| Version: | 1.2 |
| Name: | (Q)SAR Model Reporting Format |
| Author: | European Chemicals Bureau |
| Date: | July 2007 |
| Contact: | Joint Research Centre, European Commission |
| e-mail: | qsardb@jrc.it |
| www: | http://ecb.jrc.ec.europa.eu/qsar/ |
Molcode QSAR for abiotic degradation in air (OH tropospheric degradation of volatile organic compounds)
QSARModel 4.0.4
Molcode Ltd., Turu 2, Tartu, 51014, Estonia
http://www.molcode.com
08.02.2010
Indrek Tulp
Molcode Ltd.
Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
Tarmo Tamm
Molcode Ltd.
Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
Gunnar Karelson
Molcode Ltd.
Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
Dimitar Dobchev
Molcode Ltd.
Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
Dana Martin
Molcode Ltd.
Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
Kaido Tämm
Molcode Ltd.
Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
Deniss Savchenko
Molcode Ltd.
Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
Jaak Jänes
Molcode Ltd.
Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
Eneli Härk
Molcode Ltd.
Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
Andres Kreegipuu
Molcode Ltd.
Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
Mati Karelson
Molcode Ltd.
Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
Molcode model development team
Molcode Ltd. Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
07.02.2010
Software is proprietary but model training and test sets provided. Algorithm provided.
None to date
Not applicable - environmental fate parameter
2.Environmental fate parameters. 2.Persistence: Abiotic degradation in air (Phototransformation). 2.2.b.Indirect photolysis (OH-radical reaction, ozone-radical reaction, other)
Rate constant for OH radical degradation.
The dominant chemical process of chemicals in the gasphase is their reaction with OH radicals, NO3 radicals, and ozone. The hydroxyl radical is the key reactive species in the troposphere, where it reacts with practically every organic compound.
cm3 s-1 molecule-1
-logK (OH) (original rate constants were transformed into log scale and multiplied by -1 to reduce data range and obtain positive values)
The selected data are for reactions at 25 °C and 1 atm. The gas-phase reaction rate constants of OH radical and organic chemicals have been directly measured.
Original experimental data were collected from ref 3.
Statistics for -logK(OH):
max value: 15.7
min value: 9.44
standard deviation: 1.03
skewness: 1.16
QSAR
Multilinear regression QSAR
Multilinear regression QSAR derived with BMLR (Best Multiple Linear Regression) method
-logK(OH) = 3.61
+2.15*HASA-1/TMSA (AM1) (all)
-0.698*HOMO energy (AM1)
+1.67*Relative number of aromatic bonds
-12.7*HACA-1/TMSA (Zefirov)
HASA-1/TMSA (AM1) (all), [unitless]relative solvent-accessible surface area of H-bonding acceptor atoms (from AM1 calculation)
HOMO energy (AM1), [eV]energy of highest occupied molecular orbital energy
Relative number of aromatic bonds, [unitless]Relative number of aromatic bonds
HACA-1/TMSA (Zefirov), [unitless]sum of solvent-accessible surface area of H-bonding acceptor atoms, selected by threshold charge
Initial pool of ~1000 descriptors for each structure calculated. Stepwise descriptor selection was applied to reduce the pool based on a set of statistical selection rules.
For one-parameter equations: Fisher criterion and R2 over threshold, variance and t-test value over threshold, intercorrelation with another descriptor not over threshold)
Two parameter correlations developed from previously reduced pool, the statistical selection applied: intercorrelation coefficient below threshold, significant correlation with endpoint, in terms of correlation coefficient and t-test. Stepwise trial of additional descriptors not significantly correlated to any already in the model. See refs 1-2.
1D, 2D, and 3D theoretical calculations. Descriptors derived from mol files. Quantum chemical descriptors from AM1 calculations. Model developed by using multilinear regression using ordinary least squares.
QSARModel 4.0.4
QSAR/QSPR package that will compute chemically meaningful descriptors and includes statistical tools for regression modeling
Molcode Ltd, Turu 2, Tartu, 51014, Estonia
http://www.molcode.com
53 (212 chemicals / 4 descriptors)
Applicability domain based on training set:
a) by chemical identity: Diverse set of Volatile Organic Compounds (alphatic and aromatic hydrocarbons, alcohols, amines, halogenated compounds, etc)
b) by descriptor value range: The model is suitable for compounds that have the descriptors in the following minimal-maximal range:
HASA-1/TMSA (AM1) (all): 0 - 0.911
HOMO energy (AM1): -13.3 - -8.10
Relative number of aromatic bonds: 0 - 0.615
HACA-1/TMSA (Zefirov): 0 - 0.0587
By chemical identity - compounds must be similar to traing set compounds in terms of functionality.
By descriptor value range: range of descriptor values similar to training set with ±30% confidence. Descriptor values must fall between maximal and minimal descriptor values of training set ±30%.
QSARModel 4.0.4
QSAR/QSPR package that will compute chemically meaningful descriptors and includes statistical tools for regression modeling
Molcode Ltd, Turu 2, Tartu, 51014, Estonia
http://www.molcode.com
See 5.1
Yes
Chemname:Yes
SMILES:No
CAS RN:Yes
InChI:No
MOL file:Yes
Formula:Yes
All
All
212 data points: 0 negative values; 212 positive values
original source dataset split into testing and training. From the original source data of 423 values sorted by endpoint value, each 2nd was subjected to the test set.
No more than specified in 3.5
R2 = 0.832 (Correlation coefficient)
s2 = 0.427 (Standard error of the estimate)
F = 256.8 (Fisher function)
R2CV = 0.821
R2CVMO = 0.819 (80% : 20% , training : testing)
ABC analysis (2:1 training : prediction) on sorted (in increased order of endpoint value) data divided into 3 subsets (A;B;C). Training set formed with 2/3 of the compounds (set A+B, A+C, B+C) and validation set consisted of 1/3 of the compounds (C, B, A).
average R2 (fitting) = 0.833
average R2 (prediction) = 0.824
Yes
Chemname:Yes
SMILES:No
CAS RN:Yes
InChI:No
MOL file:Yes
Formula:Yes
All
All
211 data points: 0 negative values; 211 positive values
Original source dataset split into testing and training. From the original source data, sorted by endpoint value, each 2nd was subjected to the test set.
R2 = 0.773 (Coefficient of determination)
All are in range of applicability domain:
HASA-1/TMSA (AM1) (all): 0 - 0.942
HOMO energy (AM1): -13.1 - -8.27
Relative number of aromatic bonds: 0 - 0.579
HACA-1/TMSA (Zefirov): 0 - 0.0561
The validation coefficient of determination (R2) is significant and close to the coefficients derived by internal validation (R2CV and R2CVMO).
The descriptors "HASA-1/TMSA (AM1) (all)" and "HACA-1/TMSA (Zefirov)" are simultaneously taking into account the capability of hydrogen acceptor bonding and the size of the compound. Although the descriptors seem to be similar, they are counting different features of hydrogen acceptor abilities. "HASA-1/TMSA (AM1) (all)" counts all possible hydrogen acceptor atoms solvent accessible surface area while "HACA-1/TMSA (Zefirov)" counts only charged areas. "HOMO energy (AM1)" is an indicator of the nucleophilicity of the molecule - reactive molecules have relatively higher HOMO energy.. "Relative number of aromatic bonds" represents a count of aromaticity which differentiates these compounds from aliphatic ones. All presented descriptors represent important molecular properties related to H-abstraction.
A posteriori mechanistic interpretation, consistent with published scientific interpretations of experiments (in ref. 4. HOMO energy and aromatic carbon were found to be important)
Similar methodology to current approach was used in refs 1-2
Karelson M, Dobchev D, Tamm T, Tulp I, Jänes J, Tämm K, Lomaka A, Savchenko D & Karelson G (2008). Correlation of blood-brain penetration and human serum albumin binding with theoretical descriptors. ARKIVOC 16, 38-60.
Karelson M, Karelson G, Tamm T, Tulp I, Jänes J, Tämm K, Lomaka A, Savchenko D & Dobchev D (2009). QSAR study of pharmacological permeabilities. ARKIVOC 2, 218–238.
Atkinson R (1989). Kinetics and mechanisms of the gas-phase reactions of the hydroxyl radical with organic compounds. Journal of Physical Reference Data, Monograph 1.
Gramatica P, Pilutti P & Papa E (2004). Validated QSAR Prediction of OH Tropospheric Degradation of VOCs: Splitting into Training?Test Sets and Consensus Modeling. Journal of Chemical Informatics and Computer Science 44, 1794–1802.
Training data set Photolysis_#1_212_trainingValidation data set Photolysis_# 1_211_testsetOther documents
Molcode, abiotic degradation in air, OH tropospheric degradation, volatile organic compounds