| Version: | 1.2 |
| Name: | (Q)SAR Model Reporting Format |
| Author: | European Chemicals Bureau |
| Date: | July 2007 |
| Contact: | Joint Research Centre, European Commission |
| e-mail: | qsardb@jrc.it |
| www: | http://ecb.jrc.ec.europa.eu/qsar/ |
QSAR for acute toxicity to Daphnia magna (LC50)
QSARModel 4.0.4
Molcode Ltd., Turu 2, Tartu, 51014, Estonia
http://www.molcode.com
19.05.2010
Indrek Tulp
Molcode Ltd.
Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
Tarmo Tamm
Molcode Ltd.
Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
Gunnar Karelson
Molcode Ltd.
Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
Dimitar Dobchev
Molcode Ltd.
Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
Dana Martin
Molcode Ltd.
Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
Kaido Tämm
Molcode Ltd.
Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
Deniss Savchenko
Molcode Ltd.
Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
Jaak Jänes
Molcode Ltd.
Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
Eneli Härk
Molcode Ltd.
Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
Andres Kreegipuu
Molcode Ltd.
Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
Mati Karelson
Molcode Ltd.
Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
Molcode model development team
Molcode Ltd.
Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
12.05.2010
Karelson M, Dobchev D, Tamm T, Tulp I, Jänes J, Tämm K, Lomaka A, Savchenko D & Karelson G (2008). Correlation of blood-brain penetration and human serum albumin binding with theoretical descriptors. ARKIVOC 16, 38-60.
Karelson M, Karelson G, Tamm T, Tulp I, Jänes J, Tämm K, Lomaka A, Savchenko D & Dobchev D (2009). QSAR study of pharmacological permeabilities. ARKIVOC 2, 218–238.
Training and test sets are available.
None to date
Daphnia magna (water flea)
3.Ecotoxic effects. . 3.1.Short-term toxicity to Daphnia (immobilisation)
Acute toxicity 48h LC50 (50% of lethal concentration). This is the concentration which immobilizes 50% of the Daphnia in a test batch within 48 h.
mol/L
log(LC50)
Acute toxicity for Daphnia is expressed as the median effective concentration EC50. The concentrations of the substances are given in mol per litre (mol/L). Those animals which are not able to swim within 15 seconds after gentle agitation of the test batch are considered to be immobile.
Some studies use mortality (LC50) and immobilization (EC50) as identical endpoints in the context of daphnid toxicity, as is, for example, reported in the toxicity analysis of parathion that is also included in the presently selected AQUIRE data set [2].
From the U.S. EPA database AQUIRE [ref 2], acute toxicity values (48 h LC50) for the Daphnia magna were collected for a total of 380 compounds.
When multiple test values were found for one substance, these values were checked for consistency. If values differed by more than a factor of 30 from the closest one in a group of at least three other references, the aberrant value was discarded so as to remove outliers from the data set. Of all the remaining values for a given substance, the arithmetic mean was taken as the valid experimental value.
From the initial set of 1067 LC50 data, 77 values were excluded as outliers as described above, which led to a set of 349 chemicals with at least one LC50 value per substance. Subsequently, 49 chemicals were excluded because their LC50 values exceeded the predicted water solubility or because they contained metal atoms or were inorganic, leading to the final set of 300 organic compounds that cover a log Kow (octanol/water partition coefficient) range from -2 to 8.
Experimental data from different labs was used. As explained above (see 3.6), the average experimental error, which accounts as well an error caused by inerlaboratory differences, could be large. Since the authors do not provide the results from interlaboratory calibrations, it is difficult or even impossible to estimate exact error.
Statistics:
max value: -0.460
min value: -10.1
standard deviation: 1.75
skewness: -0.259
QSAR
QSAR
Multilinear regression QSAR
log(LC50) = -4.904
-2.272*Average Bonding Information content (order 2)
+0.377*HOMO - LUMO energy gap (AM1)
+4.653E-003*HPSA Polar (AM1) part of SASA
-1.240E-002*Molecular weight
+0.256*min(#HA, #HD) (AM1)
Average Bonding Information content (order 2), [unitless]Information theoretic index showing the complexity of structure
HOMO - LUMO energy gap (AM1), [eV]Energy difference between highest occupied and lowest unoccupied molecular orbitals
HPSA Polar (AM1) part of SASA, [Å2]Polar part of solvent accessible surface area
Molecular weight, [g/mol]Molecular weight
min(#HA, #HD) (AM1), [unitless]minimum value of the count of hydrogen-acceptor sites and the count of hydrogen-donor sites
Initial pool of ~1000 descriptors. Stepwise descriptor selection based on a set of statistical selection rules:
a) one-parameter equations: Fisher criterion and R2 over threshold, variance and t-test value over threshold, intercorrelation with another descriptor not over threshold
b) two-parameter equations: intercorrelation coefficient below threshold, significant correlation with endpoint, in terms of correlation coefficient and t-test).
Stepwise trial of additional descriptors not significantly correlated to any already in the model.
1D, 2D, and 3D theoretical calculations. Quantum chemical descriptors derived from AM1 calculation. Model developed by using multilinear regression.
QSARModel 4.0.4
QSAR/QSPR package that will compute chemically meaningful descriptors and includes statistical tools for regression modeling
Molcode Ltd, Turu 2, Tartu, 51014, Estonia
http://www.molcode.com
38.8 (194 chemicals / 5 descriptors)
Applicability domain based on training set:
a) by chemical identity: Organic Compounds (hydrocarbons, aliphatic alcohols, phenols, ethers, and esters; anilines, amines, nitriles, nitroaromatics, amides, and carbamates; urea and thiourea derivatives; isothiocyanates; thioles; phosphorothionate and phosphate esters; and halogenated derivatives)
b) by descriptor value range: The model is suitable for compounds that have the descriptors
in the following minimal-maximal range:
Average Bonding Information content (order 2): 0.279 - 0.976
HOMO - LUMO energy gap (AM1): 4.97 - 14.7
HPSA Polar (AM1) part of SASA: 0 - 392
Molecular weight: 44.1 - 505
min(#HA, #HD) (AM1): 0 - 5
Chemicals in the same structural domain as training set (similar functionality)
Range of descriptor values in training set with ±30% confidence. Descriptor values must fall between maximal and minimal descriptor values of training set ±30%.
QSARModel 4.0.4
QSAR/QSPR package that will compute chemically meaningful descriptors and includes statistical tools for regression modeling
Molcode Ltd, Turu 2, Tartu, 51014, Estonia
http://www.molcode.com
See 5.1
Yes
Chemname:Yes
SMILES:No
CAS RN:Yes
InChI:No
MOL file:Yes
Formula:Yes
All
All
Two compounds (2,4,6-trinitro-1,3-benzenediol, CAS:15245-44-0 and mancozeb, CAS:8018-01-7) were eliminated from original data because they are metal salts and they do not fit into applicability domain.
During the modeling procedure five compounds (paclobutrazol, CAS:76738-62-0; pirimiphos-methyl, CAS:29232-93-7; TEDP, CAS:3689-24-5; 2,4-dichlorophenoxyacetic acid, CAS:94-75-7, and dichlorvos, CAS:62-73-7) were excluded as a statistical outlier (which residuals exceeded 2 times standard deviation).
Final training set consisted of 194 data points: 194 negative values; 0 positive values
R2 = 0.741 (Correlation coefficient)
s2 = 0.903 (Standard error of the estimate)
F = 108 (Fisher function)
R2CV = 0.725
R2CVMO = 0.719
ABC analysis (2:1 training : prediction) on sorted (in increased order of endpoint value) data divided into 3 subsets (A;B;C). Training set formed with 2/3 of the compounds (set A+B, A+C, B+C) and validation set consisted of 1/3 of the compounds (C, B, A).
average R2 (fitting) = 0.747
average R2 (prediction) = 0.712
Yes
Chemname:Yes
SMILES:No
CAS RN:Yes
InChI:No
MOL file:Yes
Formula:Yes
All
All
One compound was excluded from test set because it does not fit into applicability domain with descriptor "min(#HA, #HD) (AM1)".
98 data points: 98 negative values; 0 positive values
From sorted data source every 3rd was subjected to the test set.
R2 = 0.621 (Coefficient of determination)
After excluding that compound, the rest are all in range of applicability domain:
Average Bonding Information content (order 2): 0.393 - 0.980
HOMO - LUMO energy gap (AM1): 6.53 - 14.2
HPSA Polar (AM1) part of SASA: 0 - 353
Molecular weight: 41.1 - 420
min(#HA, #HD) (AM1): 0 - 6
The validation coefficient of determination (R2) is relatively low but still acceptable bearing in mind the diversity of the compounds and the possible differences in experimental protocols (see 3.6 and 3.7). Also, large chemical diversity (complecity) in the test set affects R2. Investigation of descriptor value ranges of test set compounds reveals also, that quite often the values are on the edge of the applicability domain.
The toxicity baseline (as it is usually modeled by logP) is defined here with combination of "Molecular weight", "Average Bonding Information content (order 2)", "HPSA Polar (AM1) part of SASA" and "min(#HA, #HD) (AM1)". "Molecular weight" defines generally the mass and size of the structure; "Average Bonding Information content (order 2)" accounts the bonding complexity, i.e. aromatic, single, double, triple bonds, where also taking into account a heteroatoms; "HPSA Polar (AM1) part of SASA" shows the amount of polar surface area; and "min(#HA, #HD) (AM1)" counts the hydrogen bonding. All these descriptors affect more or less hydrophobicity - the baseline. Indirectly they are also related with other mode of action (like polar narcosis). For instance, heteroatoms, polar surface area and hydrogen bonding are important factors for different MOA. Specifically "HOMO - LUMO energy gap (AM1)" is defining the electronic hardness of molecules and is an important descriptor to define the deviation from baseline.
a posteriori mechanistic interpretation,
Interpretation consistent with scientific literature [1,3]
The data are gathered from different sources and therefore the quality of the data suffers. This means, that error term includes also a component from interlaboratoty experimental differences. Thus, very high quality QSAR models cannot be expected.
Von der Ohe PC, Kühne R, Ebert RU, Altenburger R, Liess M & Schüürmann G (2005). Structural AlertsA New Classification Model to Discriminate Excess Toxicity from Narcotic Effect Levels of Organic Compounds in the Acute Daphnid Assay. Chemical Research in Toxicology 18, 536–555.
U.S. Environmental Protection Agency (2002). AQUIRE (Aquatic Toxicity Information Retrieval Database), National Health and Environmental Effects Research Laboratory, Duluth, MN.
Netzeva IT, Pavan M & Worth AP (2008). Review of (Quantitative) Structure – Activity Relationships for Acute Aquatic Toxicity, QSAR Combinatorial Science 27, 77–90.
Training data set Daphnia#2_194_trainingValidation data set Daphnia#2_98_testOther documents
Molcode, acute toxicity, Daphnia, immobilisation