| Version: | 1.2 |
| Name: | (Q)SAR Model Reporting Format |
| Author: | European Chemicals Bureau |
| Date: | July 2007 |
| Contact: | Joint Research Centre, European Commission |
| e-mail: | qsardb@jrc.it |
| www: | http://ecb.jrc.it/QSAR/ |
QSAR for acute toxicity to Pimephales promelas (Fathead Minnow)
MOBY DIGS
Software for multilinear regression analysis and variable subset selection by Genetic Algorithm, ver. 1.0 beta for Windows, 2004
Todeschini Roberto, Talete srl, Milan (Italy)
29/05/2009
Ester Papa
University of Insubria
ester.papa@uninsubria.it
Fulvio Villa
University of Insubria
fulvio.villa@uninsubria.it
Paola Gramatica
University of Insubria
paola.gramatica@uninsubria.it
Ester Papa
University of Insubri
ester.papa@uninsubria.it
Fulvio Villa
University of Insubria
fulvio.villa@uninsubria.it
Paola Gramatica
University of Insubria
paola.gramatica@uninsubria.it
2005
Papa E, Villa F & Gramatica P (2005). Statistically Validated QSARs, Based on Theoretical Descriptors, for Modeling Aquatic Toxicity of Organic Chemicals in Pimephales promelas (Fathead Minnow). Journal of Chemical Information and Modeling 45, 1256-1266.
This model is not proprietary, training and test sets are available
no information available
Pimephales promelas (Fathead Minnow)
3.Ecotoxic effects. . 3.3.Acute toxicity to fish (lethality)
Flow-through bioassays, conducted with juvenile fathead minnows.
The median lethal concentrations are reported as the logarithm of the inverse molar concentration: log(1/LC50).
log1/EC50
Experimentally determined 96h LC50 values for 468 industrial organic chemicals were collected from Russom et al. (1997) (original source: U.S.-E.P.A. Duluth Fathead Minnow Database). The data relate to flow-through bioassays, conducted with juvenile fathead minnows, on chemicals selected from a cross section of the Toxic Substances Control Act Inventory of industrial organic chemicals.
A detailed analysis of the quality of the data reported in Duluth Fathead minnow database was made by Russom et al. (1997).
QSAR
multilinear regression QSAR
Log (1/LC50)96h = -2.54 (±0.4) + 0.91 (±0.06) WA + 6.2 (±0.6) Mv + 0.08 (±0.01) H-046 + 0.22 (±0.03) nCb - 0.19 (±0.04) MAXDP - 0.33 (±0.06) nN
WA,topological descriptor representing the mean Weiner indexv
Mv,constitutional descriptor, mean atomic van der Waals volume
H-046,H attached to C-O sp3
nCb-,number of C sp2 in substituted benzenes
MAXDP,topological descriptor, maximal electrotopological positive variation
nN,number of nitrogen atoms
A total of 1200 molecular descriptors of different kinds (0D, 1D, 2D, 3D) were calculated by the DRAGON software to describe the chemical diversity of the compounds. Constant values and descriptors found to be correlated pairwise were excluded in a prereduction step (when there was more than 98% pairwise correlation, one variable was deleted). Quantum-chemical descriptors such as HOMO (highest occupied molecular orbital), LUMO (lowest unoccupied molecular orbital), HOMO-LUMO gap (DHL), ionisation potential (P ion), and heat of formation (H), were added to Dragon descriptors. The Genetic Algorithm (GA) was applied to a final set of 400 descriptors for variable selection.
Multiple linear regression (Ordinary Least Square method) was applied to generate the model.
Molecular descriptors were generated by the DRAGON software. The input files for descriptor calculation contain information on atom and bond types, connectivity, partial charges and atomic spatial coordinates, relative to the minimum energy conformation of the molecule, and were obtained by the Molecular Mechanics method of Allinger (MM+) using the package HYPERCHEM.
Quantum-chemical descriptors were calculated by the semiempirical PM3 Hamiltonian for the geometry optimization method available in the HYPERCHEM package.
DRAGON - 2005, version 5.2 for Windows
Software for the calculation of molecular descriptors
R. Todeschini, Talete s.r.l. Milano
HYPERCHEM - ver. 7.03
Software for molecular drawing and conformational energy optimization
249 chemicals / 6 descriptors = 41.5
Structural Applicability Domain - high leverage chemicals (training set):
nicotine sulfate, 2,2'-methylenebis(3,4,6- trichlorophenol), hexachloro-1,3-butadiene, pentachloropyridine, rotenone, and 2,6-di-tert-butyl-4-methylphenol.
Response domain - response outliers (training set):
chloroacetonitrile.
Structural Applicability Domain - high leverage chemicals (validation set):
tetrachloroethylene, hexachloroethane, 2,4,5-tribromoimidazole, 3-amino-5,6-dimethyl-1,2,4- triazine, caffeine, pentabromophenol.
Most of the chemicals falling outside the AD of the model belong to the specific acting compound class.
The structural AD of the model was checked by the Leverage approach. The presence of outliers (i.e. compounds with cross-validated standardized residuals greater than 2.5 standard deviation units) and chemicals very structurally influential in determining model parameters (i.e. compounds with a high leverage value (h) greater than 3 p'/n (h*), where p' is the number of model variables plus one, and n is the number of the objects used to calculate the model) was verified by the Williams plot.
MOBY DIGS - MOdels BY Descriptors In Genetic Selection - ver.1 beta for Windows, Talete S.r.l., Milan, Italy, 2004.
Calculation of hat values, calculated - predicted values
Excel
Calculation of standardised residuals (compounds with crossvalidated standardized residuals greater than 2.5 standard deviation units)
High leverage compounds: hat value > 0.084
Outliers for the response: standardised residuals > 2.5 standard deviation units
Yes
Chemname:Yes
SMILES:No
CAS RN:Yes
InChI:No
MOL file:No
Formula:No
All
All
249 compounds
Transformation of LC50 into Log1/LC50
n = 249; R2 = 0.79; SDEP = 0.613; SDEC = 0.595; RMSE (training. set) = 0.38;
KX = 34.81; KXY = 39.94
Q2LOO = 0.78
Q2LMO(50%) = 0.77
R2Y-SC = 0.024
Q2BOOT = 0.78
Yes
Chemname:Yes
SMILES:No
CAS RN:Yes
InChI:No
MOL file:No
Formula:No
All
All
200 compounds
The splitting of the original dataset into a training set of 249 chemicals representative of the entire data set and a validation set of 200 chemicals (splitting 50%) was realized by Kohonen artificial neural network (K-ANN) using the software KOALA: the three most significant principal components, calculated from each group of DRAGON molecular descriptors, were used to synthesize the structural information of the chemicals. This structural information and the response were used as variables to organize the structure of a Kohonen map. At the end of the net training, similar chemicals fell within the same neuron, i.e., they carried the same information. To select the training set of chemicals it was assumed that the compound closest to each neuron centroid was the most representative of all the chemicals within the same neuron.Thus, the training set chemicals were selected according to the minimal distance from the centroid of each cell in the top map. The remaining objects, close to the training set chemicals, were used for the validation set.
Q2 EXT = 0.71; RMSE (validation set) = 0.64
The application of Kohonen maps Artificial Neural Networks allowed for the selection of a large structurally representative validation set
The model was developed by a statistical approach. No mechanistic basis was defined a priori.
A posteriori mechanistic interpretation. The theoretical descriptors selected in this model are a combination of global structural features, able to represent the high structural heterogeneity of the training and test sets: WA (topological descriptor representing the mean Weiner index), Mv (mean atomic van der Waals volume), nCb- (number of C sp2 in substituted benzenes), H-046 (H attached to C-O sp3), MAXDP (maximal electrotopological positive variation), and nN (number of nitrogen atoms). The information related to dimensional features is condensed in WA and Mv, and the electronic distribution in MAXDP, while some counters (nN, nCb, and H-046) are mainly needed to model some particular chemicals in the data set.
No significant difference was found in the performances obtained with or without the inclusion of Log P.
Russom CL, Bradbury SP, Broderius SJ, Drummond RA, & Hammermeister DE (1997). Predicting modes of toxic action from chemical structure: Acute toxicity in the fathead minnow (Pimephales promelas). Environmental Toxicology and Chemistry 16, 948-967.
US EPA. Duluth Fathead Minnow Database
Training data set PimephalesTrainingSet.sdfValidation data set PimephalesValidationSet.sdfOther documents
Q2-17-11-126
2009/12/04
QSAR, fathead minnow, Pimephales promelas, acute toxicity