| Version: | 1.2 |
| Name: | (Q)SAR Model Reporting Format |
| Author: | Joint Research Centre, European Commission |
| Date: | July 2007 |
| Contact: | Joint Research Centre, European Commission |
| e-mail: | qsardb@jrc.it |
| www: | http://ecb.jrc.ec.europa.eu/qsar/ |
Nonlinear QSAR: artificial neural network for in vitro chromosomal aberration
QSARModel 3.3.8
The software was used to calculate the molecular descriptors
Turu 2, Tartu, 51014, Estonia
http://www.molcode.com
Statistica 7
The software was used to build the ANN models
StatSoft Ltd.
statsoft.com
4.06.2010
Dimitar Dobchev
Molcode Ltd.
Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
Tarmo Tamm
Molcode Ltd.
Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
Gunnar Karelson
Molcode Ltd.
Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
Indrek Tulp
Molcode Ltd.
Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
Dana Martin
Molcode Ltd.
Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
Kaido Tämm
Molcode Ltd.
Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
Deniss Savchenko
Molcode Ltd.
Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
Jaak Jänes
Molcode Ltd.
Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
Eneli Härk
Molcode Ltd.
Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
Andres Kreegipuu
Molcode Ltd.
Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
Mati Karelson
Molcode Ltd.
Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
Molcode model development team
Molcode Ltd.
Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
Molcode model development team
Molcode Ltd
Turu 2, Tartu, 51014, Estonia
models@molcode.com
www.molcode.com
12.04.2010
Karelson M, Karelson G, Tamm T, Tulp I, Jänes J, Tämm K, Lomaka A, Savchenko D & Dobchev D (2009). QSAR study of pharmacological permeabilities. Arkivoc 2, 218-238.
Training, selection and test sets are available. Model algorithm is available (snn file).
None to date.
Chinese Hamster Lung Cells
4.Human health effects. . 4.10.Mutagenicity
Chromosomal Aberration Index (indicated as CA: +1 and -1)
Description of the in vitro chromosome aberration test:
The test system and its purpose are described in OECD Guideline for the Testing of chemicals, No. 473 (1).
“The purpose of the in vitro chromosome aberration test is to identify agents that cause structural chromosome aberrations in cultured mammalian cells.Structural aberrations may be of two types, chromosome or chromatid. With the majority of chemical mutagens, induced aberrations are of the chromatid type, but chromosome-type aberrations also occur. An increase in polyploidy may indicate that a chemical has the potential to induce numerical aberrations. However, this guideline is not designed to measure numerical aberrations and is not routinely used for that purpose. Chromosome mutations and related events are the cause of many human genetic diseases and there is substantial evidence that chromosome mutations and related events causing alterations in oncogenes and tumour suppressor genes of somatic cells are involved in cancer induction in humans and experimental animals.”
Unitless, binary property
Chromosome aberration (CA) values -1 (NEG) or 1 (POS). No preprocessing of the original data has been done for this model.
All tests were performed using a Chinese Hamster Lung Cell (CHL) fibroblast cell line, which has been kept as a single cell sub-clone since 1973. This cell line has been used almost exclusively in Japan to test hundreds of chemicals over more than two decades, as opposed to the Chinese Hamster Ovary (CHO) cell lines that are more common in Europe and the United States. Much of the test information has been published in numerous scientific articles during the years over which it has been generated. An example is provided by Ishidate et al. (4).
The test data used in this model were taken from a single source, the Data Book of Chromosomal Aberration Test In Vitro [ref 2, sect 9.2]. This book is written in Japanese, but all tables are in English and the authors were provided with English translations for everything except the Introduction. The Introduction is identical to that used in the previous version of the book, published in English by Dr. Motoi Ishidate [ref 3, sect 9.2], which was also available to the authors.
Test results for a total of 901 substances are presented in the Data Book [ref 2, sect 9.2]. The chemicals were chosen for a variety of reasons, including use in foods. A number fall into the class commonly referred to as UVCB’s, or chemicals that cannot be represented by a complete structure diagram and specific molecular formula. These were excluded for the obvious reason that it is impossible to model a chemical for which a structure is not available. However, it was found that this is not always a totally unambiguous process, so the authors made the best judgement they could. Inorganic chemicals were also excluded, as the modeling platform used by the authors cannot deal with them. A very small number of chemicals were excluded because the true identity was not clear (inconsistencies between chemical name, CAS number and structure/molecular weight that we were unable to resolve). A few stereo-isomers with conflicting results were also removed as they cannot be distinguished by SMILES notation (a computer code for 2D structures).
A toxicological decision was made to include chemicals as being positive if they were active in inducing either aberrations or polyploidy. While the current test guideline does not specify testing for a length of time, which would allow polyploidy to be assessed, much of the CHL data does and the information was felt to be too valuable to lose (18 chemicals). Chemicals were also retained even if the test had not been performed both in the presence and absence of metabolic activation.
Beyond this, the judgement of the authors was used in their interpretation of the final test result. This included dropping 16 of 18 chemicals that the authors considered inconclusive in repeat tests (two were kept because while they were inconclusive for polyploidy, they were clearly positive for structural aberrations).
Seventy-eight chemicals were excluded because the authors considered them False Positive (only active at dose of more than 10 mM where effects could be due to osmotic pressure).
As the modeling system was not able to handle salts (e.g. sodium salts, hydrochlorides), further interpretation was necessary. In the majority of cases there was no conflict with regard to results of testing ionised or non-ionised forms. However, in certain cases there were. The authors decided that for some simple organic acids that were active but where the salt was clearly inactive, to consider these as being inactive in accordance with the advice given in the OECD Guidelines and Morita et al. (5), that particularly low pH may lead to false positive predictions. It is not known if this decision is right or wrong in relation to use of results of this in vitro system for predicting in vivo effects, but it will clearly affect the performance of the model.
A few decisions have been done on a basis of additional data from the literature: vitamin B2 (Riboflavin, CAS 83-88-5) tested positive in insoluble form, but was negative in soluble form. The negative result was retained, as the mechanism for the insoluble compound appears to be physical ) [ref 6, sect 9.2] After some consideration, saccharin (CAS 81-07-2) and EDTA (CAS 60-00-4) were entered as negatives, in agreement with Ashby et al. [ref 7, sect 9.2], even though there was conflicting information for some of the salts.
Finally, about 40 chemicals having only equivocal results were excluded. This is also an arbitrary decision, but it was felt that equivocal results were not likely to lead to a better training set.
Thus, a total of 513 chemicals remained. Their identities and SMILES notations are available in Training_set.doc. There were 263 positive and 250 negative substances in the training set, giving the nearly 50:50 split considered ideal for modeling purposes.
For external validation, data generated over a six-year period (1991-1996) was used for chromosomal aberration testing of high production volume (HPV) industrial chemicals that had been conducted using Chinese hamster lung (CHL/IU) cells according to the OECD HPV testing program and the national program in Japan [Kusakabe et al., ref 8, sect 9.2].
Of a total of 98 substances, two were removed in the authors’ analyses: dicyclopentadiene (CAS 77-73-6), because it was already in the training set, and Pigment Green No. 7 (CAS 14832-145), a copper complex that cannot be modeled in the selected system. The 98 chemicals are available in Validation_set.doc. On further examination of the data set, it was noticed that one substance (4-(1-Methylpropyl)phenol, CAS 99-71-8) was actually a false positive (only active at very high concentration, and ultimately judged inactive following an in vitro micronucleus test). Eight additional chemicals were identified where the chromosomal aberrations are induced under non-physiological culture conditions (pH<6), which could be kept in mind when using the data.
Neural network
Neural network
Standard Backpropagation Neural Network (Multilayer Perceptron) classification
The algorithm is based on neural network predictor with structure 9-9-8-1. Available as snn file.
Square root of Partial Surface Area of H atoms,
Partial Surface Area of H atoms,
HOMO - LUMO energy gap (AM1),
No. of occupied electronic levels (AM1) / # atoms,
WFOSA Atomic charge (Zefirov) weighted FOSA,
Highest exchange energy (AM1) for C - C bonds,
Number of H atoms,
DPSA1 Difference in CPSAs (PPSA1-PNSA1) (AM1),
Max Sigma-Sigma bond order (AM1),
Initial pool of ~1000 descriptors. Stepwise descriptor (as forward selection) selection based on a set of statistical selection rules as F statistic and p probability of F distribution. The first highest F (low p) descriptors (9) were selected from the whole (~1075) descriptors. These 9 descriptors were used as inputs to the network. Twelve networks with different structures were tested in order to find the best ANN with lowest RMS (root-mean-squared error) and highest correct predictions (for training, selection and test sets). Then 1998 epochs were used to train the final network with architecture depicted in 4.2. Optimization of the weights was performed with Levenberg-Marquardt algorithm encoded in the backpropagation scheme using linear and hyperbolic activation functions. The cost function was Entropy function.
All descriptors were generated using QSARModel on structure optimized by AM1 semiempirical quantum mechanical model.
QSARModel 3.3.8
The descriptors are based on structure optimized by mopac 6 with key words AM1 BOND, PRECISE, GNORM=0.01, PI, POLAR, ENPART, VECTOR
http://www.molcode.com
66 (501 chemicals / 9 descriptors)
Applicability domain based on training set and by descriptor value range (between min and max values):
The model is suitable for compounds (including ethers, esters, amides, halides, aromatic, aliphatic functional groups etc) that have the descriptors in the following range augmented with the confidence in 5.2:
Desc ID
See 4.3 1 2 3 4 5 6 7 8 9
Min 0.000000 0.000000 1.25747 0.971429 0.00000 -10.0872 0.00000 -228.998 0.590701
Max 0.237228 0.978325 14.61591 2.900000 25.56523 0.0000 67.00000 791.387 0.930916
Presence of functional groups in structures
Range of descriptor values in training set with ±30% confidence
Descriptor values must fall between maximal and minimal descriptor values (see5.1) of training set ±30%.
QSARModel 3.3.8
http://www.molcode.com
See 5.2
Yes
Chemname:Yes
SMILES:No
CAS RN:Yes
InChI:No
MOL file:Yes
Formula:No
All
All
Data points: 501 (initial set was refined: salts and equivocal experimental values were removed). See also 6.7
Standardization and normalization of the inputs by taking into account the mean and standard deviation. Some of the structures which were not able to be properly optimized were discarded from the original set.
Training negatives; Training positives; Selection negatives; Selection positives; Test negatives; Test positives
Total 242.0000 259.0000 19.00000 31.00000 23.00000 27.00000
Correct 233.0000 252.0000 13.00000 22.00000 13.00000 18.00000
Wrong 9.0000 7.0000 6.00000 9.00000 10.00000 9.00000
Correct (%) 96.2810 97.2973 68.42105 70.96774 56.52174 66.66667
Wrong (%) 3.7190 2.7027 31.57895 29.03226 43.47826 33.33333
See 6.7
See 6.7 for classification statistics
Yes
Chemname:Yes
SMILES:No
CAS RN:Yes
InChI:No
MOL file:Yes
Formula:No
All
All
The method used two randonly selected validation sets – selection (50) and test (50; 23 positive and 27 negative) (see 7.9 for description)
Randomly selected 50 (for selection set) and 50 (test set) data points
NEG POS
Total 23.00000 27.00000
Correct 13.00000 18.00000
Wrong 10.00000 9.00000
Correct (%) 56.52174 66.66667
Wrong (%) 43.47826 33.33333
The descriptors for the test set are in the limit of applicability, see 6.7 and 6.12
Overall predictions for the selection set (used to stop the ANN training and not to over fit it) and the test set (used to test the external prediction of the net after training) are given in the classification matrix, see 6.7.
The mechanistic picture is difficult to analyze because of the nature of the ANN models. According to the descriptors used as inputs to the network, it can be concluded that the property is mainly related to the charged surfaces that may play important role in defining the property values. For instance, the most significant descriptor (according to F) Square root of Partial Surface Area of H atoms leads to positive index of the chromosomal aberration when its values are lower.
In addition to the charged surfaces, hydrogen abilities of the compounds are also important in conjunction with the energy terms related to HOMO-LUMO and exchange interactions for the C-C bond.
A posteriori relation between the CA and the charge distribution over certain areas in the molecule was observed [ref 7, sect 9.2].
Supporting information for: Training set(s), Selection set(s), Test set(s), 9-9-8-1.snn file (binary) includes the ANN model, in order to be used the user must have Statistica 7 or higher with ANN modules.
OECD (1997). OECD Guidelines for the Testing of Chemicals No. 473: Genetic Toxicology: In Vitro Mammalian Cytogenetic Test. Organisation for Economic Cooperation and Development, Paris, France.
Sofuni T (1998). Data Book of Chromosomal Aberration Test In Vitro, Revised Edition.. Life-Science Information Center, Tokyo, Japan.
Ishidate M (1988). Data Book of Chromosomal Aberration Test In Vitro, Revised Edition. Elsevier, Amsterdam, New York, Oxford.
Ishidate M, Haronois MC & Sofuni T (1988). A Comparative analysis of data on the clastogenicity of 951 chemicals tested in mammalian cell cultures. Mutation Research 195, 151-213.
Morita T, Nagaki T, Fukuda I & Okumura K (1992). Clastogenicity of low pH to various cultures mammalian cells. Mutation Research 268, 297-305.
Kawaguchi Y, Hayashi H, Sato M & Shindo Y (1997). Needle crystals of Vitamin B2 induce polyploidy in Chinese hamster lung (CHL/IU) cells. Mutation Research 373, 1-7.
Ashby J & Ishidate M Jr (1986). Clastogenicity in vitro of the Na, K, Ca and Mg. Salts of Saccharin; and of magnesium chloride; consideration of significance. Mutation Research 163, 63-73.
Kusakabe H, Ymakage K, Wakuri S, Sasaki K, Nakagawa Y, Watanabe M, Hayashi M, Sufuni T, Ono H & Tanaka N (2002). Relevance of chemical structure and cytotoxicity to the induction of chromosome aberrations based on testing of 98 high production volume industrial chemicals. Mutation Research 517, 187-198.
Niemelä J & Wedeby E (2004). Evaluation of the setubal principles for establishing the status of development and validation of (Q)SARs, Annex 4, A “global” MULTI-CASE model for in vitro chromosomal aberrations in mammalian cells. pp 113-133 in: OECD Environment Health and Safety Publications, Series on Testing and Assessment, no 49, Report from the expert group on (Quantitative) Structure-Activity Relationships ((Q)SARs) on the principles for the validation of (Q)SARs.
Training data set Chromosomal_Aberration_trainingset_501Validation data set Chromosomal_Aberration_testset_50Chromosomal_Aberration_selectionset_50Other documents 9-9-8-1
Q17-10-1-311
2011/06/06
Molcode, artificial neural network, in vitro chromosome aberration, Chinese Hamster Lung cell