| Version: | 1.2 |
| Name: | (Q)SAR Model Reporting Format |
| Author: | European Chemicals Bureau |
| Date: | July 2007 |
| Contact: | Joint Research Centre, European Commission |
| e-mail: | qsardb@jrc.it |
| www: | http://ecb.jrc.it/QSAR/ |
Nonlinear QSAR: artificial neural network for classification of skin sensitisation potential
QSARModel 3.3.8
Turu 2, Tartu, 51014, Estonia
http://www.molcode.com
Statistica 7
StatSoft Ltd.
http://www.statsoft.com
23.09.2009
Dimitar Dobchev
Molcode model development team
Molcode Ltd. Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
Tarmo Tamm
Molcode model development team
models@molcode.com
Gunnar Karelson
Molcode model development team
models@molcode.com
Indrek Tulp
Molcode model development team
models@molcode.com
Dana Martin
Molcode model development team
models@molcode.com
Kaido Tämm
Molcode model development team
models@molcode.com
Deniss Savchenko
Molcode model development team
models@molcode.com
Jaak Jänes
Molcode model development team
models@molcode.com
Eneli Härk
Molcode model development team
models@molcode.com
Andres Kreegipuu
Molcode model development team
models@molcode.com
Mati Karelson
Molcode model development team
models@molcode.com
Molcode model development team
Molcode Ltd Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
23.9.2009
Katritzky A R, Dobchev DA, Fara DC, Hur E, Tämm K, Kuruncz L, Karelson M, Varnek A & Solov'ev VP (2006). Skin Permeation Rate as a Function of Chemical Structure. Journal of Medicinal Chemistry 49, 3305 - 3314.
Karelson M, Dobchev DA, Kulshyn OV & Katritzky A (2006). Neural Networks Convergence Using Physicochemical Data. Journal of Chemical Information and Modeling 46, 1891 - 1897.
Training and test sets are available. Model algorithm is available (snn file).
None to date.
Mouse
4.Human health effects. B.40. Human health effects: skin sensitisation, ranking of local lymph node assay (Score index of LLNA). 4.6.Skin sensitisation
In the Local Lymph Node assay (LLNA), the classification (Score index S) is based on the chemical concentration necessary to induce a three-fold or greater increase in lymph node cell proliferation activity in treated groups relative to the control. This concentration, known as the EC3 value, is estimated by linear interpolation of skin sensitization factors above and below the value of three on the LLNA dose response plot. A close association between the EC3 values and the relative skin sensitizing potential of chemicals among humans has been observed. Thus, based on the EC3 results obtained, a chemical can be classified as being extreme (1), strong (0.725), moderate (0.5), weak (0.25), or non-sensitizing (0).
LLNA Score index (S)
LLNA Score index (S)
The local lymph node assay (LLNA) was determined using the EU Test Guideline B.42 (OECD TG 429). The LLNA can be used as an alternative to the guinea-pig maximization test and the Buehler test for identifying skin sensitising chemicals and for confirming that chemicals lack a significant potential to cause skin sensitisation. The basic principle underlying the LLNA is that sensitizers induce a primary proliferation of lymphocytes in the lymph node draining the site of chemical application. This proliferation is proportional to the dose applied (and to the potency of the allergen) and provides a simple means of obtaining an objective, quantitative measurement of sensitisation.
Animals
Young adult (6–12 weeks old) female CBA strain mice are used for regulatory LLNA studies. Animals are maintained under hygienic barriered conditions with free access to food and water. The ambient temperature is maintained between 20 and 24 °C and relative humidity is maintained between 40 and 70% with a 12 h light/dark cycle. Mice are allowed to acclimatize for at least two days after arrival in the facility in cages of four or five animals per group.
Chemicals
Dosing solutions are prepared. In general, three consecutive concentrations are selected from the following: 50, 25, 10, 5, 2.5, 1, 0.5, 0.25 , and 0.1% (w/v). The appropriate vehicle solution is prepared also. Solutions must be prepared freshly (within 1 h of dosing). Although, in the context of hazard identification it is be desirable to select the highest test concentrations possible, this is not always practical. Poor solubility and/or concerns regarding acute or systemic toxicity may dictate a more conservative approach. Dosing levels may be set on the basis of oral toxicity data, but when dealing with a new chemical it is advisable to perform preliminary sighting studies using limited numbers of animals.
Vehicle
Many organic vehicles may be used. Water, however, is inappropriate as a result of its high surface tension that makes it impossible to apply evenly and to remain in contact with the surface of the skin for a suff icient period of time for absorption. Experience indicates that, in order of preference, the vehicles of choice are: 4:1 [v:v] acetone:olive oil (AOO), methylethyl ketone, dimethylformamide and dimethylsulfoxide. Vehicle selection is dictated by the relative solubility of the test material. For most purposes, AOO is suitable.
Topical exposure to chemical
The body weight of all animals is recorded, so that body weight changes over the course of the experiment are can be monitored. Significant inhibition of increases in body weight is indicative of systemic toxicity and should be recorded. Twenty five microlitrees of chemical, or vehicle alone, is dispensed on to the dorsum of both ears of each animal (n = 4 per group) using an automatic pipett e with a disposable tip, ensuring an even distribution over the surface of the ear. Identical treatment is performed once daily for the next two consecutive days. The animals are monitored daily for signs of local toxicity (irritation and/or necrosis at the site of application) and systemic toxicity. Dosing may be suspended if such signs are observed, although such is minimized by prior sighting studies. The animals are rested for two days.
Injection of thymidine
A solution of filter sterile tritiated thymidine in PBS (80 μCi/ml or 2960 kBq) is prepared. Gloves must be worn, the area in which thymidine is used must be swabbed and monitored regularly for radioactive contamination. The animals are placed in a temperature controlled “hot box”, one experimental gr oup at a time, for 5 min to allow the veins to dilate. The temperature must not exceed 37 °C. An alternative approach to improve tail vein dilation is to hold the tails under warm running tap water. Each mouse is restrained individually using a restraining tube with an outlet for the tail and injected via the tail vein with 0.25 ml of radiolabeled thymidine (80 μCi/ml or 2960 kBq), dispensed with a 1 ml graduated syringe and 25G 5/8 needle. Care must be taken to ensure syringe is free of air bubbles. The animals are returned to cages and allowed to rest for 5 h.
Processing of lymph nodes
Animals are euthanized and body weights recorded. The draining (auricular) lymph nodes are excised, counted and pooled for each experimental group in a small volume (approximately 2 ml) of PBS. Us ing tweezers, the nodes are placed onto a square of stainless steel gauze or a nylon mesh filter (100 μm pore size) contained within a 60 mm plastic Petri dish with a small volume (approximately 2 ml) of fresh PBS. A single cell suspension of lymph node cell (LNC) is prepared by gently disrupting the lymph nodes and pushing them through the gauze using the plunger of a 5 ml syringe (mechanical disaggregation). The LNC are transferred from the Petri dish into a 10 ml plastic centrifuge tube, rinsing the gauze and the Petri dish with fresh PBS. The LNC are washed twice in fresh PBS by centrifugation at 100g for 10 min. After the final wash, the cell pellet is resuspended in 3 ml of trichloroacetic acid (TCA) and stored overnight at 4 °C. Clumping of LNC should be avoided by ensuring pellet is completely resuspended in small volume of liquid before making up to final volume. The pellet is centrifuged at 100g for 10 min, the TCA is removed and the pellet is resuspended in 1 ml of fresh TCA. The pellet is transferred to 10 ml of scintillation fluid (e.g., Hisafe Optiphase) and thymidine incorporation is measured by β-scintillation counting.
Processing of data
Results are recorded as total disintegrations per min per node (dpm/node) for each experimental group. The vehicle control group is used as the comparator in order to derive a stimulation index (SI) according to the following equation
If topical exposure to one or more concentrati ons of the test chemical results in an SI of three or greater, the chemical is considered to have a significant potential to cause contact sensitization.
Modified procedure
A modified protocol based upon the standard method described above is sometimes utilized. In this protocol, lymph nodes obtained from individual mice, rather than lymph nodes pooled for each experimental group, are analyzed. Groups of mice (n = 5) receive chemical daily for three consecutive days, followed by intravenous injections of thymidine as described for the standard protocol. Five hours after the injection of thymidine, mice are euthanized and the draining auricular lymph nodes are excised and pooled for each individual mouse. Each pair of lymph nodes is processed separately. Incorporation of 3H-TdR is measured by β-scintillation counting as dpm/node for each individual animal. For each test and vehicle control experimental group, the mean and SD or SE dpm/individual animal are calculated. The vehicle control group is used as the comparator in order to derive a stimulation index (SI) according to the following equation
As for the standard protocol, if topical exposure to one or more concentrations of the test chemical results in an SI of three or greater, the chemical is considered to have a significant potential to cause contact sensitization. In addition, the data can be assessed statistically, although generally the SI value takes precedent over statistical evaluation for determination of positivity. For each experimental g roup, the data are normalized by obtaining the log values. Depending on whether data are parametric or non-parametric, Dunnett’s t test or Kruskal–Wallis test followed by Dunn’s multiple comparison procedure, respectively, are applied to determine the statistical significance of differences between test and control.
Mathematical analyses
Linear interpolation
In order to make comparisons of the relative potency of chemical sensitizers, the estimated concentration of chemical required to induce an SI of three relative to concurrent vehicle-treated controls, or EC3 value, is derived by linear interpolation of dose response data. The EC3 value is calculated by interpolating between two points on the SI axis, one imm ediately above, and the other immediately below, the SI value of three. The vehicle-treated control (by definition, SI = 1) cannot be used for the latter. Where the data points lying immediately above and below the SI value of three have the co-ordinates (a,b) and (c,d) respectively, then the EC3 value is calculated using the following equation
EC3=c+[(3-d)/(b-d)](a-c).
Log-linear extrapolation
In certain situations where the dose–response does not incorporate a data point lying below the SI value of three, provided the data are of good quality (relatively close to an SI of three and evidence of a dose response; See data interpretation section), an EC3 value may be estimated by using the two doses closest to the SI value of three. The EC3 value is estimated by log-linear interpolation between these two points on a plane where the x-axis represents the dose level and the y-axis represents the SI. The point with the higher SI is denoted (a,b) and the point with the lower SI is denoted (c,d). The formula for the EC3 estimate is as follows:
EC3=2^(log2(c)+(3-d)/(b-d)*(log2(a)-log2(c))),
by log-transforming the doses, EC3 estimates will never fall below zero.
The chemical were categorized with respect to relative skin sensitising activity based on derived EC3 valuesby defining four categories with the descriptors: Extreme, Strong, Moderate and Weak [Kimber etl al, 2003]. The scheme distinguishes between contact allergens on the basis of 10-fold variations in potency—as illustrated in table below.
The LLNA dataset consists of 238 substances, randomly split into a training (n = 215) and a test (n = 23) set. QSAR models were developed using only chemicals in the training set. Results were validated using the test set.
Experimental data from different sources is considered reliable (Golla et al, 2009). The EC3 experimental data accuracy is known to be variable at best. The same data has been modeled before with an alternative approach, which supports consistency (Golla et al, 2009).
Neural network
Neural network
Nonlinear QSAR: Backpropagation Neural Network (Multilayer Perceptron) classification
Neural network algorithm based on neural network predictor with structure 7-7-6-1. The precise explicit algorithm of the network is given in supplementary file ANN.snn. Descriptor selection explained in 4.4.
Avg nucleophilic reactivity index (AM1) for H atoms , (1/eV)
Relative number of N atoms ,
Global softness: 1/(LUMO - HOMO) (AM1) , (1/eV)
HA dependent HDCA-1 (AM1) (all) , (Å2)
Highest e-e repulsion (1-center) (AM1) for Br atoms , (eV)
RNCG Relative negative charge (QMNEG/QTMINUS) (AM1),
Highest n-n repulsion (AM1) for N - O bonds , (eV)
1)Initial pool of ~1000 descriptors. Stepwise descriptor selection based on a set of statistical selection rules:
1-parameter equations: Fisher criterion and R2 over threshold, variance and t-test value over threshold, intercorrelation with another descriptor not over threshold),
2 parameter equations: intercorrelation coefficient below threshold, significant correlation with endpoint in terms of correlation coefficient and t-test.
Stepwise trial of additional descriptors not significantly correlated to any already in the model.
6 BMLR models were selected by highest R2. Their descriptors formed a pool of 32 descriptors. F rom these descriptors 7 were selected by Genetic Algorithm used as inputs to the network. 11 networks with different structures were tested in order to find the best ANN with lowest RMS (root-mean-squared error). Approximately 600 epochs were used to train the final network with architecture depicted in 4.2. Optimization of the weights was performed with Levenberg-Marquardt algorithm using logistic activation function.
All descriptors were generated using FQSARModel on structures optimized by AM1 semi-empirical quantum mechanical method.
QSARModel 1
http://www.molcode.com
215 chemicals / 7 descriptors = 30.7 chemicals per descriptor
Applicability domain based on training set: diverse set of organic compounds (ketones, esters, carboxylic acids, halogen-derivatives, alcohols, amino-compounds, etc).
By descriptor value range (between min and max values): The model is suitable for compounds that have the descriptors in the following range:
Desc ID 1 2 3 4 5 6 7
Min 0 0 0.06572 0 0 0.05874 0
Max 0.0135 0.3333 0.1599 51.99 239.0 1 299.3
Presence of functional groups in structures
Range of descriptor values in training set with ±30% confidence
Descriptor values must fall between maximal and minimal descriptor values of training set ±30%.
QSARModel 3.3.8
http://www.molcode.com
Yes
Chemname:Yes
SMILES:No
CAS RN:Yes
InChI:No
MOL file:Yes
Formula:No
All
All
Data points: 215 classification values – 5 classes
Standardization and normalization by taking into account the mean and standard deviation
Test Train
Data Mean 0.370 0.323
Data S.D. 0.254 0.290
Error Mean -0.031 0.000
Error S.D. 0.236 0.164
Abs E. Mean 0.163 0.117
S.D. Ratio 0.928 0.566
Correlation 0.499 0.824
89% correct predictions of the classes
RMS =0.09
Yes
Chemname:Yes
SMILES:No
CAS RN:Yes
InChI:No
MOL file:Yes
Formula:No
All
All
Randomly selected 23 from source (dataset split into training and testing sets)
See 6.7
The descriptors for the test set are in the limit of applicability
Overall classification is 77% correct
The reaction between the chemical and protein is believed to be covalent in nature. Therefore, skin sensitization is underpinned by mechanisms based on chemical reactivity, where the chemical behaves as an electrophile and the protein behaves as a nucleophile as these are reflected by our descriptors such as Global softness: 1/(LUMO - HOMO) (AM1) and Avg nucleophilic reactivity index (AM1) for H atoms.
A posteriori mechanistic interpretation, consistent with published scientific interpretations of experiments.
The descriptors HA dependent HDCA-1 (AM1) (all) reflects transfer of the compounds to a phase characterized by hydrogen bonding and descriptors as well as the interactions between the O and N atoms (Highest n-n repulsion (AM1) for N - O bonds).
Golla S, Madihally S, Robinson RL Jr & Gasem KAM (2009). Quantitative structure–property relationship modeling of skin sensitization: A quantitative prediction. Toxicology in Vitro 23, 454–465.
Gerberick GF, Ryan CA, Dearman RJ & Kimber I (2007). Local lymph node assay (LLNA) for detection of sensitization capacity of chemicals. Methods 41, 54-60.
Loveless SE, Ladics GS, Gerberick GF, Ryan CA, Basketter DA, Scholes EW, House RV & Hilton J, Dearman RJ & Kimber I (1996). Further evaluation of the local lymph node assay in the final phase of an international collaborative trial. Toxicology 108, 141–152.
Kimber I, Basketter DA, Butler M, Gamer A, Garrigue J-L, Gerberick GF, Newsome C, Steiling W & Vohr H-W (2003). Classification of contact allergens according to potency: proposals. Food and Chemical Toxicology 41, 1799–1809.
Training data set Training setValidation data set test setOther documents
Q17-10-1-241
2010/07/18
skin sensitisation, local lymph node assay, neural network, Molcode, QSARModel