QSAR Model Reporting Format

Version: 1.2
Name: (Q)SAR Model Reporting Format
Author: European Chemicals Bureau
Date: July 2007
Contact: Joint Research Centre, European Commission
e-mail: qsardb@jrc.it
www: http://ecb.jrc.it/QSAR/

1.QSAR identifier

1.1 QSAR identifier (title)

Nonlinear QSAR: artificial neural network for classification of skin sensitisation potential

1.2 Other related models

1.3 Software coding the model

QSARModel 3.3.8

Turu 2, Tartu, 51014, Estonia
http://www.molcode.com
Statistica 7

StatSoft Ltd.
http://www.statsoft.com

2.General information

2.1 Date of QMRF

23.09.2009

2.2 QMRF author(s) and contact details

Dimitar Dobchev
Molcode model development team
Molcode Ltd. Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com
Tarmo Tamm
Molcode model development team

models@molcode.com

Gunnar Karelson
Molcode model development team

models@molcode.com

Indrek Tulp
Molcode model development team

models@molcode.com

Dana Martin
Molcode model development team

models@molcode.com

Kaido Tämm
Molcode model development team

models@molcode.com

Deniss Savchenko
Molcode model development team

models@molcode.com

Jaak Jänes
Molcode model development team

models@molcode.com

Eneli Härk
Molcode model development team

models@molcode.com

Andres Kreegipuu
Molcode model development team

models@molcode.com

Mati Karelson
Molcode model development team

models@molcode.com

2.3 Date of QMRF update(s)

2.4 QMRF update(s)

2.5 Model developer(s) and contact details

Molcode model development team

Molcode Ltd Turu 2, Tartu, 51014, Estonia
models@molcode.com
http://www.molcode.com

2.6 Date of model development and/or publication

23.9.2009

2.7 Reference(s) to main scientific papers and/or software package

Katritzky A R, Dobchev DA, Fara DC, Hur E, Tämm K, Kuruncz L, Karelson M, Varnek A & Solov'ev VP (2006). Skin Permeation Rate as a Function of Chemical Structure. Journal of Medicinal Chemistry 49, 3305 - 3314.
Karelson M, Dobchev DA, Kulshyn OV & Katritzky A (2006). Neural Networks Convergence Using Physicochemical Data. Journal of Chemical Information and Modeling 46, 1891 - 1897.

2.8 Availability of information about the model

Training and test sets are available. Model algorithm is available (snn file).

2.9 Availability of another QMRF for exactly the same model

None to date.

3.Defining the endpoint - OECD Principle 1

3.1 Species

Mouse

3.2 Endpoint

4.Human health effects. B.40. Human health effects: skin sensitisation, ranking of local lymph node assay (Score index of LLNA). 4.6.Skin sensitisation

3.3 Comment on endpoint

In the Local Lymph Node assay (LLNA), the classification (Score index S) is based on the chemical concentration necessary to induce a three-fold or greater increase in lymph node cell proliferation activity in treated groups relative to the control. This concentration, known as the EC3 value, is estimated by linear interpolation of skin sensitization factors above and below the value of three on the LLNA dose response plot. A close association between the EC3 values and the relative skin sensitizing potential of chemicals among humans has been observed. Thus, based on the EC3 results obtained, a chemical can be classified as being extreme (1), strong (0.725), moderate (0.5), weak (0.25), or non-sensitizing (0).

3.4 Endpoint units

LLNA Score index (S)

3.5 Dependent variable

LLNA Score index (S)

3.6 Experimental protocol

The local lymph node assay (LLNA) was determined using the EU Test Guideline B.42 (OECD TG 429). The LLNA can be used as an alternative to the guinea-pig maximization test and the Buehler test for identifying skin sensitising chemicals and for confirming that chemicals lack a significant potential to cause skin sensitisation. The basic principle underlying the LLNA is that sensitizers induce a primary proliferation of lymphocytes in the lymph node draining the site of chemical application. This proliferation is proportional to the dose applied (and to the potency of the allergen) and provides a simple means of obtaining an objective, quantitative measurement of sensitisation.

Animals

Young adult (6–12 weeks old) female CBA strain mice are used for regulatory LLNA studies. Animals are maintained under hygienic barriered conditions with free access to food and water. The ambient temperature is maintained between 20 and 24 °C and relative humidity is maintained between 40 and 70% with a 12 h light/dark cycle. Mice are allowed to acclimatize for at least two days after arrival in the facility in cages of four or five animals per group.

Chemicals

Dosing solutions are prepared. In general, three consecutive concentrations are selected from the following: 50, 25, 10, 5, 2.5, 1, 0.5, 0.25 , and 0.1% (w/v). The appropriate vehicle solution is prepared also. Solutions must be prepared freshly (within 1 h of dosing). Although, in the context of hazard identification it is be desirable to select the highest test concentrations possible, this is not always practical. Poor solubility and/or concerns regarding acute or systemic toxicity may dictate a more conservative approach. Dosing levels may be set on the basis of oral toxicity data, but when dealing with a new chemical it is advisable to perform preliminary sighting studies using limited numbers of animals.

Vehicle

Many organic vehicles may be used. Water, however, is inappropriate as a result of its high surface tension that makes it impossible to apply evenly and to remain in contact with the surface of the skin for a suff icient period of time for absorption. Experience indicates that, in order of preference, the vehicles of choice are: 4:1 [v:v] acetone:olive oil (AOO), methylethyl ketone, dimethylformamide and dimethylsulfoxide. Vehicle selection is dictated by the relative solubility of the test material. For most purposes, AOO is suitable.

Topical exposure to chemical

The body weight of all animals is recorded, so that body weight changes over the course of the experiment are can be monitored. Significant inhibition of increases in body weight is indicative of systemic toxicity and should be recorded. Twenty five microlitrees of chemical, or vehicle alone, is dispensed on to the dorsum of both ears of each animal (n = 4 per group) using an automatic pipett e with a disposable tip, ensuring an even distribution over the surface of the ear. Identical treatment is performed once daily for the next two consecutive days. The animals are monitored daily for signs of local toxicity (irritation and/or necrosis at the site of application) and systemic toxicity. Dosing may be suspended if such signs are observed, although such is minimized by prior sighting studies. The animals are rested for two days.

Injection of thymidine

A solution of filter sterile tritiated thymidine in PBS (80 μCi/ml or 2960 kBq) is prepared. Gloves must be worn, the area in which thymidine is used must be swabbed and monitored regularly for radioactive contamination. The animals are placed in a temperature controlled “hot box”, one experimental gr oup at a time, for 5 min to allow the veins to dilate. The temperature must not exceed 37 °C. An alternative approach to improve tail vein dilation is to hold the tails under warm running tap water. Each mouse is restrained individually using a restraining tube with an outlet for the tail and injected via the tail vein with 0.25 ml of radiolabeled thymidine (80 μCi/ml or 2960 kBq), dispensed with a 1 ml graduated syringe and 25G 5/8 needle. Care must be taken to ensure syringe is free of air bubbles. The animals are returned to cages and allowed to rest for 5 h.

Processing of lymph nodes

Animals are euthanized and body weights recorded. The draining (auricular) lymph nodes are excised, counted and pooled for each experimental group in a small volume (approximately 2 ml) of PBS. Us ing tweezers, the nodes are placed onto a square of stainless steel gauze or a nylon mesh filter (100 μm pore size) contained within a 60 mm plastic Petri dish with a small volume (approximately 2 ml) of fresh PBS. A single cell suspension of lymph node cell (LNC) is prepared by gently disrupting the lymph nodes and pushing them through the gauze using the plunger of a 5 ml syringe (mechanical disaggregation). The LNC are transferred from the Petri dish into a 10 ml plastic centrifuge tube, rinsing the gauze and the Petri dish with fresh PBS. The LNC are washed twice in fresh PBS by centrifugation at 100g for 10 min. After the final wash, the cell pellet is resuspended in 3 ml of trichloroacetic acid (TCA) and stored overnight at 4 °C. Clumping of LNC should be avoided by ensuring pellet is completely resuspended in small volume of liquid before making up to final volume. The pellet is centrifuged at 100g for 10 min, the TCA is removed and the pellet is resuspended in 1 ml of fresh TCA. The pellet is transferred to 10 ml of scintillation fluid (e.g., Hisafe Optiphase) and thymidine incorporation is measured by β-scintillation counting.

Processing of data

Results are recorded as total disintegrations per min per node (dpm/node) for each experimental group. The vehicle control group is used as the comparator in order to derive a stimulation index (SI) according to the following equation

If topical exposure to one or more concentrati ons of the test chemical results in an SI of three or greater, the chemical is considered to have a significant potential to cause contact sensitization.

Modified procedure

A modified protocol based upon the standard method described above is sometimes utilized. In this protocol, lymph nodes obtained from individual mice, rather than lymph nodes pooled for each experimental group, are analyzed. Groups of mice (n = 5) receive chemical daily for three consecutive days, followed by intravenous injections of thymidine as described for the standard protocol. Five hours after the injection of thymidine, mice are euthanized and the draining auricular lymph nodes are excised and pooled for each individual mouse. Each pair of lymph nodes is processed separately. Incorporation of 3H-TdR is measured by β-scintillation counting as dpm/node for each individual animal. For each test and vehicle control experimental group, the mean and SD or SE dpm/individual animal are calculated. The vehicle control group is used as the comparator in order to derive a stimulation index (SI) according to the following equation

As for the standard protocol, if topical exposure to one or more concentrations of the test chemical results in an SI of three or greater, the chemical is considered to have a significant potential to cause contact sensitization. In addition, the data can be assessed statistically, although generally the SI value takes precedent over statistical evaluation for determination of positivity. For each experimental g roup, the data are normalized by obtaining the log values. Depending on whether data are parametric or non-parametric, Dunnett’s t test or Kruskal–Wallis test followed by Dunn’s multiple comparison procedure, respectively, are applied to determine the statistical significance of differences between test and control.

Mathematical analyses

Linear interpolation

In order to make comparisons of the relative potency of chemical sensitizers, the estimated concentration of chemical required to induce an SI of three relative to concurrent vehicle-treated controls, or EC3 value, is derived by linear interpolation of dose response data. The EC3 value is calculated by interpolating between two points on the SI axis, one imm ediately above, and the other immediately below, the SI value of three. The vehicle-treated control (by definition, SI = 1) cannot be used for the latter. Where the data points lying immediately above and below the SI value of three have the co-ordinates (a,b) and (c,d) respectively, then the EC3 value is calculated using the following equation

EC3=c+[(3-d)/(b-d)](a-c).

Log-linear extrapolation

In certain situations where the dose–response does not incorporate a data point lying below the SI value of three, provided the data are of good quality (relatively close to an SI of three and evidence of a dose response; See data interpretation section), an EC3 value may be estimated by using the two doses closest to the SI value of three. The EC3 value is estimated by log-linear interpolation between these two points on a plane where the x-axis represents the dose level and the y-axis represents the SI. The point with the higher SI is denoted (a,b) and the point with the lower SI is denoted (c,d). The formula for the EC3 estimate is as follows:

EC3=2^(log2(c)+(3-d)/(b-d)*(log2(a)-log2(c))),

by log-transforming the doses, EC3 estimates will never fall below zero.

The chemical were categorized with respect to relative skin sensitising activity based on derived EC3 valuesby defining four categories with the descriptors: Extreme, Strong, Moderate and Weak [Kimber etl al, 2003]. The scheme distinguishes between contact allergens on the basis of 10-fold variations in potency—as illustrated in table below.

The LLNA dataset consists of 238 substances, randomly split into a training (n = 215) and a test (n = 23) set. QSAR models were developed using only chemicals in the training set. Results were validated using the test set.

3.7 Endpoint data quality and variability

Experimental data from different sources is considered reliable (Golla et al, 2009). The EC3 experimental data accuracy is known to be variable at best. The same data has been modeled before with an alternative approach, which supports consistency (Golla et al, 2009).

4.Defining the algorithm - OECD Principle 2

4.1 Type of model

Neural network

4.2 Explicit algorithm

Neural network
Nonlinear QSAR: Backpropagation Neural Network (Multilayer Perceptron) classification

Neural network algorithm based on neural network predictor with structure 7-7-6-1. The precise explicit algorithm of the network is given in supplementary file ANN.snn. Descriptor selection explained in 4.4.

4.3 Descriptors in the model

Avg nucleophilic reactivity index (AM1) for H atoms , (1/eV)
Relative number of N atoms ,
Global softness: 1/(LUMO - HOMO) (AM1) , (1/eV)
HA dependent HDCA-1 (AM1) (all) , (Å2)
Highest e-e repulsion (1-center) (AM1) for Br atoms , (eV)
RNCG Relative negative charge (QMNEG/QTMINUS) (AM1),
Highest n-n repulsion (AM1) for N - O bonds , (eV)

4.4 Descriptor selection

1)Initial pool of ~1000 descriptors. Stepwise descriptor selection based on a set of statistical selection rules:

1-parameter equations: Fisher criterion and R2 over threshold, variance and t-test value over threshold, intercorrelation with another descriptor not over threshold),

2 parameter equations: intercorrelation coefficient below threshold, significant correlation with endpoint in terms of correlation coefficient and t-test.

Stepwise trial of additional descriptors not significantly correlated to any already in the model.

6 BMLR models were selected by highest R2. Their descriptors formed a pool of 32 descriptors. F rom these descriptors 7 were selected by Genetic Algorithm used as inputs to the network. 11 networks with different structures were tested in order to find the best ANN with lowest RMS (root-mean-squared error). Approximately 600 epochs were used to train the final network with architecture depicted in 4.2. Optimization of the weights was performed with Levenberg-Marquardt algorithm using logistic activation function.

4.5 Algorithm and descriptor generation

All descriptors were generated using FQSARModel on structures optimized by AM1 semi-empirical quantum mechanical method.

4.6 Software name and version for descriptor generation

QSARModel 1


http://www.molcode.com

4.7 Descriptors/Chemicals ratio

215 chemicals / 7 descriptors = 30.7 chemicals per descriptor

5.Defining the applicability domain - OECD Principle 3

5.1 Description of the applicability domain of the model

Applicability domain based on training set: diverse set of organic compounds (ketones, esters, carboxylic acids, halogen-derivatives, alcohols, amino-compounds, etc).

By descriptor value range (between min and max values): The model is suitable for compounds that have the descriptors in the following range:

Desc ID 1 2 3 4 5 6 7

Min 0 0 0.06572 0 0 0.05874 0

Max 0.0135 0.3333 0.1599 51.99 239.0 1 299.3

5.2 Method used to assess the applicability domain

Presence of functional groups in structures

Range of descriptor values in training set with ±30% confidence

Descriptor values must fall between maximal and minimal descriptor values of training set ±30%.

5.3 Software name and version for applicability domain assessment

QSARModel 3.3.8


http://www.molcode.com

5.4 Limits of applicability

6.Internal validation - OECD Principle 4

6.1 Availability of the training set

Yes

6.2 Available information for the training set

Chemname:Yes
SMILES:No
CAS RN:Yes
InChI:No
MOL file:Yes
Formula:No

6.3 Data for each descriptor variable for the training set

All

6.4 Data for the dependent variable for the training set

All

6.5 Other information about the training set

Data points: 215 classification values – 5 classes

6.6 Pre-processing of data before modelling

Standardization and normalization by taking into account the mean and standard deviation

6.7 Statistics for goodness-of-fit

Test Train

Data Mean 0.370 0.323

Data S.D. 0.254 0.290

Error Mean -0.031 0.000

Error S.D. 0.236 0.164

Abs E. Mean 0.163 0.117

S.D. Ratio 0.928 0.566

Correlation 0.499 0.824

6.8 Robustness - Statistics obtained by leave-one-out cross-validation

89% correct predictions of the classes

6.9 Robustness - Statistics obtained by leave-many-out cross-validation

6.10 Robustness - Statistics obtained by Y-scrambling

6.11 Robustness - Statistics obtained by bootstrap

6.12 Robustness - Statistics obtained by other methods

RMS =0.09

7.External validation - OECD Principle 4

7.1 Availability of the external validation set

Yes

7.2 Available information for the external validation set

Chemname:Yes
SMILES:No
CAS RN:Yes
InChI:No
MOL file:Yes
Formula:No

7.3 Data for each descriptor variable for the external validation set

All

7.4 Data for the dependent variable for the external validation set

All

7.5 Other information about the external validation set

7.6 Experimental design of test set

Randomly selected 23 from source (dataset split into training and testing sets)

7.7 Predictivity - Statistics obtained by external validation

See 6.7

7.8 Predictivity - Assessment of the external validation set

The descriptors for the test set are in the limit of applicability

7.9 Comments on the external validation of the model

Overall classification is 77% correct

8.Providing a mechanistic interpretation - OECD Principle 5

8.1 Mechanistic basis of the model

The reaction between the chemical and protein is believed to be covalent in nature. Therefore, skin sensitization is underpinned by mechanisms based on chemical reactivity, where the chemical behaves as an electrophile and the protein behaves as a nucleophile as these are reflected by our descriptors such as Global softness: 1/(LUMO - HOMO) (AM1) and Avg nucleophilic reactivity index (AM1) for H atoms.

8.2 A priori or a posteriori mechanistic interpretation

A posteriori mechanistic interpretation, consistent with published scientific interpretations of experiments.

8.3 Other information about the mechanistic interpretation

The descriptors HA dependent HDCA-1 (AM1) (all) reflects transfer of the compounds to a phase characterized by hydrogen bonding and descriptors as well as the interactions between the O and N atoms (Highest n-n repulsion (AM1) for N - O bonds).

9.Miscellaneous information

9.1 Comments

9.2 Bibliography

Golla S, Madihally S, Robinson RL Jr & Gasem KAM (2009). Quantitative structure–property relationship modeling of skin sensitization: A quantitative prediction. Toxicology in Vitro 23, 454–465.
Gerberick GF, Ryan CA, Dearman RJ & Kimber I (2007). Local lymph node assay (LLNA) for detection of sensitization capacity of chemicals. Methods 41, 54-60.
Loveless SE, Ladics GS, Gerberick GF, Ryan CA, Basketter DA, Scholes EW, House RV & Hilton J, Dearman RJ & Kimber I (1996). Further evaluation of the local lymph node assay in the final phase of an international collaborative trial. Toxicology 108, 141–152.
Kimber I, Basketter DA, Butler M, Gamer A, Garrigue J-L, Gerberick GF, Newsome C, Steiling W & Vohr H-W (2003). Classification of contact allergens according to potency: proposals. Food and Chemical Toxicology 41, 1799–1809.

9.3 Supporting information

Training data set
Training set
Validation data set
test set
Other documents

10.Summary (JRC QSAR Model Database)

10.1 QMRF number

Q17-10-1-241

10.2 Publication date

2010/07/18

10.3 Keywords

skin sensitisation, local lymph node assay, neural network, Molcode, QSARModel

10.4 Comments