|
ALOGPS 2.1 is described
in two artilces:
Application of Associative Neural Networks for Prediction of Lipophilicity
in ALOGPS 2.1 programIgor V. Tetko,1,2
Vsevolod Yu. Tanchuk,2
1-Laboratoire de Neuro-Heuristique, Institut de Physiologie, Rue du
Bugnon 7, Lausanne, CH-1005, Switzerland
2-Biomedical Department, Institute of Bioorganic & Petroleum Chemistry,
Murmanskaya 1, Kiev-660, 253660, Ukraine
This article provides a systematic study of several
important parameters of the Associative Neural Network (ASNN), such as
the number of networks in the ensemble, distance measures, neighbor functions,
selection of smoothing parameters and strategies for the user-training
feature of the algorithm. The performance of the different methods is assessed
with several training/test sets used to predict lipophilicity of chemical
compounds. The Spearman rank-order correlation coefficient and Parzen-window
regression methods provide the best performance of the algorithm. If additional
user data is available, an improved prediction of lipophilicity of chemicals
up to 2-5 times can be calculated when the appropriate smoothing parameters
for the neural network are selected. The detected best combinations of
parameters and strategies are implemented in the ALOGPS 2.1 program that
is publicly available at https://vcclab.org/lab/alogps.
Current status: published in J. Chem. Inf. Comput. Sci.,
2002,
42, 1136-1145.
Prediction of n-Octanol/Water Partition
Coefficients from PHYSPROP Database Using Artificial Neural Networks and
E-state IndicesIgor V. Tetko,1,2
Vsevolod Yu. Tanchuk,2 Alessandro E. P. Villa1
1-Laboratoire de Neuro-Heuristique, Institut de Physiologie, Rue du
Bugnon 7, Lausanne, CH-1005, Switzerland
2-Biomedical Department, Institute of Bioorganic & Petroleum Chemistry,
Murmanskaya 1, Kiev-660, 253660, Ukraine
A new method, ALOGPS v 2.0 (https://www.lnh.unil.ch/~itetko/logp/*),
for the assessment of n-octanol/water partition coefficient, logP, was
developed on the basis of neural network ensemble analysis of 12908 organic
compounds available from PHYSPROP
database of Syracuse Research Corporation.
The atom and bond-type E-state indices as well as the number of hydrogen
and non-hydrogen atoms were used to represent the molecular structures.
A preliminary selection of indices was performed by multiple linear regression
analysis and 75 input parameters were chosen. Some of the parameters combined
several atom-type or bond-type indices with similar physico-chemical properties.
The neural network ensemble training was performed by Efficient Partition
Algorithm developed by the authors. The ensemble contained 50 neural networks
and each neural network had 10 neurons in one hidden layer. The prediction
ability of the developed approach was estimated using both leave-one-out
(LOO) technique and training/test protocol. In case of inter-series predictions,
i.e. when molecules in the test and in the training sub-sets were selected
by chance from the same set of compounds, both approaches provided similar
results. ALOGPS performance was significantly better than the results obtained
by other tested methods. For a subset of 12777 molecules the LOO results,
namely correlation coefficient r2=0.95, Root Mean Squared Error, RMSE=0.39,
and an absolute mean error, MAE=0.29, were calculated.
For two cross-series predictions, i.e. when molecules
in the training and in the test sets belong to different series of compounds,
all analyzed methods performed less efficiently. The decrease in the performance
could be explained by a different diversity of molecules in the training
and in the test sets. However, even for such difficult cases the ALOGPS
method provided better prediction ability than the other tested methods.
We have shown that the diversity of the training sets rather than the design
of the methods is the main factor determining their prediction ability
for new data. A comparative performance of the methods as well as a dependence
on the number of non-hydrogen atoms in a molecule is also presented.
Current status: published in J. Chem. Inf. Comput. Sci.,
2001,
41,
1407-1421.
*- unfortunately after termination of Dr. Tetko's
work in Lausanne this site is no more supported. Please, contact Dr.
Tetko if you need the old version of the program.
See FAQ if you have questions. How to cite this applet? Are you looking for a new job in chemoinformatics? | |