|
PARTITION
Keyword of Integer
Type Indicates how to subdivide initial training data set on learning and
validation set.
no validation {0}
-- use all data in the learning set. The neural network fits all data as
much as possible. validation {1}
-- the data entries determined by keyword VALIDATION
are used to monitor performance of neural networks. The statistical coefficients
are calculated for two early stopping points (the
first point, S1, corresponds to minimum RMSE for the validation set, and
the second, S2, to RMSE minimum for the whole set) and minimum RMSE achieved
by neural network. random sets {2}
-- the same as procedure as validation, but in this case the validation
set is selected by chance for each neural network in ensemble (see Figure).
Thus, all networks are characterized by their own learning and validation
sets. This allow to estimate LOO coefficients for the whole initial training
set. EPA {3} -- performs
weighted selection of data entries for learning/validation set as described
in Tetko & Villa, 1997.
Besides weighting procedure the selection of data in both these sets is
done by chance as with the random sets option. We recommend to use random sets or EPA options. These
options, by our opinion and experience provides the most correct estimation
of the generalization of neural networks and does not suffer from overfitting/overtraining
problems, as discussed in our publications.
The other two options are provided mainly for comparison. The default value and suggested value is random {2}.
See FAQ if you have questions. How to cite this applet? Are you looking for a new job in chemoinformatics? | |