Table 2. Central parameters of a neural network and recommended settings
NameRangeDefault value
Learning rate0.1, 0.01, 0.001, 0.00010.01
Batch size64, 128, 256128
Momentum rate0.8, 0.9, 0.950.9
Weight initializationNormal, Uniform, Glorot uniformGlorot uniform
Perā€parameter adaptive learning rate methodsRMSprop, Adagrad, Adadelta, AdamAdam
Batch normalizationYes, noYes
Learning rate decayNone, linear, exponentialLinear (rate 0.5)
Activation functionSigmoid, Tanh, ReLU, SoftmaxReLU
Dropout rate0.1, 0.25, 0.5, 0.750.5
L1, L2 regularization0, 0.01, 0.001