cross-entropy – expectation value of log(p).
initialization – randn for weights. use 2/sqrt(input size) if using relu. See He. Avoids blow up
epoch – one run through all data
mini-batch – break up data into 1 gpus worth chunks. Worth trying different values to see
momentum – smooths gradients that are oscillating and lets build up
Adam – combined momentum and RMS prop. Works better often? 0.9 for beta1 and 0.999 for beta2 are common parameters.
Hyperparameter search – random points use log scales for some things.
reevalute your hyperparametrs occasionally
batch normalization – adds a normalization and mean subtraction at every hidden layer. makes later neurons less susceptible to earlier changes
tensorflow – variables – placeholder, make sessions, run a trainer
strategy – fit training set, dev set, test set, real world
use better optimizer bigger network if not fitting training
use more dat, rgularize if not
add weight to realyy important examples
bias – perforance on triainig set – human level is good benchmark
error analysis – ceiling on performance. Find out how many of some kind of problem are happening to figure out what is worthwhile. Do it manually
reasonably robust to random errors in training set
build first fast, iterate fast
if you need to use a different distro from training set, use the real stuff mostly in your dev and test
Break up into train dev and train-dev. so that you can know if the problem is due to mismatch or due to overfitting
manually try to make training set more like dev set on problem cases. Maybe add noise or find more examples of the error prone thing
end to end – use subproblems if you have data for subproblems
And… I can’t access the two last ones yet. Poo.