Due: Wed, June 5th, 11:59P.M.
Deep Learning derivations, Naive Bayes and Logistic Regression implementation.
Most of the content here is about using numpy to your advantage. If there are two things to take away, they are:
The pseudocode in the lecture notes has three nested for-loops in it. The outermost loop is for training steps/iterations, the next loop in is over the datapoints, and the innermost loop is over the features. In ML, we don't like for loops too much and prefer to do operations in bulk to matrices/arrays. We can actually eliminate the inner two loops if we would like. The good news for you is that you can get your code working pretty fast without eliminating loops. I have written three versions of the code with decreasing numbers of loops (and as a result decreasing runtimes), and I will talk about each one:
The speedups I talk about here are the most important ones for getting your runtime down to something reasonable. The speedups I used for this version are:
In this version, I eliminate the for loop over features. If you implemented everything I described in part 1, then you already have the tools to do this. Basically you can calculate the gradient for a whole datapoint at once using numpy's overloaded * operator.
As I said above it's actually possible to do this with only the for loop over training steps/iterations. This is the hardest version to wrap your mind around, but it should shorten your code significantly (I only have three lines in my for loop) and will give you a huge speedup. I especially recommend doing this version if you plan on doing more with ML in the future, since the tricks you need to do this are the bread and butter of most ML software like TensorFlow/PyTorch/Caffe. Some useful tricks for getting here are:
I hope some of the stuff here is helpful. Feel free to follow up on Piazza (but try not to post chunks of code there) and I will do my best to reply!