Deep Learning derivations, Naive Bayes and Logistic Regression implementation.

Resources


NOTE: We will not be accepting late submissions.


Logistic Regression Efficiency Tips

by Noah

NOTE: There is no speed requirement for this assignment. We will accept any working code. However, if you want to make your code run way faster, read on!

Most of the content here is about using numpy to your advantage. If there are two things to take away, they are:

  1. Store all values in numpy arrays.
  2. Do operations in bulk, not one at a time.

The pseudocode in the lecture notes has three nested for-loops in it. The outermost loop is for training steps/iterations, the next loop in is over the datapoints, and the innermost loop is over the features. In ML, we don't like for loops too much and prefer to do operations in bulk to matrices/arrays. We can actually eliminate the inner two loops if we would like. The good news for you is that you can get your code working pretty fast without eliminating loops. I have written three versions of the code with decreasing numbers of loops (and as a result decreasing runtimes), and I will talk about each one:

1. Three Loops (~2 min runtime on Netflix for me)

The speedups I talk about here are the most important ones for getting your runtime down to something reasonable. The speedups I used for this version are:

  1. Load your data in one time and store it in arrays - When you load your data in, you should store the y's in one array/vector of length n and you should store your x's in one big array of size n by m. You can access a whole row of a 2D np array using arr[i,:] and assign a whole row using arr[i,:] = vec.
  2. Store theta and your gradient in np arrays - This allows you to take the dot product between theta and x using np.dot. It also lets you update theta efficiently using the fact that numpy conveniently overloads + to work with arrays of the same size and * to work between a scalar and an array.
  3. Part of the calculation for the gradient is the same for every feature - cache this value before starting the innermost for loop.

2. Two Loops (~1 min runtime on Netflix for me)

In this version, I eliminate the for loop over features. If you implemented everything I described in part 1, then you already have the tools to do this. Basically you can calculate the gradient for a whole datapoint at once using numpy's overloaded * operator.

3. One Loop (~1 second runtime on Netflix for me)

As I said above it's actually possible to do this with only the for loop over training steps/iterations. This is the hardest version to wrap your mind around, but it should shorten your code significantly (I only have three lines in my for loop) and will give you a huge speedup. I especially recommend doing this version if you plan on doing more with ML in the future, since the tricks you need to do this are the bread and butter of most ML software like TensorFlow/PyTorch/Caffe. Some useful tricks for getting here are:

  1. Matrix multiplication is your friend - np.matmul can let you multiply two matrices together or a matrix by a vector.
  2. Broadcasting is crucial - Read up on broadcasting in numpy. Basically if you have some task where you want to perform the same operation or similar operations on every part of an array, numpy can do it! Sometimes the syntax is a little bit tricky (I had to use Stack Overflow once or twice while writing my code), so make sure to test each piece of syntax you use on a small standalone example so that you know your code is working correctly.

I hope some of the stuff here is helpful. Feel free to follow up on Piazza (but try not to post chunks of code there) and I will do my best to reply!

Happy coding!

-Noah