March 12th, 2021
To gain a high-level understanding of deep learning, how it's implemented, and why it works.
Q: welcome back chris! can you see the question now?
Q: yay bye I am off the class
A1: have a great class :)
Q: di you still need qa test?
A1: we are good!
Q: this stuff is so cool its incredible
Q: Where's Stanford art class to analyze computer generated paintings?
A1: that sounds so fun. i have a friend in the art department and ill ask :)
Q: So when a neuron fires is that like y-hat being 1 and if it doesn't fire then y-hat is 0?
A1: more precisely the “firing” is the output of a sigmoid… so a number between 0 and 1… for the prediction at the end, we interpret a sigmoid output >0.5 as a prediction of 1
Q: hi is there any update on maybe having review slides for quiz 3?
A1: we are looking into it — historically we haven’t had slides for quiz 3. I have just heard that an amazing TA wants to volunteer to make one. They said you could expect something before monday
Q: How do you pick the number of hidden layer values?
A1: its engineering. often its just an instinct based on how much data you have. There is work on algorithms that can chose its own structure…
Q: And what new sets of thetas are we using? Is this new training data?
A1: the thetas are tunable parameters, not the training data. But we will use training data in order to choose values for our thetas. Initially all thetas are random numbers
Q: How would it come up with the hidden layer inputs using all of the data? Do we separate the original picture into sections to create the inputs to the hidden layer?
A1: its simply MLE with gradient decent! Just the gradient descent updates all the thetas!
Q: So when we choose new versions of theta sets, we stray a bit from the MLE set of thetas?
A1: It is still MLE. With more thetas, you simply need a more sophisticated set of chain rule in order to find the derivative of LL with respect to each and every parameter
Q: awesome!!! thats so helpful thank you so much!!
A1: our pleasure
Q: How do you decide what the layers should be (ex from the demo - how do you decide that the first layer finds edges, the second layer finds corners and the third layer puts them together to get a number)?
A1: thats the truly amazing thing. Nobody decides. Instead gradient descent on log likelihood generates that phenoma naturally. one of the reasons this is so wild is that its very similar to what human brains do (v1 cortex is edges, v2 is parts, etc)
Q: so the goal of hidden layer is to weight different theta output changed by gradient descent in the previous layer?
A1: that sounds about right. the previous layer gives its output, then it goes into a next “hidden” layer as input. That hidden layer is weighting, summing, and squashing its input