L28

Today: Last lecture, your future in CS, conclusions

Today

Show some fun stuff for the future
Talk about classes you might take

Final Exam

Mon Mar 20th, 10:00-11:30 am
Arrive a little early to get settled
Time/place logistics will be on the course page by Wed
Similar structure to midterm
1.5 hours, closed note
Enough time to write code, not extra time
Covers the whole quarter, weighted towards post-midterm
Improved final exam score can compensate for a poor midterm score
Exam problems look like homework functions
Not on final exam:
Bit
Writing main()
Calling matplotlib
See course page for review materials: final-prep, previous exams
Lots of problems for practice
Section this week - work some review problems

Chat After Class

I'm happy to talk to people after class at Bytes cafe with any sort of CS questions, then heading over to Durand. (Office hours 2-3 merged into this). Also on Zoom at 3:10 for remote people.

Python Guide

I'm gradually working on and expanding the Python Guide, aiming to keep it as a free resource on the web. If you want to find it in the future, it's linked from my home page and the CS106A page.

Today: Special Timely Topic

I'm going to take 10 minutes to show you something of interest for your future. At the beginning it will look like an innocent Python talk, and it some point it will transition to something you have heard about.

I Have Some Text

This and that and the other.

It's linear - can scan it from first word to last.

Bigrams Structure

For each word in the text, we could note which words come after it, essentially the pairs of words in the text sequence, aka "bigrams". We could build a Python structure, where each word is a key, and its value is a list of all the words that come after that word at some point in the text. As a special case, we'll say the empty string appears before the very first word of the text.

{
 '': ['This'],
 'This': ['and'],
 'and': ['that', 'the'],
 'that': ['and'],
 'the': ['other.'],
}

Gettysburg Text

Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal. ...

And here is what some of the bigrams for the gettysburg address:

{
 '': ['Four'],
 'Four': ['score'],
 'score': ['and'],
 'and': ['seven', 'dedicated', 'so', 'proper', 'dead,', 'that'],
 'seven': ['years'],
 'dedicated': ['to', 'here', ...],
 ...
}

Bigrams - A Model

The bigrams structure of the text is .. what is it? It's a sort of a summary. It does not replicat the text, but it has much of the text distilled into it. We might call it a "model of the text. The word "model" appears frequently in computer systems that are trying to work with masses of real data.

Do What? Random Generation

We could use the model to generate random text that is loosely based on the input text.

Algorithm - "chase" through the bigrams to create text.

1. Start with a word, e.g. "Four" to start. This is the first word of the output.

2. Look at the list of words that come after it.

3. Choose one of those words at random as the next word. Repeat.

alt: chase through the bigrams

Output: Four score and dedicated here ...

Demo: Gettysburg

Here are some random output from the Gettysburg bigrams. They do not make a ton of sense, but you can see how the model is replaying bits of the original, a weak copy of the original human authorship.

1. Four score and dedicated here highly resolve that all men are created equal. Now we can never forget what we can not dedicate a final resting place for us -- that nation so conceived in Liberty, and so nobly advanced. It is rather for the proposition that all men are created equal.

2. Four score and so nobly advanced. It is rather for those who struggled here, have thus far above our fathers brought forth on this continent, a great battle-field of devotion -- that we should do this. But, in Liberty, and dead, who here have consecrated it, far above our fathers brought forth on this continent, a final resting place for which they gave the unfinished work which they gave the people, for which they who fought here highly resolve that these dead we say here, have thus far above our poor power to add or detract.

Demo: Alice in Wonderland

Gettysburg is so short, it really limits how well the model can do. Try some longer texts.

Alice's elbow against herself, to queer things -- at once tasted -- at once, with an atom of the house down!' said the Dodo, `the best plan.' It was the Mock Turtle at the OUTSIDE.' He unfolded the way through all like that!' screamed the glass table set to speak severely to twenty at this, that was peering about ravens and ending with its voice.

Demo: Tale of Two Cities

A wonderful quickness, and beautiful. “Eighteen years!” said Mr. Barsad was coming and table, covered by Madame Defarge was sleep or less—he stationed Miss Pross and alluvial mud, to themselves to use them,' I was present. Let me where anything to her, on Carton. Some of things to do you to-night.

What did We See There?

Model distills some of the sense of the original

Can create kind of lame echo - re-mixing elements from the original. It's pretty good considering that all it does is look at pairs of words.

Does not have intelligence. It is remixing/replaying elements from the original.

This gives it the appearance of intelligence, since it is remixing something written with intelligence.

This is How Chat GPT Works

If our bigrams code is like a rollerskate, then Chat GPT is like a 747. Chat GPT, at a guess, is the product of 10 PhDs working for 10 years.

The important similarity is that you have a human, intelligently created body of work. The computer absorbs this into a model.

1. Chat GPT - Maybe Not Intimidated

Reasons to not be intimidated
It's just replaying/remixing
Not having an intelligence about the world
This is why it confidently generates falsehoods
It's going off patterns
Not knowledge of the world
Some of the appearance of intelligence is due to replaying the intelligence of the source text

2. Chat GPT - Revolutionary

Maybe it is revolutionary actually
The model is capturing the source nuance better and better
Perhaps that does form knowledge about the world?

Replay Reconsidered

We think of the human brain as "thinking"
But actually perhaps "replay" is commonly used
Muscle memory, or making a joke in particular situation
Maybe the human brain is not entirely different from what Chat GPT is doing?

Two Predictions - 1 Bad and 1 Good

Two predictions - a bad one and a good one. Let's do the bad one first.

Preamble - Google Search Model

alt: google search model

Say we have an author, and they create some content on the web that is good - using their human intelligence. Say it's a recipe.

Google finds this page, and shows it to you when you are searching for that, plus puts some ads around it. This model can be pretty great, making a lot of knowledge available.

Note that Google algorithm is trying to show you the best page. Google does not have a nefarious purpose to show you a not-great page. But it's just an algorithm.

SEO

Search Engine Optimization

The idea is that people create knock-offs or just copies of the original good content, cover them with extra ads. The knock-offs are optimized to appeal to the google algorithm, sort of cat-vs-mouse as google tweaks the algorithm to try to favor the good content, and the SEO pages tweak to match the new algorithm. Note that the SEO pages are not necessarily the best for the end user. They are optimized to fool the Google algorithm. That's why it is called: Search Engine Optimization. alt: google with SEO

I'm sure you have seen SEO type pages when you are looking for something. Often the do have the answer in there, but with lots of ads and click-through. You kind of miss the regular old page that had the answer, and wonder where it went.

GPT Bad News: SEO

I believe Chat GPT will supercharge the SEO downsides of the internet. I fear the technology makes it so cheap to semi-plagiarize content, filling the internet with many not-best quality knockoffs.

e.g. Reddit - Looking for an Opinion

Theme: sub-reddit type discussions are an often-overlooked part of the value of the internet.

Suppose you are trying to read people's opinions in a sub-reddit about mechanical keyboards, or knitting needles, or kittens, or whatever. Seeing the opinions of other random people can be fantastic, and reddit is a great example.

But the bad scenario here is that GPT fills the domain with a blizzard of fake, biased information, or trolling, it's hard to find the actual opinion.

Now Google and Amazon and Reddit are not powerless here, but they are going to have to work at it. A valuable feature of content may be that you know it has a real human, unbiased author. Perhaps there will be a shift to knowing about the author vs. just the content itself.

2. Good: Super Auto-Correct

Many sentences or bits of code to write, look like other sentences and bits of code. I expect Chat GPT type technology will appear where we are authoring as a sort of co-pilot or super auto-correct. A bit like spelling correction now - a helper filling things in, so we get work done quicker.

This is definitely going to happen, and I expect we will be better off for it.

That's our Chat GPT futurism discussion. Now let's talk about CS106A and later courses.

Preamble: What is the Role of CS106A?

What is the role of CS106A?
It is not to browbeat people into liking CS!
It's to lay out CS honestly
See the nature of code
See how to solve this problem
And then you pick what's best for yourself

Life After CS106A

CS106A - You Will Never Not Know This

Things said on the first day you now know
The computer just follows mechanical instructions
There will be bugs
The programmer has the insight, directs the computer
We've done this so much, it scarcely needs mentioning
Concatenate

You will never not know this nature of the computer.

I imagine 20 years from now, you are playing trivia with friends, and the word "concatenate" or "off by one error" comes up, and it's all going to come back to you.

Learned All The Programming Techniques?

No!

Learned the Important Core

Here is the deal: Python and the space of all programming techniques is very large. A bigger space than you might think.

You have learned the most important 80% core: loops, lists, strings, functions, tests, files

There's a few more important techniques in CS106B. Most programs, even very advanced programs, are centered around those core features. If you need to use a less-common technique, you may look it up and figure out as you go. That's how most programmers proceed - the core they know well, the other stuff they look up as they go.

Women in CS Trend - 1994 - 2020

Slide from Mehran Sahami, Stanford CS-Education. The bars is number of students. The line is the percentage of women. Both are going up which is great, and it looks like a gradual broadening of the field.

alt: increasing percentage of women in CS

Fact: Programmer Shortage

A fact for you to internalize
What percentage in the US work as programmers?
Walking around Stanford .. you would think it's like 25%
Approx 1% of the population work formally as programmers
Formal "programmer" jobs in the US statistics
There are additional programmer-adjacent jobs, but still a tiny number
For comparison approx 12% of population works in health care
Lots of programming problems are not being solved due to lack of programmers
In other words, there is a programmer shortage
Very high pay for programmers is another bit of evidence of the shortage
Not that we need to make any Stanford student's head any bigger...
But by these numbers, knowing Python makes you a little special

Why is there a Programmer Shortage?

Coding seems impossible
People are scared off before trying, weird syntax
Don't see themselves as programmers, keep a distance
Maybe they tried it and just don't like it!

Nick Python T-Shirt Story

I was on a bicycle, wearing ratty clothes and a "Python" t-shirt stopped for a red light. A person walking in the cross-walk in front of me, stopped, turned to me, and asked if I was looking for work.

Not to disillusion you about graduating from Stanford, but that is not how hiring is normally done.

Like how desperate for programmers was that person? That is what an extreme programmer shortage looks like!

Fear: Programmer Recession?

No. For the last 30 years in Silicon valley. There have been perhaps 3 1-year stretches where demand for programmers fell. The other 27 years were dominated by white-hot, very strong demand for programmers. I expect demand for programmers to be undiminished. That historical pattern has been super strong.

Background: Many Computer Languages

There are many different computer languages
Python, C++, Javascript, Java, C, Rust
Each language is good at different things

Python Niche - Programmer Efficient

Python is great at letting the programmer express their ideas with minimal code
At what cost?
Python runs slow and uses more memory than code in other languages
In some sense, shifting costs from the programmer to the computer
That's a great tradeoff in many situations!

Code Ideas We've Seen in Python

Code ideas we've seen in Python
Storing data
ints, floats, strings, lists, dicts
Language features:
functions, parameters, strings, loops, if-logic, lists, dicts
Good programming style:
Divide and conquer, decomposition
Testing functions
Readability

Your Second Programming Language

Your second computer language
C++ or Javascript or Java or whatever..
Has those some features:
ints, loops, strings, if-statements, ..
Computer languages are 80% similar to each other
Different syntax - superficial
Your second language is surprisingly easy to learn
(you may be skeptical)
Python has a "light" syntax
other languages have more to type in
C++ (CS106B)

Here is some C++ code

// comments start with 2 slashes
int i = 0;                 // must declare var
while (i < 100) {          // parens + braces
    i += 1;                // same as py + semicolon
    if (is_bad(i)) {       // parens + braces
        return;
    }
    i += "Hello";          // error detected
    // int/string types different,
    // so above does not work.
    // Error is flagged at edit-time:
    // earlier than python, an improvement
}

C++ code looks different
Actually 80% familiar
Picking up C++ will not be a big problem
Advantage of heavier syntax: more automatic error detection
Also C++ runs much faster than Python
Disadvantage: more to type in
Fun fact: the Python interpreter is itself written in the language C, related to C++

Possible Next Steps

Most Stanford students take 1 or 2 CS classes and keep with their chosen major. It's easy to imagine they use Python here and there as part of their work.

After CS106A ..

Even if you don't write code any more
Now you know code is not a magic box
Algorithms thought up by people
Expressed as code for the computer
Code some more..
Perhaps a little code alongside, say, your bio research
Interested in more CS .. take CS106B
See how far you like going
Hidden agenda:
Many students coming to Stanford don't see themselves in CS
CS106A tries to pick off a few was-not-planning-to-like-CS students

Next "CS106" CS106B

The next step in CS - mixture of coding and CS
Coding problems are harder and more impressive compared to 106A
Has section leaders
Many non-CS-majors take this
More powerful algorithms
Uses C++ language - don't worry about this
Recursion (beautiful) .. e.g. solving a maze
A sort of jaw-dropping idea when you get it
Really understand: hash table (dict), sorting algorithms
More hands-on use of memory

If you want to take CS106B, we generally recommend taking it within 6 months. It happens topics and workload in CS106B go up or down somewhat depending on who's teaching it, so you have to think a little bit which quarter is right for you. Any version of CS106B is fine for going on in the major.

Think About Section Leading

Few schools have this opportunity
Section Leader program - amazing thing at Stanford
Section leaders are drawn from students who have completed CS106B
Don't need to be a CS major
Section leaders - not for everyone
like code, like helping people
Open secret: SLs pick up fantastic skills
debugging, organizing ideas, public speaking, confidence

Map of CS Major

Programming core sequence:
CS106A, CS106B, CS107, CS111
CS Mathematical core sequence:
CS103, CS109, CS161 (integer mathematics, probability)
Then there are more courses in an area of concentration

Aside: What is CS Integer Mathematics?

CS Integer mathematics (vs. calculus real-number)
int div and mod (// %)
Hash Table (dict) - how is this so fast?
Not so much: integrals, differential equations
Calculus is not the ceiling, it's a branch
Integer mathematics is its own domain
Relates to list/string algorithms we've done
Personal take:
I was tired of Calculus, but integer mathematics seemed neat and applied
CS106B goes down the path of integer mathematics a little

CS Major Tracks / Concentrations

If interested in majoring in CS
Don't have to pick an area at first
Undergraduate CS concentration areas:
Artificial Intelligence (AI)
Human Computer Interaction (HCI)
Systems (operating systems, networks)
Graphics
Biocomputation
Choose a concentration, take advanced courses in that area
HCI - call this one out, since most don't realize this field exists (below)
CS Minor is a good deal
Up through CS107 and CS109 + 2 electives
This gives a strong CS background
Minor is a better idea than double-major

Some Select Courses

We'll just mention a few courses you could take, build the picture that there are many different areas of CS you might explore. Many of these require CS106B as the pre-requisite.

Scientific Python CME 193

Python and scientific computing
Prereq: CS106A basically
1 unit s/nc
Applied Python (vs. CS fundamentals)
See CME193

Human Computer Interaction (HCI) CS147

How to design systems to work well with humans
Details below about HCI
Prereq: CS106B
See CS147 intro HCI

Web Applications CS142

Building a web server and pages
Prereq: CS107
See CS142

Graphics CS148

3d imagery
Prereq: CS107, Math 51
How does Mario Cart work?
See CS148

Applied Machine Learning CS129

Machine Learning is the cutting edge AI technique
Drawing conclusions from data
Prereq: CS106B, linear algebra / Math 51
See CS129

Human Computer Interaction - HCI Design

What if you don't want to write code all the time?
An unexpected part of CS
Human Computer Interaction Design
Anyone working in management in a computer field should take this course
You can do an undergrad or grad emphasis in HCI
More info: CS147 intro HCI
aka "interaction design" - product manager, designer
Demo Image search: push pull handle HCI
Door Push-Pull Handle
The appearance communicates to the subconscious
Great design works without the user thinking!
(analogy: film making to create an emotion in the audience)
Mostly we notice HCI in the world when it is done badly
You click a control, and are surprised by what happens
Nick's Open Question: what is the greater drag on human potential on earth:
Missing software, not coded up yet (CS106B)
Software with bad HCI design (CS147)

Symbolic Systems Major

A sibling to the CS major - similar intellectual domains but less focus on coding

An interdisciplinary major that uses the lenses of CS, Philosophy, Psychology and Linguistics to study systems that use symbols to represent information. In Symsys you can concentrate on AI, Neuroscience, Natural Language, Philosophical Foundations or design your own concentration.

Big Data - Machine Learning

Very hot areas in CS these days...
Big Data
Machine Learning
Think about the Ghost project

Ghost Example

Suppose you show Ghost to your parents
Ask them how it works
Like does it "think" about the images?
No!
But now you know the internal story..
Break it into a bunch of little numbers
Programmer has an algorithm, computer follows the code
This points towards Machine Learning...

Machine Learning Sketch

This is just a sketch of the idea of Machine Learning
Machine Learning - computed insights
Cancer cell grading research
e.g. Show the computer many example cell slides
Program the computer to break out many different metrics of each slide
dark/light, dark/light of boundaries, colors, number of cells, texture, ...
Do not pre-bias the computer about the meaning of the various metrics
Also give the computer the outcome data associated with each slide
What if you have a million such slides
Let the computer recombine all the data, sift out the patterns
Not insight like a human
Insight by looking at masses of data, guided by a human plan

Machine Learning - Programmer + Computer

Machine Learning is solving real problems in the world
Much more sophisticated math than in Ghost
The math insight / framework is from the programmer
The computer is sifting through the details

Self-Driving Cars - Machine Learning

I think this will work someday, at least for freeways
A vision / radar problem - see all the possible collisions
Machine Learning to try to recognize the things around the car

Needs to work for 100% of Cases

Part of getting code to work is that you need to chase down those rare, difficult cases as well.

Below is a difficult case for the self-driving logic.

alt: bike attached to back of car

Thanks To Elyse and the Section Leaders

Thanks to Elyse and the section leaders! The only way this course can work is with their prodigious and generous efforts. Juliette and the section leaders are a tribe selected for technical skill and generosity - a fantastic group of people and we are lucky to have them.

Where is the Magic in CS?

The computer seems magic
Such neat output
But where is the magic?

alt: ghost input image with foot in the way

Where is the Insight? The Power?

Where is the power in this story?

You are the power in this story
You have an insight about a problem to solve in the world
Your idea makes the algorithm
Python is just your instrument
The computer solves a real problem, driven by your idea
This is great story

Fare Well Python Programmers!

In closing, I'll say that teaching this class is very satisfying endeavor - it's great to see the light in someone's eyes when the power we know in CS starts working for that student.

Best of luck with your future projects!