Linguist 278: Programming for linguists

Course MW 10:30-12:20 am, Encina West 202 (Serra St, past Hoover Tower)
Course email discussion: Use Piazza! You can still post privately. I'm the only instructor.
InstructorChristopher Manning
Office hrsMonday 3:00-4:00, Friday 1:30-2:00
OfficeGates 248


Plan Assignment Reading/Reference
Sep 26 Week 1
  1. Strings, integers, lists
  2. The basics of for loops
  3. Notebooks: (right click on these and choose Download/Save link as... or similar)
  4. First class (installing, etc.) IPython Notebook [zip file] [html (read only)]
  5. Monty Python scene 1 IPython Notebook [zip file]
  1. Automate the boring stuff: Chapter 1 Python Basics
  2. Python tutorial: An informal introduction to Python
Sep 28
  1. In-class exercises and assignment 1: [iPython notebook] [zip file] [html file]
  2. The basics of functions
  3. (Some) built-in functions and methods
  1. Assignment 1 out (integrated into in-class exercises) [due Oct 5]. Please submit by emailing to manning@stanford.edu, with linguist278 in the subject line.
  2. Assignment 1 solutions: [ipynb] [html] [zip]
  1. Textbook: We've only covered parts of what are in the next 3 chapters so far, so read selectively as needed! Or read it all if you're super keen. We'll do much of the rest on monday
  2. Automate the boring stuff: Chapter 2 Flow Control
  3. Automate the boring stuff: Chapter 3 Functions
  4. Automate the boring stuff: Chapter 4 Lists
  5. Reference
  6. str
  7. Numeric types
  8. Built-in functions
Oct 3 Week 2
  1. In-class exercise and assignment 2: [zip file] [html]
  2. Google Books 1-gram sample
  3. Dicts, tuples
  4. File handling
  5. Control statements and iterators
  6. No class on Oct 5!
  1. Assignment 2 out (integrated into in-class exercises) [due Oct 12].
  2. Assignment 2 solutions: [ipynb] [html]
  3. Assignment 1 due. Please submit by emailing to manning@stanford.edu, with linguist278 in the subject line.
  1. Control flow tools
  2. Reading and writing files
  3. Automate the boring stuff: Chapter 5 Dicts
  4. Reference
  5. Python tutorial: Control Flow
  6. Python tutorial: Data structures (including dicts)
  7. Library reference: dict
Oct 5
Oct 10 Week 3
  1. In-class exercise: [zip file] [html] Alice in Wonderland: [text] [excerpt]
  2. More flow control: while, break, continue
  3. Slices of lists and strings
  1. Textbook: On Monday, we cover more of chapter 2 and 4. We've now covered chapter 1, most of chapter 2 and 4 and some of chapters 3 and 5. So, it's basically okay to read all of chapters 1–5. On Wednesday, We'll be moving on to covering chapter 6 and 7 stuff!
  2. Automate the boring stuff: Chapter 2 Flow Control
  3. Automate the boring stuff: Chapter 4 Lists
  4. Reference
  5. Python tutorial: Control Flow
Oct 12
  1. Regular expressions
  2. Slides: [pdf] [pptx]
  3. Learn regular expressions interactively: RegexOne
  4. IPython notebook and assignment: [zip] [html]
  5. English transliterations of Gaddafi: ABC News 2009, Business Insider [written by someone a bit more linguistically competent!], The Atlantic
  1. Assignment 3 out [due Oct 19]

  2. Assignment 3 solutions: [ipynb] [html]
  3. Assignment 2 due.
  1. Automate the boring stuff: Chapter 6 Manipulating Strings
  2. - We've already seen some of this, but they cover some other useful methods
  3. Automate the boring stuff: Chapter 7 Pattern matching with regular expressions
  4. Reference
  5. Python HOWTOs: Regular Expressions
  6. Python library: Regular Expressions
Oct 17 Week 4
  1. No class!
  2. Sorry, but I'm out again - NSF review panel, this time. :(
  1. Defining functions
  2. More on keyword arguments
  3. matplotlib site
  4. Python 3 collections module (OrderedDict, Counter, etc.)
  5. operator module
Oct 19
  1. IPython notebook and assignment: [zip] [html]
  2. Data preparation:
    my script: [clean-text.py]
    one-sentence per line: [txt]
  3. Keyword arguments to functions
  4. matplotlib basics
  5. The collections and operator libraries (OrderedDict)
  1. Assignment 4 out [due Oct 26]
  2. Assignment 4 solutions: [ipynb] [html] [zip]

  3. Assignment 3 due.
Oct 24 Week 5
  1. Working at the command line
  2. Unix command-line text tools
  3. Slides: [pptx] [pdf] [6up pdf]

  4. Doing data science: Pandas
  5. IPython notebook and assignment: [zip] [html]
  1. Assignment 5 out (see notebook to left) [due Nov 2]
  2. Assignment 4 due.
  1. pandas (Python Data Analysis Library)
Oct 26
Oct 31 Week 6
  1. More Unix command line tools
  2. Slides: [pptx] [pdf] [6up pdf]
  3. Working with Python programs in files
  4. Text editors and IDEs
  5. Using command-line arguments in Python
  6. Class notes: [iPython notebook] [zip file] [html file]

  7. A first look at working with webpages: (i) loading pages with webbrowser, decoding URLs; (ii) using requests to grab text or contents of a page, showing that in ipython; (iii) learning about HTML, view source, developer tools; (iv) using BeautifulSoup (bs4: BeautifulSoup(), select(), Tag get()
  8. "Scraping" web pages - downloading and cleaning them
  9. More on working with files
  10. ipython in a terminal
  11. Processing XML/HTML
  12. Toy HTML file
  13. Programs for web pages that I wrote in class
  1. Assignment 6 out [due Nov 9]: [zip]
  2. Begin thinking about final project
  3. Assignment 6 solutions: [zip file] [duedates.py] [webscraper.py] [amazon.py] [congress.py]
  4. Assignment 5 due.
  1. Argparse Tutorial
  2. argparse module

  3. Automate the boring stuff ch. 8: Reading and writing files. Look through this. We've seen some of it, but: learn about the os module; learn how Windows machines use backslash instead of slash as a path separator; and learn how to write files as well as reading them.

  4. Automate the boring stuff chapter 11: Web Scraping. You don't need to read the part about Selenium though. And the I'm feeling lucky project no longer works with modern google.
  5. webbrowser
  6. Requests: HTTP for Humans
  7. BeautifulSoup

  8. As unstructured data heats up, will you need a license to webcrawl?, GigaOm, 2012/04/22.
Nov 2
Nov 7 Week 7
  1. Getting all the data
  2. Unicode
  3. XML
  4. Back to HTML
  5. Notebook: [zip file]

  6. CSV library
  7. JSON
  8. Notebook: [zip file]
  1. Assignment 7 out [due Nov 16]
  1. An understandable guide to the Python str format() method for printing nicely formatted strings
  2. Philip Guo's brief introduction to Unicode and Python (2 and 3!)
  3. Dive into Python 3: Chapter 12: XML
  4. lxml

  5. Automate the boring stuff: Chapter 14 – Working with CSV Files and JSON Data
  6. Python 3 CSV library
  7. Python 3 JSON library
Nov 9
Nov 14
Week 8
  1. Advanced Python: [zip file]
  2. More collections
  3. List (and other) comprehensions
  4. Classes
  5. Exceptions

  6. More Advanced Python:
  7. [zip file]
  8. Iterators
  9. Catching exceptions; Opening files with with
  10. Lambda functions
  1. Assignment 7 due
  2. Assignment 8 out [due Nov 30]: [zip file]
  1. Dive Into Python 3: Ch. 3. Comprehensions
  2. Dive Into Python 3: Ch. 7. Classes and Iterators
  3. Python 3 docs: 8.3. Collections
  4. Python 3 docs: 9. Classes

  5. Dive Into Python 3: Ch. 6. Closures and Generators
  6. Dive Into Python 3: Ch. 7. Classes and Iterators
  7. Dive Into Python 3: Ch. 8. Advanced Iterators
  8. Python Course: Errors and Exceptions
  9. Python 3 docs: 4.3 Exceptions
Nov 16
Thanksgiving recess
Nov 28
  1. More advanced language processing
  2. Using NLTK [zip file]
  3. Natural Language Processing (NLP): part-of-speech (POS) tagging, named entity recognition (NER)
  1. NLP
  2. NLTK
  3. Spacy
  4. gensim (topic modelling for humans)

  5. Machine Learning
  6. scikit-learn

  7. File paths, etc.
  8. Automate the Boring Stuff: Chaper 8: Reading and Writing Files has os.path examples and Windows examples!
  9. os (.listdir(), .getcwd(), .chdir(), .mkdir(), .scandir())
  10. os.path (.join(), .split())
  11. glob (.glob(), .iglob(), .escape())
  12. pathlib (.Path(), path.open() objected-oriented with operators like '/')
Nov 30
  1. Machine learning classifiers
  2. Classifiers
  3. More NLTK
  4. scikit-learn
Dec 5
  1. Speech/Audio
  2. Statistical models
  3. [zip file]

  4. Getting code working
  5. LDA
  6. scikit-learn
  7. [zip file]
  8. [LDA slides PDF]
  1. Final project [due Dec 15, 11:30 am]
  1. The SciPy stack specification
  2. Stanford IT site
  3. LDA packages:
    1. lda
    2. sk-learn LDA (example of use)
    3. Gensim
    4. pyLDAvis
Dec 7