Linguist 278: Programming for Linguists (Stanford Linguistics, Fall 2020)

Class 1: The basics of numerical types and strings

Numbers and basic math

  1. Basic math operations work as one would expect from a fancy calculator, including using parentheses to explicitly indicate grouping.

  2. If parentheses are left out, then specific conventions are used: https://docs.python.org/3/reference/expressions.html#operator-precedence

  3. To get a float, ensure that a decimal is specified, as in 2.0, and 2. will suffice for this. Bare digits like 11 are int.

  4. In Python 3, ints get coaxed into floats where you would expect, as in 2/3 returning 0.66666666667.

  5. Exponentiation is with **, as in 2**3. Take care not to use ^, as it is something else (something obscure: bitwise exclusive 'or').

  6. Assignment statements are like x = 2. This is not saying x is equal to 2, but rather setting the value of the variable x to 2.

  7. For equality, use ==, as in (2 + 2) == (1 + 3). This will return True.

  8. For not-equal, use !=.

  9. The inequality operators are <, >, <=, >=.

Basic math exercises

  1. How do you specify a complex expression like 2 + 4 - 1 as an exponent?

  2. What do / and // do, and how do they differ?

  3. How can % (modulo) be used to test whether a number is even?

  4. Python has a keyword operator is that we will cover later. Can you find a case where == and is are the same and one where they are different?

str

  1. Strings can be specified using single quotes, double quotes, or a sequence of three single or double quotes (triple quotes).

  2. Use single quotes (or triple quotes) if your string contains double quotes, and use double quotes (or triple) if your string contains single quotes.

  3. Triple quotes can include newlines.

  4. Digits inside strings are strings. That is, in 's9', '9' is a str, not an int.

  5. To turn a number string, use str(), as in str(11), which returns '11'.

Indexing

  1. x[0] picks out the first character of x.

  2. x[0: 2] picks out characters 0 and 1 (first and second) of x.

  3. So, note: indexed ranges include the first index named and everything up to, but not including, the second index.

  4. If you leave off the first index in a range, then it is assumed to go to the beginning. x[ : 2] is the same as x[0: 2].

  5. If you leave off the second index in a range, then it is assumed to go to the end: x[1: ].

  6. If the second index is larger than the final index, then the returned subspan goes all the way to the end of the string. Thus, if we did x = "abc", then x[1: ] and x[1: 3] are the the same. Notice that x[3] throws an exception, though.

  7. x[-1] picks out the final element of x, and x[-2] picks out the second to last element (and so forth).

Exploratory exercises

  1. We've seen that + is flexible with regard to the types it can combine with. For example, 1 + 2 returns 3, but "1" + "2" returns "12". What happens if you try to use + to combine things with different types?

  2. Same question as the above, but not with * as the operator.

  3. What happens if you apply the built-in function int to a bool? (Try it with float too!)

  4. You've probably noticed that the types all have associated built-ins: int, float, str, bool. What does bool do if you apply it to the string "a"? What about the string "" (the empty string)?

  5. Suppose you define x = "abc". What does x[0:1] return? What about x[0:0]? See if you can guess correctly before trying the code.

  6. Suppose you have a string with an odd number of characters in it, but you don't know its precise length. Write code that will return the character that is at the precise middle of the string.

String methods

  1. Concatenation: use +. Thus s + s returns the concatenation of s with itself. This is a common pattern in Python: the + operator is addition if the arguments are numerical, and it is concatenation if the arguments are str.

  2. x.upper() returns a version of x with all letters mapped to uppercase. See also x.lower(), x.capitalize(), x.title().

  3. x.replace("b", "X") returns a new string in which all "b" characters in x have been replaced with "X".

  4. x.replace("b", "") returns a new string in which all "b" characters in x have been replaced with the empty string, which basically deletes every b.

  5. x.strip() removes all whitespace at both edges of x. The variants rstrip and lstrip target only right and left edges, respectively. Only the edges are targeted.

  6. x.strip('b') removes all tokens of 'b' at the edges, if there are any.

  7. If s is a str, then s.split() will split s on whitespace. This deletes the whitespace itself.

  8. split can also take arguments: "a,b,c".split(",") will split on the comma, returning ['a', 'b', 'c'].

  9. To delete things from inside a string, use replace with the empty string, as above.

  10. Strings are not mutable -- they can't be changed. So all of the above return new strings, leaving the original untouched. The only way to change the value of a string variable is via assignment: x = x.upper().

  11. The general pattern for the above is str.method(), where () might contain arguments to the method, as in replace.

  12. A common gotcha is to forget the parentheses, which actually call (run) the method. x.upper returns the method itself, where a x.upper() calls the upper method on str x.

  13. There are many, many string methods, so it's good to check the documentation to see if there is a method that does what you want before you write your own: https://docs.python.org/3.6/library/stdtypes.html#string-methods

  14. The join method on strings is a converse of split: given a string, it will join its list argument on that string. Try these out to see what's happening:

    "_".join(['a', 'b', 'c'])
    
    " ".join(['a', 'b', 'c'])
    
    " tiny ".join(['the', 'puppy'])
    

    I personally find that I often mistakenly think of join as a list method, rather than a str method, and so I write x.join("_") where x is a list of str. Python doesn't do things that way!

  15. The format method for str is very powerful. The basics:

    "{} is my name".format("chris")
    
    "{} plus {} is {}".format(2, 3, 2+3)
    

    Notice that, in the second case, the int values get coerced into str so that they can be interpolated.

  16. For much more on format: https://docs.python.org/3.6/library/string.html#formatstrings

  17. Methods can be chained together to create powerful one-liners. For example:

    x = "***the dog***"
    x.strip("*").split()
    

    This returns ['the', 'dog']. Breaking it down, x.strip("*") is a str, so all str methods work for it.

    We observed that reversing them fails: x.split().strip("*"). This is because x.split() is a list, whereas strip is a str method.

Str exercises

p = "the quick brown fox jumps over the lazy dog"

  1. Return the first 'the'

  2. Return 'lazy' using negative indexing.

  3. Change it to title case

  4. Remove the final g.

  5. Now put that g back.

  6. Remove all occurrences of quick.