Introduction to the main Python language features.
Nick Parlante
Python is an extremely popular computer language for solving many types of problems. Compared to other languages Python is in the "programmer efficient" niche - allowing the programmer to specify what the want pretty easily, perhaps at the cost of running a little slower or using more memory.
Python is distributed for free as open source software, and the central web site for Python information is python.org. Python was created and for many years was run by Guido Van Rossum, who is kind of a rock star for creating such a world-influencing technology.
Python is cross-platform, meaning a python program developed on Mac OS X, can likely also run on Windows or Linux without any change to the code.
Python version 3, used here, made some changes vs. Python 2.x,
in particular changing strings to be unicode, and the print() function
to be a function. For the most part, Python 2.x code looks
very similar to Python 3.x code. All discussion here is Python 3.
A variable in Python is like a named box that holds a reference to a value. Variables are assigned a value using a single equal sign
Assigning a value into a variable overwrites any previous reference stored in that variable, as shown with the variable named "y" above. Values in Python store their type, so in the example above the number 10 is tagged with the type "int", and the string 'hello' is tagged with the type "str".
In Python, Using = with a variable, sets that variable to "point to" that value in memory. Using = again
sets the variable to point to something else, leaving the original value in memory. If nothing
refers to a value in memory (no arrow to it), it is called "garbage". Eventually
the bytes of such garbage values are recycled to hold new values, called "garbage collection".
The above sequence works fine. The intermediate string 'Python!' is left as garbage.
Many algorithms throw off some garbage as they go along, computing their final answer.
In some languages, garbage-collection is a task for the programmer - ugh! In modern
languages like Python, Java, and Javascript, it's done automatically by the language.
Surprisingly, there are two distinct types of numbers for doing arithmetic in a computer - "int" for whole integer numbers, and "float" for numbers like 3.14, such as on a calculator.
The Python "int" type represents integer values. Addition, subtraction, and multiplication work the usual way with Multiplication and division have a higher "precedence" than addition and subtraction, so they evaluated first in an expression. After accounting for precedence, the arithmetic is done left-to-right. Add in parenthesis to change the order of evaluation.
The Unusually, Python int values do not have a maximum. Python allocates more and more bytes to store the int as it gets larger. The number of particles in the universe when I was in school was thought to be about 2100, playing the role of handy very-large-number. In Python, we can write an expression with that number and it just works.
Memory use approximation: int values of 256 or less are stored in a special way that uses very few bytes. Other ints take up about 24 bytes apiece.
Division is tricky with ints, since it takes in two ints but does not necessarily yield an int - e.g. 5 / 2 is 2.5. You should only use "/" if you want a float value.
Many times an algorithm makes the most sense if all of the values are kept as ints. Therefore (Python v3), Python has the The "modulus" or "mod" operator % is essentially the remainder after division. So 11 % 5 is 1 - if we divide 11 by 5, 1 is the leftover remainder.
The modulus is 0, it means the division can out evenly, like (15 % 5) above. The best practice is to only think about mod with non-negative numbers. Modding by 0 is an error, just like dividing by 0.
Floating point numbers are used to do math with real quantities, such as a velocity or angle. The regular math operators Famously, floating point numbers have a tiny error term that builds up way off 15 digits or so to the right. Mostly this error is not shown when the float value is printed, but it will appear sometimes. This is an intrinsic limitation of floating point values in the computer. (Perhaps also why CS people are drawn to do their algorithms with int.)
Therefore, do not use Memory use approximation: float values take up about 24 bytes apiece.
There are more than 50 built-in functions in python, the full list is here:
https://docs.python.org/3/library/functions.html
Here are the most commonly used functions (we'll add to this list later):
Convert to type: int() str() list() bool() float()
Collection of elements (like str or list): len(), in, [ ]
meta: type('hello') - returns the type of value parameter: int, str, float, list, ...
min(), max(),
Math: round(n), abs(n) absolute value, pow(a, b) a to the b power
The Python if statement looks like this
Any value can be used in the test expression - string, int, float, list, ...
Python has a de-facto true/false system, where all the "empty" values count as de-facto False: '', None, 0, 0.0, empty-list, empty-dict
Any other value de-facto counts as True: a non-empty string, a non-zero int, a non-empty list. Many languages use this
anything-empty-is-false system. The The if-test requires some value. It's easy to accidentally type in Suppose some foo() function is supposed to return True or False. Do not write an if-test like this:
Instead, let the if/while take the value directly like this:
The reason is that the flow of data from foo() could easily use the de-facto convention, like 1 for True and 0 for False. That would not be an ideal design, but Python makes that easy to do even by accident. It's better to not depend on precisely the values True and False, instead code that works for any de-facto True/False value.
The optional 'else:' part of an if provides code to run if the test if not True. Else is perfect if in the code we want to do exactly one of action-1 or action-2.
There's also 'elif' where a series of if-tests can be strung together (mnemonic: 'elif' is length 4, like 'else'):
Exactly 1 of the 3 cases will run. However, the logic of when each case runs can get very tangled. Combining if/else with and/or can make a deceptively complicated structure. What must be true for case c below? It is not transparent from the code.
Answer: c happens when s is not 'bob' but also (score <= high or s == 'Alice or both)
If/else chains are fine, just don't think they are trivial.
Nick style idea: I only use else when it is truly needed. If the code can work using only a plain-if, I prefer to write it that way. This works extra well with decomposed out functions, where 'return' can be used to bail out for the first few cases, leaving the main case below without indentation, like this:
You can use else if you prefer, just thinking about possible alternative structure here.
The Python "bool" type (short for "boolean") has only two values - Comparison operators..
A custom defined type can provide its own definition of Constructs like Boolean expressions like Python is unique in using plain words like "and" for this. many languages use "&&" for and, "||" for or.
There is "precedence" between and/or/not, analogous to arithmetic precedence, where "*" goes before "+". Precedence order: not is highest precedence, followed by and, followed by or. Mnemonic: not is like unary minus e.g. '-6', and is like *, or is like +.
Q: what does the following do:
The "and" goes first (higher precedence), so the above is equivalent to this form with parenthesis added to show in what order the comparisons are evaluated:
To force the or to go first, put in parenthesis like this:
If you are unsure, you can always add parenthesis to an expression to clarify what order you want and/or/not to apply.
Suppose you have an int i, and you want to check if the char at that index is alphabetic .. but only if i is valid. You can write that this way...
This works because the boolean evaluation goes left to right, stopping ("short-circuiting") as soon as it can. So in the example above, if A python string represents a sequence of unicode characters (this was an area of language-cleanup in the move to Python 3.0). Python strings are written between single or double quotes. Commonly strings are written on one line.
A backslash \ in a string "escapes" a special char we wish to include in the string, such as a quote or \n newline. Common backslash escapes:
A string using \n:
Python strings can be written within triple ''' or """, in which case they can span multiple lines. This is useful for writing longer blocks of text.
Each character is drawn from the unicode character set, which includes the "characters" or pretty much every language on earth. Also, common
emoji have been added. Every unicode character is defined by a unicode "code point" which is basically a big int value that identifies that character. Unicode characters can be written using the "hex" version of their codepoint, e.g. "03A3" is the "Sigma" char Σ, and "2665" is the heart emoji char ♥.
Hexadecimal aside: Hex numbers uses the digits 0-9 plus the letters A-F to write a number like this: 7F9A or 7f9a. Two hex digits together like 9A or FF represent the value stored in one byte. Traditionally, unicode uses hex digits like this to write the 2 or 4 byte codepoint of a unicode char.
You can write a unicode char out in python with \u followed by 4 hex digits of its codepoint:
For more than 4-hex-digits in Python use \U (uppercase U) followed by 8 digits with leading 0's as needed, like the toilet emoji 1F6BD, and the inevitable 1F489.
Not all computers have the ability to display all unicode chars, so the display of a string may fall back to something like \x0001F489 -- telling you the hex digits for the char, even though it can't draw it.
The len() function returns the length of a string. The len() function in Python is omnipresent - it's used to retrieve the length of every data type, with string just a first example. Chars are accessed with zero-based indexing with square brackets, so the valid index numbers are 0..len-1. Accessing a too large index number is an error. Strings are immutable, so they cannot be changed once created. Code to compute a different string always creates a new string in memory to represent the result (e.g. + below), leaving the original strings unchanged.
The Concatenate + only works with 2 or more strings, not for example to concatenate a string and an int. Call the function Here are the most useful Python string functions (mostly noun.verb form):
These functions return a boolean True/False test result about a string
Character class tests - it's handy to divide characters into broad classes: "alphabetic" chars like 'abc' that make words, "digits" '0', '1' to make numbers, "space" chars like space, tab, and newline. Then there are all the other miscellaneous characters like '$' and '^' which are not alphabetic, digit, or space.
Upper/lower case conversion
Standard i/range() loop goes through indices of all chars in s:
The "foreach" loop works on strings too, accessing each char.
Unlike the above form, here you do not have access to the index of
each char as it accessed.
list('abc') of a string yields a list ['a', 'b', 'c'] of its chars.
More details at official Python String Docs
Slice syntax is a powerful way to refer to sub-parts of a string instead of just 1 char. s[ start : end ] - returns a substring from s beginning at start index, running up to but not including end index. If the start index is omitted, starts from the beginning of the string. If the end index is omitted, runs through the end of the string. If the end index is too large (out of bounds), the slice just runs through the end of the string. This is the one case where Python is permissive about wrong/out-of-bounds indexes.
Negative numbers also work within [ ] and slices: -1 is the rightmost char, -2 is the char to its left, and so on. This is convenient when you want to extract chars relative to their position from the end of the string.
A returned piece will be the empty string if we have two separators next to each other, e.g. the '::', or the separator is at the very start or end of the string:
Special whitespace: split with no arguments at all splits on whitespace (space, tab, newline), and it groups multiple whitespace together. So it's a simple way to break a line of text into 'words' based on whitespace (note how the punctuation is lumped onto each 'word'):
File strategy: a common pattern is to use 'for line in f' to loop over the lines in a file and 'line.split()' to break each line up into pieces. Some text file formats have a format that split()
works on easily.
The elements in the list should be strings, and join just puts them all together to make
one big string. Note that split() and join() are both noun.verb on string. The list is just passed in as a parameter.
A list can contain a linear series of any data type: strings, ints, other lists. The things inside a list are generically called "elements". Unlike strings, lists are "mutable" - they can be changed.
The first element of the list is at index [0] and last element is at length-1, i.e. len(lst)-1.
Append vs. extend example:
More details at official
Python List Docs
Slices work to pull out parts of list just as with strings.
The original list is not modified, this creates a new list populated with elements from the original. Omitting both start and end indexes yields a copy -- lst[:]
The dict "dictionary" type is very important because it takes in disorganized data, organizes it,
and it is fast. The dict here is based on the "hash table" data structure.
You will learn how to build and analyze a hash table in CS106B, here we're happy to just use them. Many important algorithms leverage hash tables in some way since they can handle large data sets and remain fast.
'dict' is the name of the Python dict type, so we'll use just 'd' as a generic variable name.
1. 2. 3. Key point: dicts (hash tables) are fast. Even if the dict has 1 million key/value pairs in it, performing the get/set/in of single key is very fast. If the dict grows to hold 10 million key/value pairs, the speed is largely unchanged.
Strategy: therefore if we are reading a file or a network taking in disorganized data, load the data into a dict choosing the key and value definitions we want to output. The data will become organized by that key, even though it was in random order as it was read.
The type used as key should be "immutable", often a string or int (or tuple when we get to those).
Here is the canonical logic to load up a dict - in this example
the code counts the number of occurrences of each word, but many useful
dict operations will follow this basic loop/in/out pattern.
1. 2. 3. 4. 5. The most common pattern for looping over a dict, using sorted() function which returns a linear collection sorted into increasing order:
The double-variable key,value loop (more detailed explanation in the tuples section below)
The 'tuple' is a Python type which is like a small list. Tuples are like a list but written within parenthesis, here is a tuple of three strings: Accessing a tuple works like a list -- len, [ ], in, slices, loops -- but the big difference is that a tuple is immutable. It is built once and then never changes. There is no append() function.
Tuples are useful where you have a fixed number of things to keep as a group -- e.g. an x,y coordinate pair stored in a len-2 tuple like (4, 10) keeps the x,y together as a unit. In contrast, a list is useful when you have an unlimited number of items you want to store together, such as the typical pattern of reading out of a file and using lst.append() to put all the items together in one list.
Recall that dict-keys should be immutable. Therefore if you have some composite data that you want to use as a dict key, e.g. a string and an int, form them into a tuple ('meh', 16) and then use that as the key.
One quirky but important use of tuples is assignment multiple variable at once, like this:
The length of the left-hand-side and right-hand-side must be the same (CS terminology protip "lhs" and "rhs".)
Communicating in to a function is a rich area: you can have any number of parameters, they each get a name. How do you communicate out of a function .. return 1 value. Returning 1 value covers 90% of the cases. But sometimes it really makes sense to return 2 or more values. The Pythonic way to do this is to return a tuple packing together the values. Like this
There are 2 requirements:
1. The function doc string must state the size and content-types of the returned tuple. There's no way the caller code can be written correctly without
this information.
2. All paths should return a tuple of that same length, even if it is
(None, None), so that standard looking calling code of the form (x, y) = fn(..) works
without crashing should it hit the None case.
Calling min2 looks like this, catching the len-2 tuple result into 2 variables.
One handy use of tuples is the dict.items() function, which returns the entire contents of the dict as an list of len-2 (key, value) tuples.
Sorting of tuples goes left-to-right within each tuple -- first sorting by all the [0] values across the tuples, then by [1], and so on.
Once all the data has been loaded into a dict, it's natural to have a process-all-the-data phase. This can be written as a loop over the above d.items() list. The loop syntax below takes one tuple off the list for each iteration, setting the two variable, key and value each time:
This is a special version of the for loop, where there are multiple variables, and the number of variables matches the size of a tuples coming off the list. The above example, looping key,value over dict.items() is probably the most common use of this multiple-variable variant of the for loop.
Sorting is a classic computer algorithm and all computer languages provide it for you. There are piles of CS research and language features on just this one topic which I would venture say is now a solved problem. (In CS106B you'll design and analyze your own sort algorithm.) Here we'll look at the sorted() function which works on any linear data structure.
The default built-in sorted() orders the elements into increasing order. In a very Pythonic twist, the elements can be any type (str, int, float, ..) so long as that type supports < and == correctly.
The simplest use of sorted is just sorting a linear series of strs or ints...
The elements should be < comparable to each other - int vs. int, str vs. str is fine. Comparing < str to int does not have an intuitive definition and is an error (and that's got to one of the more readable error messages!):
Say you have a list of tuples, giving a web domain name and a path on that domain.
A common goal for this data is to sort first by domain, but when domains are equal, fall-back to sorting by path. This fall-back strategy is exactly what sorted() does, ordering first by [0], then falling back to [1] and so on.
Stable - a common feature of good sort algorithm is that it only re-orders elements from the original if they are not ==. So if we had 2 copies of the ('stanford.edu', '/admit') tuple in that list, they would stay in the same order after sorting. The built in sorting algorithm is stable in this way.
Say you have a list of strings, and you want to sort in a way other than alphabetic. Python has a very nice way to do this, the 'key' function.
The 'key' function is a function of 1 argument, that produces a value to use in the sort-comparison instead of the actual list value. The key value is a sort of proxy for the sorting, but the actual values are what is sorted. The other option for sorting is a reverse=True option to switch to output in descending order (reverse=False is the default).
Sort strings by their length. Introduce by_len(s) function that returns length. When calling sorted, pass in key=by_len. Note: no parens! We are indicating what function to use, but we are not calling it.
Actually can just use the built-in function 'len' but typically you need to write your own by_len or whatever function to capture the sorting you want.
Say we have (str, int) 'food' tuples where the int is a count, such as from a dict-count algorithm.
Say we want to sort these same tuples, but increasing by the count. Introduce
What if we want the highest count first? That's for the reverse=True parameter.
Aside: How much time does sorting take?
An algorithm that makes 1 pass over a list, doing a fixed-cost computation for each element could be called "linear" time cost. If you double the length of the list, the time taken roughly doubles. Many algorithms we've worked out have this linear 1-pass-over-the-data quality.
Sorting is more expensive than linear. It has to go back and forth repeatedly, some elements have to be examined many times. For CS106A, just remember that sorting is medium expensive.
1. Here is the canonical code to open a file, read all the lines out of it, handling
one line at a time. The advantage of processing 1 line at a time is that we do not require memory to hold every byte of the file at once - files can be quite big! This way we can process a file larger than the install memory of the computer, which is why
every language supports something like this. This example uses Python's newish 'with' form which automates closing the file properly. The 'r' here indicates reading. We'll talk about 'w' writing forms later.
2. Sometimes it is more handy to read the whole text file
in as one big string (requiring enough bytes of memory to hold
the whole file contents). The f.read() function does this, and here
are 3 common things to do with the text from read():
3. Can read the entire file into memory as a list of string lines:
Loop through the elements in various liner list-type things, called a 'foreach' in many languages.
Do not modify the lst during a foreach. It can work in some cases, but it's an unreliable practice. If you really need to modify lst, make a copy of lst, loop over the copy, and safely modify lst.
Foreach is often paired with the range() function:
while test:
Check the test, run all the 'body' lines inside the loop, check the test again, and so on. Can run the body zero times if the test is False the first time.
'break' exits the loop immediately. Use to make up a loop-exit test midway through the loop body.
'continue' is used less frequently. It jumps immediately back to the top/test of the loop, discarding the current run of the body. Many languages have these same 'break' and 'continue' operations.
Looping over collections is very common. Here are the most common forms.
If you do not need to know the index numbers as you go, you can just use for/in to see each element once. Do not modify the list during iteration. (In fact range(10) returns something like
a list so this form and the one above are the same.)
For a linear storage (e.g. a list, a string), index numbers are 0, 1, ... len-1. With this
form you access each element with [i], but since you know the value of i too, you can use it in slices or whatever.
Reading data out of a file or something, loading it up into a list.
Experience shows that the best code is "readable" - someone sweeping their eye over it can see what it does and how its parts fit together.
A good variable names reflects what data it holds so lines with that data make sense. Typing a very long variable name is a chore, so we want a name just long enough to show what's going on; 's2' is not good, but 'start' or 'mid' are enough.
Below is an example from the midterm review - lots of string variables. If they had names like s1 s2 s3 it's a lot harder to get right. Creating meaningful+short variable names lets the programmer narrate their own ideas as they write each line.
The last line is a nice example of names supporting the code. We would say that the code "reads" nicely, showing what it does.
foo() Example Function Problem - Given a string s of even length, if the string length is 2 or less, return it unchanged. Otherwise take off the first and last chars. Consider the remaining middle piece. Split the middle into front and back halves. Swap the order of these two halves, and return the whole thing with the first and last chars restored. So 'x1234y' yields 'x3412y'.
In a real program, most variables are not 1-letter, since the variables are carrying real data around, and the variable name reflects that type of data. But there are a few idiomatic uses where the 1-letter variable is quite readable, as the 1-letter form is a kind of standard for certain situations:
Variable names for lists, dicts, .. that hold many things work well with 's' at the end. One sign this is working well is writing the foreach loop, and the logic of the names confirms you have it right:
Tracking down bugs is a very time consuming. Often a bug creeps in when you lose track of - a word vs. a list of words vs. a dict of lists of words. Being careful with your variable names as you go code up your steps can be a big time saver by cutting down on these bugs.
Decomposition refers to separating out smaller functions to compute sub-parts of the big problem. Divide and conquer!
The 'PEP8' standard is a series of code formatting and other standards for Python code which help keep things clean. Here's the top issues IMHO...
Lots more detail:
Official PEP8
A Python "module" is a container of code ready-to-use. One reason coding in Python is so productive is the great variety of off-the-shelf code ready to use from your code, and modules are how this code is organized.
There are more than 100 "standard" modules included with Python 3 -- ranging from math functions, to parsing Excel CSV files, to reading and writing .zip archive files.
This is not the sort of list you memorize. Rather, when you come to part of your problem that may have an off-the-shelf solution, look through the modules to see what's available.
There are also modules which exist but which are not part of the standard Python distribution. Such modules need to be downloaded and installed separately (google "pip3"). Standard modules are the most common and easiest to work with, since they are already installed.
To use a module, it must be imported by its name. This is typically done at the top of the Python file. Here's an example that imports the 'sys' module, an extremely common thing to do. The sys module contains some variables and functions for interacting with the operating system of the computer which starts and stops programs.
The sys module contains the historically named 'argv' list which is the list of command line arguments to when this program was started. Elements inside a module are referred to using a dot, so here we see 'sys.argv'.
The sys module also includes a function called exit() which exits the program immediately. An old convention is that a program exit has an int status code, 0 = ran correctly, non-zero means there was some problem, so in this example we pass 1 to exit. Note that the exit() function is referred to with module-dot notation.
Python Variable model
=. Assigning to a variable creates the variable if necessary (some languages make the programmer pre-declare their variables).
x = 10
y = 'hello'
y = 'bye'
= assignment to a variable always works in a shallow way, not copying any data, just rearranging references to data. The line above y = 'bye' does not change the string data on the right. It only moves the arrows around on the left.
Memory - Garbage
x = 'Python'
y = x + '!'
y = 'Yay'
Numbers - int and float
Int Type
+ - *.
>>> 1 + 2 * 3
7
>>> 10 - 2 * 3
4
>>> (10 - 2) * 3
24
** operator does exponentiation, so 3 ** 2 is 32
>>> 3 ** 2
9
>>> 2 ** 10
1024
>>> 2 ** 100
1267650600228229401496703205376
>>> 2 ** 100 + 1
1267650600228229401496703205377
Int Division //
// operator which does "int" division - meaning it rounds-down to the next int.
>>> 5 / 2 # "/" yields a float, not what we wanted
2.5
>>> 5 // 2 # "//" rounds down to int
2
>>> 4 // 2
2
>>> 10 // 3
3
Int Mod %
>>> 11 % 5
1
>>> 12 % 5
2
>>> 15 % 5
0
>>> 1 % 5
1
>>> 10 % 5
0
>>> 10 % 0
Traceback (most recent call last):
ZeroDivisionError: integer division or modulo by zero
Float Type
+ - * / ** all work with floats, producing a float result. If an expression mixes some int values and some float values, the math is converted to float - a one-way street called "promotion" to float.
>>> 1.0 + 2.0 * 3.0
7.0
>>>
>>> 1.0 + 2 * 3
7.0
>>>
>>> 2.1 ** 2
4.41
>>> 0.1 + 0.1
0.2
>>> 0.1 + 0.1 + 0.1
0.30000000000000004
== with float values, which would be confused by the error term. To compare float values, subtract them and compare the absolute value of the difference to some small delta.
Builtin Functions
e.g. int('600') - > 600
e.g. range(10) does not print usefully, use list():
list(range(10)) -> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
-len(lst)
in - check if a value is in a collection
-e.g. 6 in [1, 2, 3, 4, 5, 6, 7] -> True
Works uniformly for many collections we'll see (as len() is uniform)
Typically you do not call this yourself, but the system uses it on your behalf
-Return the min or max value according to < or >
-Take either a series of parameters separated by commas, or an iterable
-e.g. min(4, 5, 1) -> 1
-e.g. min([1, 2, 3]) -> 1
-e.g. min(['banana', 'apple']) -> 'apple' # < applied to strings
-Takes an optional key= parameter (see sorting/lambda section)
-pow can be written as "a ** b"
-Later we'll see the math module with lots of math functions
Python if - de-facto True/False
if test:
print('test is true!')
bool() function takes
any value and returns a formal bool False/True value, so that's a way to see the empty=False interpretation:
>>> bool(0)
False
>>> bool(0.0)
False
>>> bool('') # empty string - False
False
>>> bool([]) # empty list - False
False
>>> bool(None)
False
>>> bool(6) # non-zero int - True
True
>>> bool('hi') # non-empty string - True
True
>>> bool([1, 2]) # list of something - True
True
If test = vs. ==
if a = 6: # syntax error, meant ==
print('hi')
= instead of == in an if-test. Fortunately in Python an expression like a = 6 does not represent any value itself, so the if will generate a syntax error, flagging the mistake. In the language C, typing this sort of mistake just silently runs incorrectly, providing many, many, many hours of debugging fun.
Do Not Write x == True:
if foo() == True: # NO NO NO == True
print('yay')
if foo(): # YES this way
print('yay')
# Or this way for False
if not foo():
print('not yay')
Python if-else
# set message according to score
if score > high_score:
message = 'New high score!'
else:
message = 'Oh well!
if s == 'a':
# a case
elif s == 'b':
# b case
else:
# catch-all case
if score > high and s != 'alice':
# a
elif s == 'bob':
# b
else:
# c
Alternative to if/else chains
def foo(s):
if len(s) < 4:
return 'too small!'
if len(s) > 100:
return 'too big!'
# rest of computation having screened out too-short and too-long
# notice: no if-else structure, not indented down here
...
Boolean Comparison Operators
True and False. The most common way to get a boolean value is a comparison like > ...
4 > 2 → True
4 > 5 → False
4 > 4 → False
4 >= 4 → True
== : (2 equals signs together) equal, works for int, string, list, dict, .. most types. Not recommended to use with float values
!= : not-equal (uses == under the hood)
< <= : less-than, less-or-equal. These work for most types that have an ordering: numbers, strings, dates. For strings, < provides alphabetic ordering. Uppercase letters are ordered before lowercase. It is generally an error to compare different types, like this error: 4 < 'hello'
> >= : greater than greater-or-equal.
== by defining a function with the special name __eq__() and analogous functions define < etc.
if test: can use any type - int, string, list .. - as as test value, not just boolean. See the if section for examples of this behavior.
Boolean And Or Not - Precedence
a > 10, can be combined with and or not, like this:
if a >= 0 and a <= 100:
print('yay')
a and b → True if both are True
a or b → True if one or the other or both are True
not a → Inverts True/False
if a < 6 and b < 6 or c < 6:
if (a < 6 and b < 6) or c < 6:
if a < 6 and (b < 6 or c < 6):
Boolean Short-Circuit
if i < len(s) and s[i].isalpha():...
i < len(s) is False (i.e. i is large), the whole expression evaluates to False. In particular the s[i].isalpha() is not evaluated. Attempting to evaluate it would be an error anyway, since i is too large, creating an IndexError. Writing the i<len(s) test to the left of the s[i] in effect protects it, and this is a common programming pattern. Most computer languages use short circuiting in the their boolean expressions like this.
Strings
a = 'hello'
b = '' # the empty string, length 0
c = "Python Spaces Ok"
\' # single quote
\" # double quote
\\ # a backslash
\n # newline char
a = 'First line\nSecond line\nThird line\n'
a = """First line
Second line
Third line
"""
>>> s = 'hi \u03A3'
>>> s
'hi Σ'
>>> len(s)
4
>>> 'I \u2665'
'I ♥'
>>> 'in the \U0001F6BD'
'in the 🚽'
>>> s = 'oh \U0001F489'
>>> len(s)
4
String Char Access [ ], Concatenate +
>>> s = 'Hello'
>>> len(s)
5
>>> s[0]
'H'
>>> s[4]
'o'
>>> s[5]
IndexError: string index out of range
>>> s[0] = 'x' # no, string is immutable
TypeError: 'str' object does not support item assignment
+ combines (aka "concatenates") strings to make a bigger string. This creates new strings to represent the result, leaving the original strings unchanged.
>>> s1 = 'Hello'
>>> s2 = 'There'
>>> s3 = s1 + ' ' + s2
>>> s3
'Hello There'
>>> s1
'Hello'
str(4) to make a string out of an int, then concatenation works.
>>> 'score:' + 6
TypeError: can only concatenate str (not "int") to str
>>> 'score:' + str(6)
'score:6'
String Search Functions
x in s - true if string x appears anywhere in s. Use find() below if you want to know the index where it appears, but in is great for simple testing.
>>> 'c' in 'abcd'
True
>>> 'aa' in 'XXaaZZ' # test string can be any length
True
s.find(x) - searches s left to right, returns int index where string x appears, or -1 if not found. In this and other searches, characters much match exactly, so 'a' matches 'a', but does not match 'A'.
>>> s = 'Python'
>>> s.find('y')
1
>>> s.find('tho')
2
>>> s.find('xx')
-1
s.find(x, start_index) - variant of find(), beginning the search at the given index
String Test Functions
s.startswith(x) - True if s start with string x
s.endswith(x) - True if s ends with string x
>>> 'Python'.startswith('Py')
True
>>> 'Python'.startswith('Px')
False
>>> 'resume.html'.endswith('.html')
True
s.isdigit() - True if all chars in s are digits '0..9'
s.isalpha() - True for alphabetic word char, i.e. a-z A-Z (applies to "word" characters in other unicode alphabets too)
s.isalnum() - alphanumeric, just combines isalpha() and isdigit()
s.isspace() - True for whitespace char, e.g. space, tab, newline
s.isupper(), s.islower() - True for uppercase / lowercase alphabetic. False for characters like '9' and '$' which do not have upper/lower versions.
>>> '6'.isdigit()
True
>>> 'a'.isalpha()
True
>>> '$'.isalpha()
False
>>> s = '\u03A3'
>>> s
'Σ'
>>> s.isalpha()
True
>>> 'a'.islower()
True
>>> 'a'.isupper()
False
>>> '$'.islower()
False
>>> '\n'.isspace()
True
String Change Functions
s.lower() - returns a version of s where each char is converted to its lowercase form, chars like '$' are unchanged. The original s is unchanged - a good example of strings being immutable. Each unicode alphabet includes its own rules about upper/lower case.
s.upper() - returns an uppercase version of s
s.strip() - return a version of s with the whitespace characters from the very start and very end of the string all removed. Handy to clean up strings
parsed out of a file.
>>> ' hi there \n'.strip()
'hi there'
s.replace(old, new) - returns a version of s where all occurrences of old have been replaced by new. Does not pay attention to word boundaries, just replaces every instance of old in s. Replacing with the empty string effectively deletes the matching strings.
>>> 'this is it'.replace('is', 'xxx')
'thxxx xxx it'
>>> 'this is it'.replace('is', '')
'th it'
String Loops
for i in range(len(s)):
# use s[i] in here
for char in s:
# use char in here
String Slices
>>> s = 'Python'
>>> s[2:4]
'th'
>>> s[2:]
'thon'
>>> s[:5]
'Pytho'
>>> s[2:999]
'thon'
>>> s[4:4] # a slice with zero chars in it
''
>>> s[-1]
'n'
>>> s[-2:]
'on'
String split()
str.split(',') is a string function which divides a string up into a list of string pieces based on a "separator" parameter that separates the pieces:
>>> 'a,b,c'.split(',')
['a', 'b', 'c']
>>> 'a:b:c'.split(':')
['a', 'b', 'c']
>>> ':a:b::c:'.split(':')
['', 'a', 'b', '', 'c', '']
>>> 'Hello there, he said.'.split()
['Hello', 'there,', 'he', 'said.']
String Join
','.join(lst) is a string function which is approximately the opposite of split: take a list of strings parameter and forms it into a big string, using the string as a separator:
>>> ','.join(['a', 'b', 'c'])
'a,b,c'
Lists
lst = [] - create empty list
len(lst) - access length of string
lst[0] - access individual elements with square brackets
for x in lst: - loop over contents, do not modify lst during loop
x in lst - boolean test if x is in lst (just like for string)
lst.append(x) -- add x to the end of lst, increasing its length by 1. The easiest way to add to a list. Does not return anything. Changes lst in place.
lst.extend(lst2) - add all the elements of lst2 on to the end of lst. Alternately + can be used: (lst + lst2) puts all their elements together to make a new, big list, e.g. [1, 2] + [3, 4] yields [1, 2, 3, 4]. Does not return anything.
lst = [1, 2, 3]
x =[4, 5]
# what happens .append vs. extend
# append:
lst.append(x) -> x is added as an *element* so lst is [1, 2, 3, [4, 5]]
# extend:
lst.extend(x) -> all elements of x are added at end, so lst is [1, 2, 3, 4, 5]
lst.pop() -- remove the element from the end of the list and return it, decreasing the length of the list by 1, i.e. the opposite of append()
lst.pop(index) -- alternate version with the index to remove is given, e.g. lst.pop(0) removes the first element from the list.
lst.insert(index, x) -- insert the element x so it is at the given index, shifting elements towards the end of the list as needed. Use index=len(lst) to insert at the end. Append() is simpler since it just goes on the end without any shifting and you don't have to think about index numbers. So it seems like I never use insert().
lst.copy() -- returns a copy of lst. You could use this to loop over a list and also modify it - loop over a copy, modify the original. (mentioned for completeness, I don't think this will ever come up.)
lst.reverse() -- reverse lst "in place", that means no new list is created. lst is modified where it is. (Related: reversed(lst) takes a list or similar linear collection, returns a reversed version, not modifying the original).
lst.sort() -- sort lst into increasing order in place, modifying it. By default numbers sort increasing by value, text sorts increasing alphabetically. We'll study custom sorting in more detail later. (Related: sorted(lst) takes a list or similar linear collection, returns a sorted version, not modifying the original).
min(lst) max(lst) - Return the smallest or largest element in lst. Uses the same underlying < foundation as sort(), but much faster than sorting the whole list. Raises an error if the list is empty. Note: these are functions like len(), not noun.verb.
lst.index(x) -- Look for first instance of x in lst and return its index.
Raises an error if x is not in there (!). Therefore check with 'in' first, and only if x
is in there call index(). Translation: there is no .find() for lists which IMHO seems
like a real omission.
List Slices
lst = ['a', 'b', 'c']
lst[:2] -> ['a', 'b']
lst[2:] -> ['c']
Dict Type
d = {} # Create empty dict
d['ca'] = 'California' # 1. Set key/value pairs into dict
d['ok'] = 'Oklahoma'
d['nj'] = 'New Jersey'
d['tx'] = 'Texas'
val = d['nj'] # 2. Retrieve value by key
val = d['xx'] # fails with KeyError
check = 'nj' in d # 3. in check -> True
d['ca'] = 'California' - with equal-sign: d[key] = xxx creates a key/value pair in the dict. If a pair was in there already, it is overwritten.
val = d['nj'] - referring to d[key] retrieves the value for that key, or is a KeyError. Code needs to check that the key is in before doing [ ] retrieval.
if 'nj' in d: - use in to check if a key is in the dict or not. This sort of 'in' works for lists too, but for dicts it's much faster.
Dict Counting
def strs_counts(strs):
"""
Given a list of strings, return a 'counts' dict of
str->count key/value pairs
"""
counts = {}
for s in strs:
# fix up not-in case
if not s in counts:
counts[s] = 0
# invariant: at this line, s is in
counts[s] += 1
# alternate 'else' solution:
#if not s in counts:
# counts[s] = 1
#else:
# counts[s] += 1
return counts
Getting Data out of Dict
len(d) - as you would guess, the number of key/value pairs in the dict
d.get(key) - retrieves the value for a key, but if the key is not there, returns None by default (vs. throwing an error like [ ]). A 2 parameter form d.get(key, missing-value) specifies what value to return if the key is missing. This forms an alternative to writing if/in logic to check if the key is in.
d.keys() - returns an iterable of all the keys in dict (in a random order). Can loop over this to get each key. The keys alone, no values.
d.values() - returns an iterable of all the values in dict (in a random order). Can loop over this to get each value.
d.items() returns an iterable of the key,value pairs. This works with a particular sort of double-variable loop, see below. If you want to look at all of the key,value pairs, this is the most direct way.
def print_counts2(counts):
# sort the keys for nicer output
for key in sorted(counts.keys()):
print(key, counts[key])
def print_counts3(counts):
# key,value loop over .items()
# unsorted
for key,value in counts.items():
print(key, value)
Tuples
('a', 'b', 'c')
>>> t = ('a', 'b', 'c')
>>> len(t)
3
>>> t[0]
'a'
>>> 'c' in t
True
>>> t[0:2]
('a', 'b')
>>> t[0:2]
('a', 'b')
>>> t[0] = 'hi' # no changing it!
Error:'tuple' object does not support item assignment
Tuples and Assignment
>>> (x, y) = (12, 4)
>>> x
12
>>> y
4
Tuples and Multiple Value Return
def min2(a, b):
"""
Given 2 ints, returns (True, min_val) if at least
one is negative and min_val is the minimum value.
Otherwise returns (False, None)
"""
if a<0 or b<0:
return (True, min(a, b))
return (False, None)
(negative, min_val) = min2(a, b)
if negative:
...
Tuples and Dicts
>>> d = {'a':1, 'd':4, 'c':2, 'b':3}
>>> d
{'a': 1, 'd': 4, 'c': 2, 'b': 3}
>>> d.items()
dict_items([('a', 1), ('d', 4), ('c', 2), ('b', 3)]) # (key, value) tuples
>>> sorted(d.items()) # same tuples, sorted by key
[('a', 1), ('b', 3), ('c', 2), ('d', 4)]
# Example: for loop setting key/value for each iteration
for key,value in d.items():
# use key and value in here
Sorting
Sorting 1.0
>>> a = [23, 42, 4, 8, 15, 16]
>>> sorted(a)
[4, 8, 15, 16, 23, 42]
>>> a
[23, 42, 4, 8, 15, 16] # original lst not changed
>>>
>>> b = ['banana', 'apple']
>>> sorted(b)
['apple', 'banana']
>>> sorted([2, 3, 'hi'])
Error:'<' not supported between instances of 'str' and 'int'
Sorting 2.0 - Tuples
>>> domains = [('stanford.edu', '/meh'), ('google.com','/search'), ('stanford.edu', '/admit')]
>>> sorted(domains)
[('google.com', '/search'), ('stanford.edu', '/admit'), ('stanford.edu', '/meh')]
Sorting 3.0 - Custom Sorting
Sort Key Example 1
>>> a = ['aaaa', 'bb', 'z']
>>> sorted(a)
['aaaa', 'bb', 'z']
>>> def by_len(s): return len(s) # writing def as 1-line in the console
>>> sorted(a, key=by_len)
['z', 'bb', 'aaaa']
>>> sorted(a, key=by_len, reverse=True)
['aaaa', 'bb', 'z']
Sort Key Example 2
>>> foods = [('apple', 5), ('banana', 2), ('chocolate', 137), ('bread', 10)]
>>> sorted(foods)
[('apple', 5), ('banana', 2), ('bread', 10), ('chocolate', 137)]
>>> def by_count(food): return food[1]
>>> sorted(foods)
[('apple', 5), ('banana', 2), ('bread', 10), ('chocolate', 137)]
>>> sorted(foods, key=by_count)
>>> sorted(foods, key=by_rating, reverse=True)
[('chocolate', 137), ('bread', 10), ('apple', 5), ('banana', 2)]
File Reading
with open(filename, 'r') as f:
for line in f:
# process line (line str has '\n' at its end)
print(line, end='')
with open(filename, 'r') as f:
text = f.read() # the whole file contents is in str variable 'text'
# with '\n' chars embedded in it
# a. Could use split() to make a words-list out of the whole file.
# The '\n' are treated like spaces by split()
words = text.split()
# b. Say we want print the text out unchanged.
# The text probably ends with '\n' already so just printing
# it you get an extra '\n' at the end. Therefore use end=''
print(text, end='')
# c. Say you want to process the lines anyway, str.splitlines()
# finds all the '\n' type chars, and divides the text into a list
# of lines. The '\n' are omitted from each returned line by default.
lines = text.splitlines()
with open(filename, 'r') as f:
lines = f.readlines()
# lines is a list of the lines you would get with 'for line in f'
# now can access them all at once
# but requires much more memory
Loops
1. for in - 'foreach'
# words is a list of strings
for word in words:
# use word
# string s
for i in range(len(s)): # all indexes of s
if s[i].isdigit():
...
2. while
i = 0
while i < len(s) and s[i].isdigit():
i += 1
break and continue
while True:
s = obtain_string()
if len(s) == 0:
break
...
Canonical Loops Phrases
Loop over int range
for i in range(10):
print(i)
# 0 1 2 3 4 5 6 7 8 9
Loop over elements in a list
words = ['hi', 'there!', ...]
for word in words:
# use word in here
print(word)
Loop over each index into a collection
s = 'Hello'
for i in range(len(s)):
print(s[i])
# H e l l o
Load a list with data
lst = []
for i in range(10):
lst.append(i)
# lst = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Programming Style - Readable
Variable Names
def foo(s):
if len(s) <= 2:
return s
first = s[0]
last = s[len(s)-1]
mid = s[1:len(s)-1]
halfway = len(mid) // 2 # index of halfway
return first + mid[halfway:] + mid[:halfway] + last
Allowable 1-Letter Names
open(filename, 'r') as f
if the string is an ip address we would use 'ip' as its name,
but some algorithms take in a string with zero semantics, in which case 's' is ok.
Collection Data Structures - 's'
# e.g. 'lines' is a list of text lines from a file
for line in lines:
...
# e.g. 'nums' is a list of int values
for num in nums:
...
# e.g. 'weights' is a list of weight values
for weight in weights:
...
# e.g. 'word_counts' is a dict mapping word -> count
for word in word_counts.keys():
count = word_counts[word]
...
Names vs. Bugs - Time Saver!
Function Decomposition
PEP8
{'a': 1, 'b': 2}
1 + 2
non PEP8: possibly helpful to vary spaces to reflect ordering
'a*b + c + c*d' vs. 'a * b + c + c * d'
Python Modules and Import
Import
#!/usr/bin/env python3
# typically import a few modules at the top of the file,
# here the standard 'sys' module
import sys
def main():
args = sys.argv[1:]
if len(args) == 0:
print('Expected a command line argument')
sys.exit(1)
print("We don't get to this line!")
...