Today: find()/slice practice, text files, file-reading, crazycat example program
> find() + slice exercises
Int indexing into something is very common in computer code. So of course doing it slightly wrong is very common as well. So common, there is a phrase for it - "off by one error" or OBO — it even has its own wikipedia page. You can feel some kinship with other programmers each time you stumble on one of these.
"This code is perfect! Why is this not working. Why is this not work ... oh, off by one error. We meet again."
'aabb' -> 'bbbbaaaa'
We'll say the midpoint of a string is index len // 2, dividing the string into a left half before the midpoint and a right half starting at the midpoint. Given string s, return a new string made of 2 copies of right followed by 2 copies of left. So 'aabb' returns 'bbbbaaaa'.
def right_left(s): mid = len(s) // 2 left = s[:mid] right = s[mid:] return right + right + left + left # Style comparison: # Without using any variables, the solution # is longer and not so readable: # return s[len(s) // 2:] + s[len(s) // 2:] + s[:len(s) // 2] + s[:len(s) // 2]
Notice the decomp-by-var strategy: break the computation into smaller, named parts. A big improvement.
Lesson: if you have a line you cannot get working. Can you divide into a few lines, each a sub-part? This is our old divide-and-conquer strategy, working at the level of lines.
This looks simple, but the details are tricky. Make a drawing.
Given string s. Find the first '@' within s. Return the len-3 substring immediately following the '@'. Except, if there is no '@' or there are not 3 chars after the @, return ''.
Suggestion: what is the index of the last char we want to pull out? Is that index beyond the valid chars in s, then the string is not long enough and we return the empty string.
def at_3(s): at = s.find('@') if at == -1: return '' # Is at + 3 past end of string? # Could "or" combine with above if at + 3 >= len(s): return '' return s[at + 1:at + 4] # Working out >= above ... drawing!
This is nice, realistic string problem with a little logic in it.
s.find() variant with 2 params:
s.find(target, start_index) - start search at start_index vs. starting search at index 0.
Given string s.
Look for a '(.....)' within s -
look for the first '(' in s, then the
first ')' after the '(', using the second start_index
parameter of .find(). If both parens are found,
return the chars between them, so
no such pair of parens is found, return the
empty string. Think about the input
def parens(s): left = s.find('(') if left == -1: return '' # Start search at left + 1: right = s.find(')', left + 1) if right == -1: return '' # Use slice to pull out chars between left/right return s[left + 1:right]
See guide: Python Print for details about print() and standard out.
See guide: Python File for details about file reading and writing.
We'll use the crazycat example to demonstrate files, file-processing, and printing.
'\n'is called the "newline" char
'\n'is like hitting the "return" or "enter" key on your keyboard
Use backslash \ to include special chars in a string
s = 'isn\'t' # or use double quotes # s = "isn't" \n newline char \\ backlash char \' single quote \" double quote
2 lines, each line has a
'\n' at the end.
The first line has a space, aka
' ', between the two words.
Hi and bye
Here is what that file looks like in an editor that shows little gray marks for the space and \n
In Fact the contents of that file can be expressed as a Python string
How many chars are in that file (each \n is one char)? Roman alphabet A-Z chars like this take up 1 byte per char. This comes to 11 chars. Look in your file-system explorer on your computer, get-info on the file. See if it's 11 bytes in size.
So when you send a 50 char text message .. that's about 50 bytes sent on the network + some overhead. Text uses very few bytes compared to sound or images or video.
'\n'at the end of the line
>>> print('hello', 'there', '!') hello there ! >>> print('hello', 123, '!') hello 123 ! >>> print('hello', 123, '!', sep=':') hello:123:! >>> print(1, 2, 3) # end='\n' the default 1 2 3 >>> print(1, 2, 3, end='xxx\n') # end= what goes at end 1 2 3xxx >>> print(1, 2, 3, end='') # suppress the \n 1 2 3>>>
Return and print() are both ways to get data out of a function, so they can be confused with each other. We will be careful when specifying a function to say that it should "return" a value (most common), or it should "print" something to standard output. Return is the most common way to communicate data out of a function, but below are some print examples.
Open a terminal in the crazycat directory (see the Command Line guide for more information running in the terminal). Terminal commands - work in both Mac and Windows. When you type command in the terminal, you are typing command directly to the operating system that runs your computer - Mac OS, or Windows, or Linux.
pwd - print out what directory we are in
ls - see list of filenames ("dir" on older Windows)
cat *filename* - see file contents ("type" on older Windows)
$ ls alice.txt crazycat.py hibye.txt poem.txt quotes $ cat poem.txt Roses Are Red Violets Are Blue This Does Not Rhyme $
$ python3 crazycat.py poem.txt Roses Are Red Violets Are Blue This Does Not Rhyme $ python3 crazycat.py hibye.txt Hi and bye $
Here is the canonical file-reading code:
with open(filename) as f: for line in f: # use line in here
Visualization of how the variable "line" behaves for each iteration of the loop:
'\n'at its end
open(filename)- open for reading
open(filename, 'r')- same as above, 'r' denotes reading
open(filename, 'w')- open for writing
open(filename, encoding='utf-8')- specify unicode encoding (later)
Here is the working function to print the contents of a file. Why do we need
The line already has
\n at its end, so get double spacing if print() adds its standard
def print_file_plain(filename): with open(filename) as f: for line in f: # use line in here print(line, end='')
def crazy_line(line): """ Given a line of text, returns a "crazy" version of that line, where upper/lower case have all been swapped, so 'Hello' returns 'hELLO'. >>> crazy_line('Hello') 'hELLO' >>> crazy_line('@xYz!') '@XyZ!' >>> crazy_line('') '' """ result = '' for i in range(len(line)): char = line[i] if char.islower(): result += char.upper() else: result += char.lower() return result
Here is command line to run with -crazy option
$ python3 crazycat.py -crazy poem.txt rOSES aRE rED vIOLETS aRE bLUE tHIS dOES nOT rHYME
Here is print_file_crazy(), similar to print_file_plain() but passes each line through the crazy_line() function before printing.
def print_file_crazy(filename): """ Given a filename, read all its lines and print them out in crazy form. """ with open(filename) as f: for line in f: print(crazy_line(line), end='')
A loose end we'll need cleared up soon. See guide Python String
What is the difference between 123 and '123'? How do they work with the
123is an integer, type is
'123'is a string, a series of chars, type is
+use this type information
int() and str()
>>> a = 123 >>> b = 5 >>> a + b 128 >>> >>> a = 'hi' >>> b = 'there' >>> a + b 'hithere' >>> >>> # e.g. line is out of a file - a string >>> # convert str form to int >>> line = '123\n' >>> int(line) 123 >>> >>> # works the other way too >>> str(123) '123'