L11

Today: accumulate patterns - counting and summing, int modulo, text files, standard output, print(), file-reading, crazycat example program

Accumulate Pattern

Look at double_char() and some similar functions to see a common pattern.

1. at the start: result = empty

2. In the loop, some form of: result += xxx

3. At the end: return result

Recognizing this pattern gives you have a head start solving similar problems.

Loop Counting

A common problem in computer code is counting the number of times something happens within a data set. This is within the pattern, using count = 0 before the loop and count += 1 in the loop. Recall that the line count += 1 will increase the int stored in the variable by 1.

count = 0

loop:
    if thing-to-count:
        count += 1

return count

Example count_e()

This string problem shows how to use += 1 to count the occurrences of something, in this case the number of 'e' in a string.

> count_e()

count_e() Solution

def count_e(s):
    count = 0
    for i in range(len(s)):
        if s[i] == 'e':
            count += 1
    return count

Loop Summing

Suppose I want to add up a bunch of numbers. We can use the accumulate pattern here too. Set sum = 0 before the loop. Inside the loop, use sum += next_number to add each number to the sum. When the loop is done, the sum variable holds the answer.

sum = 0

loop:
    sum += next_number

return sum

Example shout_score()

> shout_score()

Say we want to rate an email about how long and how much shouting it has in it before we read - like scoring emails from your nutty relatives.

Example high-score email:

Hi Sarah, just relaxing in retirement.
I CAN'T BELIEVE WHAT YOUR MOM IS UP TO!!!!!!
WITH THAT NEW HAIRCUT!!!!!!!!!!!!
AND WHY IS THANKSGIVING SO EARLY THIS YEAR!!!!!

Scoring for each char:

lowercase char -> 1 point
uppercase char -> 2 points
      '!' char -> 10 points

Reminder, boolean string tests:

s.isalpha() s.isdigit() s.isspace() s.islower() s.isupper()

shout_score(s): Given a string s, we'll say the "shout" score is defined this way: each exclamation mark '!' is 10 points, each lowercase char is 1 point, and each uppercase char is 2 points. Return the total of all the points for the chars in s.

'Arg!!'  -> 24 points

'A' -> 2
'r' -> 1
'g' -> 1
'!' -> 10
'!' -> 10

In the loop, use the sum pattern to compute the score for the string.

shout_score() Solution

def shout_score(s):
    score = 0
    for i in range(len(s)):
        if s[i] == '!':
            score += 10
        elif s[i].islower():
            score += 1
        elif s[i].isupper():
            score += 2
    return score

Here using if/elif structure, since our intention is to pick out 1 of N tests. As a practical matter, it also works as a series of plain if. Since '!' and lowercase and uppercase chars are all exclusive from each other, only one if test will be true for each char.

Data Types

Python code works on values, and each value has a "type" which determines how it behaves. Most often, what Python code will do follows your intuition. Here we'll look under the hood to see how Python tracks values and types.

Demo

Start off typing some + expressions in the interpreter. The results here are not surprising, but how does the + know what to do?

>>> 1 + 2
3
>>> 
>>> 'a' + 'b'
'ab'
>>>

`123` vs. `'123'`

Q: What is the difference between these two?

123 vs. '123'

A: 123 is an int number, and '123' is a string length 3, made of 3 digit chars

Types - `int` and `str`

These two values are different types. Every value in Python has a "type" which is its category of data. Each type in Python has an official name — name of the integer type is int and the string type is str

`int str` Variables

Suppose we set up these three variables

>>> a = 3
>>> b = 'hi'
>>> c = '7'

Here is what memory looks like. Each variable points to its assigned value, as usual. In addition, each value in memory is tagged with its type - here int and str.

alt: a b c variables, each pointing to value+type

How `+` Operator orks - Type

Python uses the type of a value to guide operations on that value. Look at the + operator in the expressions below. At the moment the + runs, it follows the arrow to see the values to use. On each value, in particular, it can see the type. In this case, when it see int, it does arithmetic and returns an int value. When it sees str, it does string concatenation and returns a str value.

For each variable, Python follows the arrow to get the value to use, and each value is tagged with its type. What is the result for the expressions like a + a below?

alt: hilight type on variable a

>>> a = 3
>>> b = 'hi'
>>> c = '7'
>>>
>>> a + a
6   
>>> b + b
hihi
>>> c + c
77  
>>>

The + with int values does addition, but with str values it does string concatenation.

The type of '7' is str, so '7' + '7' is '77'

(optional) Python Does Not Deduce Type from Variable Name

Normally we follow the convention that a variable named s to points to a string. This is a good convention, allowing people reading the code get the right impression of what the variable stores. We always follow this convention in our example code, so students naturally get the impression that it's some sort of rule. As if Python knows the value is a string because the variable name is s.

In fact, Python does not have a rule that a certain variable name must point to a certain type. To Python, the variable name is just a label of that variable used to identify it within the code. Python's attitude to the variable name is like: this is the name my human uses for this variable.

The type comes from the value at the end of the arrow, such as 7 (int) or 'Hello' (str).

Contrary Name Example

Just to be difficult, here we've chose variable name that do not correspond to the types. What does Python do in this case?

>>> s = 7
>>> x = '9'
>>>
>>> s + s
???
>>> x + x
???

What are the ??? above?

>>> s + s
14
>>> x + x
'99'

Type Conversions - `int() str()`

Integer type is int
String type is str
Each type name is also the name of a conversion function:
int(xxx) takes in string (or other) value, converts to int
e.g. int('123') -> 123
str(xxx) takes in int (or other) value, converts to str
e.g. str(77) -> '77'
Works for other types we will see later too: float(), list(), bool()

Challenge - `'123'` Addition

Say we have a number text = '123' typed by the user. We want to add 100 to it.

1. `str + int` - Error

>>> text = '123'
>>> text + 100
TypeError: can only concatenate str (not "int") to str
>>>

The + works int/int or str/str but not like the above. Solution? Convert the str to int, then do the addition.

2. Convert `int(text)` Then Add

>>> text = '123'
>>> int(text) + 100
223
>>>

The int() function converts str to int form, then we can do addition.

3. Convert `str(n)` Then Concatenate

Similarly, concatenation does not work with int. Use the str(n) function to convert int to str, then concatenate.

>>> # works the other way too
>>> str(123)
'123'
>>>
>>> # append int to str - error
>>> 'score:' + 13
TypeError: can only concatenate str (not "int") to str
>>>
>>> # use str() convert int -> str
>>> # then can concatenate
>>> 'score:' + str(13)
'score:13'
>>>

Exercise sum_digits()

> sum_digits()

'12abc3' -> 6

Students try this one. It combines the accumulate pattern and str/int conversion. Reminder, boolean string test: s.isdigit()

sum_digits(s): Given a string s. Consider the digit chars in s. Return the arithmetic sum of all those digits, so for example, '12abc3' returns 6. Return 0 if s does not contain any digits.

sum_digits() Starter

Here's the rote parts of sum_digits() you can start with. Work out the code inside the loop.

def sum_digits(s):
    sum = 0

    for i in range(len(s)):
        # use s[i]
        pass

    return sum

sum_digits() Solution

def sum_digits(s):
    sum = 0
    for i in range(len(s)):
        if s[i].isdigit():
            # str '7' -> int 7
            num = int(s[i])  
            sum += num
    return sum

`int` — Division and Mod

Looking just briefly at two int arithmetic operators today - division and modulus - which go together.

1. Division `/` always produces float

>>> 7 / 2
3.5
>>> 8 / 2
4.0

2. Problem: Cannot use float for indexing or range()

Suppose we want to compute the middle index of a string, and then access the char at that index. The obvious way to do this is len(s) / 2 to get the midpoint index. Unfortunately that's a float, and it does not work within the square brackets:

>>> s = 'Python'
>>> mid = len(s) / 2
>>> mid
3.0
>>>
>>> s[mid]
TypeError: string indices must be integers, not 'float'
>>>
>>> # Similarly with range()
>>> range(7 / 2)
TypeError: 'float' object cannot be interpreted as an integer

3. Solution: int-division operator `//`

The int division operator // rounds down to produce int. Use this when we want to divide and produce an int.

Python has a separate "int division" operator. It does division and discards any remainder, rounding the result down to the next integer.

>>> 6 // 2
3
>>> 7 // 2
3
>>> 8 // 2
4
>>> 94 // 10
9
>>> 102 // 4
25

Using int-division, can compute the string's midpoint index for a slice

>>> s
'Python'
>>> mid = len(s) // 2
>>> mid
3
>>> s[mid]
'h'

Later Practice: right_left()

> right_left()

'aabb' -> 'bbbbaaaa'

A problem using int-division.

right_left(s): We'll say the midpoint of a string is the len divided by 2, dividing the string into a left half before the midpoint and a right half starting at the midpoint. Given string s, return a new string made of 2 copies of right followed by 2 copies of left. So 'aabb' returns 'bbbbaaaa'.

Modulo, Mod `%` Operator

The "modulo" operator % is essentially the remainder after int division. It's usually called the "mod" operator for short. So for example (57 % 10) yields 7 — int divide 57 by 10 and 7 is the leftover remainder. The mod operator makes the most sense with positive integers, so avoid negative numbers or floats with modulo.

Say we have positive ints a and n

a % n ...

1. Is the int emainder after dividing a by n

2. Always yields an int in the range 0..n-1 inclusive, e.g. mod by 10, is always an int 0..9

3 Returning 0 means the division came out evenly (i.e. 0 remainder)

4. Mod by 0 is an error, just like divide by 0

Mod Examples

>>> 56 % 10
6
>>> 59 % 10   # biggest case
9
>>> 60 % 10   # 0 result -> divides evenly
0
>>> 54 % 5
4
>>> 55 % 5
0
>>> 56 % 5
1
>>> 56 % 0
ZeroDivisionError: integer division or modulo by zero
>>>

Mod - Even vs. Odd

A simple use of mod is checking if an int is even or odd. Consider the result of n % 2. If the result is 0, then n is even, otherwise odd. It's common to use mod like this to, say, color every other row of a table green, white, green, white .. pattern. (See next example)

>>> 8 % 2
0
>>> 9 % 2
1
>>> 10 % 2
0
>>> 11 % 2
1
>>> 12 % 2
0

Example crazy_str()

Produce that internet crazy capitalization like

tHeRe aRe nO MoRe bUgS

crazy_str(s): Given a string s, return a crazy looking version where the first char is lowercase, the second is uppercase, the third is lowercase, and so on. So 'Hello' returns 'hElLo'. Use the mod % operator to detect even/odd index numbers.

'Hello' -> 'hElLo'

index:   0      1      2      3      4
       lower, upper, lower, upper, lower ...

even index: lower
odd index: upper

> crazy_str()

crazy_str() Solution

def crazy_str(s):
    result = ''
    for i in range(len(s)):
        if i % 2 == 0:  # even
            result += s[i].lower()
        else:
            result += s[i].upper()
    return result

File Processing - crazycat example

We'll use the crazycat example to demonstrate files, file-processing, printing, standard output, and functions.

Foreshadow: Parts of the Computer

alt: computer is made of CPU, RAM, storage

We'll meet these later, but the CPU does the computation, RAM stores data when it's worked on, and storage holds files, data to work on later.

crazycat.zip

What Are Files?

Where does your code get its data? files
On the computer "files" provide storage of data
A file has a name and stores some data as bytes
Typically each file as sitting in a folder on your computer
Here is the file named "hibye.txt" we'll use below

alt: hibye.txt file

Text File

A "text file" is a very common form of file
Very old (teletype) .. and used up through today
A text file is a series of lines
Each line is a series of chars ending with a '\n' char
'\n' is a special char called the "newline" char
'\n' is like hitting the "return" or "enter" key on your keyboard
Aside: a few other chars can appear instead of '\n', detailed below

hibye.txt Text File Example

The file named "hibye.txt" is in the crazycat folder. What is a file? A file stores some data. The file has a name and holds a series of bytes representing, say, text, or an image. The data in the file remains intact, even if the computer is switched off. The file is said to be "non-volatile".

bibye.txt Contents

Text file: series of lines, each line a series of chars, each line marked by '\n' at end

The hibye.txt file has 2 lines, each line has a '\n' at the end. The first line has a space, aka ' ', between the two words. Here is the complete contents:

Hi and
bye

Here is what that file looks like in an editor that shows little gray marks for the space and \n (like show-invisibles mode in word processor):

$alt: hibye.txt chars, showing \n ending each line$

In Fact the contents of that file can be expressed as a Python string - see how the newline chars end each line:

'Hi and\nbye\n'

Backslash Chars in a String

Use backslash \ to include special chars within a string literal. Note: different from the regular slash / on the ? key.

\n  newline char
\'  single quote
\"  double quote
\\  backlash char

# Write the word: isn't

s = 'isn\'t'    # use \
s = "isn't"     # use "

(optional) How many chars? How many bytes?

How many chars are in that file (each \n is one char)?

There are 11 chars. The latin alphabet A-Z chars like this take up 1 byte per char. Characters in other languages take 2 or 4 bytes per char. Use your operating system to get the information about the hibye.txt file. What size in bytes does your operating system report for this file?

So when you send a 50 char text message .. that's about 50 bytes sent on the network + some overhead. Text data like this uses very few bytes compared to sound or images or video.

Aside: Detail About Line Endings

In the old days, there were two chars to end a line. The \r "carriage return", would move the typing head back to the left edge. The \n "new line" would advance to the next line. So in old systems, e.g. DOS, the end of a line is marked by two chars next to each other \r\n. On Windows, you will see text files with this convention to this this day. Python code largely insulates your code from this detail - the for line in f form shown below will go through the lines, regardless of what line-ending they are encoded with.

Before reading the file, need some background.

Recall: Function Dataflow - Parameters and Return

Q: How does data flow between the functions in your program?

A: Parameters and Return value

Parameters carry data from the caller code into a function when it is called. The return value of a function carries data back to the caller.

This is the key data flow in your program. It is 100% the basis of the Doctests. It is also the basis of the old black-box picture of a function. This is still true, despite what we see in the next section.

alt: black-box function, params in, return value out

"Standard Output" Text Area

BUT .. there is an additional, parallel output area for a program, shared by all its functions.

There is a text area known as Standard Output associated with every run of a program. By default standard output is made of text, a series of text lines, just like a text file. Any function can append a line of text to standard out by calling the print() function, and conveniently that text will appear in the terminal window hosting that run of python code. the standard output area works in other computer languages too, and each language has its own form of the print() function.

Here we see the print() output from calling the main() function in this example:

alt: print() function prints to standard output text area

print() function

See guide: print()

Python print() function
Adds text to the end of the standard output area
Takes a number of items, separated by commas
Converts each item to string form
Add a single '\n' at the end of the output line
Note that strings do not have quotes around them when printed
In the >>> interpreter, print() output appears in the interpreter

output -Try print() in the interpreter, see its output right there

>>> print('hello there')
hello there
>>> print('hello', 123, '!')
hello 123 !
>>> print(1, 2, 3)
1 2 3

Data out of function: return vs. print

Return and print() are both ways to get data out of a function, so they can be confused with each other. We will be careful when specifying a function to say that it should "return" a value (very common), or it should "print" something to standard output (rare). Return is the most common way to communicate data out of a function, but below are some print examples.

Crazycat Program example

This example program is complete, showing some functions, Doctests, and file-reading.

crazycat.zip

1. Try "ls" and "cat" in terminal

See guide: Command line

See guide: File Read/Write

Open the crazycat project in PyCharm. Open a terminal in the crazycat directory (see the Command Line guide for more information running in the terminal). Terminal commands - work in both Mac and Windows. When you type command in the terminal, you are typing command directly to the operating system that runs your computer - Mac OS, or Windows, or Linux.

pwd - print out what directory we are in

ls - see list of filenames ("dir" on older Windows)

cat filename - see file contents ("type" on older Windows)

$ ls
__pycache__	hibye.txt	quote2.txt
alice-book.txt	poem.txt	quote3.txt
crazycat.py	quote1.txt	quote4.txt
$ 
$ cat hibye.txt 
Hi and
bye
$
$ cat poem.txt 
Roses Are Red
Violets Are Blue
This Does Not Rhyme
$
$ cat quote1.txt 
Shut up, he explained.
 - Ring Lardner
$

2. Run crazycat.py with filename

The crazycat.py program does "cat" but implemented in Python
Demonstrating how to read lines of a text file and print them out
Use the tab-key to autocomplete filenames
The standard out of a program is typically printed to the terminal

$ python3 crazycat.py poem.txt 
Roses Are Red
Violets Are Blue
This Does Not Rhyme
$ python3 crazycat.py hibye.txt 
Hi and
bye
$

3. Standard File-Read Code v1

Say the variable filename holds the name of a file as a string, like 'poem.txt'. The file 'poem.txt' is out in the file system with lines of text in it. Here is the standard code to read through the lines of the file:

with open(filename) as f:
    for line in f:
        # use line
        ...

1. The phrase - with open(filename) as f - opens a connection to that file and stores it in the variable f. Code that wants to read the data from the file works through f which is a sort of conduit to the file.

2. The phrase for line in f: accesses each line of the file, one line at a time, as detailed below.

File-Read Picture

Here is how the variables "f" and "line" access the lines of the file:

alt:file read loop, gets one line at a time from file

This loop reads the lines of the file: for line in f:
On the first iteration, line is set to point to a string of the first line of the file
On the second iteration, the second line, and so on through all the lines from the file
Each line string has a '\n' newline char at its end

Detail: in reality, the chars for each line reside in the file, not in memory. The loop constructs a string in memory to represent each line on the fly. This can be done using little memory, since it only constructs one line at a time, even if the file is very large.

There are other less commonly used variations on the open function described in the guide. If the file read fails with a unicode error, the file may have an unexpected unicode encoding. The following variation lets you specify a different encoding, so you can try to find an encoding that matches the file: open(filename, encoding='utf-8'). The encoding "utf-8" is one widely used encoding shown as an example.

4. `s.strip()` Function

The string s.strip() function, removes whitespace chars like space and newline from the beginning and end of a string and returns the cleaned up string. Here we use it as an easy way to get rid of the newline.

>>> s = '  hello there\n'
>>> s
'  hello there\n'
>>> s.strip()
'hello there'

Addition: `line = line.strip()`

Each line string inside the the for line in f loop has the '\n' newline char at its end. Half the time, this newline char makes no difference to anything, and half the time it ends up getting in the way.

Therefore, we will make a habit of adding line = line.strip() in the loop which removes the '\n' char so we don't have to think about it.

5. Standard File Read Code v2 - `line.strip()`

Here is the file read code with the line.strip() added to remove the '\n'. For CS106A, we will always write it this way, so we never see the '\n'.

with open(filename) as f:
    for line in f:
        line = line.strip()
        # use line

If some CS106A problem asks you to read all the lines of a file, you could paste in the above.

6. Look at print_file_plain() Code

Back to crazycat example - look at the code.

This command line we saw earlier calls the print_file_plain() function below, passing in the string 'poem.txt' as the filename.

$ python3 crazycat.py poem.txt 
Roses Are Red
Violets Are Blue
This Does Not Rhyme

Here is the print_file_plain() function that implements the "cat" feature - printing out the contents of a file. You can see the code is simply the standard file-reading code, and then for each line, it simply prints the line to standard output.

def print_file_plain(filename):
    """
    Given a filename, read all its lines and print them out.
    This shows our standard file-reading loop.
    """
    with open(filename) as f:
        for line in f:
            line = line.strip()
            print(line)

7. Run With -crazy Command Line Option

The main() function looks for '-crazy' option on the command line. We'll learn how to code that up soon. For now, just know that main() calls the print_file_crazy() function.

Here is command line to run with -crazy option

$ python3 crazycat.py -crazy poem.txt 
rOsEs aRe rEd
vIoLeTs aRe bLuE
tHiS DoEs nOt rHyMe

How does the program produce that output?

8. Recall: `crazy_str(s)` Function

Recall the crazy_str(s) black-box function that takes in a string, and computes and returns a funny-capitalization version of it. This function is included in crazycat.py.

crazy_str('Hello') -> 'hElLo'

9. - `-crazy` Code Plan

1. Read each line of text from the file with the standard loop.

2. Call the crazy_str() function passing in each line, getting back the crazy version of that line.

3. Print the crazy version of the line.

10. print_file_crazy() Code

The code is similar to print_file_plain() but passes each line through the crazy_str() function before printing. Think about the flow of data for each iteration of the loop - from the line variable, through crazy_str(), and printed to standard output.

def print_file_crazy(filename):
    """
    Given a filename, read all its lines and print them out
    in crazy form.
    """
    with open(filename) as f:
        for line in f:
            line = line.strip()
            line_crazy = crazy_str(line)
            print(line_crazy)

Experiments

1. Run on alice-book.txt - 3600 lines. The file for-loop rips through the data in a fraction of a second. You can get a feel for how your research project could use Python to tear through some giant text file of data.

python3 crazycat.py -crazy alice-book.txt

2. Shorten the print() in the loop to one line, as below. Describe the sequence of things that happens to each line:

            print(crazy_str(line))

3. (very optional) Try removing the line = line.strip(). What happens to the output? What is happening: the line has a '\n' at its end. The print() function also adds a newline at the end of what it prints.

Optional > Trick

Try running this way, works on all operating systems:

$ python3 crazycat.py -crazy alice-book.txt > capture.txt

What does this do? Instead of printing to the terminal, it captures standard output to a file "capture.txt". Use "ls" and "cat" to look at the new file. This is a super handy way to use your programs. You run the program, experimenting and seeing the output directly. When you have a form you, like use > once to capture the output. Like the pros do it!