Today: Loose ends, debugging - bug and symptom. String - replace(), split(), join(), unicode. computer = CPU + RAM + Storage, CPU use, RAM use

Python Debug

See Python Debug

Bug vs. Symptom
Bug = flaw in code
e.g. x = x + 11
e.g. meant x = x + 1
Symptom
e.g. bad index
e.g. bad function name
Python notices the symptom
The symptom is logically downstream from the bug
Work backwards from the symptom to the bug
Look at the bottom of the error listing, work backwards looking for bug
1. Bad function name example
2. Bad data passed to function, then it crashes

alt:python bug then symptom

String - More Functions

See: Python String

String - replace()

str.replace(old, new)
Returns a new string with replacements done (immutable)
Does not respect word boundaries, just dumb replacement
Strategy:
Given s to compute something
e.g. count the digits in s
Do not use replace() to modify s as a shortcut to computing about s
Not a good strategy

>>> s ='this is it'
>>> s.replace('is', 'xxx')
'thxxx xxx it'
>>> 
>>> s.replace('is', '')
'th  it'
>>> 
>>> s
'this is it'

String - strip()

Removes whitespace chars from either end
Use with for line in f to trim off \n

>>> s = '   this and that\n'
>>> s.strip()
'this and that'

String - split()

Nice feature to parse a line of text
e.g. from a file line 11,45,19.2,N
str.split() -> array of strings
str.split(',') - splits on ',' substring
str.split() - zero params
special
splits on 1 or more whitespace chars
handy primitive "word" from line feature

>>> s = '11,45,19.2,N'
>>> s.split(',')
['11', '45', '19.2', 'N']
>>> 'apple:banana:donut'.split(':')
['apple', 'banana', 'donut']
>>> 
>>> 'this    is     it  '.split()  # special space form
['this', 'is', 'it']

String - join()

Reverse of split()
List of strings, puts them together to make a big string
Mnemonic: str.split() and str.join(), the string is the noun in noun.verb form

>>> foods = ['apple', 'banana', 'donut']
>>> ':'.join(foods)
'apple:banana:donut'

String - format()

Want to have a string and paste values into it
str.format() does this
The marker {} marks where to paste in
Simple first way: use + and str(12) to assemble string

>>> 'Alice' + ' got score:' + str(12)  # old: use +
'Alice got score:12'
>>>
>>> '{} got score:{}'.format('Alice', 12) # new: format()
'Alice got score:12'
>>>

String Unicode

(just quoting from Python String) In the early days of computers, the ASCII character encoding was very common, encoding the roman a-z alphabet. ASCII is simple, and requires just 1 byte to store 1 character, but it has no ability to represent characters of other languages.

Each character in a Python string is a unicode character, so characters for all languages are supported. Also, many emoji have been added to unicode as a sort of character.

Every unicode character is defined by a unicode "code point" which is basically a big int value that uniquely identifies that character. Unicode characters can be written using the "hex" version of their code point, e.g. "03A3" is the "Sigma" char Σ, and "2665" is the heart emoji char ♥.

Hexadecimal aside: hexadecimal is a way of writing an int in base-16 using the digits 0-9 plus the letters A-F, like this: 7F9A or 7f9a. Two hex digits together like 9A or FF represent the value stored in one byte, so hex is a traditional easy way to write out the value of a byte. When you look up an emoji on the web, typically you will see the code point written out in hex, like 1F644, the eye-roll emoji 🙄.

You can write a unicode char out in a Python string with a \u followed by the 4 hex digits of its code point. Notice how each unicode char is just one more character in the string:

>>> s = 'hi \u03A3'
>>> s
'hi Σ'
>>> len(s)
4
>>> s[0]
'h'
>>> s[3]
'Σ'
>>>
>>> s = '\u03A9'  # upper case omega
>>> s
'Ω'
>>> s.lower()     # compute lowercase
'ω'
>>> s.isalpha()   # isalpha() knows about unicode
True
>>>
>>> 'I \u2665'
'I ♥'

For a code point with more than 4-hex-digits, use \U (uppercase U) followed by 8 digits with leading 0's as needed, like the fire emoji 1F525, and the inevitable 1F4A9.

>>> 'the place is on \U0001F525'
'the place is on 🔥'
>>> s = 'oh \U0001F4A9'
>>> len(s)
4

What is a computer?

You have one on your person all day. You're debugging code for one. You see the output of them constantly. What is it and how does it work?

Step 1 - Why is it called Silicon Valley?

Silicon valley is here because of Stanford
Prof Fred Terman -> Hewlett and Packard (1939) -> Silicon valley
Orchards and cheap real estate at that time!
Thin silicon chip
note: Silicon (chips) and Silicone (rubbery stuff) easily confused
Tiny transistors are "etched" onto the chip
Moore's Law: transistors per chip doubles every 2 years
(Moore's law appears to be slowing from the 2 year cadence at this time)
i.e. smaller transistors, fit more per chip ... cheaper!
Since 1965, an incredible run of improvement
Think about phone 6 years ago, 3x doublings = 8x
was 8GB storage .. now 64 GB is the minimum
.. Moore's law!
silicon chip

We'll look starting from the outside...

Computer - CPU, RAM, Storage

alt: computer is made of CPU, RAM, storage

3 parts of the computer (or phone)
1. CPU
The brains, 2 GHz, simple instructions
CPU does work (RAM stores the work)
e.g. run a line: a = b + c
(Central Processing Unit)
2. RAM
Temporary store of bytes for CPU
Stores code and its variables
Not persistent (power-off = erased)
(Random Access Memory)
3. Persistent Storage
"storage" in laptop / phone / USB key
Storage in the form of files, folders
Measured in bytes, like RAM, but much cheaper.
Your phone might have 4GB of RAM, but 64GB of storage
"Persistent", keeps state even if powered-off

Want to talk about running a computer program...

1. Running Program Gets its own RAM Area

Running program gets its own area in RAM
The areas are kept separate from each other
Multiple programs can run at one time
When a program exits, its RAM space is reclaimed

alt:each running program gets its own area in RAM

2. Operating System (Terminal)

"Operating System" (OS) manages CPU, RAM etc.
Starts and stops programs
Manages files
e.g. Windows, iOS, Android, Mac OS, Linux
OS starts programs, knows about files
When you bring up the "terminal"
You are typing commands to the OS
Run programs: python3 crazycat.py alice.txt
List files: ls
show the contents of files: cat poem.txt

3. RAM = Code + Vars, CPU Runs the Code

RAM holds Code of program for CPU
RAM holds values like 'Hello' and [1, 2, 3]
CPU runs the code, manipulates the values

alt:program RAM area stores both code and values

4. CPU - Cores

Modern CPU can walk and chew gum at the same time
CPU has 2 or 4 or more "cores"
Each core can run code a the same time
i.e. You see your computer running multiple programs at once
Note: 16 core machine is not 16x more useful than a 1 core machine - greatly diminishing returns. Most of the benefit from 2 cores.

Python Shields us from Hardware Details - Great!

Python shields us from much detail about CPU and RAM, which is great. We're just peeking at the details here to get a little insight about what it means for a program to run, use CPU and RAM.

Hardware Demo Program

Nick's Hardware Squandering Program!

> hardware-demo.zip

Demo: computer is mostly idle to start. Idle CPU is cool. CPU starts running hard, generates heat .. fan spins! This program is an infinite loop - uses 100% of one core. Why is the fan running on my laptop? Use Activity Monitor (Mac), Task Manager (Windows) to see programs that are currently running, see CPU% and MEM%. Run program twice, once in each of 2 terminals - 200%

Core function of -cpu feature:

def use_cpu(n):
    """
    Infinite loop counting a variable 0, 1, 2...
    print a line every n (0 = no printing)
    """
    i = 0
    while True:
        if n != 0 and i % n == 0:
            print(i)
        i = i + 1

Try 1000 first ... woah! Try 1 million instead

$ python3 hardware-demo.py -cpu 1000000
0
1000000
2000000
3000000
4000000
5000000
6000000
7000000
^CTraceback (most recent call last):
  File "hardware-demo.py", line 66, in 
    main()
  File "hardware-demo.py", line 56, in main
    use_cpu(n)
  File "hardware-demo.py", line 24, in use_cpu
    i = i + 1
KeyboardInterrupt

(ctrl-c to exit)

Let's Talk About RAM

When code reads and writes values, those values are stored in RAM. RAM is a big array of bytes, read and written by the CPU.

Say we have this code

n = 10
s = 'Hello'
lst = [1, 2, 3]
lst2 = lst

Every value in use by the program takes up space in RAM.

alt:python values each taking space in RAM

RAM

Each Python value is stored using bytes in RAM
Every value gets its own area
Every value is tagged with its type - int, str, ...
Bytes required
Each Python value has, say, 16 bytes of fixed overhead
Here's how it works out
The int value 10 is 8 bytes + overhead = 24 bytes
The string 'hello' - is 2 bytes per char + 16 = 26 bytes
If the string were 100 chars long, that would 200 + 16 = 216 bytes

Demo using -mem, Look in activity monitor, "mem" area, 100 = 100 MB per second. Watch our program use more and more memory of the machine. Program exits .. not in the list any more!

$ python3 hardware-demo.py -mem 100
Memory MB: 100
Memory MB: 200
Memory MB: 300
Memory MB: 400
Memory MB: 500
Memory MB: 600
Memory MB: 700
^CTraceback (most recent call last):
...
KeyboardInterrupt
(ctrl-c to exit)