Today: exam prep, hardware tour, string functions, unicode

Midterm Tuesday Evening

See course page for timing, logistics, lots of practice problems. Finish Crypto program, first, take a couple days off, then worry about the exam. You might plan on spending, say, Sun and Mon evening working practice problems.

Topics on the exam: simple Bit (hw1), images/pixels/nested-loops (hw2), 2-d grids (hw3), strings, loops, simple lists (hw4)

Topics not on exam: bit decomposition problems, bluescreen algorithm, writing main(), file reading, int div //

CS Coding Exam

The bad news / good news of it

How To Practice For A CS Exam

Practice Problems - Reps


Computer Hardware

What is a Computer?

You have one on your person all day. You're debugging code for one. You see the output of them constantly. What is it and how does it work?

1. Why is it called Silicon Valley?

2. Moore's Law

Features on a chip - "nm" Generations

Aside: Chip Factories are Expensive, Amazingly Complex

Quick Tour of How Computers Work

Computer - CPU, RAM, Storage

alt: computer is made of CPU, RAM, storage

Extra: GPU

Extra: CPU types: x86, Arm, Risc-V

There are different types of CPU: x86, Arm, and more recently Risc-V. Low-level software created for one will not run on another. (Python is portable - your Python code will work without modification on many different CPUs). The x86 processors are associated with Intel and AMD and have had a long dominance dating back to the creation of the PC which used an x86 processor in 1982. More recently Arm licenses processors which are totally dominant in cell phones, and more recently Apple has used them in computers. Arm chips are a more modern design compared to x86. Most recently, Risc-v is a open/royalty-free type of CPU, where a manufacturer has the freedom to make them without permission (Arm and x86 are quite the opposite.) I would not be surprised to see Risc-V grow in importance, as openness has a long history of bringing in a lot of investment and creativity.


Want to talk about running a computer program...

A Running Program is a "Process"
Gets its own area in RAM

For example, we have cat.py - a python program. When not running, it is just a file sitting in storage (a file which you wrote!). To run the cat.py program, a "process" is created with space in RAM, and the CPU runs it there. When the program exits, the process is destroyed and the space in RAM can be used for something else. alt:each running program is a process, gets its own area in RAM

Operating System (OS)

CPU / Cores

RAM holds code + variables

alt:process area in RAM area stores both code and values, CPU core runs the code of process1

Multiple Processes at Once

alt:CPU core switches to run process2 for a fraction of a second

Browser Tab = Process

Demo: See the Processes on your Computer
Process Manager


Python Shields us from Hardware Details - Great!

Python allows us to write code to solve the problems we want without needing to know the details of the CPU and RAM. This is progress, much as its useful to be able to ride a bicycle without knowing the details of, say, its wheel bearings. That said, here we will look at how CPU and RAM are used to get a feel for the whole picture.

Hardware Demo Program

Nick's Hardware Squandering Program!

> hardware-demo.zip

Demo: computer is mostly idle to start. An idle CPU does not create much heat. When the CPU starts running hard, it generates heat, and often the laptop fan will start running to cool the CPU. This program is an infinite loop, see the code below. It uses 100% of one core. If the fan running is running on your laptop, use Activity Monitor (Mac), Task Manager (Windows) to see what programs are running, see CPU% and MEM%.

Core function of -cpu feature:

def use_cpu(n):
    """
    Infinite loop counting a variable 0, 1, 2...
    print a line every n (0 = no printing)
    """
    i = 0
    while True:
        if n != 0 and i % n == 0:
            print(i)
        i = i + 1

Try 1000 first ... yikes! Try 1 million instead. Type ctrl-c in the terminal to kill the process.

$ python3 hardware-demo.py -cpu 1000000
0
1000000
2000000
3000000
4000000
5000000
6000000
7000000
^CTraceback (most recent call last):
  File "hardware-demo.py", line 66, in 
    main()
  File "hardware-demo.py", line 56, in main
    use_cpu(n)
  File "hardware-demo.py", line 24, in use_cpu
    i = i + 1
KeyboardInterrupt

(ctrl-c to exit)

Run It Twice

Demo: Nick opens a second terminal. This needs to be done outside of PyCharm - see the Command Line chapter. Run a second copy of hardware-demo.py. Look in the process manager .. now see two programs running at once.


(optional) Let's Talk About RAM

When code reads and writes values, those values are stored in RAM. RAM is a big array of bytes, read and written by the CPU.

Say we have this code

n = 10
s = 'Hello'
lst = [1, 2, 3]
lst2 = lst

Every value in use by the program takes up space in RAM.

alt:python values each taking space in RAM

RAM

How Many Bytes does a Python Value Use?

Demo using -mem, Look in activity monitor (task manager), "mem" area, 100 = 100 MB per second. Watch our program use more and more memory of the machine. Program exits .. not in the list any more! Fancy: try killing off the process from inside the process manager window.

$ python3 hardware-demo.py -mem 100
Memory MB: 100
Memory MB: 200
Memory MB: 300
Memory MB: 400
Memory MB: 500
Memory MB: 600
Memory MB: 700
^CTraceback (most recent call last):
...
KeyboardInterrupt
(ctrl-c to exit)

String - More Functions

See guide for details: Strings

Thus far we have done String 1.0: len, index numbers, [ ], in, upper, lower, isalpha, isdigit, slices, .find().

There are more functions. You should at least have an idea that these exist, so you can look them up if needed. The important strategy is: don't write code manually to do something a built-in function in Python will do for you. The most important functions you should have memorized, and the more rare ones you can look up.

s.startswith() s.endswith()

These are very convenient True/False tests for the specific case of checking if a substring appears at the start or end of a string. Also a pretty nice example of function naming.

>>> 'Python'.startswith('Py')
True
>>> 'Python'.startswith('Px')
False
>>> 'resume.html'.endswith('.html')
True

String - strip()

>>> s = '   this and that\n'
>>> s.strip()
'this and that'

String - split()

>>> # Say read a line like this from file
>>> line = 'Smith,Astrid,112453,2022'
>>> parts = line.split(',')
>>> parts
['Smith', 'Astrid', '112453', '2022']  # split into parts
>>> parts[0]
'Smith'
>>> parts[2]
'112453'
>>>
>>> 'apple:banana:donut'.split(':')
['apple', 'banana', 'donut']
>>> 
>>> 'this    is     it\n'.split()  # special whitespace form
['this', 'is', 'it']

String - join()

>>> foods = ['apple', 'banana', 'donut']
>>> ':'.join(foods)
'apple:banana:donut'

Recall: String + and str()

>>> name = 'Alice'
>>> score = 12
>>> 'Alice' + ' got score:' + str(score)
'Alice got score:12'
>>> 

Format String - New

Put a lowercase 'f' to the left of the string literal, making a specially treated "format" string. For each curly bracket {..} in the string, Python evaluates the expression within and pastes the resulting value into the string. Super handy! The expression has access to local variables. We do not need to call str() to convert to string, it's done automatically.

>>> name = 'Alice'
>>> 
>>> f'this is {name}'
'this is Alice'
>>> 
>>> score = 12
>>> f'{name} got score:{score}'
Alice got score:12
>>>

Optional: Limit Digits {x:.4}

Add ':.4' after the value in the curly braces to limit decimal digits printed. There are many other "format options", but this is the one I use the most by far.

>>> x = 2/3
>>> f'value: {x}'
'value: 0.6666666666666666'
>>> f'value: {x:.4}'
'value: 0.6667'

String Unicode

In the early days of computers, the ASCII character encoding was very common, encoding the roman a-z alphabet. ASCII is simple, and requires just 1 byte to store 1 character, but it has no ability to represent characters of other languages.

Each character in a Python string is a unicode character, so characters for all languages are supported. Also, many emoji have been added to unicode as a sort of character.

Every unicode character is defined by a unicode "code point" which is basically a big int value that uniquely identifies that character. Unicode characters can be written using the "hex" version of their code point, e.g. "03A3" is the "Sigma" char Σ, and "2665" is the heart emoji char ♥.

Hexadecimal aside: hexadecimal is a way of writing an int in base-16 using the digits 0-9 plus the letters A-F, like this: 7F9A or 7f9a. Two hex digits together like 9A or FF represent the value stored in one byte, so hex is a traditional easy way to write out the value of a byte. When you look up an emoji on the web, typically you will see the code point written out in hex, like 1F644, the eye-roll emoji 🙄.

You can write a unicode char out in a Python string with a \u followed by the 4 hex digits of its code point. Notice how each unicode char is just one more character in the string:

>>> s = 'hi \u03A3'
>>> s
'hi Σ'
>>> len(s)
4
>>> s[0]
'h'
>>> s[3]
'Σ'
>>>
>>> s = '\u03A9'  # upper case omega
>>> s
'Ω'
>>> s.lower()     # compute lowercase
'ω'
>>> s.isalpha()   # isalpha() knows about unicode
True
>>>
>>> 'I \u2665'
'I ♥'

For a code point with more than 4-hex-digits, use \U (uppercase U) followed by 8 digits with leading 0's as needed, like the fire emoji 1F525, and the inevitable 1F4A9.

>>> 'the place is on \U0001F525'
'the place is on 🔥'
>>> s = 'oh \U0001F4A9'
>>> len(s)
4

Ethics of Generosity and Unicode

Generosity is Good

History of Unicode and Python

The history of ASCII and Unicode is an example of ethics.

ASCII

One byte per char, but only a-z roman alphabet. Not so helpful for non English speaking world.

In the early days of computing in the US, computers were designed with the ASCII character set, supporting only the roman a-z alphabet. This hurt the rest of the planet, which mostly doesn't write in English. There is a well known pattern where technology comes first in the developed world, is scaled up and becomes inexpensive, and then proliferates to the developing world. Computers in the US using ASCII hurt that technology pipeline. Choosing a US-only solution was the cheapest choice for the US in the moment, but made the technology hard to access for most of the world. This choice is somewhere between ungenerous and unethical.

Unicode Technology

Unicode takes 2-4 bytes per char, so it is more costly than ASCII.

Cost per byte aside, Unicode is a good solution - a freely available standard. If a system uses Unicode, it and its data can interoperate with the other Unicode compliant systems.

Unicode vs. RAM Costs vs. Moore's Law

The cost of supporting non-ASCII data can be related to the cost of the RAM to store the unicode characters. In the 1950's every byte was literally expensive. An IBM model 360 could be leased for $5,000 per month, non inflation adjusted, and had about 32 kilobytes of RAM (not megabytes or gigabytes .. kilobytes!). So doing very approximate math, figuring RAM is half the cost of the computer, we get a cost of about $1 per byte per year.

>>> 5000 * 12 / (2 * 32000)
0.9375

So in 1950, Unicode is a non-starter. RAM is expensive.

RAM Costs Today

What does the RAM in your phone cost today? Say the RAM cost of your phone is $500 and it has 8GB of RAM. What is the cost per byte?

The figure 8 GB is 8 billion bytes. In Python, you can write that as 8e9 - like on your scientific calculator.

>>> 500 / 8e9   # 8 GB
6.25e-08
>>> 
>>> 500 / 8e9 * 100  # in pennies
6.2499999999999995e-06

RAM costs nothing today - 6 millionths of a cent per byte. This is the result of Moore's law. Exponential growth is incredible.

Unicode Makes Sense in 1990s

Sometime in the 1990s, RAM was cheap enough that spending 2-4 bytes per char (unicode) was not so bad compared to 1 byte per char (ASCII). The Unicode standard was created around this time. Unicode is a standard way of encoding chars in bytes, so that all the Unicode systems can transparently exchange data with each other.

With Unicode, the tech leaders were showing a little generosity to all the non-ASCII computer users out there in the world.

Generosity and Python Story

With Unicode, there is a single Python language that can be used in every country - US, China, India, Netherlands.

A world of programmers contribute to Python as free, open source software. We all benefit from that community, vs. each country maintaining their own in-country programming language, which would be a crazy waste of duplicated effort.

Ethic: Generosity

So being generous is the right thing to do. But the story also shows, that when you are generous to the world, that generosity may well come around and help you as well.