Plan for today
Today, we'll learn how computers store information ("data"). We'll also learn how we can manipulate data in code.
- Recall: transistors are on or off (two states)
- Use binary (base 2) instead of decimal (base 10)
- "Bit": 0 or 1 (off/on)
- Equivalent to digit in decimal
- How many numbers can we store with 1 bit? 2? 10?
Bytes and Words
- Individual bits aren't that useful
- Solution: group 8 bits together into bytes
- Optimized to handle bytes instead of bits
- Hexadecimal vs. binary
- Group 4 or 8 bytes together to make a word
- Number of bits the CPU reads from memory at a time
- Part of the architecture
Representing Data: Characters
- Plain text uses ASCII (a numbering system for characters)
- Recall: ASCII art
- Each character is represented by one byte (8 bits)
- Used for representing "special" characters and emojis
- Controlled by The Unicode Consortium
- Represented with two bytes (65,536 combinations)
- Unicode Consortium controls emojis - lots of controversy over which emojis to make official
- Words are just a sequence of characters (computers use ASCII when possible)
Representing Data: ASCII
Representing Data: Integers
- Represented with one computer word (32 or 64 bits)
- Problems with 32 bits:
- Not enough options to label all computers in the world
- Gangnam Style "overflow"
Representing Data: Adding Integers
- Adding integers in binary is exactly like adding integers in decimal
0101 (5) + 0111 (7)
0101 (5) + 1011 (11)
0111 (7) + 0011 (3)
Representing Data: Real Numbers
- Usually called doubles
- Represented with one computer word
- Much like scientific notation (IEEE Floating Point)
- Keeps track of the sign, the exponent, and the fractional part
- Idea: 7.5 can be represented as
- Tradeoff: fixed number of bytes means not perfectly precise
Lots of Bytes
- Fact: 2^10 is 1024 (about 1000)
- 1 kilobyte (KB) = 1024 bytes
- About the size of a 1000 character (200-250 word) paper
- Measures emails and text documents (each email is about 2KB)
- 1 megabyte (MB) = 1024KB (about 1 million bytes)
- MP3 audio is about 1 MB per minute
- Used to measure audio clips and image sizes
- 1 gigabyte (GB) = 1024MB (about 1 billion bytes)
- 1 hour of video is about 2GB
- Used to measure video sizes and computer storage space
- 1 terabyte (TB) = 1024GB (about 1 trillion bytes)
- Used to measure computer storage space
- Sometimes used in the context of "big data", along with petabytes (1024TB)
Storage space practice
|Alice has 600 MB of data. Bob has 700 MB of data. Will it all fit on Alice's 2 GB thumb drive?
|Alice has 100 small images, each of which is 500 KB. How much space do they take up overall in MB?
|Your ghost hunting group is recording the sound inside a haunted Stanford classroom for 20 hours as MP3 audio files. About how much data will that be, expressed in GB?
Megabytes vs. Mebibyte
- Marketers like to interpret a megabyte as 1 million bytes (less memory to make)
- Mebibyte is the actual 1024 * 1024 bytes
- Compression involves storing information using fewer bytes
- Lossless vs. lossy compression
- Original data
- 12000, 12002, 12006, 12007, 12010, 12006, 12005
- One potential lossless scheme - store differences:
- 12000, +2, +4, +1, +3, -4, -1
- One potential lossy scheme: store every other number
- 12000, (xxx), 12006, (xxx), 12010, (xxx), 12005
- Recreated data: 12000, (12003), 12006, (12008), 12010, (12007), 12005
- Lossless text compression
- Idea: not every letter is used equally
- Give each character a custom encoding
- More frequent characters get shorter encodings, z and q get encodings longer than a byte
- Note: computers are fastest at reading at the byte level
Source: Marty Stepp
Other compression schemes
||Lossless vs. Lossy?
||10x less space, but still sounds good
||Free and open source; very widely used and supported
|GIF and PNG
||PNG is a little bit better;mostly used for non-photographs
The image on the right has been extremely compressed, taking up about 29% of the space as the image on the left.
- A note on the readings (they're optional)
- First homework due tomorrow
- Thanks to Shreya for covering Thursday's lecture!
- Please email Shreya and me if you need any accommodations
- Variable: a name for a piece of memory
- Quickly change code output
- Store value for easy access
- Name (
myName) without spaces and value (
- Change value with an equals (
Code: Getting input
- Already seen output
- Input keyword:
- The argument is the question you ask the user
A very fancy calculator
- Variables can store numbers too
- Math operations work as normal:
+ - * /
= to store the result
You try it!
- Write code to ask the user for two numbers, then print their sum.
parseInt("4") gives you the number 4
- Computers store data in bits using binary.
- Collections of bits are more useful for communicating information.
- Different types of information are stored differently. Data can be compressed to maximize storage space.
- We can label places in memory using variables.