Review

Review

Lecture 2 (Integer Representations and Bits / Bytes)

  • 8 bits = 1 byte
  • base 2 (0-1) vs base 8 (0-8) vs base 10 (0-9) vs base 16 (0-9a-f)
  • Each hex digit is 4 bits
  • three types of number
    • unsigned integer (non-negative only---direct representation of number)
    • signed integer (sign bit exists, uses two's complement---invert and add one)
    • floating point (base-2 scientific notation, for decimals)
  • Myth machine sizes

    type size (in bytes)
    char 1
    short 2
    int 4
    long 8
    size_t 8
    pointer 8
    float 4
    double 8
  • Overflow occurs when the we go above the maximum or below the minimum for our data type and get a result that isn't consistent with regular math (e.g., for an unsigned char 255 + 1 == 0

Lecture 3 (Byte Ordering & Bitwise Operators)

  • Also see https://107.danielr.org/review
  • Integer promotion---when adding two numbers, the one with a smaller positive range will get promoted to the type of the larger one (e.g., adding unsigned int and int gives us an unsigned int)

Otherwise, the integer promotions are performed on both operands. Then the following rules are applied to the promoted operands:

  • If both operands have the same type, then no further conversion is needed.
  • Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank is converted to the type of the operand with greater rank.
  • Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand, then the operand with signed integer type is converted to the type of the operand with unsigned integer type.
  • Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, then the operand with unsigned integer type is converted to the type of the operand with signed integer type.
  • Otherwise, both operands are converted to the unsigned integer type corresponding to the type of the operand with signed integer type.
  • Main bitwise operators: &, |, ^, ~, >> (remember arithmetic vs logical), <<

Lecture 4 (Chars and C-Strings)

  • A character on the myth machine is represented using char which is a one-byte integral type---characters are stored based on their ASCII value and you can use 'a' to make it more readable/portable
  • ctype.h functions

    function description
    isalpha(ch) true if ch is 'a' through 'z' or 'A' through 'Z'
    islower(ch) true if ch is 'a' through 'z'
    isupper(ch) true if ch is 'A' through 'Z'
    isspace(ch) true if ch is a space, tab, new line, etc.
    isdigit(ch) true if ch is '0' through '9'
    toupper(ch) returns uppercase equivalent of a letter
    tolower(ch) returns lowercase equivalent of a letter
  • C strings are represented as arrays of characters ending with the null-terminator ('\0')

  • C strings are passed as char *s
  • Common string.h Functions

    Function Description
    strlen(str) returns the # of chars in a C string (before null-terminating character).
    strcmp(str1, str2), strncmp(str1, str2, n) compares two strings; returns 0 if identical, <0 if str1 comes before str2 in alphabet, >0 if str1 comes after str2 in alphabet. strncmp stops comparing after at most n characters.
    strchr(str, ch), strrchr(str, ch) character search: returns a pointer to the first occurrence of ch in str, or NULL if ch was not found in str. strrchr find the last occurrence.
    strstr(haystack, needle) string search: returns a pointer to the start of the first occurrence of needle in haystack, or NULL if needle was not found in haystack.
    strcpy(dst, src), strncpy(dst, src, n) copies characters in src to dst, including null-terminating character. Assumes enough space in dst. Strings must not overlap. strncpy stops after at most n chars*, and does not necessarily add null-terminating char.
    strcat(dst, src), strncat(dst, src, n) concatenate src onto the end of dst. strncat stops concatenating after at most n characters. Always adds a null-terminating character.
    strspn(str, accept), strcspn(str, reject) strspn returns the length of the initial part of str which contains only characters in accept. strcspn returns the length of the initial part of str which does not contain any characters in reject.
  • We can get a substring by doing pointer arithmetic (string starting at index 1 of a char *str is just str + 1)

  • Three(ish) ways to declare strings
    1. char *s = "Hello world"---s lives on the stack, but points to a constant (read-only) string in the data segment (is really a const char *)
    2. char s[] = "Hello world"---s is an array and the characters live on the stack (you can also specify the number, but easier to make a mistake that way
    3. Initialize memory for s and then fill it in (could use malloc or stack allocation (char buf[50]))

Lecture 5 (From C Strings to Pointers)

  • A pointer is a variable that stores a memory address.
  • If you are performing an operation with some input and do not care about any changes to the input, pass the data type itself. This makes a copy of the data.
  • If you are modifying a specific instance of some value, pass the location of what you would like to modify. This makes a copy of the data’s location.
  • If a function takes an address (pointer) as a parameter, it can go to that address if it needs the actual value.
  • C is always copy by value
  • "Type hierarchy" (Daniel will draw in lecture)

  • If we create a string as a char[], we can modify its characters because its memory lives in our stack space.

  • We cannot set a char[] equal to another value, because it is not a pointer; it refers to the block of memory reserved for the original array.
  • If we pass a char[] as a parameter, set something equal to it, or perform arithmetic with it, it’s automatically converted to a char *.
  • If we create a new string with new characters as a char * , we cannot modify its characters because its memory lives in the data segment.
  • We can set a char * equal to another value, because it is a reassign-able pointer.
  • Adding an offset to a C string gives us a substring that many places past the first character.
  • If we change characters in a string parameter, these changes will persist outside of the function.

  • Buffer overflows can happen if we're not careful about making sure to not write/read past the correct length in a string/array/buffer generally (think about A5)

Lecture 6 (More Pointers and Arrays)

  • A pointer is a variable that stores a memory address.
  • Because there is no pass-by-reference in C like in C++, pointers let us pass around the address of one instance of memory, instead of making many copies.
  • One (8 byte) pointer can represent any size memory location!
  • Pointers also let us refer to memory generically
  • Memory is a big array of bytes.
  • Each byte has a unique numeric index that is commonly written in hexadecimal.
  • A pointer stores one of these memory addresses.
  • const on a pointer (e.g., const char *myptr means that the underlying characters can't be changed---we can still reassign myptr (you would need a char const *myptr to not be able to reassign, but this would let you change the underlying characters))
  • structs are a convenient way of packaging data

    ```c typedef struct { int a; int b; } MyStruct;

    void bar(MyStruct m) { // m->a is syntatic sugar for (m).a }

    void foo(void) { MyStruct one = {3, 4}; MyStruct two; two.a = 3; two.b = 4; MyStruct three = {.a = 3, .b = 4}; } ```

  • ternary operator is a way of doing an if/else expression

    ``` int x; if (argc > 1) { x = 50; } else { x = 0; }

    // is equivalent to int x = argc > 1 ? 50 : 0; ```

  • pointer arithmetic works in terms of the pointee size (+1 on an int pointer advances by 4 bytes, +1 on a char pointer advances by 1 byte---that way &arr[i] gives the same address of arr + i (similar for subtraction))

  • arr[i] is syntactic sugar for *(arr + i)

Lecture 7 (Stack and Heap)

  • Stack
    • Stores local variables and parameters
    • Each function call has its own stack frame, which is created when the call starts, and goes away when the function ends
    • The stack grows downwards and shrinks upwards
    • Fast allocation (similar to a bump allocator)
  • Heap
    • We manually allocate this memory
    • Beneficial if we want to store data that persists beyond a single function call
    • Must clean up the memory when we are done
      • C no longer knows when we are done with it, as it does with stack memory.
    • Malloc/calloc/etc. to request memory on the heap
      • Request number of bytes
      • Get void* to newly-allocated memory
    • Use free when done
    • Memory is resizable with realloc
    • Slow allocation, but is much bigger (stack about 8 MB on the myth machines, heap over 30 GB)

Lecture 8 (C Generics---Void *)

  • void *s let us make more general (polymorphic functions)---all pointers can be converted to/from a void * (although it's not always correct)
  • void *memcpy(void *dest, const void *src, size_t n);
  • void *memmove(void *dest, const void *src, size_t n); (used if source and desintation ranges overlap)
  • Potentially useful trick: remember that there is no void type that we can pass variables to, so if we are doing a generic function and working on ints, we will pass in an int *---then, if we are doing a generic function and working on char *s, we will pass in a char **
  • Normally do arithmetic with a char * typecast to operate in terms of bytes

Lecture 9 (C Generics---Function Pointers)

  • Function pointers as callback functions let you pass in logic to a generic function (e.g., the comparison function we pass into qsort or bsearch)---remember that everything is just bytes under the hood: function pointers just point to the address of the first instruction in the functino (think about A5)
  • Comparison functions int (*compare_fn)(const void *a, const void *b), returns
    • \< 0 if first value should come before second value
    • > 0 if first value should come after second value
    • 0 if first value and second value are equivalent
  • Golden rule: we always pass in a pointer to the underlying type (int -> int *, char * -> char **, etc...)

    ```c int compare_ints(const void a, const void b) { return (int )a - (int )b; // give or take overflow issues }

    int compare_strings(const void a, const void b) { return strcmp(*(const char )a, *(const char )b); } ```

Lecture 10 (Introduction to Assembly)

  • Our C code gets compiled into machine code which has (approximately) a one-to-one correspondence with ASM (with ASM just using "human-readable" mnemonics)
  • Data types are more of a C-level construct than an ASM-level one (everything is ultimately just bytes)
  • x86_64 is a register based architecture---we can only do operations with at least one register (can mov from memory to register or register to register or register to memory but not memory to memory) (true for most instructions): registers are like a limited working set of variables
  • types of values
    • immediate (e.g., $0x104): that literal number
    • register (e.g., %rdi)
    • memory location (e.g., 0x6005c0)
    • indirect offset(reg, reg, scale)
      • (%rax) get the memory pointed to by rax
      • (%rax, %rdi) calculate the address rax + rdi and get the memory there
      • (%rax, %rdi, 4) calculate the address rax + rdi * 4 and get the memory there
      • 3(%rax, %rdi, 4) calculate the address 3 + rax + rdi * 4 and get the memory there
      • 3(%rax) calculate the address 3 + rax and get the memory there
      • note that scale must be one of 1, 2, 4, or `8``
  • mov src, dst moves the value from src to destination

Lecture 11 (Assembly Continued)

  • Probably just bring https://web.stanford.edu/class/archive/cs/cs107/cs107.1238/guide/x86-64.html and https://web.stanford.edu/class/archive/cs/cs107/cs107.1238/resources/x86-64-reference.pdf to the exam

  • data/suffix sizes

    • byte: 1 byte
    • word: 2 bytes
    • long word (or double word): 4 bytes
    • quad word: 8 bytes
  • We have register sizes from 1 byte to 8 bytes
    • e.g., %rax is the full register, its low order 32 bits are %eax, its low order 16 are ax, its low order 8 are al (and the upper half of ax is ah)
  • Instruction purposes
    • %rsp: the stack pointer, points to the current "top" of the stack
    • %rip: stores the address of the next instruction to execute
    • System V ABI parameter order: %rdi, %rsi, %rdx, %rcx, %r8, %r9---return value goes in %rax
  • movs does a sig extend
  • cltq is short for movslq %eax,%rax
  • lea is like one layer of indirection above mov (conceptually, arr[i] is mov, arr + i is lea) syntax is the same
  • Unary instructions (operate on memory or register)

    Instruction Effect Description
    inc D D ← D + 1 Increment
    dec D D ← D - 1 Decrement
    neg D D ← -D Negate
    not D D ← ~D Complement
  • Binary instructions (require at least one register)

    Instruction Effect Description
    add S, D D ← D + S Add
    sub S, D D ← D - S Subtract
    imul S, D D ← D * S Multiply
    xor S, D D ← D ^ S Exclusive-or
    or S, D D ← D \ S
    and S, D D ← D & S And
  • multiplication and division support 128-bit values see slides 28/29

  • Shift instructions

    Instruction Effect Description
    sal k, D D ← D << k Left shift
    shl k, D D ← D << k Left shift (same as sal)
    sar k, D D ← D >>A k Arithmetic right shift
    shr k, D D ← D >>L k Logical right shift
    • When using %cl, the width of what you are shifting determines what portion of %cl is used.
    • For w bits of data, it looks at the low-orderlog2(w)bits of%cl` to know how much to shift.
    • Instructions are just bytes like everything else
    • %rip holds the next instruction to go to

Lecture 12 (Control Flow: When in doubt just JMP!)

  • Control flow:

    • jmp is unconditional, also have conditional versions of the form ja, jg, etc...
    Instruction Synonym Set Condition
    je Label jz Equal / zero
    jne Label jnz Not equal / not zero
    js Label Negative
    jns Label Nonnegative
    jg Label jnle Greater (signed >)
    jge Label jnl Greater or equal (signed >=)
    jl Label jnge Less (signed <)
    jle Label jng Less or equal (signed <=)
    ja Label jnbe Above (unsigned >)
    jae Label jnb Above or equal (unsigned >=)
    jb Label jnae Below (unsigned <)
    jbe Label jna Below or equal (unsigned <=)
    • conditional versions look at the value of the flags register, which is changed by arithmetic and logical operations
      • CF: Carry flag. The most recent operation generated a carry out of the most significant bit. Used to detect overflow for unsigned operations.
      • ZF: Zero flag. The most recent operation yielded zero.
      • SF: Sign flag. The most recent operation yielded a negative value.
      • OF: Overflow flag. The most recent operation caused a two’s-complement overflow-either negative or positive.
    • cmp sets flags based on the result of sub, but doesn't store the result anywhere
    • test sets flags based on the result of and, but doesn't store the result anywhere
    • Read cmp S1,S2 as "compare S2 to S1"

      asm cmp $2, %edi jg [target] * if/for/while all have basic ASM sort of templates that GCC translates them to * set instructions condinonally set a byte to 0 or 1. Reads current state of flags operand is a single-byte register (e.g., %al) or single-byte memory locanon Does not perturb other bytes of register Typically followed by movzbl to zero those bytes * cmovx src,dst conditionally moves data in src to data in dst.

Lecture 13 (Control Flow: When in doubt just JMP!)

  • pushq S pushes on to the stack (S R[%rsp] ⟵ R[%rsp] – 8; M[R[%rsp]] ⟵ S)
  • popq D pops off from the stack (D ⟵ M[R[%rsp]]; R[%rsp] ⟵ R[%rsp] + 8;)
  • callq conceptually pushes %rip (the instruction to return to) and then jumps to the start of the function
  • retq conceptually pops %rip (returning to the return instruction)
  • caller vs callee owned registers (inverse of caller vs callee save)
    • caller owned registers are registers that a caller can assume are the same before and after calling a function (so, if a function wants to change them, it must be sure to change them back to their original values---often, we'll see this with pushq in the prologue and popq in the epilogue)
    • vice versa for callee owned registers
  • data types are aligned

Lecture 14

  • GCC will do optimizations to make our code more efficient (controlled with -O flag from -O0 to -O3 (and -Os, -Oz, -Og)
    • Constant Folding
    • Common Sub-Expression Elimination
    • Dead Code Removal
    • Strength Reduction
    • Code Motion
    • Tail Recursion
    • Loop Unrolling

Heap Allocator Notes

  • A heap allocator is basically something that implements malloc, free, etc... (handles heap allocations)
  • Goal 1: Maximize throughput, or the number of requests completed per unit time. This means minimizing the average time to satisfy a request.
  • Goal 2: Maximize memory utilization, or how efficiently we make use of the limited heap memory to satisfy requests.
  • Types:
    • Implicit allocator
    • Explicit allocator
    • Bump allocator
  • Fragmentation
    • External Fragmentation (this example): no single space is large enough to satisfy a request, even though enough aggregate free memory is available
    • Internal Fragmentation: space allocated for a block is larger than needed (more later)
  • Many design decisions
    • Splitting policy (minimum block sie, alignment requirements, etc...)
    • Which block to pick (first-fit, best-fit, etc...)
    • Ordering of explicit list (address-order, lifo, etc...)
  • In-place realloc
  • Coalescing
  • Requirements: A heap allocator must…
    1. Handle arbitrary request sequences of allocations and frees
    2. Keep track of which memory is allocated and which is available
    3. Decide which memory to provide to fulfill an allocation request
    4. Immediately respond to requests without delay
    5. Return addresses that are 8-byte-aligned (must be multiples of 8).