Today: while, break parsing, parse words out of string patterns

Data and Parsing

Here's some fun looking data...

$GPGGA,005328.000,3726.1389,N,12210.2515,W,2,07,1.3,22.5,M,-25.7,M,2.0,0000*70
$GPGSA,M,3,09,23,07,16,30,03,27,,,,,,2.3,1.3,1.9*38
$GPRMC,005328.000,A,3726.1389,N,12210.2515,W,0.00,256.18,221217,,,D*78
$GPGGA,005329.000,3726.1389,N,12210.2515,W,2,07,1.3,22.5,M,-25.7,M,2.0,0000*71
$GPGSA,M,3,09,23,07,16,30,03,27,,,,,,2.3,1.3,1.9*38
$GPRMC,005329.000,A,3726.1389,N,12210.2515,W,0.00,256.18,221217,,,D*79
$GPGGA,005330.000,3726.1389,N,12210.2515,W,2,07,1.3,22.5,M,-25.7,M,3.0,0000*78
$GPGSA,M,3,09,23,07,16,30,03,27,,,,,,2.3,1.3,1.9*38
...

Foreshadow: Advance With var += 1

'xx @abc @xyz xx'

See "parse1" Problems on Server

See the "parse1" examples on the experimental server

Start off with some Notes, use them later on real examples

Note 1: Loop Break - censored()

Censor problem: censored(n, censor): Given a non-negative int n. Return a list of the ints [1, 2, 3, ... n]. Except as soon as a number in the given censor list is seen, end the list without that number, so censored(10, [5, 4]) returns [1, 2, 3]. Use "break"

Solution

def censored(n, censor):
    nums = []
    for i in range(1, n + 1):
        if i in censor:  # Key: if-break
            break
        nums.append(i)
    # "break" jumps to here
    return nums

Note 2: Can you control i Within Range Loop? No!

Tempting: inside a loop, control i with = for some case. This does not work at all. At the top of the loop, for/range uses = to set i to its liking. Therefore: cannot use for/i/range if we want to adjust i with our own code.

def censored(n, censor):
    nums = []
    for i in range(n):
        ....
        if something:
            i += 10    # NO does not work
        ...
    return nums

Aside: continue

for i/range vs. while

The for/i/range form is great for going through numbers which you know ahead of time - a common pattern in real programs. However, while is more flexible - can test as we go, not needing to know ahead of time. Ultimately you need both forms.

Note 3: while Equivalent of for/range

Use for/i/range if have a series of numbers to step through. That is a common case, and for/i/range is perfect for it. We'll use while for situations that require more flexibility.

Here is the while-equivalent to for i in range(n)

i = 0         # 1. init
while i < n:  # 2. test
    # use i
    i += 1    # 3. update, loop-bottom (easy to forget)

Example while_double()

double_char() written as a while (using a range() is easier for this problem, so this just demonstrates what while would look like)

def while_double(s):
    result = ''
    i = 0
    while i < len(s):
        result += s[i] + s[i]
        i += 1
    return result

Recall: s.find(target, start)

The str.find() function with an added 2 parameter, start index, indicates where to begin scanning for the target. The default start index is 0

>>> s = 'xx[abc[xx'
>>> s.find('[')     # start = 0, the default
2
>>> s.find('[', 2)  # start = 2, no help
2
>>> s.find('[', 3)  # start = 3, find next one
6

Today - Bad news / Good news

Here we have three main code examples for today. Bad news: these are challenging. Good news: these follow a parsing pattern that we'll use again and again so you get used to it. It will also be useful in your future career.

1. all_lefts()

This function is a halfway point, working out half of the difficulties of really solving this.

Want - find all the left brackets in s, return a list of them:

'xx[xxx[xxxx[xxx' -> ['[', '[', '[']

Figure Out all_lefts()

Challenges:

Space for a drawing, work out the algorithm:



'xx[xxx[xxxx[xxx'

Given string s. Return a list of all the '[' strings in s. Use s.find() within a while loop to find all the '['.

Use search as index variable, marking current position of search within s - this will be a stereotypical pattern for searching through a string. Starting code - work from here with drawing

def all_lefts(s):
    search = 0
    result = []
    while search < len(s):
        # code in here:
        # -s.find() to find the '['
        # -slice to grab '['
        # -update search = ???
        # -if/break when no more '['
    return result

Solution

def all_lefts(s):
    search = 0
    result = []
    while search < len(s):
        # Find '[', at search index
        left = s.find('[', search)
        # No '[' -> exit loop
        if left == -1:
            break
        result.append(s[left:left + 1])
        # Update search for next iteration
        search = left + 1
    return result


alt: drawing in s=search and L=left on top of input string to work out algorithm

2. all_brackets()

Now use both left and right brackets, so this is more realistic. Find all bracket-pairs in s, return a list of all the contained data

'xx[abc]xxx[hi]xxx[woot]xxx' -> ['abc', 'hi', 'woot']

Given string s. Return a list of all the 'abc' strings for each '[abc]' substring within s. For each left bracket, find the right bracket that follows it. End the search if no left or right bracket is found. Use s.find() within a while loop.

Use the all_brackets() code as a starting point. Challenges:

Figure Out all_brackets()

Input case: 'xx[abc]xx[42]xx' -> ['abc', '42']

Draw on top of the input to work out the algorithm:



'xx[abc]xx[42]xx'

Standard CS106A steps - draw an input case, introduce vars left + right on the drawing to work out the details. Run it.

Starter code:

def all_brackets(s):
    search = 0
    result = []
    while search < len(s):
        left = s.find('[', search)
        if left == -1:
            break

        # Your code here





    return result

all_brackets() Solution


alt: drawing in s=search and L=left R=right on top of input string to work out algorithm

def all_brackets(s):
    search = 0
    result = []
    while search < len(s):
        left = s.find('[', search)
        if left == -1:
            break
        right = s.find(']', left)  # or left+1
        if right == -1:
            break
        result.append(s[left + 1:right])
        # Update search at loop end
        search = right  # or right+1
    return result

Q: What does while search < len(s): do?

When to Use while True:

search = right + 1 ?


3. at_words() Example

at_words(s): For each '@' in s, parse out the "word" substring of 1 or more alphabetic chars which immediately follow the '@', so '@abc @ @xyz' returns ['abc', 'xyz'].

'xx @abc xx @xyz' -> ['abc', 'xyz']

at_words() #1

Say we have a loop structure to find the '@' as we have before

    at = s.find('@', search)
    if at == -1:
        break

    end = at + 1
    # loop, advance end past alpha chars


'xx @abc @xyz xx'

at_words(): Find End of Alpha Chars

    end = at + 1
    while s[end].isalpha():
        end += 1

at_words(): After "end" computed

    word = s[at + 1:end]
    result.append(word)
    search = end

at_words(): End Bug

There is a bug in the end/loop. It has to do with this input case:



'@abc'
 0123

Here is the code again. Think about how the loop works when advancing "end" for '@xyz':

    end = at + 1
    while s[end].isalpha():
        end += 1

Problem: keep advancing "end" .. past the end of the string, eventually end is 4. Then the code s[end].isalpha() throws an error since end (4) is past the end of the string.

The loop above translates to: "advance end so long as the char it refers to is alphabetic"

To fix the bug, we modify the test to: "advance end so long as it refers to a char that exists and that char is alphabetic"

Fix End Bug

at_words() Case #2 - Zero Chars

at_words() Solution

def at_words(s):
    search = 0
    words = []
    while True:
        at = s.find('@', search)
        if at == -1:
            break
            
        # Pass over alpha chars to find end
        end = at + 1
        while end < len(s) and s[end].isalpha():
            end += 1
        
        word = s[at + 1:end]
        # Screen out len-0 word
        if len(word) > 0:
            words.append(word)
        
        # Set up next iteration
        search = end
    return words

Extra Example exclaim_words()

exclaim_words(s): For each '!' in s, parse out the "word" substring of one or more alphabetic chars which are immediately to the left of the '!'. Return a list of all such words including the '!', so 'x hey!@ho! returns ['hey!', 'ho!']. (Like at_words, but right-to-left)