L10

Today: string in, if/elif, string .find(), slices and drawing

See chapters in Guide: String - If

String `in` Test

String in - test if a substring appears in a string
Chars must match exactly - recall "case sensitive"
This is boolean True/False test
use .find() (below) to know where substring is
Mnemonic: Python re-uses the word "in" from the for loop
General: we'll use in later for other data types, string is just a first example
Strategy: don't write code for something that Python has built-in
in is a good example

>>> 'Dog' in 'CatDogBird'
True
>>> 'dog' in 'CatDogBird'   # upper vs. lower case
False
>>> 'd' in 'CatDogBird'     # finds d at the end
True
>>> 'atD' in 'CatDogBird'   # not picky about words
True
>>> 
>>> 'x' in 'CatDogBird'
False

Variant: `not in`

There's also a not in form which is True if the element is not in there, similar to !=. Use this form an if-statement where you want to take an action if something is not in a string.

>>> s = 'CatDogBird'
>>> if 'Fish' not in s:     # YES this way
        print('no Fish')
no Fish
>>>
>>> if not 'Fish' in s:     # NO works but not PEP8
        print('no Fish')
no Fish
>>>

In Python style, the first form above - not in - is officially preferred to the second.

Example: has_pi()

> has_pi()

    '3.14' -> True
'a 14 b 3' -> True
     '315' -> False

has_pi(s): Given a string s, return True if it contains the substrings '3' and '14' somewhere within it, but not necessarily together. Use "in".

Note these functions are in the string-3 section on the Experimental server

Syntax That Does Not Work

The form below looks sensible and works in English. However, it does not work in Python or most computer languages:

if '3' and '14' in s:   # NO does not work
    ...

The and cannot appear in a variety of positions as it can in English. The and must be between two boolean values, with each boolean value produced by an expression like < or == or in. The correct form is shown below. Notice how each side of the and is an expression that produces a boolean.

if '3' in s and '14' in s:
    ...

Strategy: Built In Functions

Python has many built-in functions, like "in" shown above, and we will see all the important ones in CS106A. You want to know the common built-in functions, since using a built-in is far preferable to writing code for it yourself - "in" is a nice example. It would be a mistake to manually write code look through a string to see if another string appears in there. Just use "in" for that. The built-in works correctly, and it makes readable code, since Python programmers are already familiar with "in" and know what it does at a glance.

The "in" operator works for several data structure to see if a value is in there, and its use with strings is our first example of it. Also, in some cases, the built-in can run faster than what you could code yourself.

Later Practice: has_first()

> has_first()

Example: catty()

> catty()

'xaCtxyzAx' -> 'aCtA'

Return a string made of the chars from the original string, whenever the chars are one of 'c' 'a' 't', (not case sensitive).

Catty Solution V1

Not case sensitive: convert each char to lowercase form with s[i].lower(), then do testing with the lowercase form.

This works correctly, but that if-test is quite long. Can we do better? Indeed, it's so long, it's awkward to fit on screen.

Aside: see style guide breaking up long lines for way to break up long lines like this.

def catty(s):
    result = ''
    for i in range(len(s)):
        if s[i].lower() == 'c' or s[i].lower() == 'a' or s[i].lower() == 't':
            result += s[i]
    return result

Strategy Idea: Add Var

The code is getting a little lengthy
The repeated s[i].lower() is irksome
Add var:
1. We have an intermediate value
2. Store the value in a well-named variable
3. Code below just uses the variable
Shorten the code, less repetitive typing
Variable name helps the code "read" better
Can help divide the code into more understandable pieces
Like the strategy of decomposition, but within a function
Also known as "decomp by var"

Catty Solution V2 - Better

Start with the V1 code. Add a variable to hold the repeated computation — shorten the code and it "reads" better with the new variable.

    low = s[i].lower()

Solution with "low" variable, better

def catty(s):
    result = ''
    for i in range(len(s)):
        low = s[i].lower()   # Add var
        if low == 'c' or low == 'a' or low == 't':
            result += s[i]
    return result

We will make frequent use of this strategy in CS106A. If the solution is getting a little lengthy, add a variable to hold some sub-part of the computation for use on later lines.

Style: Variable Names

We'll talk about Style more later. For today, the name of a variable should label that value in the code, helping the programmer to keep their ideas straight. Other than that, the name can be short. The name does not need to repeat every true thing about the value. Just enough to distinguish it from other values in this algorithm.

1. Good names for this example, short but with key facts: low, low_char

2. Names with more detail, probably too long: low_char_i, low_char_in_s

3. Avoid this name: lower - the name would work, but we avoid naming a variable with a word that is also a function, to avoid confusion. Here .lower() is the name of a string function.

The V1 code above is acceptable, but V2 is shorter and nicer. The V2 code also runs slightly faster, as it does not needlessly re-compute the lowercase form three times per char.

Optional Aside: "in" Trick Form of "or"

This is just a coding trick, not something we would ever require or look for students to do. The way in works for strings, it can do the "or" logic for us, like this:

# instead of this
if low == 'c' or low == 'a' or low == 't':
    ...

# use "in"
if low in 'cat':
    ...

Recall: `if` and `if/else:`

Have the plain if is the most common - one test and one action
Then have if/else: - one test selecting between two actions
Those are the most common
Here is a third if/elif structure in case there are N tests to go through

N Tests - `if/elif`

Use the if/elif structure to look through a series of tests, stopping at the first True test. This is much more rarely used than the plain if-statement.

The sequence is akin to looking through a series of drawers for a pen — you look in each drawer in turn, and stop as soon as you find the pen.

The structure has n if-tests.

if test1:
  action1
elif test2:
  action2
elif test3:
  action3
else:
  action4

Python goes through the tests from top to bottom, stopping at the first True test. Python runs the corresponding action, and then exits the if/elif structure. The result is that at most 1 of the n actions runs. An optional "else" at the end runs if none of the tests succeed. Mnemonic: the words "else" and "elif" are the same length.

Example: vowel_swap()

> vowel_swap()

The need for an if/elif structure is a little rare, but this problem is dialed in to show what if/elif solves.

The most common letters used in English text are: e, t, a, i, o, n

Here we process string s, swapping around the 3 vowels like this:

e -> a
a -> i
i -> e

This changes an English word in a way that looks like a word and is kind of funny.

'table' -> 'tibla'
'kitten' -> 'kettan'
'radio' -> 'rideo'

vowel_swap(s): Given string s. We'll swap around the three most common vowels in English, which are 'e', 'a', and 'i'. Return a form of s where each lowercase 'e' is changed to 'a', each 'a' is changed to 'i', and each 'i' is changed to 'e'. Other chars leave unchanged. So the word 'kitten' returns 'kettan'. The provided loop sets a variable ch to hold each char in turn, appending ch to the result. Add code to change ch.

vowel_swap() v1 Code

The provided loop sets a variable ch to be each char in turn. This solution is written with plain "if" to check and change each char. This code has a subtle problem.

def vowel_swap(s):
    result = ''
    for i in range(len(s)):
        ch = s[i]
        # Make changes to ch
        if ch == 'e':
            ch = 'a'
        if ch == 'a':
            ch = 'i'
        if ch == 'i':
            ch = 'e'
        
        result += ch
    return result

Run this code. Here is some incorrect output it produces

'aaaa' -> 'eeee'

Why does produce a bunch of 'e' instead of the expected 'i' here?

Problem Trace - Multiple If Interference

The problem is not obvious glancing at the code. Trace through the v1 code carefully for the input 'aaaa'. The ch == 'a' if-test succeeds, which is fine. But then the ch == 'i' test also succeeds, which is a problem. We have multiple if-tests, and they are interfering with each other.

vowel_swap() Solution if/elif

With if/elif, only one if-test succeeds, which is what we want for this 'e' 'a' 'i' detection:

def vowel_swap(s):
    result = ''
    for i in range(len(s)):
        ch = s[i]
        # Make changes to ch
        if ch == 'e':
            ch = 'a'
        elif ch == 'a':
            ch = 'i'
        elif ch == 'i':
            ch = 'e'
        
        result += ch
    return result

if/elif vs. if/return

A return can accomplish something similar to the if/elif structure, which is why we have not really needed if/elif up until now. Suppose we are doing the vowel-swap algorithm, but in a function that processes a single char. This is our pick-off strategy, exiting the function once a solution is known.

def swap_ch(ch):
    """Vowel-swap on one char."""
    if ch == 'e':
        return 'a'
    if ch == 'a':
        return 'i'
    if ch == 'i':
        return 'e'
    return ch

Since the return exits the function, we get in effect the if/elif behavior. Once an if-test succeeds, the later ones are skipped.

However, the full-string vowel_swap() above cannot use return like this, as it needs to keep running the loop to do the other characters. We need to handle each char in the loop but without leaving the function, and for that, the if/elif is perfect.

Later Practice: str_adx()

> str_adx()

str_adx(s): Given string s. Return a string of the same length. For every alphabetic char in s, the result has an 'a', for every digit a 'd', and for every other type of char the result has an 'x'. So 'Hi4!x3' returns 'aadxad'. Use an if/elif structure.

String .find()

alt:string 'Python' shown with index numbers 0..5

s.find(target_str)- search s for target_str
Returns int index where found first, searching from start of s
Returns -1 if not found
Alternate form: 2nd "start_index" parameter, starts search from there
s.find(target_str, start_index) (shown on later example)

>>> s = 'Python'
>>> 
>>> s.find('th')
2
>>> s.find('o')
4
>>> s.find('y')
1
>>> s.find('x')
-1
>>> s.find('N')
-1
>>> s.find('P')
0

Strategy: Dense = Slow Down

Some lines of code are routine
Require just normal attention
An advantage of using idiomatic phrases
for i in range(len(s)):
But some lines are dense
Slow down for those, work carefully
Slices (below) are quite dense
Dense = Powerful!

Python String Slices 1

This is a fantastic feature
"substring" - contiguous sub-part of a string
Access substring with 2 numbers
"slice" uses colon to indicate a range of indexes
s[1:3] returns 'yt'
Start at first number
Up to but not including second number UBNI
s[3:3] = empty string
"Not including" dominates the "starting at"
Can try it in the interpreter
Style: typically written with no spaces around ":"

alt:string 'Python' shown with index numbers 0..5

>>> s = 'Python'
>>> s[1:3]    # 1 .. UBNI
'yt'
>>> s[1:5]
'ytho'
>>> s[4:5]
'o'
>>> s[4:4]    # "not including" dominates
''

Slices 2 - Can Omit Start/End Numbers

If start index is omitted, slice goes from start of string
If end index is omitted, slice goes through end of string
If number is too big .. uses end also
This is a little unusual
In other cases, Python will halt with an error if an index is too big, but with slices Python is permissive
Note perfect split: s[:4] and s[4:]
Number used as both start and end
Splits the string into 2 pieces exactly

alt:string 'Python' shown with index numbers 0..5

>>> s[:3]     # omit num = from/to end
'Pyt'
>>> s[:4]
'Pyth'
>>> s[4:]     # split str at 4
'on'
>>> s[4:999]  # too big = through the end
'on'
>>> s[:]      # the whole thing
'Python'

brackets() Strategy - Drawing vs. OBO Errors

This is a nice example. The code is dense, but the details can be managed with careful use of variables and a drawing. This code can easily fail with Off-By-One (OBO) errors, but we try to proceed carefully and get each line exactly right. Or more simply — don't try to do it in your head.

Example: brackets()

> brackets

'cat[dog]bird' -> 'dog'

brackets(s): Look for a pair of brackets '[...]' within s, and return the text between the brackets, so the string 'cat[dog]bird' returns 'dog'. If there are no brackets, return None. If the brackets are present, there will be only one of each, and the right bracket will come after the left bracket.

A first venture into using index numbers and slices. Many problems work in this domain - e.g. extracting all the hashtags from your text messages.

Problem spec: either 2 brackets, or zero brackets
Start with drawing 'cat[dog]bird'
Think of each line - add to drawing
The drawing keeps the details as we work out the code
Strategy:
Use s.find()
left = s.find('[')
right = s.find(']')
Switch between drawing and code
Add Var Strategy
Store in variable left for later lines
Nice to have words left and right in code narrative
Use the same words in the drawing
Look for right bracket
Use slice to pull out and return answer

Brackets Drawing

alt: draw 'cat[dog]bird', show left, right before arrows added

Brackets Observations

Code should work in general
BUT can use a concrete case to work out the numbers
e.g. 'cat[dog]bird'
Empty string input - works?
What about input 'cat[]'
What are left/right for this case (put on drawing)
Verify that our slice works on that case too - returning the empty string

Brackets Solution + Readable

def brackets(s):
    left = s.find('[')
    if left == -1:
        return None
    right = s.find(']')
    return s[left + 1:right]

For programming style, we prefer "readable" code — when the eye sweeps over the code, what the code does is apparent. This code is quite dense, but the variable names do help. Look at the last line. You can see how it is using the index numbers for the left and right brackets, even if the OBO of the exact numbers is something puzzle over. We'll talk more about readability soon.

Brackets Drawing After

alt: draw 'cat[dog]bird', show left, right with arrows added pointing into string

Exercise: inside3x()

> inside3x()

'hi((yo))bye' -> 'yo,yo,yo'

inside3x(s): Given a string that may contain a pair of double-parenthesis, like 'aa((bbb))cc'. There is some text inside the parenthesis and some before and after. Return a string like 'bbb,bbb,bbb', made of three copies of the inside text separated by commas. The string is guaranteed to either contain the double parenthesis in the correct order, or will contain no parenthesis. The starting code includes the two s.find() calls.

Hint: make a drawing. Pull out the text inside, store in a variable "inside". Use + to put together the result string. Add an if-statement to pick off the case that there are no parenthesis. We cannot use "in" as a variable name, since it is a Python operator.

inside3x() Solution

def inside3x(s):
    left = s.find('((')
    right = s.find('))')
    if left == -1:
        return None
    inside = s[left + 2:right]  # Add var strategy
    return inside + ',' + inside + ',' + inside

Later Practice: at_3()

> at_3

Here is a more difficult problem, similar to brackets for you to try. A drawing really helps the OBO on this one.

Milestone-1 - get the 'abc' output below, not worrying about if the input is too short

Milestone-2 - add logic for the too-short case. Note the i < len(s) valid idea below.

at_3(s): Given string s. Find the first '@' within s. Return the len-3 substring immediately following the '@'. Except, if there is no '@' or there are not 3 chars after the '@', return None.

'xx@abcd' -> 'abc'
'xxabcd' -> None
'x@x' -> None

at_3() Hint: Valid Index `i < len(s)`

Zero based indexing, say for a string
Valid index numbers are 0, 1, 2, ... len-1
This means for a non-negative i:
i < len tests if i is valid
Often see < in tests for this "valid" pattern

More s.find() if we have time...

s.find() 2 Param Form

s.find() variant with 2 params: s.find(target, start_index) - start search at start_index vs. starting search at index 0. Returns -1 if not found, as usual. Use to search in the string starting at a particular index.

Suppose we have the string '[xyz['. How to find the second '[' which is at 4? Start the search at 1, just after the first bracket:

>>> s = '[xyz['
>>> s.find('[')      # find first [
0
>>> s.find('[', 1)   # start search at 1
4

Exercise: parens()

> parens()

'x)x(abc)xxx' -> 'abc'

This is nice, realistic string problem with a little logic in it.

Thinking about this input: '))(abc)'.

Hint Here is some starting hint code, to find the right paren after the left paren:

    left = s.find('(')
    ...
    right = s.find(')', left)

1. This fine:
right = s.find(')', left)

Is there a right parenthesis at index left? No, is not possible for a right parenthesis to be at that exact index. We already know that index holds a left parenthesis.

2. Therefore, could write it this way, moving the search for the right parenthesis 1 index farther along:
right = s.find(')', left + 1)

We can appreciate having the sort of analytical mind that work out that (2) will work. That said, keeping things as simple as possible, KISS, is a great strategy for code, and so simply writing (1) is probably for the best.

Optional: Negative Slice

alt: negative index into string

Optional / advanced shorthand - you never need to use this
Handy way to refer to chars near the end of string
Negative numbers to refer to chars at end of string
-1 is the last char
-2 is the next to last char
Works in slices etc.
Maybe just memorize this one:
s[-1] is the last char in s

>>> s = 'Python'
>>> s[len(s) - 1]
'n'
>>> s[-1]  # -1 is the last char
'n'
>>> s[-2]
'o'
>>> s[-3]
'h'
>>> s[1:-3]  # works in slices too
'yt'
>>> s[-3:]
'hon'

String in Test

Variant: not in

Example: has_pi()

Syntax That Does Not Work

Strategy: Built In Functions

Later Practice: has_first()

Example: catty()

Catty Solution V1

Strategy Idea: Add Var

Catty Solution V2 - Better

Style: Variable Names

Optional Aside: "in" Trick Form of "or"

Recall: if and if/else:

N Tests - if/elif

Example: vowel_swap()

vowel_swap() v1 Code

Problem Trace - Multiple If Interference

vowel_swap() Solution if/elif

if/elif vs. if/return

Later Practice: str_adx()

String .find()

Strategy: Dense = Slow Down

Python String Slices 1

Slices 2 - Can Omit Start/End Numbers

brackets() Strategy - Drawing vs. OBO Errors

Example: brackets()

Brackets Drawing

Brackets Observations

Brackets Solution + Readable

Brackets Drawing After

Exercise: inside3x()

inside3x() Solution

Later Practice: at_3()

at_3() Hint: Valid Index i < len(s)

s.find() 2 Param Form

Exercise: parens()

Optional: Negative Slice

String `in` Test

Variant: `not in`

Recall: `if` and `if/else:`

N Tests - `if/elif`

at_3() Hint: Valid Index `i < len(s)`