Today: string in, if/elif, string .find(), slices and drawing

See chapters in Guide: String - If

String in Test

>>> 'Dog' in 'CatDogBird'
True
>>> 'dog' in 'CatDogBird'   # upper vs. lower case
False
>>> 'd' in 'CatDogBird'     # finds d at the end
True
>>> 'atD' in 'CatDogBird'   # not picky about words
True
>>> 
>>> 'x' in 'CatDogBird'
False

Variant: not in

There's also a not in form which is True if the element is not in there, similar to !=. Use this form an if-statement where you want to take an action if something is not in a string.

>>> s = 'CatDogBird'
>>> if 'Fish' not in s:     # YES this way
        print('no Fish')
no Fish
>>>
>>> if not 'Fish' in s:     # NO works but not PEP8
        print('no Fish')
no Fish
>>>

In Python style, the first form above - not in - is officially preferred to the second.

Example: has_pi()

> has_pi()

    '3.14' -> True
'a 14 b 3' -> True
     '315' -> False

has_pi(s): Given a string s, return True if it contains the substrings '3' and '14' somewhere within it, but not necessarily together. Use "in".

Note these functions are in the string-3 section on the Experimental server

Syntax That Does Not Work

The form below looks sensible and works in English. However, it does not work in Python or most computer languages:

if '3' and '14' in s:   # NO does not work
    ... 

The and cannot appear in a variety of positions as it can in English. The and must be between two boolean values, with each boolean value produced by an expression like < or == or in. The correct form is shown below. Notice how each side of the and is an expression that produces a boolean.

if '3' in s and '14' in s:
    ...

Strategy: Built In Functions

Python has many built-in functions, like "in" shown above, and we will see all the important ones in CS106A. You want to know the common built-in functions, since using a built-in is far preferable to writing code for it yourself - "in" is a nice example. It would be a mistake to manually write code look through a string to see if another string appears in there. Just use "in" for that. The built-in works correctly, and it makes readable code, since Python programmers are already familiar with "in" and know what it does at a glance.

The "in" operator works for several data structure to see if a value is in there, and its use with strings is our first example of it. Also, in some cases, the built-in can run faster than what you could code yourself.

Later Practice: has_first()

> has_first()


Example: catty()

> catty()

'xaCtxyzAx' -> 'aCtA'

Return a string made of the chars from the original string, whenever the chars are one of 'c' 'a' 't', (not case sensitive).

Catty Solution V1

Not case sensitive: convert each char to lowercase form with s[i].lower(), then do testing with the lowercase form.

This works correctly, but that if-test is quite long. Can we do better? Indeed, it's so long, it's awkward to fit on screen.

Aside: see style guide breaking up long lines for way to break up long lines like this.

def catty(s):
    result = ''
    for i in range(len(s)):
        if s[i].lower() == 'c' or s[i].lower() == 'a' or s[i].lower() == 't':
            result += s[i]
    return result

Strategy Idea: Add Var

Catty Solution V2 - Better

Start with the V1 code. Add a variable to hold the repeated computation — shorten the code and it "reads" better with the new variable.

    low = s[i].lower()

Solution with "low" variable, better

def catty(s):
    result = ''
    for i in range(len(s)):
        low = s[i].lower()   # Add var
        if low == 'c' or low == 'a' or low == 't':
            result += s[i]
    return result

We will make frequent use of this strategy in CS106A. If the solution is getting a little lengthy, add a variable to hold some sub-part of the computation for use on later lines.

Style: Variable Names

We'll talk about Style more later. For today, the name of a variable should label that value in the code, helping the programmer to keep their ideas straight. Other than that, the name can be short. The name does not need to repeat every true thing about the value. Just enough to distinguish it from other values in this algorithm.

1. Good names for this example, short but with key facts: low, low_char

2. Names with more detail, probably too long: low_char_i, low_char_in_s

3. Avoid this name: lower - the name would work, but we avoid naming a variable with a word that is also a function, to avoid confusion. Here .lower() is the name of a string function.

The V1 code above is acceptable, but V2 is shorter and nicer. The V2 code also runs slightly faster, as it does not needlessly re-compute the lowercase form three times per char.

Optional Aside: "in" Trick Form of "or"

This is just a coding trick, not something we would ever require or look for students to do. The way in works for strings, it can do the "or" logic for us, like this:

# instead of this
if low == 'c' or low == 'a' or low == 't':
    ...

# use "in"
if low in 'cat':
    ...

Recall: if and if/else:

N Tests - if/elif

Use the if/elif structure to look through a series of tests, stopping at the first True test. This is much more rarely used than the plain if-statement.

The sequence is akin to looking through a series of drawers for a pen — you look in each drawer in turn, and stop as soon as you find the pen.

The structure has n if-tests.

if test1:
  action1
elif test2:
  action2
elif test3:
  action3
else:
  action4

Python goes through the tests from top to bottom, stopping at the first True test. Python runs the corresponding action, and then exits the if/elif structure. The result is that at most 1 of the n actions runs. An optional "else" at the end runs if none of the tests succeed. Mnemonic: the words "else" and "elif" are the same length.

Example: vowel_swap()

> vowel_swap()

The need for an if/elif structure is a little rare, but this problem is dialed in to show what if/elif solves.

The most common letters used in English text are: e, t, a, i, o, n

Here we process string s, swapping around the 3 vowels like this:

e -> a
a -> i
i -> e

This changes an English word in a way that looks like a word and is kind of funny.

'table' -> 'tibla'
'kitten' -> 'kettan'
'radio' -> 'rideo'

vowel_swap(s): Given string s. We'll swap around the three most common vowels in English, which are 'e', 'a', and 'i'. Return a form of s where each lowercase 'e' is changed to 'a', each 'a' is changed to 'i', and each 'i' is changed to 'e'. Other chars leave unchanged. So the word 'kitten' returns 'kettan'. The provided loop sets a variable ch to hold each char in turn, appending ch to the result. Add code to change ch.

vowel_swap() v1 Code

The provided loop sets a variable ch to be each char in turn. This solution is written with plain "if" to check and change each char. This code has a subtle problem.

def vowel_swap(s):
    result = ''
    for i in range(len(s)):
        ch = s[i]
        # Make changes to ch
        if ch == 'e':
            ch = 'a'
        if ch == 'a':
            ch = 'i'
        if ch == 'i':
            ch = 'e'
        
        result += ch
    return result

Run this code. Here is some incorrect output it produces

'aaaa' -> 'eeee'

Why does produce a bunch of 'e' instead of the expected 'i' here?

Problem Trace - Multiple If Interference

The problem is not obvious glancing at the code. Trace through the v1 code carefully for the input 'aaaa'. The ch == 'a' if-test succeeds, which is fine. But then the ch == 'i' test also succeeds, which is a problem. We have multiple if-tests, and they are interfering with each other.

vowel_swap() Solution if/elif

With if/elif, only one if-test succeeds, which is what we want for this 'e' 'a' 'i' detection:

def vowel_swap(s):
    result = ''
    for i in range(len(s)):
        ch = s[i]
        # Make changes to ch
        if ch == 'e':
            ch = 'a'
        elif ch == 'a':
            ch = 'i'
        elif ch == 'i':
            ch = 'e'
        
        result += ch
    return result

if/elif vs. if/return

A return can accomplish something similar to the if/elif structure, which is why we have not really needed if/elif up until now. Suppose we are doing the vowel-swap algorithm, but in a function that processes a single char. This is our pick-off strategy, exiting the function once a solution is known.

def swap_ch(ch):
    """Vowel-swap on one char."""
    if ch == 'e':
        return 'a'
    if ch == 'a':
        return 'i'
    if ch == 'i':
        return 'e'
    return ch

Since the return exits the function, we get in effect the if/elif behavior. Once an if-test succeeds, the later ones are skipped.

However, the full-string vowel_swap() above cannot use return like this, as it needs to keep running the loop to do the other characters. We need to handle each char in the loop but without leaving the function, and for that, the if/elif is perfect.

Later Practice: str_adx()

> str_adx()

str_adx(s): Given string s. Return a string of the same length. For every alphabetic char in s, the result has an 'a', for every digit a 'd', and for every other type of char the result has an 'x'. So 'Hi4!x3' returns 'aadxad'. Use an if/elif structure.


String .find()

alt:string 'Python' shown with index numbers 0..5

>>> s = 'Python'
>>> 
>>> s.find('th')
2
>>> s.find('o')
4
>>> s.find('y')
1
>>> s.find('x')
-1
>>> s.find('N')
-1
>>> s.find('P')
0

Strategy: Dense = Slow Down

Python String Slices 1

alt:string 'Python' shown with index numbers 0..5

>>> s = 'Python'
>>> s[1:3]    # 1 .. UBNI
'yt'
>>> s[1:5]
'ytho'
>>> s[4:5]
'o'
>>> s[4:4]    # "not including" dominates
''

Slices 2 - Can Omit Start/End Numbers

alt:string 'Python' shown with index numbers 0..5

>>> s[:3]     # omit num = from/to end
'Pyt'
>>> s[:4]
'Pyth'
>>> s[4:]     # split str at 4
'on'
>>> s[4:999]  # too big = through the end
'on'
>>> s[:]      # the whole thing
'Python'

brackets() Strategy - Drawing vs. OBO Errors

This is a nice example. The code is dense, but the details can be managed with careful use of variables and a drawing. This code can easily fail with Off-By-One (OBO) errors, but we try to proceed carefully and get each line exactly right. Or more simply — don't try to do it in your head.

Example: brackets()

> brackets

'cat[dog]bird' -> 'dog'

brackets(s): Look for a pair of brackets '[...]' within s, and return the text between the brackets, so the string 'cat[dog]bird' returns 'dog'. If there are no brackets, return None. If the brackets are present, there will be only one of each, and the right bracket will come after the left bracket.

A first venture into using index numbers and slices. Many problems work in this domain - e.g. extracting all the hashtags from your text messages.

Brackets Drawing

alt: draw 'cat[dog]bird', show left, right before arrows added

Brackets Observations

Brackets Solution + Readable

def brackets(s):
    left = s.find('[')
    if left == -1:
        return None
    right = s.find(']')
    return s[left + 1:right]

For programming style, we prefer "readable" code — when the eye sweeps over the code, what the code does is apparent. This code is quite dense, but the variable names do help. Look at the last line. You can see how it is using the index numbers for the left and right brackets, even if the OBO of the exact numbers is something puzzle over. We'll talk more about readability soon.

Brackets Drawing After

alt: draw 'cat[dog]bird', show left, right with arrows added pointing into string

Exercise: inside3x()

> inside3x()

'hi((yo))bye' -> 'yo,yo,yo'

inside3x(s): Given a string that may contain a pair of double-parenthesis, like 'aa((bbb))cc'. There is some text inside the parenthesis and some before and after. Return a string like 'bbb,bbb,bbb', made of three copies of the inside text separated by commas. The string is guaranteed to either contain the double parenthesis in the correct order, or will contain no parenthesis. The starting code includes the two s.find() calls.

Hint: make a drawing. Pull out the text inside, store in a variable "inside". Use + to put together the result string. Add an if-statement to pick off the case that there are no parenthesis. We cannot use "in" as a variable name, since it is a Python operator.

inside3x() Solution

def inside3x(s):
    left = s.find('((')
    right = s.find('))')
    if left == -1:
        return None
    inside = s[left + 2:right]  # Add var strategy
    return inside + ',' + inside + ',' + inside

Later Practice: at_3()

> at_3

Here is a more difficult problem, similar to brackets for you to try. A drawing really helps the OBO on this one.

Milestone-1 - get the 'abc' output below, not worrying about if the input is too short

Milestone-2 - add logic for the too-short case. Note the i < len(s) valid idea below.

at_3(s): Given string s. Find the first '@' within s. Return the len-3 substring immediately following the '@'. Except, if there is no '@' or there are not 3 chars after the '@', return None.

'xx@abcd' -> 'abc'
'xxabcd' -> None
'x@x' -> None

at_3() Hint: Valid Index i < len(s)


More s.find() if we have time...

s.find() 2 Param Form

s.find() variant with 2 params: s.find(target, start_index) - start search at start_index vs. starting search at index 0. Returns -1 if not found, as usual. Use to search in the string starting at a particular index.

Suppose we have the string '[xyz['. How to find the second '[' which is at 4? Start the search at 1, just after the first bracket:

>>> s = '[xyz['
>>> s.find('[')      # find first [
0
>>> s.find('[', 1)   # start search at 1
4

Exercise: parens()

> parens()

'x)x(abc)xxx' -> 'abc'

This is nice, realistic string problem with a little logic in it.

Thinking about this input: '))(abc)'.

Hint Here is some starting hint code, to find the right paren after the left paren:

    left = s.find('(')
    ...
    right = s.find(')', left)

1. This fine:
right = s.find(')', left)

Is there a right parenthesis at index left? No, is not possible for a right parenthesis to be at that exact index. We already know that index holds a left parenthesis.

2. Therefore, could write it this way, moving the search for the right parenthesis 1 index farther along:
right = s.find(')', left + 1)

We can appreciate having the sort of analytical mind that work out that (2) will work. That said, keeping things as simple as possible, KISS, is a great strategy for code, and so simply writing (1) is probably for the best.

Optional: Negative Slice

alt: negative index into string

>>> s = 'Python'
>>> s[len(s) - 1]
'n'
>>> s[-1]  # -1 is the last char
'n'
>>> s[-2]
'o'
>>> s[-3]
'h'
>>> s[1:-3]  # works in slices too
'yt'
>>> s[-3:]
'hon'