Today: more string algorithms, decomp-by-var, if/elif, quick-return, str.find(), slices

See chapters in Guide: String - If

Patterns

Solving a problem, it's natural to think of similar problems
There's a natural "patterns"
Recognize and re-use patterns
"Patterns" - common code forms
Handy to know common patterns, idiomatic
Add the quick-return pattern today

Recall: String += Accumulate Pattern

result = '' - at start
result += xxxxx - in loop
return result - at end

String Character Classes

alt: character classes

Strings are made of characters, chars
Chars are divided into "classes"
"alpha" - alphabetic "word" chars, e.g. a-z
Other non-roman alphabets, have their own alphabetic chars
"data"" - 0-9
"space" - space, tab, newline
Python char test functions
Work on 1 or more chars
Returns True if true for all chars
False for the empty string
s.isdigit() - is digit char
s.isalpha() - is alphabetic char
s.isspace() - is space char
Space is mentioned for completeness, we'll concentrate on alpha and digit

>>> 'a'.isalpha()
True
>>> 'cat'.isalpha()
True
>>> '5'.isalpha()
False
>>> '5'.isdigit()
True
>>> '@'.isalpha()
False

Uppercase / Lowercase chars

In some languages, alpha chars can have upper/lower pairings
e.g. 'a' is lowercase form of 'A'
s.upper() s.lower() - convert all chars in a string
A char with no upper/lower goes through unchanged, e.g. '@'
s.upper() - return uppercase form of s
s.lower() - return lowercase form of s
s.isupper() - True if made of uppercase chars
s.islower() - True if made of lowercase chars

>>> 'Kitten123'.upper()  # return with all chars in upper form
'KITTEN123'
>>> 'Kitten123'.lower()
'kitten123'
>>> 
>>> 'a'.islower()
True
>>> 'A'.islower()
False
>>> 'A'.isupper()
True
>>> '@'.islower()
False
>>> 'a'.upper()
'A'
>>> 'A'.upper()
'A'
>>> '@'.upper()
'@'

Example: alpha_up()

> Experimental server. These examples are on the "String2" section on the experimental server.

'12abc34' -> 'ABC'

Given string s. Return a string made of all the alphabetic chars in s, converted to uppercase form.

Use string functions .isalpha() and .upper()

Solution

def alpha_up(s):
    result = ''
    for i in range(len(s)):
        if s[i].isalpha():
            result += s[i].upper()
    return result

Example: catty()

'xCtxxxAax' -> 'CtAa'

Return a string made of the chars from the original string, whenever the chars are one of 'c' 'a' 't', (either lower or upper case). So the string 'xaCxxxTx' returns 'aCT'.

Catty Version That Doesn't Work - V1

Here is a natural way to think of the code, but it does not work:

def catty(s):
    result = ''
    for i in range(len(s)):
        if s[i] == 'c' or s[i] == 'a' or s[i] == 't':
            result += s[i]
    return result

What is the problem? Upper vs. lower case. We are not getting any uppercase chars 'C' for example.

Catty Solution V2

Solution: convert each char to lowercase form .lower(), then test.

Solution - this works, but that if-test is ugly

def catty(s):
    result = ''
    for i in range(len(s)):
        if s[i].lower() == 'c' or s[i].lower() == 'a' or s[i].lower() == 't':
            result += s[i]
    return result

Idea: Decomp By Var

The code is getting a little lengthy
Introduce a variable to hold an intermediate result
Advantages
1. Shorten the code, less repetative typing
2. Variable name helps the code "read" better
3. Sort of decomp within a function - break the big thing into little steps

Decomp Var Steps

Some phrase X repeated in code
e.g. s[i].lower()
Used several times, or it's just wordy to type
Create a variable, compute once and store
low = s[i].lower()
Use that variable on later lines
Variable name noun - code "reads" better
Aside: can name the var anything, and the code still works

Catty Solution V3 - Better

Create variable to hold the lengthy computation - shorter code and "reads" better.

    low = s[i].lower()

The complete solution

def catty(s):
    result = ''
    for i in range(len(s)):
        low = s[i].lower()   # decomp by var
        if low == 'c' or low == 'a' or low == 't':
            result += s[i]
    return result

Style aside: the name of a variable should remind us of the role of that data in the local code. Other than that, the name can be short. The name does not need to repeat every true thing about the value. Just enough to distinguish it from other values in this algorithm.

Good names, short but including essential details: low, low_char

Names with more detail, probably too long: low_char_i, low_char_in_s

The V2 code above is acceptable, but V3 is shorter and nicer. The V3 code also runs faster, as it does not compute the lowercase form three times per char.

if/elif Structure

if/elif - a series of if-tests
Evaluate test1, test2, test3
As soon as test is true
Runs that action
Exits the if/elif structure
No further tests/actions will run
Optional "else:" at end runs if no test is true
The plain if-statement is used most often
Only use this form when you have a series of tests
Mnemonic: "else" and "elif" are the same length

if test1:
  action-1
elif test2:
  action-2
else:
  action-3

Example str_adx()

If/elif logic to check for different char types
alpha char → 'a', digit → 'd', otherwise → 'x'
e.g. 'Z5$$t' → 'adxxa'

Solution

def str_adx(s):
    result = ''
    for i in range(len(s)):
        if s[i].isalpha():
            result += 'a'
        elif s[i].isdigit():
            result += 'd'
        else:
            result += 'x'
    return result

Pattern: Quick Return

Loop over, looking for X
If find X, return answer immediately
If there is no X found in loop
Run return line after the loop - "not found" case

Example: first_alpha()

'123abc' -> 'a'

'123-456' -> None

Given a string s, return the first alphabetic char in s, or None if there is no alphabetic char. Demonstrates quick-return strategy.

1. Hinges on the fact that return exits the function immediately, so we can bury the return inside a loop or if-statement, structured to return a result immediately if the code finds one.

2. If the code gets to the end of the loop, the condition was never found. Like the Sherlock Holmes - "The Dog that did not bark", the loop never hits a true test. So what can we conclude about the string? In this case, there are no alpha chars in the string.

first_alpha() Solution

def first_alpha(s):
    for i in range(len(s)):
        if s[i].isalpha():
            return s[i]
            # 1. Exit immediately if found
    # 2. If we get here,
    # there was no alpha char.
    return None

Optional Exercise: has_digit()

'abc123' -> True

'abc' -> False

Use quick-return strategy, returning True or False.

Given a string s, return True if there is a digit in the string somewhere, False otherwise.

Solution - same pattern as first_alpha(), but returns boolean instead of a char

    for i in range(len(s)):
        if s[i].isdigit():
            # 1. Exit immediately if found
            return True
    # 2. If we get here,
    # there was no digit.
    return False

String find()

alt:string 'Python' shown with index numbers 0..5

s.find(target_str)- search s for target_str
Returns int index where found first, searching from start of s
Returns -1 if not found
str noun.verb function call style (aka "object oriented")
Alternate form: 2nd "start_index" parameter, starts search from there
s.find(target_str, start_index) (use this later)

>>> s = 'Python'
>>> s.find('t')
2
>>> s.find('th')
2
>>> s.find('n')
5
>>> s.find('N')
-1
>>> s.find('x')
-1

Strategy: Dense = Slow Down

Some lines of code are routine
Require just normal attention
An advantage of using idiomatic phrases
for i in range(len(s)):
But some lines are dense
Slow down for those, work carefuly
Slices (below) are dense!
Dense = Powerful!

Python String Slices

This is a fantastic feature
"substring" - contiguous sub-part of a string
Access substring with 2 numbers
"slice" uses colon to indicate a range of indexes
s[1:3] returns 'yt'
Start at first number
Up to but not including second number UBNI
s[3:3] = empty string
"Not including" dominates the "starting at"
Try it in the interpreter
Style: typically written with no spaces around ":"

alt:string 'Python' shown with index numbers 0..5

>>> s = 'Python'
>>> s[1:3]
'yt'
>>> s[1:5]
'ytho'
>>> s[4:5]
'o'
>>> s[4:4]    # "not including" dominates
''

Omit Start/End Index

If start index is omitted, goes from start of string
If end index is omitted, goes through end of string
If number is too big .. uses end also
Note perfect split: s[:4] and s[4:]
Number used as both start and end
Splits the string into 2 pieces exactly

>>> s[:3]     # omit = from/to end
'Pyt'
>>> s[4:]
'on'
>>> s[4:999]  # too big = to end
'on'
>>> s[:4]     # "perfect split" on 4
'Pyth'
>>> s[4:]
'on'
>>> s[:]      # the whole thing
'Python'

Example: brackets()

(This is in the String3 category on the server.)

A first venture into using index numbers and slices. Many problems work in this domain - e.g. extracting all the hashtags from your text messages.

'cat[dog]bird' -> 'dog'

> brackets

Problem spec: either 2 brackets, or zero brackets
Strategy:
Use s.find()
left = s.find('[')
right = s.find(']')
Switch between drawing and code
Decomp by var
Store in variable left for later lines
Nice to have words left and right in code narrative
Look for right bracket
Use slice to pull out and return answer

Brackets Drawing

alt: draw string plan for brackets

Brackets Observations

Make a drawing - work out the index numbers
Diagram / example strategy
Code should work in general
BUT can use specific string to work out numbers
e.g. 'cat[dog]bird'
Empty string input - works?
What about input 'a[]z'
Verify that our slice works here too

Solution

def brackets(s):
    left = s.find('[')
    if left == -1:
        return ''
    right = s.find(']')
    # Use slice to pull out chars between left/right
    # make a drawing!
    return s[left + 1: right]

Optional Exercise: at_3()

Here is a problem similar to brackets for you to try

> at_3

Optional: Negative Slice

alt: negative index into string

We'll cover this someday maybe
Optional / advanced shorthand
Handy to refer to chars at end of string instead of beginning
Negative numbers to refer to chars at end of string
-1 is the last char
-2 is the next to last char
Works in slices etc.

>>> s = 'Python'
>>> s[len(s)-1]
'n'
>>> s[-1]
'n'
>>> s[-2]
'o'
>>> s[-3]
'h'
>>> s[1:-3]
'yt'
>>> s[-3:]
'hon'