Section 5. String Parsing and Dictionaries


Written by Ecy King, Juliette Woodrow, Brahm Capoor, Nick Parlante, Anna Mistele, John Dalloul, and Zheng Lian

Section 5: Strings, Files, and Dictionaries

CONGRATS on finishing up your midterm! You've learned a lot so far. This week in section you will gain some practice with strings, file reading, and dictionaries. There are more problems on this than we expect you to get through in section, so feel free to use the other ones as practice!


Here is the zip file for the project:

PyCharm Project


String Warm Ups

  • Write a function called intersect(a, b). Given two strings, a and b. Return a version of a that includes only those chars which also appear in b. Use case-sensitive comparisons. Use a for/ch/s loop and "in". For example:

    • intersect('Kitten', 'tan') should return'ttn'
    • intersect('Apple', 'ale') should return 'le'
    • intersect('Apple', 'ALE') should return 'A'
    • intersect('Apple', '') should return ''
  • Write a function called alpha_list(s) Given a string s. Build and return a list of all the alphabetic chars in s, so the string 'ow#w' returns ['o', 'w', 'w']. Use list.append() to add to the list and the loop of your choice to loop over the string.For example:

    • alpha_list('AbCd$') should return ['A', 'b', 'C', 'd']
    • alpha_list('!@#a$%^') should return ['a']
    • alpha_list('@') should return [ ]

String Problems

  • Write a function called is_peaceful(word) that returns whether the word is peaceful. We say that a word is peaceful if its letters are in alphabetical order. If the word is peaceful, return True and False otherwise. You may assume you have access to a constant ALPHABET which is a string of the uppercase letters in the alphabet, in sequential order, i.e., ALPHABET = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'.

    • For example, ABS, ALMOST, CHIPS, DIRTY, FIRST, and HOST are all peaceful words. -
  • Write a function is_stacatto(word) that returns True if a word is a stacatto word and False otherwise. We say that a word is a stacatto word if all of the letters in even positions are vowels (i.e., the second, fourth, sixth, etc. letters are vowels).

    • For this problem, the vowels are A, E, I, O, U, and Y.
    • For example, AUTOMATIC, CAFETERIA, HESITATE, LEGITIMATE, and POPULATE are stacatto words.

Full Program

  • Suppose you're given a file called many_words.txt that contains many words in the English language, where each one is on a different line. Write the following functions, using the functions you wrote in the previous problem:

    • count_peaceful(filename) which returns the number of English words that are peaceful.
    • count_stacatto(filename) which returns the number of English words that are stacatto words.

String Parsing

Parse Out Hashtags

  • Implement a function, parse_out_hashtag(s), which takes in a string representing a single tweet and returns a list containing all of the hashtags within the tweet. A Hashtag can be defined as a string of 1 or more alphanumeric characters immediately following a "#" character. A single hashtag ends at the first non-alphanumeric character following the '#'. For example:
    • parse_out_hashtag('You are so cool #amazing') should return the list ['amazing']
    • parse_out_hashtag('I love #cats #dogs #hamsters') should return the list ['cats', 'dogs', 'hamsters']
    • parse_out_hashtag('I do not like hashtags') should return []
    • parse_out_hashtag("So far, we've covered #strings #lists #bit #functions") should return ['strings', 'lists', 'bit', 'functions']

Dictionaries

  • Write a function create_nums(n). Given non-negative int n, create and return a "nums" dict that has a key for every number in the range 0..n-1, and its value is the string form of that number. For example
    • n=3 returns {0: '0', 1: '1', 2: '2'}
  • Write a function year_count You are given a list of date strings, all of which have the form m/d/y form like '3/25/2020'. Build a counts dict, where each key is an int year, and its value is the number of dates in the list with that year. For example:
    • year_count(['1/2/2019', '11/12/2019', '12/4/2019', '1/6/2019']) should return {2019: 4}
    • year_count(['1/2/2019', '11/12/2020', '12/4/2021', '1/6/2022']) should return {2019: 1, 2020: 1, 2021: 1, 2022: 1}
    • year_count(['1/1/2000']) should return {2000: 1}
    • year_count([]) should return {}