#include "lexicon.h"

class Lexicon

This class is used to represent a lexicon, or word list. The main difference between a lexicon and a dictionary is that a lexicon does not provide any mechanism for storing definitions; the lexicon contains only words, with no associated information. It is therefore similar to a set of strings, but with a different internal representation. The Lexicon class supports efficient lookup operations for words and prefixes.

This version of Lexicon is implemented internally using a data structure called a prefix tree or trie (pronounced "try"), which allows for efficient prefix and word searching. Some runtimes below are expressed in terms of "W" where W represents the length of the word being added.

Constructor
Lexicon() O(1) Initializes a new empty lexicon.
Lexicon(filename) 
Lexicon(istream) 
O(WN) Initializes a new lexicon that reads words from the given file or input stream.
Methods
contains(word)  O(W) Returns true if word is contained in the lexicon.
containsPrefix(prefix)  O(W) Returns true if any words in the lexicon begin with prefix.
size()  O(1) Returns the number of words contained in the lexicon.
toString()  O(1) Returns a printable string representation of this lexicon.
Operators
for (string word : lex)  O(N) Iterates through the words in a lexicon in alphabetical order.
ostream << lex  O(N) Outputs the contents of the lexicon to the given output stream.
istream >> lex  O(WN) Reads the contents of the given input stream into the lexicon.

Constructor detail


Lexicon();
Lexicon(istream& input);
Lexicon(string filename);
Initializes a new lexicon. The default constructor creates an empty lexicon. The second form reads in the contents of the lexicon from the specified data file or stream. The data file must be in one of two formats: (1) a space-efficient precompiled binary format or (2) a text file containing one word per line. The Stanford library distribution includes a binary lexicon file named EnglishWords.dat containing a list of words in English. The standard code pattern to initialize that lexicon looks like this:
Lexicon english("EnglishWords.dat");

Usage:

Lexicon lex;
Lexicon lex(filename);

Method detail


bool contains(string word) const;
Returns true if word is contained in the lexicon. In the Lexicon class, the case of letters is ignored, so "Zoo" is the same as "ZOO" or "zoo".

Usage:

if (lex.contains(word)) ...

bool containsPrefix(string prefix) const;
Returns true if any words in the lexicon begin with prefix. Like containsWord, this method ignores the case of letters so that "MO" is a prefix of "monkey" or "Monday".

Usage:

if (lex.containsPrefix(prefix)) ...

int size() const;
Returns the number of words contained in the lexicon.

Usage:

int n = lex.size();

string toString() const;
Returns a printable string representation of this lexicon, such as "{word1, word2, word3}".

Usage:

string str = lexicon.toString();

Operator detail


for (string word : lex)
The range-based for loop can be used to iterate through the elements in a collection. The iteration accesses the lexicon words in alphabetical order.

Usage:

for (string word : lex) {
	cout << word << endl;
}

Documented: Fall Quarter 2022


ostream& operator<<(const Lexicon& lex);
Outputs the contents of lex to the given output stream. The output is in the form { word1, word2, word3 }. The words will be listed in alphabetical order.

Usage:

cout << lex << endl;

Documented: Fall Quarter 2022


istream& operator>>(Lexicon& lex);
Reads the contents of the given input stream into lex. Any previous contents of the lex are replaced. The input is expected to be in the form { word1, word2, word3 }. If unable to read a proper lexicon from the stream, the operation results in a stream fail state.

Usage:

if (infile >> lex) ...

Documented: Fall Quarter 2022