How to parse, sort, and display a text file using the Stanford Web Application Toolkit

From Web Services Wiki

Jump to: navigation, search

Contents

Problem

You want to parse a CSV (comma-separated value) or similarly formatted file.

Solution

Use the StanfordTextFile class of the Stanford Web Application Toolkit to parse, sort, and display a text file as an HTML table.

Example text file

The following is an example of a Comma Separated Value (CSV) file with headings.

"First Name","Last Name","Date of Birth","Years in Office"
"George","Washington","February 22, 1732","8"
"John","Adams","October 30, 1735","4"
"Thomas","Jefferson","April 13, 1743","8"
"James","Madison","March 16, 1751","8"
"James","Monroe","April 28, 1758","8"

Basic usage

Create a new StanfordTextFile by passing its constructor the location of the text file. Call display_as_html_table to output the file in an HTML table.

// Include StanfordTextFile
include_once("stanford.textfile.php");
 
// Create a new StanfordTextFile
$data = new StanfordTextFile("/path/to/file.csv");
 
// Retrieve the data from the file and output it to the browser as a table
$data->display_as_html_table();

If you would like to simply parse the text file and get data from it without outputting a table, use the retrieve method.

// Get the data from the file as an associative array
$people = $data->retrieve();
 
// Output the result as a list
echo '<ul>';
 
foreach($people as $person) {
  echo '<li>' . $person['Last Name'] . ', ' . $person['First Name'] . '</li>';
}
 
echo '</ul>';

Filtering results

Using StanfordTextFile you may be selective about what data to return from the source file. It is possible to select only certain fields in a certain order, sort results by one or more parameters, and choose the number of rows to return.

// We want to retrieve just the last name, first name, and date of birth
$data->set_fields_to_retrieve("Last Name", "First Name", "Date of Birth");
 
// Order first by last name
$data->sort_by("Last Name");
 
// And then by first name
$data->sort_by("First Name");
 
// Get the first 10 results
$data->set_limit(10);
$data->set_page(1);
 
// Get the data from the file
$result = $data->retrieve();

To properly sort date fields, you must first set their data type.

// Mark the date of birth as a date field
$data->set_type("Date of Birth", StanfordTextFile::DATE);
 
// Order by date of birth
$data->sort_by("Date of Birth");

To sort in the reverse order, specify the direction in sort_by.

// Sort in the reverse order -- default is ASCENDING
$data->sort_by("Date of Birth", StanfordTextFile::DESCENDING);

While using set_limit and set_page is usually sufficient for returning a particular slice of the data, you may manually set the offset using set_offset.

// Manually set the offset when returning only part of the dataset
$data->set_offset(15);

Getting information about the resultset

When using a limit on the number of results to retrieve, it is often useful to get the total number of results available. Use the methods get_num_results and get_num_results_available to find out how much data was retrieved and how much remains. Call them after retrieving data or displaying a table.

// Get the number of results retrieved
$num = $data->get_num_results();
 
// Get the total number of results available
$num = $data->get_num_results_available();

Displaying the Data

Custom Headings

After sorting and filtering the results, you may set custom table headings and change the order of the fields outputted by display_as_html_table.

// Set the table heading for the field named "Date of Birth" to "Birthday"
$data->set_heading("Date of Birth", "Birthday");
 
// Display the date of birth first, then the name
$data->set_fields_to_display("Date of Birth", "First Name", "Last Name");
 
// Output the table
$data->display_as_html_table();

Alternatively, use the shortcut method set_headings to create labels for your table, which in some cases may simplify your code compared to using set_heading on each field.

// Display the date of birth first, then the name
$data->set_fields_to_display("Date of Birth", "First Name", "Last Name");
 
// Set all of the headings at once (in display order -- must correspond to the data being displayed)
$data->set_headings("Birthday", "Given Name", "Surname");
 
// Output the table
$data->display_as_html_table();

If you don't want to display any headings at all, simply set display_headings to false.

// Do not display table headings when calling display_as_html_table
$data->set_display_headings(false);

Date Formats

To format date fields in the table, call the function set_date_format, as shown below.

// Change date format for "Date of Birth" (e.g. from "December 3, 1984" to "Dec 03, 1984")
$data->set_date_format("Date of Birth", "M d, Y");

For a list of all available date formats, visit PHP.net: date function.

Please note that date formats apply only to displaying data in an HTML table. When getting data using the retrieve function, format conversion is left to the developer.

Display Functions

In many cases, data is stored differently than it is displayed. For example, a standard phone number may be stored as a 10 character field with no formatting, but you'd prefer to display it with parentheses, dashes, and spaces to make it more readable. Use the method add_display_function to define a custom formatting function to be called before a field is displayed. See the sample code below for an example of calling a function with a single parameter.

// This example assumes that values in the "amount" column are stored as floats (e.g. 1234.56)
 
// Define a custom function to return values in the format $1,234.56
function formatMoney($number) {
  return "$" . number_format($number, 2);
}
 
// Add custom display function to the "amount" field
// $data is a StanfordData object
$data->add_display_function("amount", "formatMoney");
 
// Display the table
// The amount column will be displayed as defined in the formatMoney function
$data->display_as_html_table();

To define a display function with more than one parameter, set the third parameter of add_display_function to an array containing the arguments to the method. Use the placeholder %FIELD% to mark which parameter is the value of the field.

// Configure StanfordData to call the function number_format($amount, 2) each time a cell in the "amount"
// column is outputted by display_as_html_table
$data->add_display_function("amount", "number_format", array("%FIELD%", 2));

You may define multiple display functions for one field. Each function will be called successively in the order that it is defined, and the return value from each function is passed as the field value to the method that follows.

Allowing HTML

By default, HTML code contained within cells of the table is escaped (and thus not parsed by the browser) for security purposes. If you define a custom display function (see above) to make a field bold by surrounding it with HTML tags, you must also enable HTML for that column. To remove the HTML output restriction from a column, use the function set_allow_html_in_column($field_name), as shown below.

// Do not escape HTML in the column for the 'email' field
$data->set_allow_html_in_column("email", true);

To allow HTML output in the entire table, use the function set_allow_html_in_table.

// Do not escape any HTML output
$data->set_allow_html_in_table(true);

You may use a combination of the HTML output settings shown above. The more specific setting (per-column) takes precedence. Setting the boolean parameter in each of the functions to false means that HTML code will be escaped.

If you need to disable the automatic HTML restriction, we strongly suggest setting up a custom display function to escape user input wherever possible. Visit PHP.net: htmlspecialchars for one way of escaping HTML output.

Discussion

How do I parse data separated by a character or string other than a comma?

Use the set_delimiter method to parse files which use tabs, newlines, or other characters or strings to partition data.

// Set the delimiter to a tab
$data->set_delimiter("\t");
 
// Set the delimiter to a newline (for reading a text file that does not have multiple columns of data line-by-line)
$data->set_delimiter("\n");

What to do when the data contains the delimiter

In many cases, the text being parsed contains the character or string being used as a delimiter. For example:

Name,Sentence,Date
Jim,Hello, my name is Jim, Happy New Year!,2008-01-01

The parser is unable to understand the above text due to the commas in the sentence. To avoid this problem, always enclose data in double quotes:

"Name","Sentence","Date"
"Jim","Hello, my name is Jim, Happy New Year!","2008-01-01"

Can I change the style of the table outputted by 'display_as_html_table?'

Yes, either modify the default style for a table or create a custom class.

// In your PHP code, before outputting the table, set the desired name of the CSS class for the <table> tag
$data->set_css_class("php-table");
 
// Output the table
$data->display_as_html_table();

Some sample style settings are shown below. Rows containing table headings automatically belong to the 'headings' class (use set_headings_class to modify). Every other data row's class is set to 'alternate' by default (use set_alternate_row_class to modify).

/* Table */
.php-table {
  background-color: #900;
  border: 3px solid #000;
  border-collapse: collapse;
}
 
/* Column headings */
.php-table tr.headings {
  color: #FFF;
}
 
/* Data cells */
.php-table td {
  color: #E7D19A;
  padding: .5em;
}
 
/* Alternate rows */
.php-table tr.alternate {
  background-color: #800;
}

What if my text file does not have labels or headings?

Since the parser assumes that the first line of every file contains a heading by default, call set_file_has_headings to tell it otherwise. You may set the names of each field without modifying the text file using the set_field_names method of StanfordTextFile.

Our example text file is as follows:

"Banana","Yellow"
"Orange","Orange"
"Strawberry","Red"

The following code allows you to access the data from the text file with descriptive field names. Please note that set_field_names accepts a variable number of arguments which must equal the number of fields in the file (in this case, two).

// Create a new StanfordTextFile
$data = new StanfordTextFile("/path/to/fruits.csv");
 
// The parser assumes that the first line of each file contains headings, so we must explicitly tell it to treat the first line as data
$data->set_file_has_headings(false);
 
// Set the field names corresponding to the number and order of the columns in the text file
$data->set_field_names("Name", "Color");
 
// Get the data
$fruits = $data->retrieve();
 
// Output result
foreach($fruits as $fruit) {
  echo "<p>A " . $fruit['Name'] . " is " . $fruit['Color'] . "</p>\n";
}

If you'd just like to display the data in a table with custom headings, use the shortcut method set_headings. It works the same way as set_field_names, as shown above, except it only affects the headings in the table outputted by display_as_html_table and not the field names themselves.

// Create a new StanfordTextFile
$data = new StanfordTextFile("/path/to/fruits.csv");
 
// The parser assumes that the first line of each file contains headings, so we must explicitly tell it to treat the first line as data
$data->set_file_has_headings(false);
 
// Set the headings corresponding to the number and order of the columns in the text file
$data->set_headings("Name", "Color");
 
// Display table
$data->display_as_html_table();

How can I retrieve the code for the HTML table without displaying it?

Set the only parameter to display_as_html_table to false. The HTML code is returned and no output is displayed.

// Get the code, do not display the table
$html_code = $data->display_as_html_table(false);
 
// Get the number of results
$num = $data->get_num_results();
$total = $data->get_num_results_available();
 
// Output a heading
echo "<p>Displaying $num of $total results</p>";
 
// Display the table
echo $html_code;
Personal tools