Stanford
AUGUST 02, 2014
Publications > Table of Contents > Mining and Mapping the Production of Space
Mining and Mapping the Production of Space

A View of the World from Houston
Cameron Blevins 1
1. Stanford University, History Department
The following essay accompanies the print article "Space, Nation, and the Triumph of Region: A View of the World from Houston" in the June 2014 issue of the Journal of American History.
1
In the 1970s the philosopher Henri Lefebvre made the deceptively simple assertion that societies produce space. Instead of space serving as a neutral backdrop for the march of historical events, societies dynamically produce space over time. And space in turn shapes societies. The grid system of the urban planner segments a city into different areas of labor, leisure, and consumption. Flows of goods and capital into and out of the city in turn reshape this space. The daily movement of people - commuters, tourists, shoppers, cyclists, taxi drivers – further realigns cities, imbuing different neighborhoods with different meanings and inscribing them with a specific sense of place. These are some of the ways societies produce space.1
2
My research measures and maps the production of nineteenth-century space using the tools of the digital age. Computational analysis allowed me to quantify how late nineteenth-century newspapers crafted a view of the world for their readers. Specifically, I examine the Houston Daily Post from 1894 to 1901 to study how late nineteenth-century America appeared from a specific vantage point in time and space. What places loomed large in the paper’s imagined geography? How did large-scale processes of incorporation, standardization, and nationalization shape the paper’s production of space and place? What was the relationship between region and nation? To answer these questions I combine traditional historical research with digital analysis of the paper.2
3
The print article "Space, Nation, and the Triumph of Region" presents the findings of my project. The following online essay previews the article and offers a more in-depth treatment of its digital foundation. It does so in three parts. Part I presents a preview of my project’s results through a series of dynamic visualizations that compare the imagined geographies of the Houston Daily Post (1894-1901) to an earlier Houston-based paper, the Telegraph and Texas Register (1836-1851). Part II discusses the specific methodology used to obtain these results, while Part III concludes with a broader reflection on digital research in history and the humanities.
4
I. Results
The following visualizations compare the frequency of different place-names in two newspapers from Houston, Texas: the Telegraph and Texas Register in the 1830s and 1840s and the Houston Daily Post in the 1890s. The two periods stand in as bookends for Houston in the nineteenth century. The first period, 1836-1851, encompassed Texas’s independence from Mexico, its annexation to the United States, and early statehood. The second period, 1894-1901, marked Houston’s final years as a minor commercial city before its ascension to the energy capital of the United States during the twentieth century.
5
The first visualization charts the production of space in the Telegraph and Texas Register during the initial years of independence from Mexico and early American statehood. Several features stand out. First, the paper oriented its perspective toward the gulf of Mexico. Galveston and New Orleans towered over its constructed space, with the ports of Havana and Mobile close behind, demonstrating Houston’s reliance on the lifelines of shipping and commerce that ran through the Gulf to the Atlantic.3 Second, the Telegraph and Texas Register’s spatial fixation on Mexico ebbed and flowed during this period, cresting in the mid-1840s with American annexation and the Mexican-American War. 4 Finally, the paper’s imagined geography reflected the precarious territorial claims of the young republic. For most of the period the Telegraph and Texas Register was largely incapable of printing Anglo place-names in western Texas. This trend began to change at mid-century, as selecting the years 1849-1851 reveals the ascent of El Paso as a destination for exploratory expeditions by the U.S. military to the far western edge of the state.5
6
Occurrences of Place-Names in the Telegraph and Texas Register, 1836-1851
7
The second visualization compares the Telegraph and Texas Register and Houston Daily Post’s imagined geographies from the same vantage point in space but at different points in time. Switching from one perspective to the other demonstrates just how much had changed over half a century. The city’s reliance on the shipping channels of the Gulf region declined from the first paper to the second, exemplified by the decrease in mentions of New Orleans, Havana, and Mobile. In the northeast, New York skyrocketed from the Telegraph and Texas Register to the Houston Daily Post, while other Atlantic port cities such as Boston, Philadelphia, and Baltimore fell in relative frequency. Finally, the visualization illustrates the dramatic rise of the American Midwest in both absolute space and the imagined geography of the Houston Daily Post. Midwestern cities such as Chicago, St. Louis, and Kansas City moved from being largely non-existent in the pages of the Telegraph and Texas Register to dominating the Houston Daily Post’s view of the nation.
8
Comparing Place-Name Mentions in Newspapers over Time
9
The third visualization illustrates how the Houston Daily Post shaped its imagined geography in the 1890s. It charts obvious events in the region and world: the discovery of oil just northeast of Houston in Beaumont in 1901, for instance, or the rise of Cuba with the outbreak of the Spanish-American War in 1898. The visualization also reveals striking, and more surprising, spatial orientations. Despite Houston’s close political, cultural, and economic ties to the American South, the city’s leading newspaper turned its attention away from the region’s cities. Atlanta, Memphis, and Nashville were dwarfed by references to urban centers outside of the South.6 Meanwhile, the forces of centralization and consolidation sweeping the nation appeared in the Houston Daily Post, but they did so in spatially specific ways. National metropolises such as San Francisco, Baltimore, or Boston were surprisingly muted in the Houston Daily Post relative to their populations, especially in comparison to the midwestern cities of Chicago, St. Louis, and Kansas City. Finally, the overwhelming presence of Texas places reveals the dominance of regional space. Galveston, Dallas, Fort Worth, Waco, and San Antonio may have occupied a relatively lowly position in the nation’s urban hierarchy, but they sprawled across the Houston Daily Post’s imagined geography.
10
Occurrences of Place-Names in the Houston Daily Post, 1894—1901
11
II. Methodology
Newspapers are a particularly rich source for exploring the historical production of space. Although far from the only medium of spatial information for nineteenth-century Americans, newspapers were cheap and widely available. Their daily or weekly print cycles also allow historians to track temporal changes in much finer detail than do other sources such as maps or novels. Of course, their ubiquity also creates a problem of abundance: during the two periods I examine the Telegraph and Texas Register and the Houston Daily Post printed 2,431 issues, 23,331 pages, and 139,018,540 words. For perspective, a researcher poring over the newspapers nonstop for eight hours a day, five days a week, would need four years to finish reading them. Furthermore, the Telegraph and Texas Register and Houston Daily Post are merely drops in the ocean of digitized newspapers now available. The challenge is, then: How can historians measure the production of space on this kind of scale?7
12
To meet the challenge of abundance I turned to the method of “distant reading” articulated most famously by literary scholar Franco Moretti. In contrast to the close reading of individual texts, distant reading focuses on how content and meaning emerge across a much larger scale. Rather than analyze specific passages in Edward Bellamy’s 1887 utopian novel Looking Backward, for instance, a distant reading approach might track themes of utopianism across hundreds of nineteenth-century novels. Not surprisingly, distant reading often necessitates computers that can "read" massive quantities of text in a matter of seconds. In the case of my project, I wrote a computer program to track how frequently a newspaper printed specific geographic place-names to re-create how it produced space.8
13
Computational analysis requires, first and foremost, that the text be machine readable. Much like a human trying to read a book in a foreign language, a computer cannot understand the content of a text until it is translated. This occurs through a process known as Optical Character Recognition (OCR), in which a static image of words on a page is transformed into the electronic language of 1’s and 0’s. My analysis would have been impossible if the University of North Texas Library had not first digitized and converted through OCR the Telegraph and Texas Register and the Houston Daily Post then subsequently given me access to the machine-readable text.9 In part, my selection of the two papers stemmed from pragmatism: much like the historian who shapes their research agenda around the availability of archival sources, I shaped the project around the availability of digitized and machine-readable sources. Accessibility can be a major impediment to digital analysis. Online databases often use OCR to enable users to search their collections, but few provide access to the "raw material" of its underlying machine-readable text needed for large-scale text mining. Private, for-profit content providers are particularly hesitant to provide individual researchers with that degree of access to their material.10
14
Even after a researcher gains access to machine-readable text, many challenges remain. Like any form of translation, the accuracy of the OCR process varies widely. Torn pages, blurred text, or divergent typefaces can all lower recognition rates during the OCR translation process. (See figure 1.) A smudged word might cause the computer to translate “Texas” as “Toxas,” for instance. To test the quality of the OCR translation of the newspapers, a computer program looked up each word in a dictionary. In the case of the Telegraph and Texas Register, 9.4% of its digitized words did not appear in a dictionary, while the Houston Daily Post had a substantially higher percentage (19.7%) of unrecognizable words.11 These are admittedly poor rates: for every four occurrences of “Texas” caught in my net, another “Toxas” got away. The layout of newspapers present an even higher degree of difficulty for the digitization process. Computers have trouble distinguishing advertisements from adjacent columns of text, for instance, or following a story that begins on page 1 and is continued on page 4.
15
Figure 1. Smudged and Torn Pages Limit the Effectiveness of OCR 
16
Digitized newspapers are inherently messy sources. They often resemble a jumbled bag of mistake-ridden words as much as neatly segmented columns of text. If historians insist on perfect data, however, we risk ignoring huge swathes of the digital archive. One goal of my project was to find a way to draw meaning from messy text. This led to the comparatively unsophisticated approach of term frequency, or counting how often specific place-names occurred across different issues of a newspaper. To identify these place-names, I built a customized gazetteer of roughly six hundred states, cities, and towns (see footnote 12 for details on the construction of this gazetteer). A computer program then iterated through each newspaper issue attempting to find and count occurrences of these places. It is important to acknowledge the limitations of this approach. Most importantly, the program only identified places that appeared in the gazetteer. The Houston Daily Post may have printed a story on a small mining town in Montana, for instance, but the program would not recognize the town because it did not appear in the gazetteer. Similarly, ambiguous place-names such as "Washington" (the state or the capital?) were removed from the results. The above visualizations are not a comprehensive view of every place-name printed in the two papers, therefore, but a measurement of how frequently selected places appeared: "a" view of the world from Houston rather than "the" view of the world from Houston. Nevertheless, even within these narrow constraints a skeleton of spatial production emerged, with nearly 1.3 million occurrences of place-names across thousands of issues and hundreds of millions of words.12
17
Counting every instance of a place-name across every issue of a newspaper flattened the text. This approach intentionally disregarded context: the "Dallas" in a front-page headline was given the same weight as the "Dallas" in a retail advertisement. What began as a pragmatic response to messy data, however, soon became an intellectual cornerstone of my project. Specifically, it led me to a more holistic concept of "news" that included the entire spectrum of a newspaper - feature stories, editorials, advertisements, classifieds, freight tables, stock quotes, weather tables - as potential geographic sources for readers. This avoided privileging certain modes of reading over others. After all, a reader looking for a new pair of gloves may have been more interested in a back-page advertisement from a Dallas merchant than a front-page editorial by a Dallas mayor. Flattening the text helped me understand the multifaceted ways a newspaper produced space.
18
To measure different kinds of content, I turned to a form of sampled content analysis that operated in a middle ground between close and distant reading. I first took a statistically significant sample of issues form my collection. I then helped design a program to overlay a grid onto each image of a newspaper page. The program allowed me to quickly divide the image into different blocks of content, categorize them, and aggregate the amount of page space dedicated to, say, news columns versus advertisements. The results of my sampled content analysis reinforced just how important it was to examine the entire spectrum of newspaper content: in the Houston Daily Post, more than 40% of its page space was dedicated to non-narrative, fragmentary content such as advertisements, classifieds, railroad schedules or stock market listings. If I had limited myself to the more traditional narrative-based stories we tend to think of as "news," I would have missed a huge amount of content that proved foundational to how a newspaper produced space for its readers.13 (See figure 2.)
19
Figure 2: Fragmentary and Non-narrative Newspaper Content
The Houston Daily Post, January 25, 1901
20
Fragmentary content such as wholesale prices or timetables were utterly ubiquitous in nineteenth-century newspapers. But they are often hidden in plain sight for contemporary readers. A ten-word wanted ad for lumber in La Porte, Texas recedes from the historian’s memory in a way that a feature story about a grisly murder in Kansas City does not. This problem is compounded when the scale of analysis grows from dozens of pages to tens of thousands of pages. The human reader simply cannot identify spatial patterns at that level of mundaneness, granularity, and fragmentation. Fortunately, computers are quite good at this kind of reading. Instead of struggling to wade through yet another advertisement for a healing tonic, a computer program locates Chicago in its address line, updates its tally of place-names, and moves on with a speed that human readers cannot match. The ability to process hundreds of millions of words in a matter of seconds allows for experimentation and modularity, such as modifying the program to search for additional place-names or applying it to other newspapers. The occurrence of Chicago in an address line of an advertisement does not in and of itself speak to the city’s relative importance. But tabulating the occurrences of Chicago in tens of thousands of advertisements, news stories, classified ads, market reports, and railroad schedules throughout the 1890s reveals the city’s changing position in a hierarchy of constructed space. By systematically measuring the process of spatial production across such a broad scale, I was able to recreate a detailed view of the world from a specific vantage point in time and space.
21
III. Beyond Houston
Spatial history is not a new field. In 1949 Fernand Braudel placed scale and spatial relations at the forefront of his magisterial study, The Mediterranean and the Mediterranean World in the Age of Philip II, which remains one of the discipline’s most influential works. Forty years later William Cronon based Natures Metropolis (1991) on the study of Chicago’s commercial and environmental connections to its hinterland. From Braudel’s analysis of communication lags to Cronon’s mapping of midwestern debtor patterns, both historians relied on empirical methods to make spatial arguments.14 My research builds on earlier spatial histories through its use of technology to analyze sources on a much larger scale. Historians have long written about urban systems, regionalism, or the production of space, but computing allows us to access and make sense of otherwise incomprehensibly vast amounts of information. Digital methods are not any more or less valid than traditional approaches, but they do provide a different entry point into the historical archive.
22
Electronic sources and digital tools offer fundamentally new ways for humanities scholars to practice their craft. Historians at George Mason University analyzed the titles of over 1 million British books published in the nineteenth century to test long-standing claims about a Victorian crisis of faith. A group at Stanford University charted the correspondence flows, travel patterns, and social networks of Enlightenment thinkers to map the early modern republic of letters. Historians at the University of Richmond used both textual and geospatial analysis to reveal the hidden patterns of sectional conflict and emancipation during the U.S. Civil War.15 All three projects fall under the category of "digital humanities," an expansive phrase encompassing the use of technology and computing to study the human experience. The commonalities between these projects illustrate many of the common threads that run through digital humanities research: a reliance on large-scale datasets, the use of computing to conduct analysis, and the visualization of evidence and results. Emphasizing technology, however, risks overshadowing an even more important commonality: collaboration.16
23
Collaboration stands at the heart of the digital humanities. This stems in part from pragmatism; most of us aren’t equipped to tackle the size and complexity of the digital archive on our own. No humanist, "digital" or otherwise, can learn every new method of database design, network analysis, geospatial processing, or data visualization. Collaboration allows scholars to specialize in specific areas. But collaboration also embodies a deeper ethos of the digital humanities movement: to push the boundaries of how we do what we do. To some degree all humanities scholars work collaboratively. But digital projects require a fundamentally different workflow. Each of the projects referenced above operate within institutional centers or labs built around group-based research. My own project was no different, as I relied on the expertise of computer scientists, geographers, and graphic designers at the Stanford Center for Spatial and Textual Analysis. Interdisciplinary collaboration pushes the humanities in unfamiliar directions, from natural language processing to human-computer interaction. It also blurs the edges of academia, as scholars work alongside software developers, librarians, archivists, technical staff, and administrators. Collaboration is not a panacea. But more so than any particular technology or method, it is the best strategy for humanities research to flourish in the twenty-first century.17
24
Acknowledgments
Many thanks to Erik Steiner, Kathy Harris, Jake Coolidge, Zephyr Frank and the rest of the Stanford Spatial History Lab and the Center for Spatial and Textual Analysis. The project grew out of a larger partnership between the University of North Texas and Stanford University, and I extend my gratitude to Andrew Torget and Jon Christensen for their support. Funding was provided by the Stanford Fund for Innovation in the Humanities and the Bill Lane Center for the American West. Lastly, I would like to thank Richard White for his guidance, generosity, and patience.
25
End Notes

1 Henri Lefebvre, The Production of Space, trans. Donald Nicholson-Smith (Wiley-Blackwell, 1991). A sampling of other influential works on the production of space include David Harvey, Social Justice and the City (Johns Hopkins University Press, 1973); Doreen Massey, Spatial Divisions of Labor (Routledge, 1984); Neil Smith, Uneven Development: Nature, Capital, and the Production of Space (Blackwell, 1984); Michel Foucault, “Of Other Spaces,” trans. Jay Miskowiec, Diacritics, 16 (April 1, 1986), 22–27; Edward W. Soja, Postmodern Geographies: The Reassertion of Space in Critical Social Theory (Verso, 1989); and Derek Gregory, Geographical Imaginations (Wiley-Blackwell, 1994). For an introduction to the major issues and writers addressing space and place, see the introduction to Phil Hubbard and Rob Kitchin, eds., Key Thinkers on Space and Place, Second ed. (Sage Publications, 2010). Some of the more influential writings on place and place-making include Yi-Fu Tuan, Space and Place: The Perspective of Experience (University Of Minnesota Press, 1977); Denis E. Cosgrove, Social Formation and Symbolic Landscape (Croom Helm, 1984); Michel de Certeau, The Practice of Everyday Life (University of California Press, 1984); Doreen Massey, “A Global Sense of Place,” Marxism Today, 35 (no. 6, 1991): 24–29; and Edward Casey, The Fate of Place: A Philosophical History (University of California Press, 1997). For accessible introductions to the concept of place, see John A. Agnew and James S. Duncan, eds., The Power of Place: Bringing Together Geographical and Sociological Imaginations (Unwin Hyman, 1989); and Tim Cresswell, Place: A Short Introduction (Wiley-Blackwell, 2004)

2 My use of "Imagined geography" is a conscious reference to Edward Saids "imaginative geography" and Benedict Andersons "imagined communities." Both concepts explore many of the themes I explore in my project: socially constructed space, relations of power, and overlapping scales, among others. See Edward W. Said, Orientalism (Vintage, 1979), 53-55; and Benedict Anderson, Imagined Communities: Reflections on the Origin and Spread of Nationalism (Verso, 1983). Americans’ relationship to geography during this period is articulated in Susan Schulten, The Geographical Imagination in America, 1880-1950 (University Of Chicago Press, 2001) and Susan Schulten, Mapping the Nation: History and Cartography in Nineteenth-Century America (University of Chicago Press, 2012).

3 For examples of Gulf shipping and commerce, see Democratic Telegraph and Texas Register (Houston, Tex.), Aug. 20, 1845, p. 3. Online at: http://texashistory.unt.edu/ark:/67531/metapth78113/m1/3/

4 See the fixation on the Rio Grande as a sovereign boundary in Telegraph and Texas Register (Houston, Tex.), May 11, 1842, p. 1. Online at: http://texashistory.unt.edu/ark:/67531/metapth48061/m1/1/

5 Democratic Telegraph and Texas Register (Houston, Tex.), Vol. 14, No. 28, Thursday, July 12, 1849. Online at: http://texashistory.unt.edu/ark:/67531/metapth48547/ and Democratic Telegraph and Texas Register (Houston, Tex.), Vol. 14, No. 33, Ed. 1, Thursday, Aug. 16, 1849. Online at http://texashistory.unt.edu/ark:/67531/metapth48551/

6 Harold L. Platt, City Building in the New South: The Growth of Public Services in Houston, Texas, 1830-1910 (Temple University Press, 1983).

7 As of September 3, 2013, the Library of Congresss Chronicling America project listed 1,005 different newspaper titles on its website. Rough calculations for manual reading were based on a generous reading speed of 300 words per minute, multiplied by 480 minutes in an eight-hour workday, multiplied by 260 workdays in a year.

8 Franco Moretti, “Conjectures on World Literature,” New Left Review (Jan.-Feb. 2000), 54–68; Franco Moretti, Graphs, Maps, Trees: Abstract Models for a Literary History (Verso, 2005), 1. For an example of the quantitative application of distant reading in literature, see the Stanford Literary Labs pamphlet publication, Ryan Heuser and Long Le-Khac, “A Quantitative Literary History of 2,958 Nineteenth-Century British Novels: The Semantic Cohort Method." Available online at: http://litlab.stanford.edu/LiteraryLabPamphlet4.pdf. For a collection of responses to Moretti and “distant reading,” see Jonathan Goodwin and John Holbo, eds., Reading Graphs, Maps, and Trees: Responses to Franco Moretti (Parlor Press, 2011), http://www.parlorpress.com/pdf/ReadingMapsGraphsTrees.pdf. Literary historian and digital humanities luminary Matthew Jockers, meanwhile, prefers the term macroanalysis to distant reading. Matthew Jockers, Macroanalysis: Digital Methods and Literary History (University of Illinois Press, 2013).

9 The source material was produced by the University of North Texas Library’s Texas Digital Newspaper Program. For more information about the program, see http://tdnp.unt.edu/. To access the digitized collections, see http://texashistory.unt.edu/. My many thanks to Andrew Torget for giving me access to the collection.

10 For a brief introduction to the digitization of sources, see Marilyn Deegan and Simon Tanner, “Conversion of Primary Sources,” in Companion to Digital Humanities, ed. Ray Siemens, John Unsworth, and Susan Schreibman (Oxford: Blackwell Publishing Professional, 2004). For problems posed by copyright laws for text analysis, see Matthew L. Jockers, Matthew Sag, and Jason Schultz, "Brief of Digital Humanities and Law Scholars as Amici Curiae in Authors Guild v. Google,” Aug. 3, 2012. Available at SSRN: http://ssrn.com/abstract=2102542 or http://dx.doi.org/10.2139/ssrn.2102542

11 Many thanks to Tze-l Yang at the University of North Texas for providing OCR accuracy rates for the two newspapers.

12 The three scales of my gazetteer can be thought of as sequential filters: national, regional, and local. The program iterated through every word in the newspaper text. If a word (unigram) or pair of words (bigram) were capitalized, (Kansas City, for example) it paused and sent them through the first filter of national place-names. If it found a matching record (Kansas City in the national filter), the program updated the tally for that place-name in that issue and moved on to the next word in the text. If it did not match any record in the first filter, the program sent it through the second and third filters looking for a match. The program looked first for a bigram and then for a unigram, meaning that in the case of Kansas City it would identify the two-word place-name Kansas City and subsequently skip over the one-word place-name Kansas without trying to identify it as a separate place.

National place-names included any city that had been in the top 100 highest-populated cities during any census year, in addition to all states excluding Texas. Regional place-names included any U.S. town within a 200-mile radius of Texas that had more than 10,000 residents during any census year, along with major cities in Mexico that were within a 200-mile radius of the border with Texas. I also manually included the major place-names of Texas, Mexico, Cuba, and Havana in the regional filter. Local place-names included any place listed in the Geographic Names Information System (GNIS) database that fell within a thirty-mile radius of Houston (full GNIS data available for download at: http://geonames.usgs.gov/domestic/index.html). The full gazetteer that I used, along with the frequency of each place-name in the two newspapers, can be downloaded at http://spatialhistory.stanford.edu/viewoftheworld/PlaceNameFrequencies.csv

After compiling a gazetteer of place-names, I then had to make decisions about which words to exclude based on ambiguous identification (such as Washington). Due to the poor quality of the digitized text, it would have been fruitless to try and evaluate each individual instance of a place-name using contextual clues. I instead either entirely excluded or included a place-name. For each potentially ambiguous place-name I evaluated a random sampling of occurrences in the text to determine whether the names ambiguity would have a discernible impact on the end results. If a place-name consistently referred to more than one location, I omitted it entirely. If it consistently referred to a single location, I included the name and used that location. For instance, Abilene is a city in both Kansas and Texas. Houston papers overwhelmingly referred to Abilene, Texas, so I included the place-name and used its Texas location. The same process applied to disambiguating between place-names and proper names. In the case of Jackson the name too often referred to people such as Andrew Jackson rather than places such as Jackson, Mississippi, so it was omitted. If I found consistent ambiguity in random samples of text, I omitted the place. The full list of the place-names I omitted and the reasons for doing so can be downloaded at http://spatialhistory.stanford.edu/viewoftheworld/PlaceNameRemovals.csv

The literature in computational linguistics and natural language processing is vast. Many of these approaches rely on higher-quality text than the sources I used, which allows for more sophisticated approaches. One of the most relevant recent examples in the field of digital humanities is Ian Gregory and Andrew Hardie, "Visual GISting: Bringing Together Corpus Linguistics and Geographical Information Systems” Literary and Linguistic Computing, 26 (Sept. 2011), 297-314. See also Kalev H. Leetaru, "Fulltext Geocoding Versus Spatial Metadata for Large Text Archives: Towards a Geographically Enriched Wikipedia," D-Lib Magazine, 18 (Sept./Oct. 2012), online at: http://www.dlib.org/dlib/september12/leetaru/09leetaru.html

13 I performed content analysis on a seventeen-issue sample of the Houston Daily Post (representing 1% of the total number of issues) by developing a program called ImageGrid. The program overlays a grid onto each page image and categorizes each cell in the grid, in this case into one of six different news categories. Each category was then aggregated according to the percentage of the total printed page space it took up. ImageGrid is an open-source program available at http://www.cameronblevins.org/imagegrid/. This sampled content analysis resulted in the following categories, with accompanying 95% confidence intervals. Statistical least squares estimates were used to produces statistically significant estimates for the entire collection.

  1. Traditional narrative news (stories, editorials, reports): 48.6% (confidence interval between 38.5% and 51.6%)
  2. Narrative-based miscellany (jokes, advice columns, sermons, speeches): 8.8% (confidence interval between 6.1% and 17.9%)
  3. Advertisements and classifieds: 29% (confidence interval between 26.5% and 32.2%)
  4. Commercial nonnarrative (stock listings, price reports, ship registers, exchange rates, etc.): 5.3% (confidence interval between 3.9% and 6.3%)
  5. Miscellaneous nonnarrative (railroad schedules, weather tables, hotel guest lists): 4.7% (confidence interval between 3.8% and 5.3%)
  6. Marginalia (newspaper headers, page numbers, subscription rates, contact information): 3.6% (confidence interval between 3.1% and 4.9%)

14 Fernand Braudel, The Mediterranean and the Mediterranean World in the Age of Philip II, 2 vols., trans. Sian Reynolds (Harper and Row, 1972). William Cronon, Natures Metropolis: Chicago and the Great West (W.W. Norton, 1991).

15Publications that came out of these projects include Frederick W. Gibbs and Daniel J. Cohen, “A Conversation with Data: Prospecting Victorian Words and Ideas,” Victorian Studies, 54 (no. 1 2011): 69–77; Caroline Winterer, “Where Is America in the Republic of Letters?,” Modern Intellectual History, 9 (no. 03, 2012): 597–623; and Edward L. Ayers and Scott Nesbit, “Seeing Emancipation: Scale and Freedom in the American South,” Journal of the Civil War Era, 1 (no. 1, 2011), 3–24.

16 For an overview of the field of digital humanities, see Matthew K. Gold, ed., Debates in the Digital Humanities (University Of Minnesota Press, 2012). The book of abstracts for the annual Digital Humanities Conference also offers a glimpse into the current popular topics in the field. The 2013 conference book of abstracts is available online at: http://dh2013.unl.edu/abstracts/files/downloads/DH2013_conference_abstracts_print.pdf. For an early roundtable on the role of computing specifically in the field of history, see “Interchange: The Promise of Digital History,” The Journal of American History, 95, (Sept. 2008), 452 –491.

17For more on the changing boundaries of academia in the humanities, see Bethany Nowviskie, ed., #alt-academy: Alternative Academic Careers for Humanities Scholars, http://mediacommons.futureofthebook.org/alt-ac/.