This site

::	HOME What? What not?
::	Site map
::	About this site

Intro & Overview

Corpora@Stanford

Getting started
@Stanford

::	Intro & Overview Where corpora grow and why you like them
::	Playground rules & registration Apply for your visa to the land of corpora
::	Setting up your account Pack your suitcase to the land of corpora

Available resources
@Stanford

::	User support The Corpus TA & our corpora-email-list
::	Corpora [Ordering corpora \| Checking out CDs]
::	Corpora-tools & Software [Documents]
::	Corpus-related classes & projects

Beyond Stanford

::	Top 10 info-sources E-resources out there

For the Corpus TA

::	Guidelines & help

Overview

This page contains introductory information about the Corpus Computer and Stanford's AFS. If you have more basic questions such as "What is a corpus?", "What is corpus-linguistics?", and the like I highly recommend the following introduction to corpus linguistics (University of Essex).

Some very broadly focused introductions by Stanford folks are included below. You should check them out since they pay specific attention to the local setup:

Summary of the available types of corpora (different annotation schema and examples) provided by Jeanette Pettibone (originally developed for CS224N by Chris Manning et al.) [pdf | doc]
Introduction to using speech corpora (by Colleen Richey) [pdf | doc] and a tutorial on using phonetically transcribed corpora (by Florian Jaeger) - also, check out this list of prosodically annotated corpora (by Florian Jaeger).

Tip - You can always ask the Corpus TA.

The Corpus Computer

The Corpus Computer (a.k.a. Corpus PC or CC on these pages) is located in the linguistics department's computer cluster. It is the rightmost computer in the computer cluster (closest to the printer). It runs Windows XP and is set up in a way intended to make access to corpora (both on AFS and on the CC itself) as easy as possible. It contains pre-installed tools for corpus research, for syntactic searches, regular expression searches, searches in phonetic transcriptions, etc. Several corpora are only installed on the Corpus PC. A list of corpora on CC is avaiable. Access to the Corpus PC is restricted. Read more about how to get access. Once you have access you will need to log in both with the specific user login for the CC and then with your personal SUNnetID. If you click on "My Computer" once you are logged in you can see AFS mapped as a network drive.

Introduction to the local setup

This site provides an introduction to the local setup of corpora at Stanford. You may find the following summaries interesting:

Summary of local setup (AFS, the Corpus Computer, etc.) as of 12/06/03 [pdf | doc]
PowerPoint presentation - Corpora@Stanford, Methods Class 12/05/03 [ppt | html] - contains
- How do I find a corpus and how do I start my search for the right corpus if I don't even know what kind of corpora exist?
- Where can find the software that will help me to solve the problem I am working on?
- TGrep syntax
- Links to more information
Overview of available corpora with phonetic or prosodic transcription (by Florian Jaeger) [pdf | doc]

AFS

AFS is basically a large network drive on which several departments (including the Linguistics department) have accounts. We store many of our widely-used of our corpora there. AFS can be accessed via the Stanford network - you need a SUNnetID and you have to get access to AFS. AFS is UNIX based, so you will need some minimal UNIX expertise to find your way. If you are more comfortable with Windows you can use the Corpus PC which has AFS mapped as a regular drive (i.e. you can browse AFS using the familiar Windows Explorer).