Before using the corpora
Licenses, copyright, and user agreements
In order to use any of the corpora or the software on AFS or the corpus computer you need to register. By registering you agree to inform yourself of and observe all access and use restrictions for the corpora/software, and to not copy them or bigger chunks of them to non-Stanford machines. This also applies to corpora that can be freely downloaded on the web. We urge corpus users to be responsible about reporting the source of information obtained in corpus searches. At a minimum, the corpus creator should be identified and bibliographic information on the source document should be included with any citation. Ideally, the location in the source document (e.g. page number) should be noted as well.
A few corpora require an individual signed agreement or are subject to other special regulations. They are in special protected groups on AFS, and you need to present a signed agreement to get access to them. Be aware that by not following the above-mentioned rules you violate federal copyright law (in several countries depending on the corpus).
How to register
General use agreement
In order to register, please send an email to the corpus TA containing the following information.
- An indication that you will inform yourself of copyright restrictions and user agreements for any corpus you will use on AFS. You can copy the following sentences into your email: I will inform myself about any copyright restrictions that hold for the corpora on AFS. I recognize that it is my responsibility to do so. I will also follow all guidelines outlined by user agreements (if there are any) of any corpus I will use.
- Your SUNetID (not your student ID number, but your user name for the Stanford network)
- Your first and last name
- If you need to use any of the restricted corpora, include the appropriate agreement (see list below)
- Your departmental affiliation
- Sponsor (only if you are not within the linguistics department): which professor/which class do you need corpus-access for? Please cc your advisor/sponsor if citing a sponsor.
As mentioned above, we understand, that by registering you agree to the general user agreement that holds for all corpora at Stanford.
Corpora with special access restrictions
Some corpora require a special signed agreement from you. Before the corpus TA can give you access to those corpora, you need to hand in the signed agreement to the corpus TA. Corpora that need a special signed agreement have special protection on AFS: you must be added to a specific group to be able to use them. Some corpora are subject to other kind of limitations — for example a maximum number of simultaneously registers users. In order to get access to any kind of corpus that is subject to special regulations contact the corpus TA and tell them which corpus you are interested in.
The restricted corpora, the special groups, and a link to their user license template are listed below.
|Corpus/Corpora collection||Group-membership necessary||Link to user agreement|
|CELEX 2||corpora-celex||user agreement|
|PPCME2||corpora-ppcme2||limited to 5 simultaneous users; contact corpus TA|
|TDT Pilot Study Corpus||corpora-tdtpilot||user agreement|
|TIPSTER Complete||corpora-tipster||user agreement|
|LINK Project Switchboard Corpus||corpora-link||see README in AFS directory|