This site

::  HOME
What? What not?
::  Site map
::  About this site
 
 

 

Corpus-tools & other useful software

Corpora@Stanford

Getting started
@Stanford

::  Intro & Overview
Where corpora grow and why you like them
::  Playground rules
& registration

Apply for your visa to the land of corpora
::  Setting up your account
Pack your suitcase to the land of corpora

Available resources
@Stanford

::  User support
The Corpus TA &
our corpora-email-list
::  Corpora
[Ordering corpora | Checking out CDs]

::  Corpora-tools & Software
[Documents]

::  Corpus-related classes
& projects

Beyond Stanford

::  Top 10 info-sources
E-resources out there

For the Corpus TA

::  Guidelines & help
 

TnT - Thorsten Brants's part-of-speech tagger

TnT is a part-of-speech tagger (POS-tagger) which can be used to prepare corpora for search tools that presume POS tagging (e.g. Gsearch, tgrep, etc.). It comes pretrained to tag English and German newspaper text but can be trained with any other corpus. The Unix and Windows version can be found on AFS at:

    /afs/ir/data/linguistic-data/lib/tnt

Tip - Jeanette Pettibone has provided us with her presentation on part-of-speech taggers.