NLP, IR, IE, ML tools
-
FreeLing:
Open source language analyzer, includes morphological analysis, shallow parser and pos tagger.
-
VIEW:
Variation in English Words and Phrases, tool to compare semantically-related words and phrases in the British National Corpus.
-
OAK System:
English analyzer, which consists of a sentence spliter, a tokenizer, a POStagger, a stemmer, a chunker, a Naned Entity (NE) tagger, a dependency analyzer, a parser, a function tagger and a regularizer.
-
TreeTagger:
Language independent part-of-speech tagger.
-
SVM_light:
Support Vector Machines (SVMs) implementation in C.
-
Stanford Lexical Parser:
Probabilistic natural language parser.
-
TIGERSearch:
Tools for linguistic text exploration.
-
NLTK:
Natural Language Toolkit, python library for natural language processing.
-
GATE:
General Architecture for Text Engineering, Natural Language Proccesing system.
-
Anaphora resolution tool
Prolog tool for anaphora resolution.
-
GuiTAR
General tool for anaphora resolution.
-
JavaRAP
Java implementation of the classic Resolution of Anaphora Procedure (RAP) .
-
Lemur Toolkit
for Language Modeling and Information Retrieval.
-
Zettair
search engine and tool to build inverted file index.
-
SATZ
adaptive Sentence Boundary Detector written in C, neural network based.
-
Ngrams
n-gram analysis tool written in Perl.
-
Rubryx
text classification program (pattern classification of web sites), for Windows.
-
SEFT
Search Engine For Text, return relevant text windows for a given set of query terms.
-
Bow
Toolkit written in C for Statistical Language Modeling, Text Retrieval, Classification and Clustering.
-
Approximate String Matching
code of string matching programs.
-
Strmat
Set of C programs of string matching and pattern discovery algorithms.
-
FCLUSTER
Program for fuzzy cluster analysis.
-
LNKnet
Program for pattern Classification using a variety of techniques such as neural networks, statistical, and machine learning algorithms.
-
TextSTAT
Program for basic text analysis implemented in python.
-
Suffix sort
Program for suffix sorting written in C.
-
Alembic
Workbench for corpus analysis and domain specific tagging.
-
Quirk
Toolkit for terminology extraction and management.
-
Nice stemmer
Stemmer which integrates different stemming algorithms such as an simple stemmer, Porter, Krovetz and Combo Stemmer.
-
TnT
Statistical Part-of-Speech Tagger.
-
C. Manning list*
useful list of NLP resources by Christopher Manning.
-
SenseClusters
package (Perl) for clustering similar contexts together using unsupervised knowledge-lean methods.
-
CCG tools
tools developed by the Cognitive Computation Group at the University of Illinois, include: verb tense changer, sentence segmentation, word splitter,
shallow parser, HTML tag stripper tools.
-
LingPipe
suite of Java tools designed to perform linguistic analysis on natural language data (e.g. a heuristic within-document coreference resolution engine, general chunking,
text classification, clustering).
[up][home]
|