BioCreAtIvE - Critical Assessment for Information Extraction in Biology
Home - CNIO - MITRE - NCBI - Organization - News - Contact
































NLP, IR, IE, ML tools

  • FreeLing: Open source language analyzer, includes morphological analysis, shallow parser and pos tagger.
  • VIEW: Variation in English Words and Phrases, tool to compare semantically-related words and phrases in the British National Corpus.
  • OAK System: English analyzer, which consists of a sentence spliter, a tokenizer, a POStagger, a stemmer, a chunker, a Naned Entity (NE) tagger, a dependency analyzer, a parser, a function tagger and a regularizer.
  • TreeTagger: Language independent part-of-speech tagger.
  • SVM_light: Support Vector Machines (SVMs) implementation in C.
  • Stanford Lexical Parser: Probabilistic natural language parser.
  • TIGERSearch: Tools for linguistic text exploration.
  • NLTK: Natural Language Toolkit, python library for natural language processing.
  • GATE: General Architecture for Text Engineering, Natural Language Proccesing system.
  • Anaphora resolution tool Prolog tool for anaphora resolution.
  • GuiTAR General tool for anaphora resolution.
  • JavaRAP Java implementation of the classic Resolution of Anaphora Procedure (RAP) .
  • Lemur Toolkit for Language Modeling and Information Retrieval.
  • Zettair search engine and tool to build inverted file index.
  • SATZ adaptive Sentence Boundary Detector written in C, neural network based.
  • Ngrams n-gram analysis tool written in Perl.
  • Rubryx text classification program (pattern classification of web sites), for Windows.
  • SEFT Search Engine For Text, return relevant text windows for a given set of query terms.
  • Bow Toolkit written in C for Statistical Language Modeling, Text Retrieval, Classification and Clustering.
  • Approximate String Matching code of string matching programs.
  • Strmat Set of C programs of string matching and pattern discovery algorithms.
  • FCLUSTER Program for fuzzy cluster analysis.
  • LNKnet Program for pattern Classification using a variety of techniques such as neural networks, statistical, and machine learning algorithms.
  • TextSTAT Program for basic text analysis implemented in python.
  • Suffix sort Program for suffix sorting written in C.
  • Alembic Workbench for corpus analysis and domain specific tagging.
  • Quirk Toolkit for terminology extraction and management.
  • Nice stemmer Stemmer which integrates different stemming algorithms such as an simple stemmer, Porter, Krovetz and Combo Stemmer.
  • TnT Statistical Part-of-Speech Tagger.
  • C. Manning list* useful list of NLP resources by Christopher Manning.
  • SenseClusters package (Perl) for clustering similar contexts together using unsupervised knowledge-lean methods.
  • CCG tools tools developed by the Cognitive Computation Group at the University of Illinois, include: verb tense changer, sentence segmentation, word splitter, shallow parser, HTML tag stripper tools.
  • LingPipe suite of Java tools designed to perform linguistic analysis on natural language data (e.g. a heuristic within-document coreference resolution engine, general chunking, text classification, clustering).



[up][home]



Last update of this page: 07 August 2006
© by Martin Krallinger 2006