*BioCreAtIvE - glossary*

BioCreAtIvE - Critical Assessment for Information Extraction in Biology

Home

- CNIO

- MITRE

BioCreAtIvE glossary

B

BioCreAtIvE: Critical Assessment of Information Extraction systems in Biology challenge evaluation consists of a community-wide effort for evaluating text mining and information extraction systems applied to the biological domain.

C

Curation (Biology): curation of biological databases in this context means basically the manual extraction of biological information from the literature by a domain expert. The aim is to transform information contained in free text (scientific literature) to information stored in form of a structured database record (biological databases).

E

EBI: European Bioinformatics Institute (EMBL-EBI). Among others research groups the EBI hosts the GOA-EBI group for annotation of gene products with GO terms, the IntAct team for protein-protein interaction annotation and the Rebholz team for biomedical text mining.

F

F-measure (balanced F-score): is basically the harmonic mean of precision and recall. F = 2 X precision X recall / (precision + recall). It is a commonly used performance measure in information retrieval (IR).

G

GO: Gene Ontology (GO)consists in an initiative to provide a set of controlled vocabulary terms useful to describe gene and gene product attributes. There are used to annotate gene products in an consistent way. The three main GO categories are Cellular Component, Molecular Function and Biological Process.

GOA: Gene Ontology Annotation(GOA) is a project run by the EBI to provide assignments of gene products to the Gene Ontology (GO) terms.

H

HUPO: Human Proteome Organisation (HUPO).

I

IMEx: the IMEX consortium is a group of protein interaction providers which share the curation effort and also exchange molecular interaction data records, using an XML format following the PSI MI standard for molecular interactions. Its partners comprise Intact, MINT, BIND, DIP and MPact.

Information extraction (IE): IE systems perform natural language text analysis in order to identify information related to pre-defined types of entities (e.g. genes or proteins), relationships, facts or events.

Information retrieval (IR): ...

IntAct: the IntAct IntAct is a freely available, open source database system and analysis tools for protein interaction data. The interactions stored in IntAct are derived from literature curation or direct user submissions. It distributes software developed within the IntAct project and controlled vocabularies for the interaction methods.

M

MINT: the MINT, Molecular INTeraction database is an initiative of the University of Rome (Tor Vergata) to store data on functional interactions between proteins, focusing on experimentally verified interactions. It considers both direct and indirect relationships and hosts a team of expert curators which extract interaction information from the literature. Refer to Zanzoni et al (2002).

P

PMID: the PubMed database identifier (PMID) is a unique identifier for each PubMed citation, e.g. 11911893.

Precision: is the number of answers the system got right divided by the number of answers the system gave.

Protein-protein interaction: molecular interactions of proteins. Although there are many different types of interactions, often protein interactions are considered as physical interactions between proteins.

PSI-MI: Proteomics Standards Initiative Molecular Interaction - PSI-MI XML format. is a community standard for the representation of protein interaction data followed by several interaction databases. Refer to Hermjakob et al. (2004).

PubMed: the PubMed is a database available via the NCBI Entrez retrieval system, and was developed. It is currently the most important literature database for life sciences and contains over 15 million citations.

R

Recall: is the number of answers the systems got right divided by the number of possible right answers.

[up][home]