1 Premise
In order to obtain reliable protein interaction information, it is necessary that these
interactions have been experimentally confirmed. For annotation purposes, as well as to
judge the quality of protein interactions, it is important to know which methods have
been applied to detect protein interactions.
In case of protein-protein interaction annotation, considerable effort has been made to
develop a controlled vocabulary about interaction detection methods.
This sub-task refers to the identification of the type of experiment which was used to confirm
a given protein-protein interaction. The experimental method used to detect a given protein
interaction described in the article has to be mapped into the controlled hierarchical vocabulary
of experimental methods of the Molecular Interaction (MI) Ontology. This means that they have to
be mapped to the correct concepts within the interaction detection methods (MI:0001) branch of
the MI ontology. Be aware that we ask for the correct interaction detection methods (MI:0001) and
not for the participant identification method (MI:0002)
For the concepts in the interaction detection method ontology, despite the controlled vocabulary
term for each concept, additional information such as their definitions, exact synonyms and
related synonyms as well as an external reference for each methods in form of a PubMed identifier
of the article describing this method is provided.
To browse the MI ontology please select the Molecular Interaction ontology at the Ontology Lookup
service offered at the EBI.
To download the MI ontology please refer to the MI obo download file
Example of interaction detection method concept:
ID: MI:0090
Name: protein complementation assay
Definition: The function of numerous proteins or ribonucleic particles (enzymes,
transcription factors, etc...) can be rationally dissected into two fragments that fold
autonomously but cannot complement to reconstitute the complex function, unless they are
located in close proximity. In a two hybrid experiment, restoration of the activity by
complementation of the two fragments when expressed as fusion with two polypeptides is
taken as an evidence that the two polypeptides interact together.
Related synonym: PCA
Exact synonym: complementation
xref_definition: PMID:11495741
2 System Input
Collection of full text articles which contain protein interaction information curated by
IntAct and MINT.
3 System output
Ranked list of maximum 5 experimental interaction detection method concept identifiers (Molecular
Interaction MI identifiers) for each protein interaction pair.
4 Evaluation
For evaluating the predictions submitted for this sub-task, we will measure the mean reciprocal rank
of correctly identified interaction methods (correct MI identifiers) for each protein-protein interaction
pair compared to the previously manually annotated interaction detection methods.
5 Tentative release dates
The test set of this subtask will be released after the due date of the result submission of PPI subtask 1
(detection of protein interaction curation relevant articles).
Training set PPI subtasks 1-4: June 2006
Test set release for IMS: October 15, 2006
Test set prediction due for IMS: October 22, 2006
6 Training data
The training data was derived from the content of the IntAct and MINT databases. The data files of both
databases are freely accessible for download and are compliant with the HUPO PSI-MI 2.5
(Molecular Interaction Format).
In principle, any of the data files from the IntAct and the MINT ftp servers would be usable to derive the
annotated interaction detection methods for protein interactions pairs, but we recommend (as is the case of
other sub-tasks) using the files released 2005 and 2006. All the articles contained in these databases were
manually reviewed for whether they contain protein interactions together with the interaction detection
methods employed.
These articles were used to extract manually the protein interactions mentioned, linking each interacting
protein to its corresponding unique UniProt ID (or accession number) and providing the identifier of the described
interaction detection method. We also recommend not using articles of very large scale experiments (i.e. more
than 20-30 interactions), because they are not used in the test set, and could bias your results. Thus ideally
articles with less than 21 interaction pairs should be used for training.
We recommend that you check carefully the 'interaction detection method' node (MI:0001) and its child nodes of
the MI ontology.
7 Test data
The interaction databases MINT and IntAct are holding back a set of curated records to produce the test set
for the BioCreAtIvE contest. Both are doing a considerable annotation effort to produce the test and training
data collection. A total of around 300 publications expected to be part of this test set collection. These
articles will be provided to the participants in pdf, plain text (converted automatically from pdf to text
using pdftotext) and some of them in html format. Note that the test set of this task will be released after
the due of subtask 1 (detection of protein interaction curation relevant articles).
8 Data Selection
Note that there is no specific restriction regarding the type of interaction detection method. You should not
confuse interaction detection method (which should be predicted by the participating systems) and participant
identification method (which will not be part of this task). The interaction detection method is used to detect
the actual interaction while the participant identification method is used to identify specifically each of the
interactor proteins.
Note that a given interaction might be confirmed by several protein interaction detection methods. Not all the
proteins mentioned in a given article are usually studied by all the mentioned protein interaction detection
methods.
9 Submission format
Each run of predictions has to be provided as a single file with xml-like format, containing all the submitted
interaction detection methods for the interaction pairs extracted from an article.
A sample prediction entry is shown below:
<ENTRY>
<PPI_SUB_TASK_ID> BC2_PPI_IMS </PPI_SUB_TASK_ID>
<TEAM_ID> T1_BC2_PPI </TEAM_ID>
<RUN_NR> 1 </RUN_NR>
<PMID> 10924507 </PMID>
<INTERACTION_PAIR>
<INTERACTOR_1> Q08211 </INTERACTOR_1>
<INTERACTOR_2> Q9UBU9 </INTERACTOR_2>
</INTERACTION_PAIR>
<INT_DET_METHOD>
<INT_DET_METHOD_ID> MI:0004 </INT_DET_METHOD_ID>
<RANK> 1 </RANK>
</INT_DET_METHOD>
</ENTRY>
Where:
1) ENTRY: corresponds to a single evidence passage prediction
2) PPI_SUB_TASK_ID: The identifier of the interaction sentence sub-task, i.e. BC2_PPI_IMS
3) TEAM_ID: the identifier of the team (as provided to each participating team)
4) RUN_NR: the number of the submission run (maximum of three runs)
5) PMID: corresponds to the PubMed identifier of the article
6) INTERACTOR_1 : corresponds to the UniProt ID (or accession number) of the interactor protein 1
7) INTERACTOR_2: corresponds to the UniProt ID (or accession number) of the interactor protein 2
(maximum 3 sentences).
8) INT_DET_METHOD_ID: MI identifier of the interaction detection method
9) RANK: rank of the interaction detection method for a given interaction pair
Be sure that your prediction is compliant with this simple output format.
10 Number of runs
For this sub-task, each participating team can submit up to three runs .
11 Useful Links
1) IntAct
2) MINT
3) MI ontology
4) MI ontology browser
5) PSI-MI 2.5 format
6) UniProt
6) UniProt download
12 Training data release
People who intend to participate at the protein-protein interaction (PPI) task of the
second BioCreAtIvE challenge should send the following information:
1) Team contact e-mail (one per team).
2) Tentative list of participant team members (name and e-mail).
3) Institutions.
to: mkrallinger@cnio.es
Last update of this page: 05 September 2006
[up][home]