Protein Interaction Method Sub-task 4 (IMS)

A description of this sub-task is available in pdf format.

Premise

System Input

System output

Evaluation

Tentative release dates

Training data

Test data

Data Selection

Data set format

Prediction submission format

Number of runs

Training data release


1 Premise  
In order to obtain reliable protein interaction information, it is necessary that these 
interactions have been experimentally confirmed. For annotation purposes, as well as to 
judge the quality of protein interactions, it is important to know which methods have 
been applied to detect protein interactions. 

In case of protein-protein interaction annotation, considerable effort has been made to 
develop a controlled vocabulary about interaction detection methods. 
This sub-task refers to the identification of the type of experiment which was used to confirm 
a given protein-protein interaction. The experimental method used to detect a given protein 
interaction described in the article has to be mapped into the controlled hierarchical vocabulary 
of experimental methods of the Molecular Interaction (MI) Ontology. This means that they have to 
be mapped to the correct concepts within the interaction detection methods (MI:0001) branch of 
the MI ontology. Be aware that we ask for the correct interaction detection methods (MI:0001) and 
not for the participant identification method (MI:0002) 
For the concepts in the interaction detection method ontology, despite the controlled vocabulary 
term for each concept, additional information such as their definitions, exact synonyms and 
related synonyms as well as an external reference for each methods in form of a PubMed identifier 
of the article describing this method is provided. 
To browse the MI ontology please select the Molecular Interaction ontology at the Ontology Lookup 

service offered at the EBI. 
To download the MI ontology please refer to the MI obo download file 

Example of interaction detection method concept:

ID: MI:0090
Name: protein complementation assay
Definition: The function of numerous proteins or ribonucleic particles (enzymes, 
transcription factors, etc...) can be rationally dissected into two fragments that fold 
autonomously but cannot complement to reconstitute the complex function, unless they are 
located in close proximity. In a two hybrid experiment, restoration of the activity by 
complementation of the two fragments when expressed as fusion with two polypeptides is 
taken as an evidence that the two polypeptides interact together.
Related synonym: PCA
Exact synonym: complementation
xref_definition: PMID:11495741

 

2 System Input 
Collection of full text articles which contain protein interaction information curated by 
IntAct and MINT.

 

3 System output   
Ranked list of maximum 5 experimental interaction detection method concept identifiers (Molecular
Interaction MI identifiers) for each protein interaction pair.

 

4 Evaluation
For evaluating the predictions submitted for this sub-task, we will measure the mean reciprocal rank 
of correctly identified interaction methods (correct MI identifiers) for each protein-protein interaction 
pair compared to the previously manually annotated interaction detection methods.


5 Tentative release dates
The test set of this subtask will be released after the due date of the result submission of PPI subtask 1 
(detection of protein interaction curation relevant articles). 

Training set PPI subtasks 1-4: 	  June 2006
Test set release for IMS:            October 15, 2006
Test set prediction due for IMS:  October 22, 2006


6 Training data 
The training data was derived from the content of the IntAct and MINT databases. The data files of both 
databases are freely accessible for download and are compliant with the HUPO PSI-MI 2.5

(Molecular Interaction Format). 
In principle, any of the data files from the IntAct and the MINT ftp servers would be usable to derive the 
annotated interaction detection methods for protein interactions pairs, but we recommend (as is the case of 
other sub-tasks) using the files released 2005 and 2006.  All the articles contained in these databases were 
manually reviewed for whether they contain protein interactions together with the interaction detection 
methods employed. 
These articles were used to extract manually the protein interactions mentioned, linking each interacting 
protein to its corresponding unique UniProt ID (or accession number) and providing the identifier of the described 
interaction detection method. We also recommend not using articles of very large scale experiments (i.e. more 
than 20-30 interactions), because they are not used in the test set, and could bias your results. Thus ideally 
articles with less than 21 interaction pairs should be used for training. 
We recommend that you check carefully the 'interaction detection method' node (MI:0001) and its child nodes of 
the MI ontology.


7 Test data 
The interaction databases MINT and IntAct are holding back a set of curated records to produce the test set 
for the BioCreAtIvE contest. Both are doing a considerable annotation effort to produce the test and training 
data collection. A total of around 300 publications expected to be part of this test set collection. These 
articles will be provided to the participants in pdf, plain text (converted automatically from pdf to text 
using pdftotext) and some of them in html format. Note that the test set of this task will be released after 
the due of subtask 1 (detection of protein interaction curation relevant articles).


8 Data Selection

Note that there is no specific restriction regarding the type of interaction detection method. You should not 
confuse  interaction detection method (which should be predicted by the participating systems) and participant 
identification method (which will not be part of this task). The interaction detection method is used to detect
the actual interaction while the participant identification method is used to identify specifically each of the
interactor proteins.
Note that a given interaction might be confirmed by several protein interaction detection methods. Not all the 
proteins mentioned in a given article are usually studied by all the mentioned protein interaction detection 
methods.


9 Submission format
Each run of predictions has to be provided as a single file with xml-like format, containing all the submitted 
interaction detection methods for the interaction pairs extracted from an article.

A sample prediction entry is shown below: 


<ENTRY>
<PPI_SUB_TASK_ID> BC2_PPI_IMS </PPI_SUB_TASK_ID>
<TEAM_ID> T1_BC2_PPI </TEAM_ID>
<RUN_NR> 1 </RUN_NR>
<PMID> 10924507 </PMID>
<INTERACTION_PAIR>
<INTERACTOR_1> Q08211 </INTERACTOR_1>
<INTERACTOR_2> Q9UBU9 </INTERACTOR_2>
</INTERACTION_PAIR>
<INT_DET_METHOD>
<INT_DET_METHOD_ID> MI:0004 </INT_DET_METHOD_ID>
<RANK> 1 </RANK>
</INT_DET_METHOD>
</ENTRY>

Where:
1) ENTRY: corresponds to a single evidence passage prediction 
2) PPI_SUB_TASK_ID: The identifier of the interaction sentence sub-task, i.e.   BC2_PPI_IMS
3) TEAM_ID: the identifier of the team (as provided to each participating team)
4) RUN_NR: the number of the submission run (maximum of three runs)
5) PMID: corresponds to the PubMed identifier of the article
6) INTERACTOR_1 : corresponds to the UniProt ID (or accession number) of the interactor protein 1
7) INTERACTOR_2: corresponds to the UniProt ID (or accession number) of the interactor protein 2
 (maximum 3 sentences).
8) INT_DET_METHOD_ID: MI identifier of the interaction detection method
9) RANK: rank of the interaction detection method for a given interaction pair


Be sure that your prediction is compliant with this simple output format.



10 Number of runs
For this sub-task, each participating team can submit up to three runs .



11 Useful Links
1) IntAct
2) MINT

3) MI ontology
4) MI ontology browser
5) PSI-MI 2.5 format
6) UniProt
6) UniProt download




12 Training data release
People who intend to participate at the protein-protein interaction (PPI) task of the 
second BioCreAtIvE challenge should send the following information:

1) Team contact e-mail (one per team).
2) Tentative list of participant team members (name and e-mail).
3) Institutions.

to: mkrallinger@cnio.es

Last update of this page: 05 September 2006

[up][home]