BioCreAtIvE - Critical Assessment for Information Extraction in Biology
Home - CNIO - MITRE - NCBI - Organization - News - Contact
































IAS test set prediction check list


1 TEST DATA DOWNLOAD:  
First of all make sure you can download properly the test data. The download 
FTP link will be send to the provided team contact e-mail address.
 
 


2 TEST DATA CONTENT: 
1) bc2_ppi_ias_abstract_test.txt
The actual test data file containing the test data entries, in a format compatible with the training set.

2) bc2_ppi_ias_test_readme_v1.txt
The readme file of this sub-task

3) bc2_ppi_ias_pmid_test.txt
The PMIDs of the IAS test set

4) formatcheck_bc2_ppi_ias_V01.py
The submission format check script:

Example input for team 60, run 1:
./formatcheck_bc2_ppi_ias_V01.py --t BC2_PPI_IAS_T60_BC2_PPI_1_T --f BC2_PPI_IAS_T60_BC2_PPI_1_F --i bc2_ppi_ias_pmid_test.txt

Where:
--t refers to the entries predicted as PPI relevant
--f refers to the entries predicted as not PPI relevant
--i refers to the test set PMID check file, i.e. bc2_ppi_ias_pmid_test.txt



 

3 TEAM IDENTIFIER:   
Make sure that you know your team identifier and team contact e-mail. You will 
need the team identifier for your predictions.



 

4 NUMBER OF RUNS:
After running your system  and making the predictions make sure that you do 
not exceed the number of runs you submit. A total of three runs can be 
submitted by each team.



5 PREDICTION RUN FILES:
A single run consists of two files, the file containing the ranked list of TP 
predictions and the file containing the ranked list of TN predictions. In the 
first case the predictions are ranked by physical protein interaction curation 
relevance, while in the second case they are ranked by non-relevance (i.e. 
should not be useful to derive protein interaction curation).




6 NAMING OF THE RUN FILES: 
In order to identify unambiguously your prediction runs please follow the 
naming convention we propose for this sub-task. The names of the prediction 
run files have the following root name BC2_PPI_IAS_ , followed by your team 
identifier, underscore, run number, underscore 'T' or 'F' for the entries 
predicted as relevant and non-relevant respectively.

A sample prediction consisting in three runs of team 60 would thus consist in:

BC2_PPI_IAS_T60_BC2_PPI_1_T
BC2_PPI_IAS_T60_BC2_PPI_1_F
BC2_PPI_IAS_T60_BC2_PPI_2_T
BC2_PPI_IAS_T60_BC2_PPI_2_F
BC2_PPI_IAS_T60_BC2_PPI_3_T
BC2_PPI_IAS_T60_BC2_PPI_3_F




7 COMPLETENESS OF PREDICTIONS 
Note that you have to submit predictions for all the entries in the test set, 
otherwise results are not comparable. This means that you can not just make 
predictions for a certain subset of test set cases, but need to provide a 
prediction for all of them. 



8 PREDICTION RUN FORMAT
Make sure that you follow the prediction format, consisting of tabulator 
separated columns containing the following information:

team_id       run_id        sub_task_id   type  rank  pmid

where 
team_id:      corresponds to the assigned team identifier (provided to each team), e.g.  T60_BC2_PPI
run_id:       corresponds to the run id (max. of three runs per team), e.g. 1 or 2 or 3
sub_task_id:  the identifier of this subtask, i.e. 'BC2_PPI_IAS'
type:         prediction of relevance for protein-protein interaction: 'T' or 'F'
rank:         corresponds to the rank of the prediction, must start with 1
pmid:         the PubMed identifier of the prediction.

Given the exhaustive journal curation strategy used by MINT and IntAct, 
there should be no bias of initial article selection. Note that these 
databases are not organism specific, so they curate proteins from a number 
of  model organisms. 




9 SUBMISSION DATES:
October 8 2006:  Release of test data for Protein Protein Interaction IAS sub-task 
October 13 2006: Results of Protein-Protein Interaction Task IAS sub-task due
 
The results due date is not subjected to a specific time zone. The time zone/s of the
country provided in the e-mail by the participants when they registered will be considered.
This way we assure that people have no advantages related to their time zone.  



10 SYSTEM DESCRIPTION
In order to assess and compare the relative performance of different approaches, we require the teams to provide a short system description and to submit a short description questionnaire, which should be provided before October 31. If you are not able to deliver the description in time or want part of your system description to be anonymous contact the PPI task organizers: mkrallinger@cnio.es
The system description should be around 800 words. 

11 SUBMISSION PROCESS:
The submissions should be send by e-mail to the following two e-mail addresses:
mkrallinger@cnio.es
biocreative-ppi-sub-2006@lists.source.net

as attachments. The submission e-mail should specify the names of the files included in the attachment,
the sub-task ID as well as the team ID.

1. Attached prediction files: ...
2. Team ID: ....
3. Task ID: BC2_PPI_IAS
4. Number of files in the attachment: ....
5: IAS baseline questionnaire:

1- Did you use additional training data in addition to the provided one? (Y/N)
2- Did you use the additional noisy training data of TP abstracts ? (Y/N)
3- Did  you use machine learning (ML)  approaches?  (Y/N)
4- In case you used ML techniques, which ones did you use ? (Just short method names)
5- Did you use protein name tagging for your strategy?
6- Did you use NLP technique components (e.g. POS tagging, stemming, shallow parsing)? (Y/N)
7- In case you used NLP components, which ones ? (Only short list of names)
8- Did you use Bio-NLP components (i.e. NLP tools adapted to the biomedical literature, such as MedPost)? (Y/N)
9- Did you use external lexical resources, such as dictionaries or ontologies? (Y/N)
10- Did you do processing using sentence units ? (Y/N)
11- Did you do processing using whole abstracts as units? (Y/N)
12- Did you use regular expression or pattern matching strategies? (Y/N)

By submitting results, the groups agree to have their submissions made public 
in an anonymous form at the end of the evaluation. 




12 UNABLE TO RETURN RESULTS: 
If you are unable to return results for the test set, please send a short note 
justifying the main reasons for not submitting predictions to: 
mkrallinger@cnio.es




Last time schedule update: 06 October 2006.


[
up][home]

© by Martin Krallinger 2006