1 TEST DATA DOWNLOAD:
First of all make sure you can download properly the test data. The download
FTP link will be send to the provided team contact e-mail address.
2 TEST DATA CONTENT:
1) bc2_ppi_ias_abstract_test.txt
The actual test data file containing the test data entries, in a format compatible with the training set.
2) bc2_ppi_ias_test_readme_v1.txt
The readme file of this sub-task
3) bc2_ppi_ias_pmid_test.txt
The PMIDs of the IAS test set
4) formatcheck_bc2_ppi_ias_V01.py
The submission format check script:
Example input for team 60, run 1:
./formatcheck_bc2_ppi_ias_V01.py --t BC2_PPI_IAS_T60_BC2_PPI_1_T --f BC2_PPI_IAS_T60_BC2_PPI_1_F --i bc2_ppi_ias_pmid_test.txt
Where:
--t refers to the entries predicted as PPI relevant
--f refers to the entries predicted as not PPI relevant
--i refers to the test set PMID check file, i.e. bc2_ppi_ias_pmid_test.txt
3 TEAM IDENTIFIER:
Make sure that you know your team identifier and team contact e-mail. You will
need the team identifier for your predictions.
4 NUMBER OF RUNS:
After running your system and making the predictions make sure that you do
not exceed the number of runs you submit. A total of three runs can be
submitted by each team.
5 PREDICTION RUN FILES:
A single run consists of two files, the file containing the ranked list of TP
predictions and the file containing the ranked list of TN predictions. In the
first case the predictions are ranked by physical protein interaction curation
relevance, while in the second case they are ranked by non-relevance (i.e.
should not be useful to derive protein interaction curation).
6 NAMING OF THE RUN FILES:
In order to identify unambiguously your prediction runs please follow the
naming convention we propose for this sub-task. The names of the prediction
run files have the following root name BC2_PPI_IAS_ , followed by your team
identifier, underscore, run number, underscore 'T' or 'F' for the entries
predicted as relevant and non-relevant respectively.
A sample prediction consisting in three runs of team 60 would thus consist in:
BC2_PPI_IAS_T60_BC2_PPI_1_T
BC2_PPI_IAS_T60_BC2_PPI_1_F
BC2_PPI_IAS_T60_BC2_PPI_2_T
BC2_PPI_IAS_T60_BC2_PPI_2_F
BC2_PPI_IAS_T60_BC2_PPI_3_T
BC2_PPI_IAS_T60_BC2_PPI_3_F
7 COMPLETENESS OF PREDICTIONS
Note that you have to submit predictions for all the entries in the test set,
otherwise results are not comparable. This means that you can not just make
predictions for a certain subset of test set cases, but need to provide a
prediction for all of them.
8 PREDICTION RUN FORMAT
Make sure that you follow the prediction format, consisting of tabulator
separated columns containing the following information:
team_id run_id sub_task_id type rank pmid
where
team_id: corresponds to the assigned team identifier (provided to each team), e.g. T60_BC2_PPI
run_id: corresponds to the run id (max. of three runs per team), e.g. 1 or 2 or 3
sub_task_id: the identifier of this subtask, i.e. 'BC2_PPI_IAS'
type: prediction of relevance for protein-protein interaction: 'T' or 'F'
rank: corresponds to the rank of the prediction, must start with 1
pmid: the PubMed identifier of the prediction.
Given the exhaustive journal curation strategy used by MINT and IntAct,
there should be no bias of initial article selection. Note that these
databases are not organism specific, so they curate proteins from a number
of model organisms.
9 SUBMISSION DATES:
October 8 2006: Release of test data for Protein Protein Interaction IAS sub-task
October 13 2006: Results of Protein-Protein Interaction Task IAS sub-task due
The results due date is not subjected to a specific time zone. The time zone/s of the
country provided in the e-mail by the participants when they registered will be considered.
This way we assure that people have no advantages related to their time zone.
10 SYSTEM DESCRIPTION
In order to assess and compare the relative performance of different approaches, we require the teams to provide a short system description and to submit a short description questionnaire, which should be provided before October 31. If you are not able to deliver the description in time or want part of your system description to be anonymous contact the PPI task organizers: mkrallinger@cnio.es
The system description should be around 800 words.
11 SUBMISSION PROCESS:
The submissions should be send by e-mail to the following two e-mail addresses:
mkrallinger@cnio.es
biocreative-ppi-sub-2006@lists.source.net
as attachments. The submission e-mail should specify the names of the files included in the attachment,
the sub-task ID as well as the team ID.
1. Attached prediction files: ...
2. Team ID: ....
3. Task ID: BC2_PPI_IAS
4. Number of files in the attachment: ....
5: IAS baseline questionnaire:
1- Did you use additional training data in addition to the provided one? (Y/N)
2- Did you use the additional noisy training data of TP abstracts ? (Y/N)
3- Did you use machine learning (ML) approaches? (Y/N)
4- In case you used ML techniques, which ones did you use ? (Just short method names)
5- Did you use protein name tagging for your strategy?
6- Did you use NLP technique components (e.g. POS tagging, stemming, shallow parsing)? (Y/N)
7- In case you used NLP components, which ones ? (Only short list of names)
8- Did you use Bio-NLP components (i.e. NLP tools adapted to the biomedical literature, such as MedPost)? (Y/N)
9- Did you use external lexical resources, such as dictionaries or ontologies? (Y/N)
10- Did you do processing using sentence units ? (Y/N)
11- Did you do processing using whole abstracts as units? (Y/N)
12- Did you use regular expression or pattern matching strategies? (Y/N)
By submitting results, the groups agree to have their submissions made public
in an anonymous form at the end of the evaluation.
12 UNABLE TO RETURN RESULTS:
If you are unable to return results for the test set, please send a short note
justifying the main reasons for not submitting predictions to:
mkrallinger@cnio.es
Last time schedule update: 06 October 2006.
[up][home]