1. Am I interpreting the PPI-IAS task correctly in that two separate
ranked lists, one for predicted positives and one for predicted negatives
will be required and evaluated?
You are right in your assumption. To separated lists, one for predicted
positives and one for predicted negatives will be required.
2. Do you have any further information on why you are requiring ranked negatives instead
of just the more usual positives?
The reason why we also want in addition to the usual ranked positive list a ranked
list of negatives is due to the curation procedure followed by the databases in for
this task. I will explain you the background information related to this decision.
There are two basic curation strategies followed by database annotators in biology (although
there might be exceptions and hybrid approaches):
Approach A) Biologist takes the 'whole' PubMed of a large collection of journals for detecting
annotation relevant articles, e.g. often using keyword searches,..
Approach B) Biologist does exhaustive journal curation, meaning that they curate and check each/all
the articles published by a given journal (usually during a certain period).
In the case of approach B, followed by the MINT and IntAct curators for this contest it is often
even of more practical relevance to know which articles do certainly not contain annotation relevant
information and which should not be check then knowing the most relevant ones.
3. Should the results be returned all in one file, or two separate files, one for positive
predictions and one for negatives? If all in one file, then many of the rank indices will
have two entries, one for the "T" type and one for "F".
They should be returned in separate files, I think this would be more adequate. The naming convention
for the predictions will be announced as well.
4. I've been experimenting with the IAS dataset and I have a concern. It appears that there is a
publication date bias between the positive and negative training data. Many of the articles in the
positive set were published in 1995, 1998, and 1999, and most of the articles in the negative set
were published in 2005 and 2006.
You are absolutely right, and we are aware about this. (An 'ideal' training set would actually take
the publication dates into account to consider also effects of word usage changes depending on time and
having the articles balanced in this sense).
5. Just looking at the string "1999" as a single predictive feature, this identifies 340 positives
correctly with no false positives (P=1.0, R=0.0962, F1=0.1754). The strings "1995" and "1998"
are also highly predictive. Furthermore, there may be an issue with journal bias, as "embo" and
"biol" are also highly predictive strings. As it is, the inclusion of journal and data based
features leads to very high ROC scores in my cross validations. Has any one else reported this?
Will the test data reflect the same proportions?
Note that people should definitively NOT rely on the publication dates as well as the Journal names
as features for their classifier, because of two reasons:
1) the resulting classifier would not be useful in real world applications
2) For the test set we want to have it to be balanced taking into account both the publication dates
as well as the journal names. This means that both dates as well as journal names would not be
a discriminative feature.
3) The predictions should be based on titles and abstracts, not on the journals or dates
4) The test data will be derived from journals already included in the training collection,
but the proportion of the journals will be different, and also the publication dates will
be partly different.
6. Can you confirm that the test data will be anonymized as far as actual pmid, journal, and publication
date? That would make it completely clear and enforce that these things are not appropriate features for the
task.
Yes, the test set will be anonymized (in terms of journal, pmid and date), the actual fields will be
tagged with NONE instead, and the article will be provided with an randomly generated article identifier
instead of the PMID.
7. Are there cases in the test set where the abstract at first sight is interaction relevant but
in the full text article the interactions which were talked about did not actually appear in the rest
of the paper?
The annotators encountered some, very few cases of this misleading abstracts. They were excluded form
the final test set.
8. Are there cases in the test set where the abstract at first sight is interaction relevant
but in the full text article the interactions which were talked about did not actually appear
in the rest of the paper?
The annotators encountered some (very few cases of this misleading abstracts) and they where
excluded form the final test set.
Last update of this page: 21 September 2006
[up][home]