Drug candidates are comprehensively and thoroughly tested for their safety profiles before they enter clinical trials. Gene expression profiling with omics technologies, often applied in combination with cell-based assays or animal tests, has contributed significantly to our understanding of safety-relevant findings of drug candidates.
In this contest, we focus on two types of data that are often encountered in preclinical research: drug-induced gene expression data and pathology. The goal of the contest is to create algorithms and software that (1) best predict pathology findings given gene expression profiles, (2) mostly deepen our understanding of molecular mechanisms underlying the pathology findings.
Participants of the contest are provided with a selected subset of the TG-GATEs database, including drug-response gene expression data and pathology records. Before the contest starts, an introduction to the database and data format will be given.
The contest is divided into 3 stages.
In the first stage, the participants will design machine-learning tools that can predict pathology records with drug-response gene expression data. This stage can be called the ‘black-box stage’, because the tool can be a black-box in the sense that it depends on pure statistics and does not have to help us better understand the pathology.
In the second stage, the participants are encouraged to explore the meaning of the gene expression data, by tools such as gene-set enrichment analysis and co-expression analysis, and to implement machine-learning strategies based on the statistics derived from such analysis with the aim of deriving explainable models of prediction. This stage can be called the ‘unboxing stage’, because the aim is to get better understanding of the mechanisms underlying pathologies by alternative ways of feature selection/extraction based on either bioinformatics-specific or general techniques.
In the third, final, and optional stage, the participants are encouraged to collect publicly available information on the drugs (such as the information in the ChEMBL and the DrugBank database, see reference) and improve the machine-learning tools produced in the first stage with these additional information. This stage can be called the ‘making-a-bigger-box stage’, because the aim is to improve the prediction performance using additional information besides gene expression.
A total of 50 points will be given.
For the first stage, a maximum of 15 points will be given. Tools are judged by generalised performance measures (e.g. F1 score, Precision, Recall, etc.).
For the second stage, a maximum of 15 points will be given. Tools are judged both by good generalised performance measures and by the ability to shed light on the molecular mechanism of pathology.
For the third, optional stage, a maximum of 10 points will be given. Tools are judged by good generalised performance measures and by the extra information used.
A maximal of 10 points will be given to the presentations.
We want to make the participants aware that we would like to keep the right to publish and discuss the results of this contest, either completely or partially, in a future manuscript for peer-reviewed journals. Participants contributing to the published results will be invited as co-authors of the publication.
Copyright © 2018 F.Hoffmann-La Roche Ltd-All rights reserved.