Background

Drug candidates are comprehensively and thoroughly tested for their safety profiles before they enter clinical trials. Gene expression profiling with omics technologies, often applied in combination with cell-based assays or animal tests, has contributed significantly to our understanding of safety-relevant findings of drug candidates.

In this contest, we focus on two types of data that are often encountered in preclinical research: drug-induced gene expression data and pathology. The goal of the contest is to create algorithms and software that (1) best predict pathology findings given gene expression profiles, (2) mostly deepen our understanding of molecular mechanisms underlying the pathology findings.


Format of the contest

Participants of the contest are provided with a selected subset of the TG-GATEs database, including drug-response gene expression data and pathology records. Before the contest starts, an introduction to the database and data format will be given.

The contest is divided into 3 stages.

  1. In the first stage, the participants will design machine-learning tools that can predict pathology records with drug-response gene expression data. This stage can be called the ‘black-box stage’, because the tool can be a black-box in the sense that it depends on pure statistics and does not have to help us better understand the pathology.

  2. In the second stage, the participants are encouraged to explore the meaning of the gene expression data, by tools such as gene-set enrichment analysis and co-expression analysis, and to implement machine-learning strategies based on the statistics derived from such analysis with the aim of deriving explainable models of prediction. This stage can be called the ‘unboxing stage’, because the aim is to get better understanding of the mechanisms underlying pathologies by alternative ways of feature selection/extraction based on either bioinformatics-specific or general techniques.

  3. In the third, final, and optional stage, the participants are encouraged to collect publicly available information on the drugs (such as the information in the ChEMBL and the DrugBank database, see reference) and improve the machine-learning tools produced in the first stage with these additional information. This stage can be called the ‘making-a-bigger-box stage’, because the aim is to improve the prediction performance using additional information besides gene expression.


Evaluation of the results

A total of 50 points will be given.


Future publication of the results

We want to make the participants aware that we would like to keep the right to publish and discuss the results of this contest, either completely or partially, in a future manuscript for peer-reviewed journals. Participants contributing to the published results will be invited as co-authors of the publication.


References

Igarashi, Yoshinobu, Noriyuki Nakatsu, Tomoya Yamashita, Atsushi Ono, Yasuo Ohno, Tetsuro Urushidani, and Hiroshi Yamada. “Open TG-GATEs: A Large-Scale Toxicogenomics Database.” Nucleic Acids Research 43, no. D1 (January 28, 2015): D921–27.

Zhang, J. D., N. Berntenis, A. Roth, and M. Ebeling. “Data Mining Reveals a Network of Early-Response Genes as a Consensus Signature of Drug-Induced in Vitro and in Vivo Toxicity.” The Pharmacogenomics Journal 14, no. 3 (June 2014): 208–16.

Sutherland, Jeffrey J., Robert A. Jolly, Keith M. Goldstein, and James L. Stevens. “Assessing Concordance of Drug-Induced Transcriptional Response in Rodent Liver and Cultured Hepatocytes.” PLOS Computational Biology 12, no. 3 (March 30, 2016): e1004847.

Thiel, Christoph, Henrik Cordes, Lorenzo Fabbri, Hélène Eloise Aschmann, Vanessa Baier, Ines Smit, Francis Atkinson, Lars Mathias Blank, and Lars Kuepfer. “A Comparative Analysis of Drug-Induced Hepatotoxicity in Clinically Relevant Situations.” PLOS Computational Biology 13, no. 2 (February 2, 2017): e1005280.

The ChEMBL database

The DrugBank database (freely available for academic use)


Copyright © 2018 F.Hoffmann-La Roche Ltd-All rights reserved.