The MR Data Challenge 2019

The challenge

The MR Data Challenge will engage conference participants in exploring and developing innovative approaches to causal inference using an example data set. Data Challenge participants are asked to use all or part of the data set to illustrate new methodology and to compare or explain existing methods as part of an oral or poster presentation. Participants can be individuals or teams of (not more than four) individuals.

The example data set is a rich source of genetic association data published in the research paper:

Kettunen et al “Genome-wide study for circulating metabolites identifies 62 loci and reveals novel systemic effects of LPA” Nature Communications (2016) 7:11122.

Participants can use all or part of the data set. At a glance, these data comprise information on 150 SNPs and their association with:

  • 118 lipid fraction traits (including HDL  and LDL cholesterol)
  • 7 health outcomes (including type II diabetes and stroke)

Submitting to the data challenge

The MRChallenge2019 R package providing all data and documentation can be downloaded from here.

Participants are asked to submit the  knitted output  from an R Markdown .Rmd file (in either  docx, pdf, html, or .nb.html format) containing the following sections:

  1. Participants: Name(s) and affiliation(s) of entrant(s)
  2. Motivation: what is your research question? For example, ‘Which lipid fraction(s) are the most important drivers of T2D risk, and what are their estimated causal effects?’
  3. Data: what genetic, exposure and outcome data have you used?
  4. Analysis methods: A non-technical description of the methodological approach taken in your analysis.
  5. Results: Answers to your research question, including tabular and graphical summaries.
  6. Technical appendix: Further technical details underpinning the approach taken.
  7. Software: R code used to run the analysis.
  8. References: References and links to additional technical reports of relevance to the work.

Please submit your entry to by the closing date of 7th June 2019. Entrants must be registered conference attendees. There are no other entry requirements or costs.

If you have any questions about the data challenge, please contact Jack Bowden.

Data challenge conference session

A plenary session on Friday 19th July within the Mendelian randomization conference will showcase the submitted analyses. A key aim of the session will be to bring together methodologists and statisticians with experts from epidemiology, medical and biological sciences, to comment and debate the results. The session will include presentations from subject matter experts on state-of-the-art lipids research; an overview of all analyses attempted at a meta-level; quickfire presentations from individuals and teams on their analysis; debate on the strengths and limitations of different methodological approaches.


Awards will be made under the following categories:

  • Best report
  • Best oral presentation
  • Best data visualization
  • Best analysis tool


  • Jack Bowden (MRC Integrative Epidemiology Unit)
  • Wes Spiller (MRC Integrative Epidemiology Unit)
  • Verena Zuber (MRC Biostatistics Unit and Imperial College)
  • Gibran Hemani (MRC Integrative Epidemiology Unit)
  • Chin Yang Shapland (MRC Integrative Epidemiology Unit)
  • Eleanor Sanderson (MRC Integrative Epidemiology Unit)
  • Mika Ala-Korpela (Baker Heart and Diabetes Institute)