Modelling and inference in single-cell RNA sequencing

Dr Aden Forrow
Research Fellow 2018

Dr Aden Forrow
University of Oxford

All of one individual’s cells share the same DNA, but they perform very different functions. The differences lie in gene expression, the translation of DNA through RNA to functional proteins. The RNA present in a cell is therefore intimately connected to the cell’s behaviour. Counts of RNA molecules determine what a cell is doing, what other cells it resembles, and what those cells might do in the future. These RNA measurements were first done in bulk by extracting RNA from a tissue, which gives an average of the expression levels across the sample. More recent techniques allow this to be done at a single cell level, producing counts of individual RNA molecules within individual cells.

Such single cell analysis enables precise, fine-tuned distinctions between cell types within the same tissue, with extensive applications across biology and medicine.  Gene expression signatures can, for example, be used to identify pluripotent stem cells or malignant tumour cells and their precursors. Understanding the signs of cancer would lead to earlier, more accurate diagnosis and better treatment. However, despite significant progress in developing methods for single-cell RNA sequencing, neither the underlying experimental procedures nor the required inference methods are sufficiently well understood. I propose to work on both problems in parallel.

"Solving these problems will allow researchers to make stronger conclusions with greater confidence and accuracy"

My first goal is a deep understanding of the possible experimental protocols, which will enable careful optimization for particular goals. In particular, I will study the sources of measurement error by building models to replicate experimental protocols and then use those models to suggest optimal experimental parameters. This mathematical work will build on my current collaboration with the group of Professor Alex Shalek (MIT, Broad Institute), which is perfecting state-of-the-art methods for high-throughput single-cell sequencing. The second problem involves tailoring existing methods from statistics and machine learning to the particular challenges of single-cell RNA sequencing, including the high variability in measurement accuracy and biologists’ desire for clear, mechanistic understandings of new discoveries. Solving these mathematical and algorithmic problems will allow researchers in the field to make stronger conclusions with greater confidence and accuracy.