Large, complex datasets

Reliable and reproducible analysis of large, complex datasets

Theme Translational data science

With this workstream, we use machine learning and other state-of-the-art methods to analyse large-scale linked electronic health records to inform translational research. Translational research takes the results of early stage research and applies them to humans.

Complex data with a lot of dimensions, such as digital images, can’t be efficiently analysed using standard methods. Machine learning is being rapidly adopted to analyse medical imaging, but it lacks suitably labelled data for this purpose. This leads to poor accuracy and results that can’t be reproduced.

Automated, scalable methods can overcome issues such as missing data, misclassification and confounding factors. All these issues can bias analysis, giving misleading results.

We are developing state-of-the-art methods to address bias in machine learning, alongside a large, labelled data set of images for evaluating machine learning.

We are also developing training in machine learning, including consideration of ethical issues. This work will benefit all the Bristol BRC themes.