Large, complex datasets

Reliable and reproducible analysis of large, complex datasets

With this workstream, we use machine learning and other state-of-the-art methods to analyse large-scale linked electronic health records to inform translational research. Translational research takes the results of early stage research and applies them to humans.

Complex data with a lot of dimensions, such as digital images, can’t be efficiently analysed using standard methods. Machine learning is being rapidly adopted to analyse medical imaging, but it lacks suitably labelled data for this purpose. This leads to poor accuracy and results that can’t be reproduced.

Automated, scalable methods can overcome issues such as missing data, misclassification and confounding factors. All these issues can bias analysis, giving misleading results.

We are developing state-of-the-art methods to address bias in machine learning, alongside a large, labelled data set of images for evaluating machine learning.

We are also developing training in machine learning, including consideration of ethical issues. This work will benefit all the Bristol BRC themes.

Large, complex datasets

Predicting mental illness risk using health records

Guidelines for using multiple imputation without bias in studies

Improving decisions on what to focus on in research using large datasets

Combining data and AI to predict heart problems following Covid

Do ethnicity and coexisting health conditions impact high-risk diabetes?

Handling missing data in large electronic healthcare record datasets