Large, complex datasets

Reliable and reproducible analysis of large, complex datasets

Theme Translational data science

With this workstream, we use machine learning and other state-of-the-art methods to analyse large-scale linked electronic health records to inform translational research. Translational research takes the results of early stage research and applies them to humans.

Complex data with a lot of dimensions, such as digital images, can’t be efficiently analysed using standard methods. Machine learning is being rapidly adopted to analyse medical imaging, but it lacks suitably labelled data for this purpose. This leads to poor accuracy and results that can’t be reproduced.

Automated, scalable methods can overcome issues such as missing data, misclassification and confounding factors. All these issues can bias analysis, giving misleading results.

We are developing state-of-the-art methods to address bias in machine learning, alongside a large, labelled data set of images for evaluating machine learning.

We are also developing training in machine learning, including consideration of ethical issues. This work will benefit all the Bristol BRC themes.

View all research projects

Improving decisions on what to focus on in research using large datasets

Research using de-personalised data from electronic health records is increasingly common.  Electronic…

Theme Translational data science

Workstream Large, complex datasets

Exploring inflammation as a driver for post-operative complications

One in seven patients develop a serious medical problem after surgery. These types of complications…

Theme Translational data science

Workstream Omics for prediction and prognosis

Investigating new approaches to drug development using human genetics

Developing new drugs is an important part of improving our ability to treat disease. To…

Theme Translational data science

Workstream Genetic evidence to prioritise intervention

Exploring the link between obesity and heart failure using genetics

Heart failure is a condition that develops when the heart unable to pump blood around…

Theme Translational data science

Workstream Genetic evidence to prioritise intervention

Lung development in early life and respiratory diagnosis and treatment

More than 17 per cent of deaths around the world can be attributed to respiratory…

Themes Respiratory disease Translational data science

Workstream Exacerbation prediction and aerosol emissions

Can we use DNA methylation to predict disease in diverse populations?

DNA methylation is a process during which methyl groups become attached to parts of our…

Theme Translational data science

Workstream Omics for prediction and prognosis

South West Secure Data Environment

Secure Data Environments (SDEs) are online platforms for analysing health and social care data for…

Theme Translational data science

Workstream Clinical informatics platforms

Preventing cardiovascular events in stroke patients

Having a stroke means you are more likely to experience a subsequent cardiovascular event. Cardiovascular…

Theme Translational data science

Workstream Genetic evidence to prioritise intervention

Exploring the link between genes and cognitive decline

Cognitive decline in older adults refers to the difficulties someone may experience later in life…

Theme Translational data science

Workstream Genetic evidence to prioritise intervention

Do ethnicity and coexisting health conditions impact high-risk diabetes?

About a third of people diagnosed with type 2 diabetes have very high blood sugar…

Theme Translational data science

Workstreams Clinical informatics platforms Large, complex datasets

Handling missing data in large electronic healthcare record datasets

Electronic healthcare records (EHRs) are created when healthcare professionals record information about the health of…

Theme Translational data science

Workstream Large, complex datasets

Biomarkers for screening and diagnosing lung cancer

In the UK, only 15 per cent of people diagnosed with lung cancer will still…

Theme Translational data science

Workstream Omics for prediction and prognosis

Treatment resistance and drug side effects in schizophrenia

Schizophrenia is a mental health condition where people may see, hear or believe things that…

Theme Translational data science

Workstream Genetic evidence to prioritise intervention

Exploring how obesity influences cancer survival

Evidence from different studies suggests that obesity or body mass index (BMI) might play a…

Theme Translational data science

Workstream Genetic evidence to prioritise intervention

Using biomarkers and machine learning to predict antidepressant resistance

Around half of patients with depression don’t improve after taking antidepressants. Clinicians need to…

Theme Translational data science

Workstream Omics for prediction and prognosis

Can DNA methylation biomarkers predict whether pleural effusion is caused by cancer?

Pleural effusion, where fluid builds up in the cavity around the lungs, can develop…

Theme Translational data science

Workstream Omics for prediction and prognosis

Using DNA methylation biomarkers to understand Parkinson’s disease severity and progression

The Biogen Tel Aviv Parkinson Project (BeatPD) looks in-depth at clinical and genetic information…

Theme Translational data science

Workstream Omics for prediction and prognosis

Data driven approaches to drug target prioritisation

Despite more money going towards developing drugs, the success rate of getting new drugs to…

Theme Translational data science

Workstream Genetic evidence to prioritise intervention