Developing a reproducible data pipeline using stroke data

Theme Translational data science

Workstream Clinical informatics platforms

Status: This project is ongoing

Routinely collected health and care data is a vital tool in research. Data pipelines allow data to securely move from health and care organisations to Secure Data Environments, allowing trusted researchers to safely access data. 

Project aims

We want to develop a ‘blueprint’ data pipeline that could be re-used by NHS trusts for many projects. 

The pipeline will allow different types of data, from medical images to treatment codes, for a range of conditions, to move from health and care institutions into a Secure Data Environment. Pipelines can automatically quality-check and depersonalise data and update as new data becomes available. 

We will focus on an existing pipeline of stroke related medical images from North Bristol NHS Trust which are then stored in the University of Bristol’s research data storage facility (RDSF). The data is used to train and test machine learning methods to help clinicians make decisions about patient care. 

This pipeline will be the basis for our blueprint. 

What we hope to achieve

By developing a blueprint data pipeline, we hope to streamline the processes for research data. This ultimately could enable research to happen much more quickly and efficiently.