Barts Health Datasets
Barts Health NHS Trust makes several sets of health data available to external organisations through the Barts Health Data Platform (BHDP). As a rule, this data will stay in the secure data environment (SDE) of the BHDP, unless we can be certain the data can be made anonymous. Any data which leaves the SDE are strictly monitored and controlled, and this may not be allowed. Different types of data are described below:
Structured Clinical Data
This type of dataset includes coded information about a patient’s diagnosis, treatments, procedures,
medicines, test results, and measurements like blood pressure, pulse rate, body temperature, and oxygen rate.
Barts Health collects around 2 million new data points
every day and covers a population of close to 3
million patients which is over 40% non-white.
Free text
This covers all documents for patients in Barts Health including scanned letters and reports which are available for analysis within the BHDP. Without the appropriate approvals or consent, identifying patient names and information will be removed, and the resulting data can only be analysed within the BHDP. With over 170 million documents already in our records and hundreds of thousands more being added each day, this provides a unique resource for analysis.
OMOP clinical data
This is a type of Structured Clinical dataset which has been standardised into the OMOP form. This makes it easier to analyse data across multiple hospitals. Over 60,000 distinct concepts have been standardised in OMOP.
Imaging
Over 13 million medical images (DICOM) are available from radiology (X-rays and MRIs) with reports explaining what the images are showing. The patient's name and identifiable information are automatically removed from the images and reports. Digital pathology with reports and findings is expected to be added soon.
Maternity
There are over 160,000 births linking both the mother and baby’s health record. This also includes data from the Neonatal Intensive Care Unit. Patient names and identifiable information are removed before it is used for analysis inside the BHDP.
Geographic
All addresses over the course of patient care are mapped to geographical areas (LSOA and UPRN) and linked with deprivation (poverty levels) and air quality data. This allows tracking patient data over many years and compares it with data on up to 15 different pollutants to see how differently these things affect a person's health.
Blood Counts
There are over 160,000 births linking both the mother and baby’s health record. This also includes data from the Neonatal Intensive Care Unit. Patient names and identifiable information are removed before it is used for analysis inside the BHDP.
| Data Set | Identified | Anonymous |
|---|---|---|
| Structured Clinical Data | Yes | Yes |
| Free text | Yes | Yes (if in BHDP) |
| OMOP Clinical Data | Yes | Yes |
| Imaging | Yes | Yes |
| Maternity | Yes | Yes |
Ethical approvals
All requests to access data from Barts Health must go through an application and review process. The
review process is managed by the Barts Health Data Access Committee, which includes members from the
technical, commercial, clinical, research, nursing, public contributors and information and research
governance teams.
When access to data is granted, it will be limited to the minimum amount
necessary for the request. If the request includes a large amount of data or data with sensitive
information, a more detailed review will be needed.
If any personal data is required, then a
complete ethical review process is needed, which may include individual patient consent for the
data set. Further approval or sign-off is needed for data where personal details have been removed.
Some structured data sets can be made anonymous by removing fields with known patient
details from the record. Unstructured free text fields can have specific patient-related information
removed, but this may still leave relatable identifiable details about the patient (e.g., family) in the
text and would still needs careful ethical review and potentially Caldicott Guardian sign-off and would
not normally be allowed outside the BHDP.
Data linkage
How we link data
We use various methods for data linkage:
• Data Transfers: We can
either send Barts Health data to a partner's secure environment or receive data from partners to link
with our datasets.
• Federated Analysis: In some cases, we analyse datasets separately without
moving them, then combine the results using statistical methods.
• Information Governance: Before
linking data, we ensure that all necessary approvals are in place. This includes checking whether the
identifiers (like NHS numbers) are already authorised for use. We assess the sensitivity of the data to
determine how safely it can be shared. For example, sharing a list of NHS numbers is generally low risk,
while sharing names and addresses is much more sensitive.
Linkage methods
We have several approaches to linking data:
• Partner Matching: One partner matches identifiers
based on existing approvals, which is straightforward and low-risk.
• Third-Party Matching: A
trusted third party can perform the matching, ensuring that neither partner sees the other's full
dataset.
• Cryptographic Matching: Instead of sharing identifiable information directly, we use
cryptographic hashes. This means that sensitive identifiers are transformed into a secure format that
cannot be reversed, enhancing privacy.
Data transfer options
Once data is linked, we can transfer it in different ways:
• Manual Transfers: Researchers can
download and upload data manually when they have the necessary approvals and permissions.
• Team
Transfers: Our Barts Health Data Platform team can handle transfers directly for researchers.
•
API Integration: For more advanced setups, we can automate data transfers through APIs, making the
process seamless.
By continuously improving our data linkage strategies, we aim to
support impactful research while safeguarding patient confidentiality.
