You’re offline. This is a read only version of the page.
Skip to main content
Toggle navigation
Home
Our data
Our data
Approved projects
Datasets and Linkage
Pricing and packages
Platform information
Request data access
News and Events
News and Events
Acknowledgements
About
About
Secure data environment
Public involvement
Our Impact
Data Access Committee
Barts Charity
Meet the team
Contact Us
Contact Us
FAQs
Search
Sign in
Contrast:
By clicking the "Accept Basic" button below, you agree to the minimum cookies needed to operate the website. By clicking the 'Accept All' button you agree to additional cookies needed for website analytics. More details are in our
Privacy Policy
.
Close Box
Accept Basic Cookies
Accept All Cookies
Project Summary
Project Title
*
General Project Information
Project Type
Research
Clinical Audit
Quality Improvement
Service Evaluation
Research - OMOP (CYBORG)
Clinical - OMOP (CYBORG)
Approval Type
Full
Provisional
Planned Project Start Date
*
Planned Project End Date
*
Overall Status
*
Project Summary
*
<div class="ck-content" data-wrapper="true" dir="ltr" style="--ck-image-style-spacing: 1.5em; --ck-inline-image-style-spacing: calc(var(--ck-image-style-spacing) / 2); font-family: 'Segoe UI','Helvetica Neue',sans-serif; font-size: 9pt;"><p style="margin: 0;">This project aims to make lung cancer care safer by improving how small spots on the lungs, called pulmonary nodules follow up rate after scans. These nodules can sometimes be an early sign of cancer, but patients are often missed because there is no easy way to track them. We will develop and test a computer tool to help hospitals identify and monitor these patients more reliably. Patients in Birmingham and East London could benefit within the next few years through safer systems, fewer missed diagnoses, and earlier treatment - giving patients a better chance of recovery.</p><div style="font-family:'Segoe UI','Helvetica Neue',sans-serif;font-size:9pt;"> </div><pre class="fui-Text ___11pvwsw fk6fouc fkhj508 f1i3iumi figsok6 fpgzoln ftgm304 fibxuh5 f1gl81tg f2jf649 f1hu3pq6 f11qmguv f19f4twv f1tyq0we fly5x3f f1ids18y fhx3js9" style="background-color:rgb(255, 255, 255);border-width:0px;color:rgb(36, 36, 36);display:block;font-family:"Segoe UI", "Segoe UI Web (West European)", -apple-system, BlinkMacSystemFont, Roboto, "Helvetica Neue", sans-serif;font-size:14px;font-style:normal;font-weight:400;letter-spacing:normal;line-height:19.6px;margin:0px;outline:none;overflow:visible;padding:0px;text-align:start;text-indent:0px;text-transform:none;white-space:pre-wrap;width:516px;word-spacing:0px;"> </pre><pre class="fui-Text ___11pvwsw fk6fouc fkhj508 f1i3iumi figsok6 fpgzoln ftgm304 fibxuh5 f1gl81tg f2jf649 f1hu3pq6 f11qmguv f19f4twv f1tyq0we fly5x3f f1ids18y fhx3js9" style="background-color:rgb(255, 255, 255);border-width:0px;color:rgb(36, 36, 36);display:block;font-family:"Segoe UI", "Segoe UI Web (West European)", -apple-system, BlinkMacSystemFont, Roboto, "Helvetica Neue", sans-serif;font-size:14px;font-style:normal;font-weight:400;letter-spacing:normal;line-height:19.6px;margin:0px;outline:none;overflow:visible;padding:0px;text-align:start;text-indent:0px;text-transform:none;white-space:pre-wrap;width:516px;word-spacing:0px;"> </pre></div>
Detailed Project Description
*
<div data-wrapper="true" style="font-family:'Segoe UI','Helvetica Neue',sans-serif; font-size:9pt"><p>This project aims to improve early-stage lung cancer detection by strengthening follow-up for incidentally detected pulmonary nodules (PNs) in the NHS. PNs are small lesions often identified on CT scans, and while most are benign, 3–5% represent early lung cancer. Current UK evidence on adherence to British Thoracic Society (BTS) guidelines for PN follow-up is lacking, and international data suggest compliance can be as low as 30%, risking delayed diagnoses.</p><div><p>We will request access to free-text radiology reports from Barts Secure Data Environment to (1) validate a Natural Language Processing (NLP) algorithm for identifying patients with PNs, and (2) if possible, assess whether follow-up aligns with BTS guidelines. This will involve linking radiology reports with follow-up imaging data and patient demographics to explore patterns of care and potential inequities (e.g., by age, ethnicity, deprivation).</p><p>The study will include thousands of patients undergoing thoracic CT scans, as only 10–30% will have nodules. Findings will quantify the scale of missed or delayed follow-up and inform the design of an intervention to improve safety and equity in PN care. Outputs include a validated NLP tool for national audit, evidence on current practice, and a co-designed intervention prototype. This work has the potential to reduce late-stage lung cancer diagnoses, improve survival, and enhance patient safety across the NHS.</p></div></div>
Requested Data Summary
*
<div data-wrapper="true" style="font-family:'Segoe UI','Helvetica Neue',sans-serif; font-size:9pt"><p>We are requesting access to free-text radiology reports for chest CT scans from the Barts Secure Data Environment. As many reports as possible to provide would be ideal. These reports will be used to identify patients with incidentally detected pulmonary nodules using a Natural Language Processing (NLP) algorithm. We also require associated metadata to enable follow-up analysis, including:</p><div><ul> <li>Patient demographics (age, sex, ethnicity, postcode/deprivation index) for those linked to CT scans.</li> <br> <li>Scan dates and identifiers to track follow-up imaging (if possible).</li> <br> <li>Basic clinical context (e.g., indication for scan)</li></ul><p>The data will allow us to validate the NLP tool, quantify adherence to British Thoracic Society guidelines for nodule follow-up, and explore potential inequities in care.</p></div></div>
Technical Description
*
<div data-wrapper="true" style="font-family:'Segoe UI','Helvetica Neue',sans-serif; font-size:9pt"><p>We will use de-identified free-text radiology reports from the Barts Secure Data Environment (SDE) to validate and refine a Natural Language Processing (NLP) algorithm for identifying pulmonary nodules. The algorithm, developed in <strong>Python</strong> using open-source libraries such as <strong>spaCy</strong>, <strong>scikit-learn</strong>, and <strong>transformers</strong>, will be evaluated using precision, recall, and F1-score against a manually validated reference set.</p><div><p><strong>Computing Requirements:</strong></p><strong>Duration:</strong> 2 years for one named user. A clinician may also need temporary access to manually code/verify a subset of reports for algorithm validation.<ul><br> <li><strong>Environment:</strong> Linux-based secure container within the SDE.</li> <br> <li><strong>Resources:</strong> <ul><br> <li>CPU: 8–16 cores</li> <br> <li>RAM: 16-32GB</li> <br> <li>GPU: Not required</li> <br> <li>Storage: Depends on free text report size, likely to be <1TB.<span style="display:none"> </span><span style="display:none"> </span></li> </ul> </li></ul><p><strong>Analysis Plan:</strong></p><ul> <li>Validate NLP model on historical CT reports.</li> <br> <li>Link identified cases to imaging metadata to assess adherence to British Thoracic Society guidelines and explore sociodemographic patterns.</li> <br> <li>Perform descriptive and stratified analyses using <strong>R</strong> and <strong>Python</strong>.</li></ul><p><strong>Open Science and Reproducibility:</strong></p><ul> <li>All code will be version-controlled on GitHub and released under an open-source license (e.g., MIT) after project completion.</li> <br> <li>A reproducible analytical pipeline will be provided, enabling replication by other NHS sites.</li> <br> <li>Documentation and an implementation guide will be shared to support national audit and quality improvement initiatives.</li></ul></div></div>
Public and Patient Involvement and Engagement Summary
*
<div data-wrapper="true" style="font-family:'Segoe UI','Helvetica Neue',sans-serif; font-size:9pt"><p>Patient and public involvement is central to this project. We have already presented the proposed research to the Midlands PSRC PPIE group for feedback, which informed key design decisions. For example, contributors highlighted the need for clear communication about patient follow-up expectations, the importance of addressing power imbalances in co-design workshops, and the use of anonymous tools (e.g., Mentimeter) to ensure all voices are heard. They also stressed the value of understanding real-world resource constraints and ensuring transparency when patient preferences cannot be fully implemented.</p><div><p>A PPIE co-applicant with lived experience of lung conditions will join the project team, supported with appropriate training. Public contributors will be involved throughout, including in shaping recruitment strategies, advising on patient-facing materials, and participating in co-design workshops for the intervention. We will also provide feedback to contributors on how their input has influenced decisions.</p></div></div>
Reporting
*
<div data-wrapper="true" style="font-family:'Segoe UI','Helvetica Neue',sans-serif; font-size:9pt"><p>We will deliver a comprehensive public and academic legacy from this project:</p><div><ul> <li><strong>Peer-Reviewed Publications:</strong> At least three papers, including (1) NLP algorithm development and validation, (2) UK-first analysis of pulmonary nodule follow-up and inequalities, and (3) qualitative insights into best practice and intervention design.</li> <br> <li><strong>Open-Source Outputs:</strong> Release of the NLP case-finding tool and a reproducible analytical pipeline (e.g., via GitHub) under an open license, with full documentation for NHS use.</li> <br> <li><strong>Conference Presentations:</strong> Findings will be presented at national and international conferences (e.g., British Thoracic Society, NIHR events).</li> <br> <li><strong>Public-Facing Outputs:</strong> Lay summaries, infographics, and explainer videos co-produced with PPIE contributors, disseminated via Asthma + Lung UK and local NHS Trust channels.</li> <br> <li><strong>Policy and Practice Impact:</strong> Reports for participating Trusts and engagement with the British Thoracic Society to inform future audit frameworks.</li> <br> <li><strong>Regulatory Reporting:</strong> Compliance with REC and SDE reporting requirements throughout the project.</li></ul></div></div>
Contact Points
Project Lead Name
*
Project Lead Position
*
Project Lead Email
*
Lead Organisation Name
*
Lead Organisation Address
*
Secure Data Environment (SDE)
Will you be using the BH SDE
Will you be using the BH SDE
No
Will you be using the BH SDE
Yes
Details of the location and IT system (the SDE) where the data extract will be kept and processed.
*