Improving methods to minimise bias in ethnicity data for more representative and generalisable models, using CVD in COVID-19 as an example
Inequality in health has been made worse by the COVID-19 pandemic. People from minority ethnic backgrounds are more likely to become very sick or die from COVID-19.
An example of inequality in health is technology for predicting a person’s future health risks. This involves routinely collected health information which is put into a computer model and then a health risk score for a patient is given. Doctors can use this to decide patient care.
If there is bias in the data or bias in the model, the doctor can potentially make wrong decisions and patients can get the wrong care or no care. This could result in some groups of patients being incorrectly prioritised over others for booster vaccines, hospital beds, or life-saving treatments. This might affect patient and public trust, as well as cost the NHS.
We are aiming to improve existing technology for predicting personalised future risk of health conditions, particularly those affecting overlooked groups of patients.
We aim to do so by:
- improving the way recorded ethnicity is used in research, and
- improving the modelling process to build risk prediction models designed specifically to ethnicity groups and therefore more reliable.
We know that there are ethnicity biases for cardiovascular disease in COVID-19 patients. We are developing a calculator to predict cardiovascular disease in COVID-19 patients. We will use this as a first example and will then be able to use this approach across other health and disease areas.
The calculator can be used by public to guide lifestyle choices, and by doctors to provide better care. This can also be used by researchers nationwide doing health research involving ethnicity.
This work will be based on health information that represents almost everyone currently living in England and Wales, without being traced back to them. By extending to Northern Ireland and Scotland in future, we hope that this work will help to make health equal and fair for everyone in the UK.
The issues stated above will be addressed in outputs from a number of related sub-projects. Follow the links below to view repositories containing the protocol, data curation and analysis code, and phenotyping algorithms and codelists for each sub-project:
- CCU037_01: Ethnicity data resource in population-wide health records: completeness, coverage and granularity of diversity
- CCU037_02: Women’s health and ethnic disparities: a population-wide analysis of 3.6 million digital health records for mortality and cardiovascular risk in women diagnosed with COVID-19
Links to repositories for additional outputs will follow in due course.
This project has been approved by the CVD-COVID-UK/COVID-IMPACT Approvals & Oversight Board (Project ID: CCU037). It successfully received funding through a funding call by Health Data Research UK working in partnership with The Alan Turing Institute and the Office of National Statistics, as part of the wider Data and Connectivity National Core Study.