Integration of transactional and Ecological Momentary Assessment data for cancer risk factor prediction

A fully funded 4-year PhD Project with Wellcome Trust Molecular, genetic and lifecourse epidemiology programme

Supervisors

Dr Anya Skatova (lead), Dr Andy Skinner , Prof Tom Gaunt, Prof Richard Martin

Read more about how to apply and the PhD programme here

Rationale

Shopping history records, collected via purchases tracked on loyalty cards, can provide a new perspective on lifestyle choices and behaviours and how these relate to health outcomes such as cancer. Shopping histories can provide information, which is otherwise difficult to measure such as granular, population level, objective data on lifestyle behaviours and risk factors (e.g., smoking, alcohol consumption) that can be tracked longitudinally. However, shopping history data also have inherent biases. For example, despite providing details on purchasing habits and basic individual characteristics, patterns in the data could be explained by other factors (e.g., the gap between purchase and consumption). Reliability of health information that is derived from shopping history data can be assessed through integrating these data with detailed self-reports of behaviour collected through Ecological Momentary Assessment (EMA). This work will improve detection of cancer risk as well assess validity of integrated data sources in risk prediction.

Aims & Objectives

The overall aim of this PhD is to integrate supermarket loyalty cards data with EMA data and conventional epidemiological measures (eg questionnaires, biomarkers, etc) in Avon Longitudinal Study of Parents and Children (ALSPAC) to predict risk factors for cancer. The innovative aspect is the use of transactional and EMA data, which provide higher-density time-series data with different biases from conventional questionnaire/interview data. The ability to predict risk factors using such data could produce novel insights of early cancer symptoms and associated consumption patterns.

Methods

Identify patterns in standalone shopping history data that can be reflective of consumption association with known risks of cancer (Years 1 – 2). Collect EMA data on behaviours related to known risks of cancer using wearable technology (e.g. smartwatches) (Year 2). Use statistical methods (e.g., linear and logistic regression) to validate shopping histories patterns through EMA and conventional self-report/biomedical data in ALSPAC (Years 2-3). Use statistical and machine learning methods to predict cancer risk factors in the ALSPAC dataset in a sample of thousands of ALSPAC participants as well as standalone supermarket loyalty cards data in population-wide sample of millions of supermarket customers (Years 2-4).