WP 4 Biostatistical methods and models

Work package leaders:
Marc Chadeau Hyam
Michelle Kelly Irving

Main objective:The overarching goal of this WP is to explore available data to characterize the social exposome (the measure of all the exposures of an individual in a lifetime and how those exposures relate to health), relying on the definition of a clear, general and translatable methodological framework to:

• Identify health relevant components of the social exposome
• Quantify exposome components’ health effect Improve the understanding of their actual effects and their relationship to established (molecular) biomarkers and modifiable risk factors
• Assess biological consequences by exploring molecular profiles available in the cohorts
• Better understand their downstream consequences and disentangle the exposure, behavioral and physiological factors they capture.

Tasks

Title	Description	Pl	Participants

Identify health-relevant and reproducible socioeconomic factors	Using the collated and curated data from WP 2, this task will first focus on the identification of social determinants and confounding/mediating factors of healthy ageing within each cohort, and will propose harmonization strategies to the database WP 2 . Harmonization procedures will ensure comparability across populations and will account for the biological, and functional nature of these indicators and will also include their clinical relevance. Causal modelling including mediation analyses and structural equation models will Healthy Choices and the Social Gradient 10 be used to assess the relative contribution of biological, environmental, psychosocial and behavioral components to i) the overall social exposome and ii) various health outcomes. Joint modeling of longitudinal and survival data will be applied to combine repeatedly measured exposure data with outcomes as diabetes, cancer, CVD, and total mortality.	Cyrille Delpierre	ICL, UiT
Identify health-relevant and reproducible socioeconomic factors	The main objective of this task is to use life course socio-economic and psychosocial factors and examine their downstream biological consequences as measured by (possibly targeted) OMICS profiles. Life course analyses will first be carried out to understand the relative importance of SES and social factors at different time points on the outcomes, using a nested regression models approach. Due to the complex nature of socioeconomic factors which capture a complex mixture of economic, psychosocial, environmental, and behavioral factors, these analyses raise interpretability challenges. Full results interpretation will rely on disentangling which (combinations) of factors directly or indirectly affect biology, and will be facilitated by exploring OMIC signatures of the social environment. Consistent and theorized socio-economic metrics developed in the first task will be related to health outcomes and molecular data. In a final step, multi-OMIC data (from the consortium and from publicly available data bases) will be related to reconciled SE indicators, in a life course context, to seek for biologically effective and interpretable molecular signatures of past SE experiences. Such analyses will rely on the combination of profiling techniques (e.g. univariate analyses, dimensionality reduction techniques, or variable selection approach) and network topologies. Envisaged network approaches will be either unsupervised (e.g. correlation network) or supervised (e.g. differential network).	Cyrille Delpierre	ICL, UiT
Combining life course approaches and exposome science to healthy ageing	This task will combine social factors and related sets of (i) risk factors and (ii) molecular markers into longitudinal models of ageing to evaluate their respective contribution to individual ageing trajectories. We will develop individual-based dynamic models to reconstruct health individual trajectories using multi-state models. These models will incorporate life course features of the social exposome including OMIC markers of exposures and socio economic experiences. Accounting for these time-resolved risk drivers would improve the prediction of disease progression and add a dynamic element to the classical case/control discrimination problem. Specifically we will use established sequence analyses and hidden Markov Models and include selected OMIC and exposome signals in the models’ parameter. Model selection approaches will identify step(s) of the health trajectories where the potential risk drivers are exerting their effect. These approaches can be combined to micro-simulation approaches, and in order to perform health impact assessment, these will include counterfactual analyses to simulate interventions within and across cohorts. This last step will help identify modifiable factors of vulnerability and resilience that may define prioritised policy-relevant targets for future interventions.	Marc Chadeau-Hyam