Generating Polygenic Risk Scores from Gene Sets: Why & How

Polygenic risk scores (PRS) have been studied for several years for their utility in outcome prediction. Applying PRS to risk stratification in all medical disciplines is still limited.
Generating Polygenic Risk Scores from Gene Sets: Why & How

So far, features selected to predict post-ischemic stroke mortality mainly focus on demographics, social, and limited clinical factors. Identified genetic risk factors have not been integrated into prediction models, individually or together, in either cohort or longitudinal studies. At the same time, the short- or long-term outcomes have become “The Next Big Thing” in the focus of stroke genetics with a great demand for the development of neuroprotective agents1.

Why creating PRS from gene sets:

  1. Gaining insight based on the ‘core genes’ – Causal relationship between genetic variation and the phenotype of interest is always restricted to a small number of pathways or so-called ‘core genes’ at both the population and individual levels.
  2. Improving the signal-to-noise ratio – Adding non-related genetic components into the PRS model may increase noise, leading to type II error.
  3. Integrating disease etiologies into the picture – Prioritizing pathways associated with the phenotype by PRS derived from gene sets may help to validate known etiologies and identify novel drug targets.
  4. A step towards personalized medicine – Risk stratification by the pathway-specific PRS promotes personalized medicine when PRSs are part of the feature sets in the prediction models.
  5. Capturing complex multi-dimensional interactions – Genes-environment interaction (GxE) has a stronger convergence at the pathway level rather than the gene level.
  6. Strength of the associations leading to a better biotyping – Endophenotype- or disease subtype-specific risk genes or pathways can be enriched by ranking the significance of the associations (competitive p-value). The intersection across some or all clinically defined subtypes would help to create new subtypes.

How to create PRS from gene sets:

  1. The process is flexible; pathway-specific PRS can be calculated directly from individual-level genotyping data or indirectly from the summary statistics.
  2. The integration of PRSs from multiple pathways can be achieved by machine learning approaches.
  3. Variants with varied minor allele frequency can be grouped and the probability distribution of their effect sizes can be approximated by various statistical models.
  4. The weight for each variant must be estimated from single or multiple large-scale GWAS derived from matched ancestry backgrounds.

Lessons learned when predicting mortality among ischemic stroke patients using Pathways-derived PRSs:

Our previous study has identified several pathway-specific PRSs that are significantly associated with ischemic stroke or its subtypes2. Here is an example of the top-ranked pathways associated with small vessel stroke(SVS) and the variants from genes belonging to these pathways (Figure 1). Through our study design and analysis pipeline, we have identified biologically relevant pathway-specific PRS (Li, J. et al. Scientific Reports, 2022), which could, individually or together with other known clinical risk factors, predict post-stroke all-cause mortality. We have provided an alternative way to construct domain knowledge-based PRS and demonstrated its robustness as an independent predictor in the outcome prediction. We have demonstrated the effect size of each pathway-specific PRS in association with post-stroke mortality in subgroups of early-onset or late-onset stroke patients. Some age-dependent associations were identified. We have also demonstrated their pleiotropy by accessing the association of these PRSs with other stroke-related comorbidities. Pathway-specific PRSs indeed have confirmed or prioritized some known etiologies for the outcome of interest, but also identified some novel pathways in the observed disease population. The PRS findings derived from pathway enrichment analysis may promote drug target identification as well as precision medicine for patients predisposed to a specific higher genetic risk score.  

Figure 1. Pathway-specific PRS augments etiologic subtyping of IS and outcome prediction


In conclusion, we provide evidence that pathway-specific PRSs for ischemic stroke are associated with 3-year all-cause mortality. The integrated multivariate risk model provides a better prognostic value for overall survival after ischemic stroke. Identified PRSs from disease-relevant pathways echoed several known etiologies for IS as well as post-ischemic stroke mortality.


1          Dichgans, M., Beaufort, N., Debette, S. & Anderson, C. D. Stroke Genetics: Turning Discoveries into Clinical Applications. Stroke 52, 2974-2982, doi:10.1161/STROKEAHA.121.032616 (2021).

2          Li, J. et al. Polygenic Risk Scores Augment Stroke Subtyping. Neurol Genet 7, e560, doi:10.1212/NXG.0000000000000560 (2021).

Please sign in or register for FREE

If you are a registered user on Nature Portfolio Health Community, please sign in