Analyzing Big EHR Data: Optimal Cox Regression Subsampling Procedure With Rare Events

    Nir Keret, Malka Gorfine
    TLDR A new method makes analyzing large datasets with rare events faster and more efficient.
    This study addresses the computational challenges of analyzing large survival datasets, specifically using the UK-biobank colorectal cancer data. The authors propose a Cox regression subsampling method that optimizes sampling probabilities for censored observations while including all observed failures. This approach aims to approximate full-data partial-likelihood estimators, effectively reducing computation time and memory requirements. The methodology is particularly useful for datasets with rare events, where failure times are a small portion of the sample. The study establishes the asymptotic properties of the estimators and evaluates their performance through simulation studies.
    Discuss this study in the Community →

    Research cited in this study

    1 / 1 results