Analyzing Big EHR Data: Optimal Cox Regression Subsampling Procedure With Rare Events
May 2023
in “
Journal of the American Statistical Association
”
TLDR A new method makes analyzing large datasets with rare events faster and more efficient.
This study addresses the computational challenges of analyzing large survival datasets, specifically using the UK-biobank colorectal cancer data. The authors propose a Cox regression subsampling method that optimizes sampling probabilities for censored observations while including all observed failures. This approach aims to approximate full-data partial-likelihood estimators, effectively reducing computation time and memory requirements. The methodology is particularly useful for datasets with rare events, where failure times are a small portion of the sample. The study establishes the asymptotic properties of the estimators and evaluates their performance through simulation studies.