The Temptation of Large Numbers: Pitfalls in Epidemiological Research

    Leonard Leibovici, ‪Adi Turjeman‬‏, Mical Paul
    TLDR Large databases in research can lead to misleading conclusions due to biases and chance findings; researchers should analyze data more rigorously.
    The document discusses the allure and pitfalls of using large databases for epidemiological research, highlighting that while they offer extensive data and potential for linking different databases, they also come with significant limitations. These include data not being collected for research purposes, potential biases, unreliable coding of diagnoses, and the misuse of statistical significance tests in large samples. The authors provide numerous examples of research from Taiwanese and Danish databases that identified various risk factors for diseases like herpes zoster and Parkinson's disease, often with conflicting results such as statins being both a risk and a protective factor. They argue that the large number of associations found in these databases is likely due to chance and that publication bias further complicates the issue. The authors suggest that editors and peer-reviewers should be more critical of studies using large databases, focusing on logic, biological plausibility, clinical relevance, and proper adjustment for multiple comparisons to avoid the dissemination of spurious findings. They emphasize the importance of considering absolute risks, the ascertainment of variables, and the potential for noise to create false associations. The document concludes by urging researchers to be more rigorous in their analyses and to consider previously published risk factors and confounders to avoid misleading conclusions and the waste of resources on unnecessary trials.
    Discuss this study in the Community →