Weakly Supervised Learning of Biomedical Information Extraction from Curated Data

    January 2016 in “ BMC Bioinformatics
    Suvir Jain, R.L. Kashyap, Tsung-Ting Kuo, Shitij Bhargava, Gordon Lin, Chun‐Nan Hsu
    Image of study
    TLDR The method can effectively extract biomedical information without needing expert annotation, performing better than previous models.
    In the 2016 paper by Jain et al., the authors introduced a weakly supervised learning method that utilizes curated biomedical databases as a source of training data for information extraction tasks, specifically avoiding the need for expert annotation. They approached the problem as cost-sensitive learning from noisy labels, employing a committee of weak classifiers to estimate costs based on both the curated data and the text itself. The method was applied to extract target phenotypes and ethnicity backgrounds from Genome-Wide Association Studies (GWAS) articles. The results showed that their method achieved a Precision-at-2 of 87% for disease/trait extraction and an F1-Score of 0.83 for stage-ethnicity extraction, surpassing the performance of cost-insensitive baselines. This study demonstrated the potential of reusing curated biomedical databases for training information extraction systems and highlighted the benefits of a cost-sensitive learning approach in biomedical text mining.
    Discuss this study in the Community →

    Related Community Posts Join

    6 / 14 results

    Similar Research

    5 / 1000+ results