A Combinatorial Tweet Clustering Methodology Utilizing Inter and Intra Cosine Similarity

    Navneet Kaur
    Image of study
    TLDR The method effectively grouped tweets into categories without knowing the number of groups beforehand.
    The document presented a combinatorial tweet clustering methodology that utilized inter and intra cosine similarity to dynamically form clusters without requiring prior information about the number of clusters. This approach combined agglomerative and divisive hierarchical clustering techniques to improve clustering effectiveness and quality. The methodology involved preprocessing tweets to remove clutter and extract relevant features, then using cosine similarity to measure the degree of relativity between tweet vectors. It was tested on a dataset of 15,062 tweets related to topics like stem cell treatments and hair regrowth, resulting in 10 final categories. The study highlighted the limitations of existing algorithms like k-means and DBSCAN and demonstrated promising results, although it was noted to be slower due to the hierarchical clustering approach.
    Discuss this study in the Community →