A Combinatorial Tweet Clustering Methodology Utilizing Inter and Intra Cosine Similarity

July 2015 in “ oURspace (University of Regina) ”

Navneet Kaur

TLDR The method effectively grouped tweets into categories without knowing the number of groups beforehand.

The document presented a combinatorial tweet clustering methodology that utilized inter and intra cosine similarity to dynamically form clusters without requiring prior information about the number of clusters. This approach combined agglomerative and divisive hierarchical clustering techniques to improve clustering effectiveness and quality. The methodology involved preprocessing tweets to remove clutter and extract relevant features, then using cosine similarity to measure the degree of relativity between tweet vectors. It was tested on a dataset of 15,062 tweets related to topics like stem cell treatments and hair regrowth, resulting in 10 final categories. The study highlighted the limitations of existing algorithms like k-means and DBSCAN and demonstrated promising results, although it was noted to be slower due to the hierarchical clustering approach.

View this study on ourspace.uregina.ca →