Affiliation of Author(s):
信息与控制工程学院
Journal:
Advacned in Intelligent System Research
Key Words:
中文关键字:MapReduce;k均值;隐语义模型,英文关键字:MapReduce;Kmeans;Hidden topic model
Abstract:
In consideration of the features of micro-blogging content such as short text, sparse feature words and the huge scale, a method to detect micro-blogging hot topic was proposed in this paper based on MapReduce programming model. This method first employs the hidden topic analysis to solve the problem of short micro-blogging content and sparse feature words. Then the CURE algorithm is used to alleviate the problem that the Kmeans algorithm is sensitive to the initial points. Finally, the hot topic clustering results are obtained through the parallel Kmeans clustering algorithm based on the MapReduce programming model. The experimental results show that proposed method can effectively improve the micro-blogging hot topic detection efficiency.