Abstract HTML Views: 297 PDF Downloads: 154 Total Views/Downloads: 451
Abstract HTML Views: 178 PDF Downloads: 111 Total Views/Downloads: 289
Topic model building is the basis and the most critical module of cross-language topic detection and tracking.
Topic model also can be applied to cross-language text similarity calculation. It can improve the efficiency and the speed
of calculation by reducing the texts’ dimensionality. In this paper, we use the LDA model in cross-language text similarity
computation to obtain Tibetan-Chinese comparable corpora: (1) Extending Tibetan-Chinese dictionary by extracting
Tibetan-Chinese entities from Wikipedia. (2) Using topic model to make the texts mapped to the feature space of topics.
(3) Calculating the similarity of two texts in different language according to the characteristics of the news text. The
method for text similarity calculation based on LDA model reduces the dimensions of text space vector, and enhances the
understanding of the text’s semantics. It also improves the speed and efficiency of calculation.