发明授权
US08838599B2 Efficient lexical trending topic detection over streams of data using a modified sequitur algorithm
有权
使用修改的Sequitur算法对数据流进行有效的词汇趋势主题检测
- 专利标题: Efficient lexical trending topic detection over streams of data using a modified sequitur algorithm
- 专利标题(中): 使用修改的Sequitur算法对数据流进行有效的词汇趋势主题检测
-
申请号: US12780850申请日: 2010-05-14
-
公开(公告)号: US08838599B2公开(公告)日: 2014-09-16
- 发明人: Zhichen Xu , Yun Fu , Neal Sample
- 申请人: Zhichen Xu , Yun Fu , Neal Sample
- 申请人地址: US CA Sunnyvale
- 专利权人: Yahoo! Inc.
- 当前专利权人: Yahoo! Inc.
- 当前专利权人地址: US CA Sunnyvale
- 代理机构: Martine Penilla Group, LLP
- 主分类号: G06F17/30
- IPC分类号: G06F17/30
摘要:
Embodiments are directed towards a Modified Sequitur algorithm (MSA) using pipelining and indexed arrays to identify trending topics within a plurality of documents having user generated content (UGC). The documents are parallelized and distributed across a plurality of network devices, which place at least some of the received documents into a buffer for which the MSA may then be applied to the documents within the buffer to identify n-grams or phrases within the documents' contents. The identified phrases are further analyzed to remove extraneous co-occurrences of phrases, and/or words based on a part of speech analysis. A weighting of the remaining phrases is used to identify trending topic phrases. Links to content in the plurality of UGC documents that is associated with the trending topic phrases may then be displayed to a client device.
公开/授权文献
信息查询