- 专利标题: Identifying topics in a digital work
-
申请号: US13433028申请日: 2012-03-28
-
公开(公告)号: US09613003B1公开(公告)日: 2017-04-04
- 发明人: Joshua M. Goodspeed , Janna S. Hamaker , Adam J. Iser , Tom Killalea , Abhishek Patnia , Alla Taborisskaya
- 申请人: Joshua M. Goodspeed , Janna S. Hamaker , Adam J. Iser , Tom Killalea , Abhishek Patnia , Alla Taborisskaya
- 申请人地址: US NV Reno
- 专利权人: Amazon Technologies, Inc.
- 当前专利权人: Amazon Technologies, Inc.
- 当前专利权人地址: US NV Reno
- 代理机构: Lee & Hayes, PLLC
- 主分类号: G06F17/27
- IPC分类号: G06F17/27 ; G06F3/00 ; G06F3/048 ; G06F17/00 ; G06F17/21
摘要:
In some implementations, text is extracted from a digital work and a plurality of noun phrases are identified. The noun phrases are checked against a network accessible resource, such as an online encyclopedia, that includes a plurality of interlinked article entries. The noun phrases that have corresponding entries in the network accessible resource are included in a set of candidate topics. The candidate topics are ranked based, at least in part, on the links to and from each of the entries corresponding to the candidate topics. Candidate topics below a ranking threshold are removed from the set of candidate topics. Further, term frequency information for each candidate topic in relation to the digital work is compared against term frequency information for the candidate topic in a large corpus of textual works to remove candidate topics within a frequency difference threshold.
信息查询