- 专利标题: Query generation using structural similarity between documents
-
申请号: US12942950申请日: 2010-11-09
-
公开(公告)号: US08346792B1公开(公告)日: 2013-01-01
- 发明人: Steven D. Baker , Michael Flaster , Nitin Gupta , Paul Haahr , Srinivasan Venkatachary , Yonghui Wu
- 申请人: Steven D. Baker , Michael Flaster , Nitin Gupta , Paul Haahr , Srinivasan Venkatachary , Yonghui Wu
- 申请人地址: US CA Mountain View
- 专利权人: Google Inc.
- 当前专利权人: Google Inc.
- 当前专利权人地址: US CA Mountain View
- 代理机构: Fish & Richardson P.C.
- 主分类号: G06F17/30
- IPC分类号: G06F17/30
摘要:
Methods, systems, and apparatus, including computer program products, for generating synthetic queries using seed queries and structural similarity between documents are described. In one aspect, a method includes identifying embedded coding fragments (e.g., HTML tag) from a structured document and a seed query; generating one or more query templates, each query template corresponding to at least one coding fragment, the query template including a generative rule to be used in generating candidate synthetic queries; generating the candidate synthetic queries by applying the query templates to other documents that are hosted on the same web site as the document; identifying terms that match structure of the query templates as candidate synthetic queries; measuring a performance for each of the candidate synthetic queries; and designating as synthetic queries the candidate synthetic queries that have performance measurements exceeding a performance threshold.
信息查询