Field weighting in text searching
    1.
    发明申请
    Field weighting in text searching 有权
    文本搜索中的字段权重

    公开(公告)号:US20050210006A1

    公开(公告)日:2005-09-22

    申请号:US10804326

    申请日:2004-03-18

    IPC分类号: G06F17/30

    摘要: A field-weighted search combines statistical information for each term across document fields in a suitably weighted fashion. Both field-specific term frequencies and field and document lengths are considered to obtain a field-weighted document weight for each query term. Each field-weighted document weight can then be combined in order to generate a field-weighted document score that is responsive to the overall query.

    摘要翻译: 场加权搜索以适当加权的方式将跨文档字段的每个术语的统计信息合并。 考虑到字段特定术语频率和字段和文档长度都可以获得每个查询项的场加权文档权重。 然后可以组合每个场加权文档权重,以便生成响应于整体查询的场加权文档得分。

    Batching document identifiers for result trimming
    3.
    发明授权
    Batching document identifiers for result trimming 有权
    批处理结果修剪的文档标识符

    公开(公告)号:US07636712B2

    公开(公告)日:2009-12-22

    申请号:US11600307

    申请日:2006-11-14

    IPC分类号: G06F17/30

    摘要: A query is separated into subqueries including a first subquery containing terms applicable to a first data store and a second subquery containing terms applicable to a second data store, where both data stores maintain information regarding the documents. Applying the first subquery to the first data store retrieves a first list of document identifiers associated with documents that satisfy the terms of the first subquery. The first list is combined with the second subquery to form a masked subquery, which is applied to the second data store. The masked subquery only seeks to identify document identifiers that both are included the first list and that satisfy terms of the second query. The document identifiers included in the first list may be ordered to match an order in which the document identifiers are ordered in the second data store.

    摘要翻译: 查询分为子查询,包括第一个子查询,其中包含适用于第一数据存储的条款,第二个子查询包含适用于第二个数据存储的条款,其中两个数据存储都保存有关文档的信息。 将第一个子查询应用于第一个数据存储检索与满足第一个子查询的条款的文档相关联的文档标识符的第一个列表。 第一个列表与第二个子查询相结合,形成一个掩码的子查询,应用于第二个数据存储。 掩蔽的子查询仅用于识别既包括在第一列表中并且满足第二查询的条款的文档标识符。 包括在第一列表中的文档标识符可以被排序以匹配在第二数据存储中排序文档标识符的顺序。

    Batching document identifiers for result trimming
    4.
    发明申请
    Batching document identifiers for result trimming 有权
    批处理结果修剪的文档标识符

    公开(公告)号:US20080114730A1

    公开(公告)日:2008-05-15

    申请号:US11600307

    申请日:2006-11-14

    IPC分类号: G06F17/30

    摘要: A query is separated into subqueries including a first subquery containing terms applicable to a first data store and a second subquery containing terms applicable to a second data store, where both data stores maintain information regarding the documents. Applying the first subquery to the first data store retrieves a first list of document identifiers associated with documents that satisfy the terms of the first subquery. The first list is combined with the second subquery to form a masked subquery, which is applied to the second data store. The masked subquery only seeks to identify document identifiers that both are included the first list and that satisfy terms of the second query. The document identifiers included in the first list may be ordered to match an order in which the document identifiers are ordered in the second data store.

    摘要翻译: 查询分为子查询,包括第一个子查询,其中包含适用于第一数据存储的条款,第二个子查询包含适用于第二个数据存储的条款,其中两个数据存储都保存有关文档的信息。 将第一个子查询应用于第一个数据存储检索与满足第一个子查询的条款的文档相关联的文档标识符的第一个列表。 第一个列表与第二个子查询相结合,形成一个掩码的子查询,应用于第二个数据存储。 掩蔽的子查询仅用于识别既包括在第一列表中并且满足第二查询的条款的文档标识符。 包括在第一列表中的文档标识符可以被排序以匹配在第二数据存储中排序文档标识符的顺序。

    RANKING REAL ESTATE BASED ON ITS VALUE AND OTHER FACTORS
    5.
    发明申请
    RANKING REAL ESTATE BASED ON ITS VALUE AND OTHER FACTORS 有权
    根据其价值和其他因素排列房地产

    公开(公告)号:US20120158748A1

    公开(公告)日:2012-06-21

    申请号:US13331505

    申请日:2011-12-20

    IPC分类号: G06F17/30

    CPC分类号: G06Q30/0623 G06Q50/16

    摘要: Real estate ranking computation is calculated to sort real estate properties. Such computations use available information regarding real estate properties in any local markets and may help the real estate buyer to identify real estate properties with various ranked financial values. Suitably, a relatively small geographic area can be used based on the real estate buyer's specified criteria, and the real estate ranking computation is able to handle interactions among predictor variables, possesses suitable predictive confidence, and includes the capability for dynamically adjusting the underlying ranking computation as new patterns of real estate market emerge over time.

    摘要翻译: 房地产价格计算是按房地产物业进行排序。 这种计算使用有关当地市场的房地产业务的现有信息,并可能有助于房地产买家识别具有各种排名的财务价值的房地产物业。 适当地,可以根据房地产买家的指定标准使用相对较小的地理区域,并且房地产排名计算能够处理预测变量之间的相互作用,具有适当的预测置信度,并且包括动态调整基础排名计算的能力 随着时间的推移,房地产市场的新格局出现。

    Ranking real estate based on its value and other factors
    6.
    发明授权
    Ranking real estate based on its value and other factors 有权
    根据其价值等因素对房地产进行排名

    公开(公告)号:US08832115B2

    公开(公告)日:2014-09-09

    申请号:US13331505

    申请日:2011-12-20

    IPC分类号: G06F7/00 G06F17/30 G06Q30/06

    CPC分类号: G06Q30/0623 G06Q50/16

    摘要: Real estate ranking computation is calculated to sort real estate properties. Such computations use available information regarding real estate properties in any local markets and may help the real estate buyer to identify real estate properties with various ranked financial values. Suitably, a relatively small geographic area can be used based on the real estate buyer's specified criteria, and the real estate ranking computation is able to handle interactions among predictor variables, possesses suitable predictive confidence, and includes the capability for dynamically adjusting the underlying ranking computation as new patterns of real estate market emerge over time.

    摘要翻译: 房地产价格计算是按房地产物业进行排序。 这种计算使用有关当地市场的房地产业务的现有信息,并可能有助于房地产买家识别具有各种排名的财务价值的房地产物业。 适当地,可以根据房地产买家的指定标准使用相对较小的地理区域,并且房地产排名计算能够处理预测变量之间的相互作用,具有适当的预测置信度,并且包括动态调整基础排名计算的能力 随着时间的推移,房地产市场的新格局出现。

    Search index format optimizations
    7.
    发明授权
    Search index format optimizations 有权
    搜索索引格式优化

    公开(公告)号:US08166041B2

    公开(公告)日:2012-04-24

    申请号:US12139213

    申请日:2008-06-13

    IPC分类号: G06F7/00 G06F17/30

    摘要: A search index structure which extends a typical composite index by incorporating an index which is optimized for fast retrieval from storage and which eliminates data which is specific to phrase searching. Other data is represented in a manner which allows it to be calculated rather than stored. Associating variable length entries with logical categories allows their length to be inferred from the category rather than stored. Using delta values between document IDs rather than the ID itself generates a compact, dense symbol set which is efficiently compressed by Huffman encoding or a similar compression method. Using an upper threshold to remove large, and thus rare, delta values from the symbol set prior to encoding further improves the encoding performance.

    摘要翻译: 一种搜索索引结构,其通过结合针对存储快速检索而优化的索引并且消除了特定于短语搜索的数据来扩展典型的复合索引。 其他数据以允许计算而不是存储的方式表示。 将可变长度条目与逻辑类别相关联可以使其长度从类别推断而不是存储。 在文档ID之间使用增量值而不是ID本身产生一个紧凑的,密集的符号集合,它被霍夫曼编码或类似的压缩方法高效地压缩。 使用上限阈值从编码之前的符号集中去除较大且因此罕见的增量值进一步提高了编码性能。

    INDEX OPTIMIZATION FOR RANKING USING A LINEAR MODEL
    8.
    发明申请
    INDEX OPTIMIZATION FOR RANKING USING A LINEAR MODEL 有权
    使用线性模型进行排序的索引优化

    公开(公告)号:US20100121838A1

    公开(公告)日:2010-05-13

    申请号:US12690100

    申请日:2010-01-19

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30864

    摘要: Technologies are described herein for providing a more efficient approach to ranking search results. An illustrative technology reduces an amount of ranking data analyzed at query time. In the technology, a term is selected, at index time, from a master index. The term corresponds to a number of documents greater than a threshold. A set of documents that includes the term is selected based on the master index. A rank is determined for each document in the set of documents that contains the term. Each document in the set of documents that contains the term is assigned to a top document list or a bottom document list based on the rank. Predefined values of at least part of the rank are stored in the top document list for documents in the top document list and are not stored in the bottom document list for documents in the bottom document list.

    摘要翻译: 本文描述了技术,以提供用于对搜索结果进行排名的更有效的方法。 说明性技术减少了在查询时间分析的排名数据量。 在技​​术中,在索引时间,从主索引中选择一个术语。 该术语对应于大于阈值的多个文档。 根据主索引选择一组包含该术语的文档。 在包含该术语的文档集中的每个文档确定排名。 包含该术语的文档集中的每个文档都会根据排名分配给顶级文档列表或底部文档列表。 至少部分等级的预定义值存储在顶部文档列表中的文档的顶部文档列表中,并且不存储在底部文档列表中的文档的底部文档列表中。

    SEARCH INDEX FORMAT OPTIMIZATIONS
    9.
    发明申请
    SEARCH INDEX FORMAT OPTIMIZATIONS 有权
    搜索索引格式优化

    公开(公告)号:US20090313238A1

    公开(公告)日:2009-12-17

    申请号:US12139213

    申请日:2008-06-13

    IPC分类号: G06F7/06 G06F17/30

    摘要: A search index structure which extends a typical composite index by incorporating an index which is optimized for fast retrieval from storage and which eliminates data which is specific to phrase searching. Other data is represented in a manner which allows it to be calculated rather than stored. Associating variable length entries with logical categories allows their length to be inferred from the category rather than stored. Using delta values between document IDs rather than the ID itself generates a compact, dense symbol set which is efficiently compressed by Huffman encoding or a similar compression method. Using an upper threshold to remove large, and thus rare, delta values from the symbol set prior to encoding further improves the encoding performance.

    摘要翻译: 一种搜索索引结构,其通过结合针对存储快速检索而优化的索引并且消除了特定于短语搜索的数据来扩展典型的复合索引。 其他数据以允许计算而不是存储的方式表示。 将可变长度条目与逻辑类别相关联可以使其长度从类别推断而不是存储。 在文档ID之间使用增量值而不是ID本身产生一个紧凑的,密集的符号集合,它被霍夫曼编码或类似的压缩方法高效地压缩。 使用上限阈值从编码之前的符号集中去除较大且因此罕见的增量值进一步提高了编码性能。

    Tenantization of search result ranking
    10.
    发明授权
    Tenantization of search result ranking 有权
    搜索结果排名的趋势

    公开(公告)号:US08694507B2

    公开(公告)日:2014-04-08

    申请号:US13287656

    申请日:2011-11-02

    IPC分类号: G06F7/00 G06F17/30

    摘要: This disclosure describes methods and systems for searching documents in a multi-tenant hosting environment. According to embodiments, to conserve hardware resources, a plurality of documents associated with a plurality of tenants may be mapped to the same search index in the multi-tenant hosting environment. In order to search documents associated only with a single tenant in the multi-tenant hosting environment, a tenant identifier is prepended to every key stored in the search index that is associated with the plurality of documents of the single tenant. Moreover, where one document links to another document within the multi-tenant hosting environment, the link is stored in a web graph when a source tenant identifier matches a target tenant identifier for the link. According to embodiments, when conducting a search, the link is resolved only if the link is stored in the web graph.

    摘要翻译: 本公开描述了用于在多租户托管环境中搜索文档的方法和系统。 根据实施例,为了节省硬件资源,可以将与多个租户相关联的多个文档映射到多租户托管环境中的相同搜索索引。 为了搜索仅在多租户托管环境中与单个租户相关联的文档,预先将租户标识符存储在与单个租户的多个文档相关联的搜索索引中存储的每个密钥。 此外,当一个文档链接到多租户托管环境中的另一个文档时,当源租户标识符与链接的目标租户标识符匹配时,链接被存储在网页图中。 根据实施例,当进行搜索时,仅当链接被存储在网络图中时才解决链接。