Information Retrieval with Unified Search Using Multiple Facets
    1.
    发明申请
    Information Retrieval with Unified Search Using Multiple Facets 有权
    使用多个面进行统一搜索的信息检索

    公开(公告)号:US20090327271A1

    公开(公告)日:2009-12-31

    申请号:US12164139

    申请日:2008-06-30

    IPC分类号: G06F7/06 G06F17/30

    CPC分类号: G06F17/30675

    摘要: Information retrieval with unified search between heterogeneous objects is described. The method includes: indexing a first object as a document in a search index; referencing a second object related to the first object in a facet of the document; and storing a relationship strength between the first and second objects in the facet of the document in the search index. Multiple heterogeneous objects can be related to the first object and referenced in multiple facets of the document, each with its relationship strength to the first object. Scoring an indirect object by indirect relation to a query object can be carried out by aggregating the relationship strengths between the indirect object and the retrieved objects multiplied by the retrieved objects' direct scores of relationship strength to the query object.

    摘要翻译: 描述了异构对象之间统一搜索的信息检索。 该方法包括:将第一对象作为文档索引到搜索索引中; 在所述文档的方面引用与所述第一对象相关的第二对象; 以及在所述搜索索引中存储所述文档的所述面中的所述第一和第二对象之间的关系强度。 多个异构对象可以与第一个对象相关,并在文档的多个方面被引用,每一个都具有与第一个对象的关系强度。 通过与查询对象的间接关系来计算间接对象可以通过将间接对象和检索对象之间的关系强度乘以检索到的对象的关系强度的直接得分与查询对象进行。

    Information retrieval with unified search using multiple facets
    2.
    发明授权
    Information retrieval with unified search using multiple facets 有权
    使用多个方面进行统一搜索的信息检索

    公开(公告)号:US08024324B2

    公开(公告)日:2011-09-20

    申请号:US12164139

    申请日:2008-06-30

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30675

    摘要: A method for information retrieval with unified search between heterogeneous objects includes indexing a first object as a document in a search index; referencing a second object related to the first object in a facet of the document; and storing a relationship strength between the first and second objects in the facet of the document in the search index. Multiple heterogeneous objects can be related to the first object and referenced in multiple facets of the document, each with its relationship strength to the first object. Scoring an indirect object by indirect relation to a query object can be carried out by aggregating the relationship strengths between the indirect object and the retrieved objects multiplied by the retrieved objects' direct scores of relationship strength to the query object.

    摘要翻译: 用于异构对象之间的统一搜索的信息检索方法包括将第一对象作为搜索索引中的文档进行索引; 在所述文档的方面引用与所述第一对象相关的第二对象; 以及在所述搜索索引中存储所述文档的所述面中的所述第一和第二对象之间的关系强度。 多个异构对象可以与第一个对象相关,并在文档的多个方面被引用,每一个都具有与第一个对象的关系强度。 通过与查询对象的间接关系来计算间接对象可以通过将间接对象和检索对象之间的关系强度乘以检索到的对象的关系强度的直接得分与查询对象进行。

    Method and system for detection of authors
    3.
    发明授权
    Method and system for detection of authors 有权
    作者检测方法和系统

    公开(公告)号:US07752208B2

    公开(公告)日:2010-07-06

    申请号:US11733808

    申请日:2007-04-11

    IPC分类号: G06F7/00

    摘要: A method and system are provided for detection of authors across different types of information sources such as across documents on the Web. The method includes obtaining a compression signature for a document, and determining the similarity between compression signatures of two or more documents. If the similarity is greater than a threshold measure, the two or more documents are considered to be by the same author. Scored pairs of documents are clustered to provide a group of documents by the same author.The group of documents by the same author can be used for user profiling, noise reduction, contribution sizing, detecting fraudulent contributions, obtaining other search results by the same author, or mating a document with undisclosed authorship to a document of known author.

    摘要翻译: 提供了一种方法和系统,用于检测跨不同类型信息源的作者,例如跨Web上的文档。 该方法包括获得文档的压缩签名,以及确定两个或多个文档的压缩签名之间的相似性。 如果相似度大于阈值度量,则两个或多个文档被认为是由同一作者。 得分的文档对被聚集以提供同一作者的一组文档。 同一作者的一组文件可用于用户分析,降噪,贡献大小,检测欺诈性贡献,获取同一作者的其他搜索结果,或将未公开作者的文档与已知作者的文档进行交互。

    Method and System for Detection of Authors
    4.
    发明申请
    Method and System for Detection of Authors 有权
    作者检测方法与系统

    公开(公告)号:US20080256093A1

    公开(公告)日:2008-10-16

    申请号:US11733808

    申请日:2007-04-11

    IPC分类号: G06F17/00

    摘要: A method and system are provided for detection of authors across different types of information sources such as across documents on the Web. The method includes obtaining a compression signature (303) for a document, and determining the similarity (304) between compression signatures of two or more documents. If the similarity is greater than a threshold measure (305), the two or more documents are considered to be by the same author. Scored pairs of documents are clustered (308) to provide a group of documents by the same author. The group of documents by the same author can be used for user profiling, noise reduction, contribution sizing, detecting fraudulent contributions, obtaining other search results by the same author, or mating a document with undisclosed authorship to a document of known author.

    摘要翻译: 提供了一种方法和系统,用于检测跨不同类型信息源的作者,例如跨Web上的文档。 该方法包括获取文档的压缩签名(303),以及确定两个或多个文档的压缩签名之间的相似度(304)。 如果相似度大于阈值度量(305),则两个或多个文档被认为是同一作者。 得分的文档对被聚集(308)以由同一作者提供一组文档。 同一作者的一组文件可用于用户分析,降噪,贡献大小,检测欺诈性贡献,获取同一作者的其他搜索结果,或将未公开作者的文档与已知作者的文档进行交互。

    Method and system for assessing quality of search engines
    6.
    发明申请
    Method and system for assessing quality of search engines 审中-公开
    用于评估搜索引擎质量的方法和系统

    公开(公告)号:US20060212265A1

    公开(公告)日:2006-09-21

    申请号:US11083204

    申请日:2005-03-17

    IPC分类号: G21C17/00

    CPC分类号: G06F16/951

    摘要: A method and system for assessing the quality of one or more search engines are provided. The method and system monitor reformulation sessions by users (201) of a search engine (308, 402, 403) by retrieving data from a query log (307, 407, 408), wherein a reformulation session is a series of at least two queries to a search engine (308) issued by a user (201) to satisfy a single information need. The method and system then determine a reformulation session parameter for the search engine (308, 402, 403) and analyse the reformulation session parameter. The reformulation session parameter may be a rate of query reformulations in a reformulation session or a reformulation session duration. Analysing the reformulation session parameter for a single search engine may determine if the parameter changes with time or may determine the parameter with different settings in a single search engine. Analysing the reformulation session parameter for two or more search engines includes comparing the parameters of the two or more search engines to measure the search quality. The analysis can be used to control the operation of one or more search engines.

    摘要翻译: 提供了一种用于评估一个或多个搜索引擎的质量的方法和系统。 所述方法和系统通过从查询日志(307,407,408)中检索数据来监视用户(201)的搜索引擎(308,402,403)的重新制定会话,其中重新配置会话是一系列至少两个查询 到由用户(201)发布以满足单个信息需求的搜索引擎(308)。 方法和系统然后确定搜索引擎(308,402,403)的重新配置会话参数,并分析重新配置会话参数。 重新配置会话参数可以是重新配置会话或重新配置会话持续时间中的查询重新设置的速率。 分析单个搜索引擎的重新配置会话参数可以确定参数是否随时间变化,或者可以在单个搜索引擎中确定具有不同设置的参数。 分析两个或多个搜索引擎的重新配置会话参数包括比较两个或多个搜索引擎的参数以测量搜索质量。 该分析可用于控制一个或多个搜索引擎的操作。

    Personal index of items in physical proximity to a user
    7.
    发明申请
    Personal index of items in physical proximity to a user 有权
    物理接近用户的物品的个人索引

    公开(公告)号:US20050067492A1

    公开(公告)日:2005-03-31

    申请号:US10675155

    申请日:2003-09-30

    IPC分类号: G06Q30/00 G06F17/60

    CPC分类号: G06Q30/02

    摘要: A dynamic index may list physical items in the changing vicinity of a user or a generator of the index. The vicinity may be within the same space as the user or the generator, such as a store, a library, a shelf, an aisle, within a given radius, a street, a city, a campus, a building, an area and a park. The index may store information about the physical items near the user or generator, such as content found on tags associated with the physical items. The content might be a description of the physical items and their locations. The present invention also includes a system and method to generate such a dynamic index.

    摘要翻译: 动态索引可以列出用户的变化附近的物理项目或索引的生成器。 附近可以在与用户或发电机相同的空间内,例如在给定半径范围内的商店,图书馆,架子,通道,街道,城市,校园,建筑物,区域和 公园。 索引可以存储关于用户或生成器附近的物理项目的信息,例如与物理项目相关联的标签上找到的内容。 内容可能是物理物品及其位置的描述。 本发明还包括产生这样的动态索引的系统和方法。

    Detecting content-rich text
    8.
    发明申请
    Detecting content-rich text 审中-公开
    检测内容丰富的文本

    公开(公告)号:US20060161537A1

    公开(公告)日:2006-07-20

    申请号:US11038370

    申请日:2005-01-19

    IPC分类号: G06F17/30

    CPC分类号: G06F17/277

    摘要: A method includes finding content-rich text in a document by identifying areas of narrative in the document. An apparatus includes a detector and a content-rich text indicator. The detector detects linguistic parameters which characterize narrative text in an input document and the content-rich text indicator provides the locations of narrative text in the input document.

    摘要翻译: 一种方法包括通过识别文档中叙述的区域来在文档中找到内容丰富的文本。 一种装置包括检测器和富含内容的文本指示符。 检测器检测表征输入文档中的叙述文本的语言参数,并且内容丰富的文本指示符在输入文档中提供叙述文本的位置。

    Method and system for determining the focus of a document
    9.
    发明申请
    Method and system for determining the focus of a document 审中-公开
    确定文档焦点的方法和系统

    公开(公告)号:US20060004752A1

    公开(公告)日:2006-01-05

    申请号:US11165527

    申请日:2005-06-23

    IPC分类号: G06F7/00

    CPC分类号: G06F16/9537 G06F16/313

    摘要: A method and system for determining the focus of a document are provided. Candidate topics in the form of topic nodes in a hierarchy of topics are input into a focus determining algorithm. For each candidate topic node, a score is allocated to the topic of each level of the hierarchy of the topic node , the scores for each topic are summed and one or more topics are determined to be the focus of the document based on the scores. The scores allocated to the topic of each parent level of the hierarchy of the topic node are progressively lower for the topic of each parent level of the hierarchy. The candidate topics may be provided by identifying occurrences of references to a topic in a document, providing a plurality of possible topics in the form of topic nodes in a hierarchy of topics, and, for each identified occurrence of a reference to a topic, determining the appropriate topic node and adding the topic node to the candidate topics.

    摘要翻译: 提供了一种用于确定文档焦点的方法和系统。 将主题层次结构中的主题节点形式的候选主题输入到焦点确定算法中。 对于每个候选主题节点,将分数分配给主题节点的层级的每个级别的主题,将每个主题的分数相加,并且基于分数将一个或多个主题确定为文档的焦点。 分配给主题节点的层次结构的每个父级别的主题的分数对于层次结构的每个父级别的主题逐渐降低。 候选主题可以通过标识对文档中的主题的引用的出现来提供,以主题层次结构中的主题节点的形式提供多个可能的主题,并且对于针对主题的引用的每个确定的出现,确定 相应的主题节点,并将主题节点添加到候选主题。

    Search Performance and User Interaction Monitoring of Search Engines
    10.
    发明申请
    Search Performance and User Interaction Monitoring of Search Engines 审中-公开
    搜索引擎的搜索性能和用户交互监控

    公开(公告)号:US20070265999A1

    公开(公告)日:2007-11-15

    申请号:US11383265

    申请日:2006-05-15

    IPC分类号: G06F17/30

    摘要: A system for monitoring search performance and user interaction is provided in the form of a utility (300) including a plurality of monitoring components (302), each for dynamic monitoring of an aspect of searching a collection of documents. An analyzer module (303) analyzes the dynamic monitoring and identifies problems or difficulties in the search performance or user interactions. An output (301), which may be in the form of a display interface, provides information regarding the search performance and user interaction including one or more of: reasoning, improvement suggestions, reports, and problem alerts. The analyzer module (302) compares the dynamic monitoring to benchmark search engine conduct and document collection state.

    摘要翻译: 用于监视搜索性能和用户交互的系统以包括多个监视组件(302)的实用程序(300)的形式提供,每个监视组件用于动态监视搜索文档集合的方面。 分析器模块(303)分析动态监视并识别搜索性能或用户交互中的问题或困难。 输出(301)可以是显示界面的形式,提供关于搜索性能和用户交互的信息,包括以下一个或多个:推理,改进建议,报告和问题警报。 分析器模块(302)将动态监视与基准搜索引擎行为和文档收集状态进行比较。