-
公开(公告)号:US07984039B2
公开(公告)日:2011-07-19
申请号:US11183086
申请日:2005-07-14
申请人: David Carmel , Adam Darlow , Shai Fine , Elad Yom-Tov
发明人: David Carmel , Adam Darlow , Shai Fine , Elad Yom-Tov
CPC分类号: G06F17/30864
摘要: A method and system are provided of merging results in distributed information retrieval. A search manager is in communication with a plurality of components, wherein a component is a search engine working on a document collection and returning results in the form of a list of documents to a search query. The search manager submits a query to the plurality of components, receives results from each component in the form of a list of documents; estimates the success of a component in handling the query to generate a merit score for a component per query; applies the merit score to the results for the component; and merges results from the plurality of components by ranking in order of the applied merit score.
摘要翻译: 提供了一种在分布式信息检索中合并结果的方法和系统。 搜索管理器与多个组件进行通信,其中组件是对文档收集工作的搜索引擎,并以搜索查询的文档列表的形式返回结果。 搜索管理器向多个组件提交查询,以文档列表的形式从每个组件接收结果; 估计组件处理查询的成功,以生成每个查询的组件的优点得分; 将优点分数应用于组件的结果; 并且通过按照所应用的优点得分的顺序来排列来自多个组分的结果。
-
公开(公告)号:US20080033971A1
公开(公告)日:2008-02-07
申请号:US11461464
申请日:2006-08-01
申请人: David Carmel , Adam Darlow , Shai Fine , Dan Pelleg , Elad Yom-Tov
发明人: David Carmel , Adam Darlow , Shai Fine , Dan Pelleg , Elad Yom-Tov
IPC分类号: G06F7/00
CPC分类号: G06F17/30675
摘要: A method and system for analyzing a document set (202, 420) are provided. The method includes determining a set of terms (312) from the terms of the document set that minimizes a distance measurement (405) from the given set of documents (420). The method includes using a greedy algorithm to build the set of terms incrementally, at each stage finding a single word that is closest to the document set (202, 420). The set of terms is evaluated to assess the ability to find the document set (202, 420). The set of terms are compared with expected terms to evaluate the ability to find the document set (202, 420). A measure of the ability to find a document set (202, 420) is provided by computing a distance measure (403) between a document set and an entire collection.
摘要翻译: 提供了一种用于分析文档集(202,420)的方法和系统。 该方法包括从文档集合的术语中确定一组术语(312),该文档集合的术语使距离给定文档集合(420)最小化距离测量(405)。 该方法包括使用贪心算法逐渐建立术语集合,在每个阶段找到最靠近文档集(202,420)的单个单词。 评估一组术语以评估查找文档集(202,420)的能力。 将这组术语与预期术语进行比较,以评估查找文档集(202,420)的能力。 通过计算文档集和整个集合之间的距离度量(403)来提供查找文档集(202,420)的能力的度量。
-
公开(公告)号:US20070016574A1
公开(公告)日:2007-01-18
申请号:US11183086
申请日:2005-07-14
申请人: David Carmel , Adam Darlow , Shai Fine , Elad Yom-Tov
发明人: David Carmel , Adam Darlow , Shai Fine , Elad Yom-Tov
IPC分类号: G06F17/30
CPC分类号: G06F17/30864
摘要: A method and system are provided of merging results in distributed information retrieval. A search manager (104) is in communication with a plurality of components, wherein a component is a search engine (106-108) working on a document collection and returning results in the form of a list of documents to a search query. The search manager (104) submits a query (202) to the plurality of components, receives results (213) from each component in the form of a list of documents; estimates (208) the success of a component in handling the query to generate a merit score (210) for a component per query; applies (220) the merit score (210) to the results for the component; and merges (222) results from the plurality of components by ranking in order of the applied merit score.
摘要翻译: 提供了一种在分布式信息检索中合并结果的方法和系统。 搜索管理器(104)与多个组件通信,其中组件是在文档收集上工作的搜索引擎(106-108),并以搜索查询的文档列表的形式返回结果。 搜索管理器(104)向多个组件提交查询(202),以文档列表的形式从每个组件接收结果(213); 估计(208)组件在处理查询中的成功以生成每个查询的组件的优点得分(210); 将优点得分(210)(220)应用于组件的结果; 并通过按照应用的优点得分的顺序来合并来自多个成分的结果(222)。
-
公开(公告)号:US07792830B2
公开(公告)日:2010-09-07
申请号:US11461464
申请日:2006-08-01
申请人: David Carmel , Adam Darlow , Shai Fine , Dan Pelleg , Elad Yom-Tov
发明人: David Carmel , Adam Darlow , Shai Fine , Dan Pelleg , Elad Yom-Tov
CPC分类号: G06F17/30675
摘要: A method and system for analyzing a document set (202, 420) are provided. The method includes determining a set of terms (312) from the terms of the document set that minimizes a distance measurement (405) from the given set of documents (420). The method includes using a greedy algorithm to build the set of terms incrementally, at each stage finding a single word that is closest to the document set (202, 420). The set of terms is evaluated to assess the ability to find the document set (202, 420). The set of terms are compared with expected terms to evaluate the ability to find the document set (202, 420). A measure of the ability to find a document set (202, 420) is provided by computing a distance measure (403) between a document set and an entire collection.
摘要翻译: 提供了一种用于分析文档集(202,420)的方法和系统。 该方法包括从文档集合的术语中确定一组术语(312),该文档集合的术语使距离给定文档集合(420)最小化距离测量(405)。 该方法包括使用贪心算法逐渐建立术语集合,在每个阶段找到最靠近文档集(202,420)的单个单词。 评估一组术语以评估查找文档集(202,420)的能力。 将这组术语与预期术语进行比较,以评估查找文档集(202,420)的能力。 通过计算文档集和整个集合之间的距离度量(403)来提供查找文档集(202,420)的能力的度量。
-
公开(公告)号:US20070016545A1
公开(公告)日:2007-01-18
申请号:US11181324
申请日:2005-07-14
申请人: Andrei Broder , David Carmel , Adam Darlow , Shai Fine , Elad Yom-Tov
发明人: Andrei Broder , David Carmel , Adam Darlow , Shai Fine , Elad Yom-Tov
IPC分类号: G06F17/30
CPC分类号: G06F16/28
摘要: A method and system for the detection of missing content in a searchable repository is provided. A system includes: a missing content query identifier (401) for identifying queries to a search engine (102) for which no or little relevant content is returned; a missing content detector (110) which clusters missing content queries by topic; and an output provider for providing details of a missing content topic.
摘要翻译: 提供了一种用于检测可搜索存储库中缺少内容的方法和系统。 一种系统包括:缺少的内容查询标识符(401),用于识别对没有返回或没有相关内容的搜索引擎(102)的查询; 一个丢失的内容检测器(110),其通过主题聚集丢失的内容查询; 以及用于提供缺少的内容主题的细节的输出提供者。
-
公开(公告)号:US07401073B2
公开(公告)日:2008-07-15
申请号:US11117749
申请日:2008-04-28
申请人: David Carmel , Adam Darlow , Yael Petruschka , Aya Soffer
发明人: David Carmel , Adam Darlow , Yael Petruschka , Aya Soffer
IPC分类号: G06F17/30
CPC分类号: G06F17/30634 , G06F17/30613 , Y10S707/99931 , Y10S707/99933 , Y10S707/99935
摘要: A method for searching a document collection includes providing an index of terms indicating the documents in which the terms appear. A first statistical distribution of each of at least some of the terms in the index and a second statistical distribution of each of at least some of the categories are estimated a over the documents in the collection. A query including one or more of the terms and a category restriction referring to at least one of the categories is accepted. A modified term distribution is produced by operating on the first statistical distribution of at least one of the terms in the query using the second statistical distribution, responsively to the category restriction. The query is applied to the index to return a response, in which occurrences of the at least one of the terms are scored responsively to the modified term distribution.
摘要翻译: 用于搜索文档收集的方法包括提供指示术语出现的文档的术语索引。 在集合中的文档上估计索引中的至少一些术语和至少一些类别中的每一个的第二统计分布中的每一个的第一统计分布。 接受包括一个或多个术语和涉及至少一个类别的类别限制的查询。 响应于类别限制,通过使用第二统计分布对查询中的至少一个项的第一统计分布进行操作来产生修改的术语分布。 该查询被应用于索引以返回响应,其中至少一个项目的出现响应于修改的术语分布而得分。
-
公开(公告)号:US20060248074A1
公开(公告)日:2006-11-02
申请号:US11117749
申请日:2005-04-28
申请人: David Carmel , Adam Darlow , Yael Petruschka , Aya Soffer
发明人: David Carmel , Adam Darlow , Yael Petruschka , Aya Soffer
IPC分类号: G06F17/30
CPC分类号: G06F16/33 , G06F16/31 , Y10S707/99931 , Y10S707/99933 , Y10S707/99935
摘要: A method for searching a document collection includes providing an index of terms indicating the documents in which the terms appear. A first statistical distribution of each of at least some of the terms in the index and a second statistical distribution of each of at least some of the categories are estimated a over the documents in the collection. A query including one or more of the terms and a category restriction referring to at least one of the categories is accepted. A modified term distribution is produced by operating on the first estimated statistical distribution of at least one of the terms in the query using the second estimated statistical distribution of the at least one of the categories, responsively to the category restriction. The query is applied to the index so as to return a response, in which occurrences of the at least one of the terms are scored responsively to the modified term distribution.
摘要翻译: 用于搜索文档收集的方法包括提供指示术语出现的文档的术语索引。 在集合中的文档上估计索引中的至少一些术语和至少一些类别中的每一个的第二统计分布中的每一个的第一统计分布。 接受包括一个或多个术语和涉及至少一个类别的类别限制的查询。 响应于类别限制,通过使用所述至少一个类别的第二估计统计分布对查询中的至少一个项的第一估计统计分布进行操作来产生修改后的分配。 将该查询应用于索引以便返回响应,其中至少一个项目的出现响应于修改的术语分布而得分。
-
公开(公告)号:US20060212265A1
公开(公告)日:2006-09-21
申请号:US11083204
申请日:2005-03-17
申请人: Einat Amitay , Adam Darlow , Uri Weiss
发明人: Einat Amitay , Adam Darlow , Uri Weiss
IPC分类号: G21C17/00
CPC分类号: G06F16/951
摘要: A method and system for assessing the quality of one or more search engines are provided. The method and system monitor reformulation sessions by users (201) of a search engine (308, 402, 403) by retrieving data from a query log (307, 407, 408), wherein a reformulation session is a series of at least two queries to a search engine (308) issued by a user (201) to satisfy a single information need. The method and system then determine a reformulation session parameter for the search engine (308, 402, 403) and analyse the reformulation session parameter. The reformulation session parameter may be a rate of query reformulations in a reformulation session or a reformulation session duration. Analysing the reformulation session parameter for a single search engine may determine if the parameter changes with time or may determine the parameter with different settings in a single search engine. Analysing the reformulation session parameter for two or more search engines includes comparing the parameters of the two or more search engines to measure the search quality. The analysis can be used to control the operation of one or more search engines.
摘要翻译: 提供了一种用于评估一个或多个搜索引擎的质量的方法和系统。 所述方法和系统通过从查询日志(307,407,408)中检索数据来监视用户(201)的搜索引擎(308,402,403)的重新制定会话,其中重新配置会话是一系列至少两个查询 到由用户(201)发布以满足单个信息需求的搜索引擎(308)。 方法和系统然后确定搜索引擎(308,402,403)的重新配置会话参数,并分析重新配置会话参数。 重新配置会话参数可以是重新配置会话或重新配置会话持续时间中的查询重新设置的速率。 分析单个搜索引擎的重新配置会话参数可以确定参数是否随时间变化,或者可以在单个搜索引擎中确定具有不同设置的参数。 分析两个或多个搜索引擎的重新配置会话参数包括比较两个或多个搜索引擎的参数以测量搜索质量。 该分析可用于控制一个或多个搜索引擎的操作。
-
-
-
-
-
-
-