Adaptive evaluation of text search queries with blackbox scoring functions
    11.
    发明授权
    Adaptive evaluation of text search queries with blackbox scoring functions 失效
    具有黑盒评分功能的文本搜索查询的自适应评估

    公开(公告)号:US07991771B2

    公开(公告)日:2011-08-02

    申请号:US11561949

    申请日:2006-11-21

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30672

    摘要: Disclosed is an evaluation technique for text search with black-box scoring functions, where it is unnecessary for the evaluation engine to maintain details of the scoring function. Included is a description of a system for dealing with blackbox searching, proofs of correctness, as well experimental evidence showing that the performance of the technique is comparable in efficiency to those techniques used in custom-built engines.

    摘要翻译: 公开了一种用于具有黑匣子评分功能的文本搜索的评估技术,其中评估引擎不需要保持评分功能的细节。 包括处理黑箱搜索的系统的描述,正确性的证明,以及实验证据表明该技术的性能与定制引擎中使用的技术的效率相当。

    Architecture for an indexer
    12.
    发明授权
    Architecture for an indexer 失效
    索引器的架构

    公开(公告)号:US07743060B2

    公开(公告)日:2010-06-22

    申请号:US11834556

    申请日:2007-08-06

    IPC分类号: G06F7/00 G06F17/30

    摘要: Disclosed is a technique for indexing data. For each token in a set of documents, a sort key is generated that includes a document identifier that indicates whether a section of a document associated with the sort key is an anchor text section or a context section, wherein the anchor text section and the context text section have a same document identifier; it is determined whether a data field associated with the token is a fixed width; when the data field is a fixed width, the token is designated as one for which fixed width sort is to be performed; and, when the data field is a variable length, the token is designated as one for which a variable width sort is to be performed. The fixed width sort and the variable width sort are performed. For each document, the sort keys are used to bring together the anchor text section and the context section of that document.

    摘要翻译: 公开了一种索引数据的技术。 对于一组文档中的每个标记,生成包括指示与排序键相关联的文档的一部分是锚定文本部分还是上下文部分的文档标识符的排序关键字,其中锚文本部分和上下文 文本部分具有相同的文档标识符; 确定与令牌相关联的数据字段是否是固定宽度; 当数据字段是固定宽度时,令牌被指定为要进行固定宽度排序的令牌; 并且当数据字段是可变长度时,令牌被指定为要对其执行可变宽度排序的令牌。 执行固定宽度排序和可变宽度排序。 对于每个文档,排序键用于将锚文本部分和文档的上下文部分组合在一起。

    Adaptive Evaluation of Text Search Queries With Blackbox Scoring Functions
    15.
    发明申请
    Adaptive Evaluation of Text Search Queries With Blackbox Scoring Functions 失效
    使用Blackbox评分函数自适应评估文本搜索查询

    公开(公告)号:US20070150467A1

    公开(公告)日:2007-06-28

    申请号:US11561949

    申请日:2006-11-21

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30672

    摘要: Disclosed is an evaluation technique for text search with black-box scoring functions, where it is unnecessary for the evaluation engine to maintain details of the scoring function. Included is a description of a system for dealing with blackbox searching, proofs of correctness, as well experimental evidence showing that the performance of the technique is comparable in efficiency to those techniques used in custom-built engines.

    摘要翻译: 公开了一种用于具有黑匣子评分功能的文本搜索的评估技术,其中评估引擎不需要保持评分功能的细节。 包括处理黑箱搜索的系统的描述,正确性的证明,以及实验证据表明该技术的性能与定制引擎中使用的技术的效率相当。

    Method for cataloging, filtering, and relevance ranking frame-based hierarchical information structures
    17.
    发明授权
    Method for cataloging, filtering, and relevance ranking frame-based hierarchical information structures 有权
    编目,过滤和相关性排序基于帧的层次信息结构的方法

    公开(公告)号:US06334131B2

    公开(公告)日:2001-12-25

    申请号:US09143733

    申请日:1998-08-29

    IPC分类号: G06F1730

    摘要: A method for cataloging, filtering and ranking information, as for example, World Wide Web pages of the Internet. The method is preferably implemented in computer software and features steps for enabling a user to interactively create an information database including preferred information elements such as preferred-authority World Wide Web pages. The method includes steps for enabling a user to interactively create a frame-based, hierarchical organizational structure for the information elements, and steps for identifying and automatically filtering and ranking by relevance, information elements, such as World Wide Web pages for populating the structure, to form, for example, a searchable, World Wide Web page database. Additionally, the method features steps for enabling a user to interactively define a frame-based, hierarchical information structure for cataloging information, identifying a preliminary population of information elements for a particular hierarchical category arranged as a frame, based upon the respective frame attributes, and thereafter, expanding the information population to include related information, and subsequently, automatically filtering and ranking the information based upon relevance, and then populating the hierarchical structure with a definable portion of the filtered, ranked information elements.

    摘要翻译: 用于对信息进行编目,过滤和排序的方法,例如互联网的万维网页面。 该方法优选地在计算机软件中实现,并且特征步骤用于使得用户能够交互地创建包括诸如优选权威万维网页面之类的优选信息元素的信息数据库。 该方法包括使用户能够交互地创建用于信息元素的基于帧的分层组织结构的步骤,以及用于识别和自动过滤和排序相关性的步骤,诸如用于填充结构的万维网页面的信息元素, 以形成例如可搜索的万维网页数据库。 另外,该方法具有以下步骤:使得用户能够交互地定义用于编目信息的基于帧的分层信息结构,基于相应的帧属性来识别为排列为帧的特定分级类别的信息元素的初步总体,以及 此后,扩展信息群体以包括相关信息,随后基于相关性自动过滤和排序信息,然后用经过排序的信息元素的可定义部分填充分层结构。

    KNOWLEDGE-BASED DATA MINING SYSTEM
    18.
    发明申请
    KNOWLEDGE-BASED DATA MINING SYSTEM 审中-公开
    基于知识的数据挖掘系统

    公开(公告)号:US20120259890A1

    公开(公告)日:2012-10-11

    申请号:US13526424

    申请日:2012-06-18

    IPC分类号: G06F17/30

    CPC分类号: G06F16/951 G06F2216/03

    摘要: In a data mining system, data is gathered into a data store using, e.g., a Web crawler. The data is classified into entities. Data miners use rules to process the entities and append respective keys to the entities representing characteristics of the entities as derived from rules embodied in the miners. With these keys, characteristics of entities as defined by disparate expert authors of the data miners are identified for use in responding to complex data requests from customers.

    摘要翻译: 在数据挖掘系统中,使用例如Web爬行器将数据收集到数据存储中。 数据分为实体。 数据挖掘者使用规则来处理实体,并将相应的密钥附加到代表矿工特征的实体的实体。 利用这些密钥,确定数据挖掘者的不同专家作者定义的实体的特征用于响应客户的复杂数据请求。

    Method and system for filtering of information entities
    19.
    发明授权
    Method and system for filtering of information entities 失效
    信息实体过滤方法和系统

    公开(公告)号:US06996572B1

    公开(公告)日:2006-02-07

    申请号:US08947221

    申请日:1997-10-08

    IPC分类号: G06F17/00

    摘要: A system and method are provided for eliciting interesting structure from a collection of entities or resources with explicit and/or implicit, static and/or dynamic relations, called “affinities,” between them. Interesting structure includes (1) notions of quality, authority, or definitiveness of information, (2) notions of relevance to a user's information need, (3) notions of similarity among the plurality of resources retrieved from a universe of resources by a query process, and (4) notions of similarity among the usages of resources by different users/servers. Similarities between entities are computed, based on similarities between the affinity values for the entities. That is, where the affinitiy values for two entities resemble each other, the two entities have a high degree of similarity. Using the similarities, the entities are ranked, clustered, etc., based on a significance derived from the similarities. The ranking, clustering, etc., makes up the interesting structure which is sought.

    摘要翻译: 提供了一种系统和方法,用于从具有明确和/或隐含,静态和/或动态关系的实体或资源集合中引出有趣的结构,在它们之间称为“亲和度”。 有趣的结构包括(1)信息的质量,权威或定义的概念,(2)与用户信息需求相关的概念,(3)通过查询过程从资源范围检索的多个资源之间的相似度概念 ,(4)不同用户/服务器资源使用情况之间的相似性概念。 基于实体的亲和度值之间的相似度来计算实体之间的相似性。 也就是说,两个实体的亲属价值相似,两个实体的相似度很高。 使用相似之处,实体根据从相似性导出的意义进行排名,聚类等。 排名,聚类等构成了有趣的结构。

    Method and system for trawling the World-wide Web to identify implicitly-defined communities of web pages
    20.
    发明授权
    Method and system for trawling the World-wide Web to identify implicitly-defined communities of web pages 失效
    拖网世界网络的方法和系统,以识别隐含定义的网页社区

    公开(公告)号:US06886129B1

    公开(公告)日:2005-04-26

    申请号:US09449697

    申请日:1999-11-24

    IPC分类号: G06F17/30

    摘要: A method and system for identifying groups of pages of common interest from a collection of hyper-linked pages are disclosed. A plurality of community cores are identified from the collection where each core includes first and second sets of pages, and each page in the first set points to every page in the second set. Each identified core is expanded into a full community which is a subset of the pages regarding a particular topic. The identification community cores is based on the analysis of the Web graph in which the communities correspond to instances of Web subgraphs. Extraneous pages are then pruned to improve the quality of the resulting communities.

    摘要翻译: 公开了一种用于从超链接页面的集合中识别共同感兴趣的页面组的方法和系统。 从集合中识别出多个社区核心,其中每个核心包括第一组和第二组页面,并且第一组中的每个页面指向第二组中的每一页。 每个识别的核心都被扩展成一个完整的社区,这是一个关于特定主题的页面的子集。 识别社区核心是基于Web图形的分析,其中社区对应于Web子图的实例。 然后修剪外来页面以提高所得社区的质量。