System, method and computer program product for performing unstructured information management and automatic text analysis, including a search operator functioning as a weighted and (WAND)
    11.
    发明申请
    System, method and computer program product for performing unstructured information management and automatic text analysis, including a search operator functioning as a weighted and (WAND) 有权
    用于执行非结构化信息管理和自动文本分析的系统,方法和计算机程序产品,包括用作加权的搜索运算符和(WAND)

    公开(公告)号:US20070112763A1

    公开(公告)日:2007-05-17

    申请号:US11607080

    申请日:2006-11-30

    IPC分类号: G06F17/30

    摘要: Disclosed is a system architecture, components and a searching technique for an Unstructured Information Management System (UIMS). The UIMS may be provided as middleware for the effective management and interchange of unstructured information over a wide array of information sources. The architecture generally includes a search engine, data storage, analysis engines containing pipelined document annotators and various adapters. The searching technique makes use of a two-level searching technique. A search query includes a search operator containing of a plurality of search sub-expressions each having an associated weight value. The search engine returns a document or documents having a weight value sum that exceeds a threshold weight value sum. The search operator is implemented as a Boolean predicate that functions as a Weighted AND (WAND).

    摘要翻译: 公开了一种用于非结构化信息管理系统(UIMS)的系统架构,组件和搜索技术。 UIMS可以作为中间件提供,用于通过广泛的信息源有效地管理和交换非结构化信息。 该架构通常包括搜索引擎,数据存储,包含流水线文档注释器和各种适配器的分析引擎。 搜索技术利用了两级搜索技术。 搜索查询包括包含多个搜索子表达式的搜索运算符,每个搜索子表达式具有相关联的权重值。 搜索引擎返回具有超过阈值权重值和的权重值和的文档或文档。 搜索运算符实现为一个布尔谓词,用作加权AND(WAND)。

    System, method and computer program product for performing unstructured information management and automatic text analysis, and providing multiple document views derived from different document tokenizations
    12.
    发明授权
    System, method and computer program product for performing unstructured information management and automatic text analysis, and providing multiple document views derived from different document tokenizations 有权
    用于执行非结构化信息管理和自动文本分析的系统,方法和计算机程序产品,以及提供从不同文档标记化导出的多个文档视图

    公开(公告)号:US07139752B2

    公开(公告)日:2006-11-21

    申请号:US10449409

    申请日:2003-05-30

    IPC分类号: G06F17/30

    摘要: Disclosed is a system architecture, components and a searching technique for an Unstructured Information Management System (UIMS). The UIMS may be provided as middleware for the effective management and interchange of unstructured information over a wide array of information sources. The architecture generally includes a search engine, data storage, analysis engines containing pipelined document annotators and various adapters. The searching technique makes use of a two-level searching technique. Also disclosed is system, method and computer program product to process document data. The method includes inputting a document and operating at least one text analysis engine that comprises a plurality of coupled annotators for tokenizing document data for identifying and annotating a particular type of semantic content. Operating the at least one text analysis engine generates a plurality of views of a document, where each of the plurality of views are derived from a different tokenization of the document. The method further includes storing the plurality of views in a common data structure associated with the document.

    摘要翻译: 公开了一种用于非结构化信息管理系统(UIMS)的系统架构,组件和搜索技术。 UIMS可以作为中间件提供,用于通过广泛的信息源有效地管理和交换非结构化信息。 该架构通常包括搜索引擎,数据存储,包含流水线文档注释器和各种适配器的分析引擎。 搜索技术利用了两级搜索技术。 还公开了处理文档数据的系统,方法和计算机程序产品。 所述方法包括输入文档并操作至少一个文本分析引擎,所述文本分析引擎包括多个耦合的注释器,用于标记文档数据以识别和注释特定类型的语义内容。 操作至少一个文本分析引擎生成文档的多个视图,其中多个视图中的每个视图是从文档的不同标记化导出的。 该方法还包括将多个视图存储在与文档相关联的公共数据结构中。

    Term-statistics modification for category-based search
    13.
    发明授权
    Term-statistics modification for category-based search 有权
    基于类别搜索的期限统计修改

    公开(公告)号:US07401073B2

    公开(公告)日:2008-07-15

    申请号:US11117749

    申请日:2008-04-28

    IPC分类号: G06F17/30

    摘要: A method for searching a document collection includes providing an index of terms indicating the documents in which the terms appear. A first statistical distribution of each of at least some of the terms in the index and a second statistical distribution of each of at least some of the categories are estimated a over the documents in the collection. A query including one or more of the terms and a category restriction referring to at least one of the categories is accepted. A modified term distribution is produced by operating on the first statistical distribution of at least one of the terms in the query using the second statistical distribution, responsively to the category restriction. The query is applied to the index to return a response, in which occurrences of the at least one of the terms are scored responsively to the modified term distribution.

    摘要翻译: 用于搜索文档收集的方法包括提供指示术语出现的文档的术语索引。 在集合中的文档上估计索引中的至少一些术语和至少一些类别中的每一个的第二统计分布中的每一个的第一统计分布。 接受包括一个或多个术语和涉及至少一个类别的类别限制的查询。 响应于类别限制,通过使用第二统计分布对查询中的至少一个项的第一统计分布进行操作来产生修改的术语分布。 该查询被应用于索引以返回响应,其中至少一个项目的出现响应于修改的术语分布而得分。

    Term-statistics modification for category-based search
    14.
    发明申请
    Term-statistics modification for category-based search 有权
    基于类别搜索的期限统计修改

    公开(公告)号:US20060248074A1

    公开(公告)日:2006-11-02

    申请号:US11117749

    申请日:2005-04-28

    IPC分类号: G06F17/30

    摘要: A method for searching a document collection includes providing an index of terms indicating the documents in which the terms appear. A first statistical distribution of each of at least some of the terms in the index and a second statistical distribution of each of at least some of the categories are estimated a over the documents in the collection. A query including one or more of the terms and a category restriction referring to at least one of the categories is accepted. A modified term distribution is produced by operating on the first estimated statistical distribution of at least one of the terms in the query using the second estimated statistical distribution of the at least one of the categories, responsively to the category restriction. The query is applied to the index so as to return a response, in which occurrences of the at least one of the terms are scored responsively to the modified term distribution.

    摘要翻译: 用于搜索文档收集的方法包括提供指示术语出现的文档的术语索引。 在集合中的文档上估计索引中的至少一些术语和至少一些类别中的每一个的第二统计分布中的每一个的第一统计分布。 接受包括一个或多个术语和涉及至少一个类别的类别限制的查询。 响应于类别限制,通过使用所述至少一个类别的第二估计统计分布对查询中的至少一个项的第一估计统计分布进行操作来产生修改后的分配。 将该查询应用于索引以便返回响应,其中至少一个项目的出现响应于修改的术语分布而得分。

    Scoring of crowd-computing inputs
    15.
    发明授权
    Scoring of crowd-computing inputs 有权
    人群计算投入的得分

    公开(公告)号:US08856021B2

    公开(公告)日:2014-10-07

    申请号:US13158425

    申请日:2011-06-12

    IPC分类号: G06Q10/00 G06Q10/10 G06Q30/02

    摘要: Method, system, and computer program product are provided for scoring of crowd-computing inputs. A group of data is provided to crowd-computing participants and the participants are requested to provide candidate members or the group of data. The computer-implemented method performed includes: receiving an input by a participant, wherein the input is a candidate member; counting multiple inputs of the same candidate member by participants; validating a candidate member; rewarding the participants inputting the candidate member, with a higher reward for participants who input the candidate member earlier than other participants; and supplying the rewards to participants once the candidate member has been validated.

    摘要翻译: 提供方法,系统和计算机程序产品用于计算人群计算输入。 向群众计算参与者提供一组数据,并请求参与者提供候选成员或数据组。 执行的计算机实现的方法包括:接收参与者的输入,其中所述输入是候选成员; 计算参与者同一候选成员的多个投入; 验证候选人; 奖励参与者输入候选人的成员,比其他参与者更早地输入候选人的参与者的奖励; 并且一旦候选成员被验证,就向参与者提供奖励。

    Indexing and searching entity-relationship data
    16.
    发明授权
    Indexing and searching entity-relationship data 有权
    索引和搜索实体关系数据

    公开(公告)号:US08751505B2

    公开(公告)日:2014-06-10

    申请号:US13417248

    申请日:2012-03-11

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30604

    摘要: Method, system, and computer program product for indexing and searching entity-relationship data are provided. The method includes: defining a logical document model for entity-relationship data including: representing an entity as a document containing the entity's searchable content and metadata; dually representing the entity as a document and as a category; and representing each relationship instance for the entity as a category set that contains categories of all participating entities in the relationship. The method also includes: translating entity-relationship data into the logical document model; and indexing the entity-relationship data of the populated logical document model as an inverted index. The method may include searching indexed entity-relationship data using a faceted search, wherein the categories are all categories required for supporting faceted navigation.

    摘要翻译: 提供了索引和搜索实体关系数据的方法,系统和计算机程序产品。 该方法包括:定义用于实体关系数据的逻辑文档模型,包括:将实体表示为包含该实体的可搜索内容和元数据的文档; 将实体双重表示为文件和类别; 并将实体的每个关系实例表示为包含关系中所有参与实体的类别的类别集合。 该方法还包括:将实体关系数据转换为逻辑文档模型; 并将填充的逻辑文档模型的实体关系数据索引为反向索引。 该方法可以包括使用分面搜索搜索索引的实体关系数据,其中类别是支持分面导航所需的所有类别。

    Method and system for using social bookmarks
    17.
    发明授权
    Method and system for using social bookmarks 有权
    使用社交书签的方法和系统

    公开(公告)号:US08266157B2

    公开(公告)日:2012-09-11

    申请号:US12550376

    申请日:2009-08-30

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30

    摘要: A method and system for using social bookmarks wherein a social bookmark is a triplet of the entities of user, document, and tag. The method including: collecting multiple bookmarks; representing the bookmarks as a three-dimensional space or matrix of the number of times a user u, used tag t to bookmark document d; measuring the similarity of two entities of the same type; and using the similarity to weight bookmarks or entities. The weightings may be used to provide a measure of a usefulness of a bookmark for describing a document for retrieval purposes. Two-dimensions of the bookmark space may also be used to predict the third-dimension.

    摘要翻译: 一种用于使用社交书签的方法和系统,其中社交书签是用户,文档和标签的实体的三元组。 该方法包括:收集多个书签; 将书签代表用户u的三维空间或矩阵,使用标签t来书签文件d; 测量相同类型的两个实体的相似度; 并使用相似度加权书签或实体。 权重可用于提供用于描述用于检索目的的文档的书签的有用性的量度。 书签空间的二维还可以用于预测第三维。

    Method and system for maintaining profiles of information channels
    18.
    发明授权
    Method and system for maintaining profiles of information channels 有权
    维护信息通道的方法和系统

    公开(公告)号:US07970739B2

    公开(公告)日:2011-06-28

    申请号:US12111972

    申请日:2008-04-30

    IPC分类号: G06F7/00

    CPC分类号: G06F17/30867 H04L69/14

    摘要: A method and system are provided for maintaining profiles of information channels available on the Web, wherein the information channels are accessed via pull-only protocols. The method includes monitoring one or more channels by a channel pull action at a monitoring rate, wherein the monitoring rate is determined for the one or more channels based on the number of update events in a previous time period. The method may optimally include filtering the update events in the time period by a novelty measure, wherein the filtering disregards events that do not include significant novel information. The monitoring rate is adapted based on reinforcement learning applying iterative learning rules over time.

    摘要翻译: 提供了一种用于维护在Web上可用的信息信道的简档的方法和系统,其中通过仅拉协议访问信息信道。 该方法包括以监视速率通过信道拉动操作监视一个或多个信道,其中基于前一时间段内的更新事件的数量来确定针对一个或多个信道的监视速率。 该方法可以最佳地包括通过新颖度量来对该时间段内的更新事件进行过滤,其中过滤忽略不包括重要新颖信息的事件。 基于强化学习,随着时间的推移应用迭代学习规则,对监测率进行了调整。

    METHOD AND SYSTEM FOR IMPROVED QUERY EXPANSION IN FACETED SEARCH
    19.
    发明申请
    METHOD AND SYSTEM FOR IMPROVED QUERY EXPANSION IN FACETED SEARCH 审中-公开
    用于在面向搜索中改进查询扩展的方法和系统

    公开(公告)号:US20110125764A1

    公开(公告)日:2011-05-26

    申请号:US12626642

    申请日:2009-11-26

    IPC分类号: G06F17/30

    CPC分类号: G06F16/3338 G06F16/332

    摘要: A method and system for improved query expansion in faceted search are provided. The method includes: receiving a search query; expanding the search query to obtain query expansion terms; and receiving a facet selection for the search query. A facet profile is retrieved in the form of collected important terms for the facet; and the query expansion terms are weighted by comparing them to the facet profile. The query expansion terms are re-ranked and the method includes executing the re-weighted query expansion terms whilst filtering for the facet.

    摘要翻译: 提供了一种用于改进多面搜索中查询扩展的方法和系统。 该方法包括:接收搜索查询; 扩展搜索查询以获取查询扩展条款; 并接收搜索查询的小平面选择。 以收集的重要术语的形式检索小平面; 并且通过将查询扩展项与小平面轮廓进行比较来加权查询扩展项。 查询扩展术语被重新排序,并且该方法包括执行重新加权的查询扩展项,同时对小平面进行过滤。

    Method and System of Prioritising Operations On Network Objects
    20.
    发明申请
    Method and System of Prioritising Operations On Network Objects 审中-公开
    网络对象优先操作优先级的方法与系统

    公开(公告)号:US20100281035A1

    公开(公告)日:2010-11-04

    申请号:US12432808

    申请日:2009-04-30

    IPC分类号: G06F17/30 G06N7/02 G06Q99/00

    CPC分类号: G06Q50/01 G06F16/951

    摘要: A method and system for prioritising operations on network objects are provided. The method includes gathering Web 2.0 available relationship data on the relationships between network entities, wherein network entities are network users and network objects. The relationship data for a network entity is analysed and a first relative score is determined based on the relationship data. For a network object, a second relative score is determined which is a dynamic score based on user interactions with the network object and formed using the first relative scores of network entities interacting with the object. The method then prioritizes an operation on a network object using the second relative score.

    摘要翻译: 提供了一种用于对网络对象进行优先级操作的方法和系统。 该方法包括收集关于网络实体之间的关系的Web 2.0可用关系数据,其中网络实体是网络用户和网络对象。 分析网络实体的关系数据,并且基于关系数据确定第一相对分数。 对于网络对象,确定第二相对分数,其是基于与网络对象的用户交互的动态分数,并且使用与对象交互的网络实体的第一相对分数形成。 该方法然后使用第二相对分数对网络对象的操作进行优先级排序。