Refining a dictionary for information extraction
    1.
    发明授权
    Refining a dictionary for information extraction 失效
    修改信息提取字典

    公开(公告)号:US08775419B2

    公开(公告)日:2014-07-08

    申请号:US13598946

    申请日:2012-08-30

    IPC分类号: G06F17/30

    CPC分类号: G06F17/2735

    摘要: A method for refining a dictionary for information extraction, the operations including: inputting a set of extracted results from execution of an extractor comprising the dictionary on a collection of text, wherein the extracted results are labeled as correct results or incorrect results; processing the extracted results using an algorithm configured to set a score of the extractor above a score threshold, wherein the score threshold balances a precision and a recall of the extractor; and outputting a set of candidate dictionary entries corresponding to a full set of dictionary entries, wherein the candidate dictionary entries are candidates to be removed from the dictionary based on the extracted results.

    摘要翻译: 一种用于提炼用于信息提取的词典的方法,所述操作包括:在文本集合上输入包括字典的提取器的执行中提取的结果集合,其中所提取的结果被标记为正确的结果或不正确的结果; 使用被配置为将提取器的分数设置在分数阈值之上的算法来处理提取的结果,其中分数阈值平衡提取器的精度和回忆; 并输出与一组完整的字典条目对应的一组候选字典条目,其中候选字典条目是根据提取的结果从字典中删除的候选。

    RULE-DRIVEN RUNTIME CUSTOMIZATION OF KEYWORD SEARCH ENGINES
    3.
    发明申请
    RULE-DRIVEN RUNTIME CUSTOMIZATION OF KEYWORD SEARCH ENGINES 审中-公开
    关键词搜索引擎的规则运行自定义

    公开(公告)号:US20130185304A1

    公开(公告)日:2013-07-18

    申请号:US13351347

    申请日:2012-01-17

    IPC分类号: G06F17/30

    CPC分类号: G06F16/951

    摘要: Described herein are methods, systems, apparatuses and products for rule-driven runtime customization of keyword search engines. An aspect provides a method for rule-driven customization of keyword searches, including: receiving by a computer an input keyword query; determining from the input keyword query and a dataset to be queried at least one rule selected from the group consisting of: a re-write rule; a category ranking rule, and a category grouping rule; and applying the at least one rule to generate search results based on domain knowledge of the dataset. Other embodiments are disclosed.

    摘要翻译: 这里描述了关键词搜索引擎的规则驱动运行时定制的方法,系统,装置和产品。 一个方面提供了一种用于关键词搜索的规则驱动定制的方法,包括:由计算机接收输入关键字查询; 从所述输入关键字查询和要查询的至少一个规则确定从包括以下的组中选择的至少一个规则:重写规则; 类别排序规则和类别分组规则; 以及应用所述至少一个规则以基于所述数据集的域知识来生成搜索结果。 公开了其他实施例。

    English-language translation of exact interpretations of keyword queries
    4.
    发明授权
    English-language translation of exact interpretations of keyword queries 失效
    关键词查询的精确解释的英文翻译

    公开(公告)号:US08000957B2

    公开(公告)日:2011-08-16

    申请号:US12129082

    申请日:2008-05-29

    CPC分类号: G06F17/30967

    摘要: The present invention relates to a methodology to translate exact interpretations of keyword queries into meaningful and grammatically correct plain-language queries in order to convey the meaning of these interpretations to the initiator of the search. The method includes the steps of generating at least one grammatically valid plain-language sentence interpretation for a keyword query from a generated sentence plain-language sentence clauses, wherein the grammatically valid plain-language sentence is based upon differing matching elements, and presenting at least one grammatically valid plain-language sentence interpretation for the keyword query to a keyword query system user for the user's review.

    摘要翻译: 本发明涉及一种将关键字查询的精确解释转化为有意义和语法上正确的简单语言查询的方法,以将这些解释的含义传达给搜索的发起者。 该方法包括以下步骤:从生成的句子纯语句句子中产生关键词查询的至少一个语法有效的简单语言解释,其中语法有效的简单语句是基于不同的匹配元素,并且至少呈现 用于用户评论的关键字查询系统用户的关键字查询的一种语法有效的简单语句解释。

    SYSTEM AND METHOD FOR STORING TEXT ANNOTATIONS WITH ASSOCIATED TYPE INFORMATION IN A STRUCTURED DATA STORE
    5.
    发明申请
    SYSTEM AND METHOD FOR STORING TEXT ANNOTATIONS WITH ASSOCIATED TYPE INFORMATION IN A STRUCTURED DATA STORE 有权
    在结构化数据存储中存储具有相关类型信息的文本注释的系统和方法

    公开(公告)号:US20090049021A1

    公开(公告)日:2009-02-19

    申请号:US12257110

    申请日:2008-10-23

    IPC分类号: G06F7/06 G06F17/30

    摘要: A text annotation structured storage system stores text annotations with associated type information in a structured data store. The present system persists or stores annotations in a structured data store in an indexable and queryable format. Exemplary structured data stores comprise XML databases and relational databases. The system exploits type information in a type system to develop corresponding schemas in a structured data model. The system comprises techniques for mapping annotations to an XML data model and a relational data model. The system captures various features of the type system, such as complex types and inheritance, in the schema for the persistent store. In particular, the repository provides support for path navigation over the hierarchical type system starting at any type.

    摘要翻译: 文本注释结构化存储系统将具有关联类型信息的文本注释存储在结构化数据存储中。 本系统以可索引和可​​查询的格式将批注持久存储在结构化数据存储中。 示例性结构化数据存储包括XML数据库和关系数据库。 系统利用类型系统中的类型信息来开发结构化数据模型中的相应模式。 该系统包括用于将注释映射到XML数据模型和关系数据模型的技术。 系统在持久存储的架构中捕获类型系统的各种功能,例如复杂类型和继承。 特别地,存储库提供对从任何类型开始的分层式系统的路径导航的支持。

    System and method for exploiting semantic annotations in executing keyword queries over a collection of text documents
    8.
    发明授权
    System and method for exploiting semantic annotations in executing keyword queries over a collection of text documents 失效
    在文本文档集合上执行关键词查询时利用语义注释的系统和方法

    公开(公告)号:US07548933B2

    公开(公告)日:2009-06-16

    申请号:US11251382

    申请日:2005-10-14

    IPC分类号: G06F7/00 G06F17/00 G06F17/30

    摘要: A query interpretation system exploits semantic annotations in keyword queries over a collection of text documents, casting semantic annotations produced by text analysis engines into a formal annotation type system. The system uses the annotation type system to enumerate various interpretations of a keyword query and automatically translate a keyword query into a set of interpretations expressed in some intermediate query language. The system returns a result list of documents by combining the results of executing one or more of these interpretations. Even though the system generates and uses a complex type system, a user is able to use simple keyword queries to locate documents.

    摘要翻译: 查询解释系统利用文本文档集合中的关键字查询中的语义注释,将由文本分析引擎生成的语义注释转换为正式的注释类型系统。 系统使用注释类型系统来枚举关键词查询的各种解释,并自动将关键字查询转换为以一些中间查询语言表达的一组解释。 系统通过组合执行这些解释中的一个或多个的结果来返回文档的结果列表。 即使系统生成并使用复杂类型的系统,用户也可以使用简单的关键词查询来定位文档。

    ENGLISH-LANGUAGE TRANSLATION OF EXACT INTERPRETATIONS OF KEYWORD QUERIES
    9.
    发明申请
    ENGLISH-LANGUAGE TRANSLATION OF EXACT INTERPRETATIONS OF KEYWORD QUERIES 审中-公开
    英语 - 翻译关键词查询的明确解释

    公开(公告)号:US20080154853A1

    公开(公告)日:2008-06-26

    申请号:US11615115

    申请日:2006-12-22

    IPC分类号: G06F7/10 G06F17/30

    CPC分类号: G06F16/9032

    摘要: The present invention relates to a methodology to translate exact interpretations of keyword queries into meaningful and grammatically correct plain-language queries in order to convey the meaning of these interpretations to the initiator of the search. The method includes the steps of generating at least one grammatically valid plain-language sentence interpretation for a keyword query form a generated sentence is based upon differing matching elements, and presenting at least one grammatically valid plain-language sentence interpretation for the keyword query to a keyword query system user for the user's review.

    摘要翻译: 本发明涉及一种将关键字查询的精确解释转化为有意义和语法上正确的简单语言查询的方法,以将这些解释的含义传达给搜索的发起者。 该方法包括以下步骤:为关键词查询形式生成至少一个语法有效的简单语言解释,所生成的句子基于不同的匹配元素,并且将关键词查询的至少一个语法有效的简单语言解释呈现给 关键字查询系统用户的用户评论。

    System and method for retrieving documents or sub-documents based on examples
    10.
    发明申请
    System and method for retrieving documents or sub-documents based on examples 审中-公开
    基于示例检索文档或子文档的系统和方法

    公开(公告)号:US20050114313A1

    公开(公告)日:2005-05-26

    申请号:US10723112

    申请日:2003-11-26

    IPC分类号: G06F17/30

    CPC分类号: G06F16/3347

    摘要: Disclosed are a system, method, and program storage device implementing the method of extracting information, wherein the method comprises inputting a query; searching a database of documents based on the query; retrieving documents from the database matching the query using a plurality of classifiers arranged in a hierarchical cascade of classifier layers, wherein each classifier comprises a set of weighted training data points comprising feature vectors representing any portion of a document; and weighing an output from the cascade according to a rate of success of query terms being matched by each layer of the cascade, wherein the weighing is performed using a terminal classifier.

    摘要翻译: 公开了实现提取信息的方法的系统,方法和程序存储设备,其中所述方法包括输入查询; 基于查询搜索文档的数据库; 使用布置在分级层的分层级联中的多个分类器从数据库检索与数据库匹配的文档,其中每个分类器包括一组加权训练数据点,其包括代表文档的任何部分的特征向量; 并且根据级联的每个层匹配的查询项的成功率对来自级联的输出进行称量,其中使用终端分类器来执行称重。