Systems and methods for information extraction using contextual pattern discovery
    1.
    发明授权
    Systems and methods for information extraction using contextual pattern discovery 有权
    使用上下文模式发现的信息提取的系统和方法

    公开(公告)号:US08630989B2

    公开(公告)日:2014-01-14

    申请号:US13117570

    申请日:2011-05-27

    IPC分类号: G06F17/30

    CPC分类号: G06F17/278

    摘要: Described herein are methods, systems, apparatuses and products for automatically discovering patterns in a text corpus. An aspect provides extracting at least one context string related to at least one annotator from the at least one text corpus; analyzing the at least one context string for at least one sequence, the at least one sequence comprised of at least one subsequence; determining at least one sequence signature for each at least one sequence by applying applicable rules to the at least one sequence; and grouping the at least one sequence signature into at least one group.

    摘要翻译: 这里描述了用于自动发现文本语料库中的模式的方法,系统,装置和产品。 一方面提供从所述至少一个文本语料库提取与至少一个注释器相关的至少一个上下文串; 分析至少一个序列的至少一个上下文串,所述至少一个序列由至少一个子序列组成; 通过将适用规则应用于所述至少一个序列来确定每个至少一个序列的至少一个序列签名; 以及将所述至少一个序列签名分组成至少一个组。

    SYSTEMS AND METHODS FOR INFORMATION EXTRACTION USING CONTEXTUAL PATTERN DISCOVERY
    2.
    发明申请
    SYSTEMS AND METHODS FOR INFORMATION EXTRACTION USING CONTEXTUAL PATTERN DISCOVERY 有权
    使用上下文图案发现的信息提取的系统和方法

    公开(公告)号:US20120303661A1

    公开(公告)日:2012-11-29

    申请号:US13117570

    申请日:2011-05-27

    IPC分类号: G06F17/30

    CPC分类号: G06F17/278

    摘要: Described herein are methods, systems, apparatuses and products for automatically discovering patterns in a text corpus. An aspect provides extracting at least one context string related to at least one annotator from the at least one text corpus; analyzing the at least one context string for at least one sequence, the at least one sequence comprised of at least one subsequence; determining at least one sequence signature for each at least one sequence by applying applicable rules to the at least one sequence; and grouping the at least one sequence signature into at least one group.

    摘要翻译: 这里描述了用于自动发现文本语料库中的模式的方法,系统,装置和产品。 一方面提供从所述至少一个文本语料库提取与至少一个注释器相关的至少一个上下文串; 分析至少一个序列的至少一个上下文串,所述至少一个序列由至少一个子序列组成; 通过将适用规则应用于所述至少一个序列来确定每个至少一个序列的至少一个序列签名; 以及将所述至少一个序列签名分组成至少一个组。

    RULE-DRIVEN RUNTIME CUSTOMIZATION OF KEYWORD SEARCH ENGINES
    4.
    发明申请
    RULE-DRIVEN RUNTIME CUSTOMIZATION OF KEYWORD SEARCH ENGINES 审中-公开
    关键词搜索引擎的规则运行自定义

    公开(公告)号:US20130185304A1

    公开(公告)日:2013-07-18

    申请号:US13351347

    申请日:2012-01-17

    IPC分类号: G06F17/30

    CPC分类号: G06F16/951

    摘要: Described herein are methods, systems, apparatuses and products for rule-driven runtime customization of keyword search engines. An aspect provides a method for rule-driven customization of keyword searches, including: receiving by a computer an input keyword query; determining from the input keyword query and a dataset to be queried at least one rule selected from the group consisting of: a re-write rule; a category ranking rule, and a category grouping rule; and applying the at least one rule to generate search results based on domain knowledge of the dataset. Other embodiments are disclosed.

    摘要翻译: 这里描述了关键词搜索引擎的规则驱动运行时定制的方法,系统,装置和产品。 一个方面提供了一种用于关键词搜索的规则驱动定制的方法,包括:由计算机接收输入关键字查询; 从所述输入关键字查询和要查询的至少一个规则确定从包括以下的组中选择的至少一个规则:重写规则; 类别排序规则和类别分组规则; 以及应用所述至少一个规则以基于所述数据集的域知识来生成搜索结果。 公开了其他实施例。

    RULE-DRIVEN RUNTIME CUSTOMIZATION OF KEYWORD SEARCH ENGINES

    公开(公告)号:US20130185330A1

    公开(公告)日:2013-07-18

    申请号:US13595826

    申请日:2012-08-27

    IPC分类号: G06F17/30

    CPC分类号: G06F16/951

    摘要: Described herein are methods, systems, apparatuses and products for rule-driven runtime customization of keyword search engines. An aspect provides a method for rule-driven customization of keyword searches, including: receiving by a computer an input keyword query; determining from the input keyword query and a dataset to be queried at least one rule selected from the group consisting of: a re-write rule; a category ranking rule, and a category grouping rule; and applying the at least one rule to generate search results based on domain knowledge of the dataset. Other embodiments are disclosed.

    SEARCH QUALITY VIA QUERY PROVENANCE VISUALIZATION
    8.
    发明申请
    SEARCH QUALITY VIA QUERY PROVENANCE VISUALIZATION 有权
    通过查询可视化搜索质量

    公开(公告)号:US20130325831A1

    公开(公告)日:2013-12-05

    申请号:US13485541

    申请日:2012-05-31

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30398

    摘要: Methods and arrangements for enhancing search quality. Query search results are displayed, and search query provenance related to the search results is graphically depicted. There is graphically accorded an investigative function to avail investigation of at least one aspect of the search query provenance.

    摘要翻译: 提高搜索质量的方法和安排。 显示查询搜索结果,并以图形方式描绘与搜索结果相关的搜索查询来源。 以图形方式给出调查功能,以便对搜索查询来源的至少一个方面进行调查。

    User-guided regular expression learning
    9.
    发明授权
    User-guided regular expression learning 有权
    用户指导的正则表达式学习

    公开(公告)号:US08805877B2

    公开(公告)日:2014-08-12

    申请号:US12369216

    申请日:2009-02-11

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30985 G06F17/30648

    摘要: A method, device, and computer program product are provided for regular expression learning is provided. An initial regular expression may be received from a user. The initial regular expression is executed over a database. Positive matches and negative matches are labeled. The initial regular expression and the labeled positive and negative matches are input in a transformation process. The transformation process may iteratively execute character class restrictions, quantifier restrictions, negative lookaheads on the initial regular expression to transform the initial regular expression into the pool of candidate regular expressions. The transformation process may execute, one at a time, the character class restrictions, quantifier restrictions, the negative lookaheads. A candidate regular expression is selected from the pool of candidate regular expressions, where the selected candidate regular expression has a best F-Measure out of the pool of candidate regular expressions.

    摘要翻译: 提供了一种用于正则表达式学习的方法,设备和计算机程序产品。 可以从用户接收初始正则表达式。 初始正则表达式通过数据库执行。 正面比赛和负面比赛被标记。 在转换过程中输入初始正则表达式和标记的正和负匹配。 转换过程可以迭代地执行字符类限制,量词限制,初始正则表达式的负面前瞻,以将初始正则表达式转换为候选正则表达式的池。 转换过程可以一次一个地执行字符类限制,量词限制,否定前瞻。 从候选正则表达式的池中选择候选正则表达式,其中所选择的候选正则表达式在候选正则表达式池中具有最佳的F-Measure。

    METHOD TO SEARCH TRANSACTIONAL WEB PAGES
    10.
    发明申请
    METHOD TO SEARCH TRANSACTIONAL WEB PAGES 审中-公开
    搜索交易网页的方法

    公开(公告)号:US20080033953A1

    公开(公告)日:2008-02-07

    申请号:US11462806

    申请日:2006-08-07

    IPC分类号: G06F17/30

    CPC分类号: G06F16/951 G06F16/353

    摘要: A method of performing transactional web page searches is disclosed. The method includes examining a plurality of web pages, identifying transactional features within a set of the plurality of web pages, and classifying the set of web pages as transactional. The method proceeds with annotating and indexing the transactional web pages, and, in response to a user-designated transactional query, providing only the set of web pages that have been classified as transactional. The identifying transactional features comprises checking for the existence of positive patterns and verifying the absence of negative patterns with respect to a set of contents within each of the plurality of web pages and comprises identifying transactional actions to be performed and identifying transactional objects of the transactional actions to be performed. The annotating and indexing the transactional features comprises annotating and indexing transactional actions and transactional objects.

    摘要翻译: 公开了一种执行事务性网页搜索的方法。 该方法包括检查多个网页,识别多个网页的集合内的事务特征,以及将该网页集合分类为事务性的。 该方法继续对事务网页进行注释和索引,并且响应于用户指定的事务查询,仅提供已经被分类为事务的一组网页。 识别事务特征包括检查正模式的存在并且验证相对于多个网页中的每一个内的一组内容的否定模式的缺失,并且包括识别要执行的事务动作并识别事务动作的事务对象 被执行。 事务特征的注释和索引包括对事务动作和事务对象的注释和索引。