COLLECTING, ORGANIZING, AND SEARCHING KNOWLEDGE ABOUT A DATASET
    1.
    发明申请
    COLLECTING, ORGANIZING, AND SEARCHING KNOWLEDGE ABOUT A DATASET 审中-公开
    收集,组织和搜索关于数据库的知识

    公开(公告)号:US20160132572A1

    公开(公告)日:2016-05-12

    申请号:US14538393

    申请日:2014-11-11

    Abstract: Techniques for organizing knowledge about a dataset storing data from or about multiple sources may be provided. For example, the data can be accessed from the multiple sources and categorized based on the data type. For each data type, a triple extraction technique specific to that data type may be invoked. One set of techniques can allow the extraction of triples from the data based on natural language-based rules. Another set of techniques can allow a similar extraction based on logical or structural-based rules. A triple may store a relationship between elements of the data. The extracted triples can be stored with corresponding identifiers in a list. Further, dictionaries storing associations between elements of the data and the triples can be updated. The list and the dictionaries can be used to return triples in response to a query that specifies one or more elements.

    Abstract translation: 可以提供用于组织关于存储来自或关于多个源的数据的数据集的知识的技术。 例如,可以从多个来源访问数据,并根据数据类型进行分类。 对于每种数据类型,可以调用特定于该数据类型的三重提取技术。 一组技术可以允许从基于自然语言的规则的数据中提取三元组。 另一组技术可以允许基于逻辑或基于结构的规则的类似提取。 三元组可以存储数据元素之间的关系。 提取的三元组可以在列表中存储相应的标识符。 此外,可以更新存储数据的元素与三元组之间的关联的字典。 列表和字典可用于返回三元组以响应指定一个或多个元素的查询。

    INPUT/OUTPUT INTERFACE FOR CONTEXTUAL ANALYSIS ENGINE
    2.
    发明申请
    INPUT/OUTPUT INTERFACE FOR CONTEXTUAL ANALYSIS ENGINE 审中-公开
    用于分析发动机的输入/输出接口

    公开(公告)号:US20150106156A1

    公开(公告)日:2015-04-16

    申请号:US14054291

    申请日:2013-10-15

    CPC classification number: G06Q30/0201

    Abstract: A contextual analysis engine systematically extracts, analyzes and organizes digital content stored in an electronic file such as a webpage. Content can be extracted using a text extraction module which is capable of separating the content which is to be analyzed from less meaningful content such as format specifications and programming scripts. The resulting unstructured corpus of plain text can then be passed to a text analytics module capable of generating a structured categorization of topics included within the content. This structured categorization can be organized based on a content topic ontology which may have been previously defined or which may be developed in real-time. The systems disclosed herein optionally include an input/output interface capable of managing workflows of the text extraction module and the text analytics module, administering a cache of previously generated results, and interfacing with other applications that leverage the disclosed contextual analysis services.

    Abstract translation: 上下文分析引擎系统地提取,分析和组织存储在诸如网页的电子文件中的数字内容。 可以使用文本提取模块来提取内容,该文本提取模块能够将要分析的内容与诸如格式规范和编程脚本之类的不太有意义的内容进行分离。 然后,所得到的纯文本的非结构化语料库可以被传递到能够生成包含在内容内的主题的结构化分类的文本分析模块。 这种结构化分类可以基于可能已经被定义或可以实时开发的内容主题本体来组织。 本文公开的系统可选地包括能够管理文本提取模块和文本分析模块的工作流的输入/输出接口,管理先前生成的结果的缓存,以及与利用所公开的上下文分析服务的其他应用程序的接口。

    CONTEXTUAL ANALYSIS ENGINE
    3.
    发明申请
    CONTEXTUAL ANALYSIS ENGINE 有权
    背景分析发动机

    公开(公告)号:US20150106078A1

    公开(公告)日:2015-04-16

    申请号:US14054351

    申请日:2013-10-15

    Inventor: Walter Chang

    CPC classification number: G06F17/30705

    Abstract: A contextual analysis engine systematically extracts, analyzes and organizes digital content stored in an electronic file such as a webpage. Content can be extracted using a text extraction module which is capable of separating the content which is to be analyzed from less meaningful content such as format specifications and programming scripts. The resulting unstructured corpus of plain text can then be passed to a text analytics module capable of generating a structured categorization of topics included within the content. This structured categorization can be organized based on a content topic ontology which may have been previously defined or which may be developed in real-time. The systems disclosed herein optionally include an input/output interface capable of managing workflows of the text extraction module and the text analytics module, administering a cache of previously generated results, and interfacing with other applications that leverage the disclosed contextual analysis services.

    Abstract translation: 上下文分析引擎系统地提取,分析和组织存储在诸如网页的电子文件中的数字内容。 可以使用文本提取模块来提取内容,该文本提取模块能够将要分析的内容与诸如格式规范和编程脚本之类的不太有意义的内容进行分离。 然后,所得到的纯文本的非结构化语料库可以被传递到能够生成包含在内容内的主题的结构化分类的文本分析模块。 这种结构化分类可以基于可能已经被定义或可以实时开发的内容主题本体来组织。 本文公开的系统可选地包括能够管理文本提取模块和文本分析模块的工作流的输入/输出接口,管理先前生成的结果的缓存,以及与利用所公开的上下文分析服务的其他应用程序的接口。

    NATURAL LANGUAGE CONSUMER SEGMENTATION
    4.
    发明申请
    NATURAL LANGUAGE CONSUMER SEGMENTATION 审中-公开
    自然语言消费者分类

    公开(公告)号:US20160103822A1

    公开(公告)日:2016-04-14

    申请号:US14513410

    申请日:2014-10-14

    CPC classification number: G06F17/3043 G06F17/277 G06F17/2795 G06Q30/0204

    Abstract: Techniques are disclosed for using natural language processing techniques to define, manipulate, and interact with consumer segmentations. In such embodiments a content consumption analytics engine can be configured to receive and process a natural language segmentation query. The query may comprise, for example, a command that defines a new segmentation, a command that manipulates existing segmentations, or a command that solicits information relating to existing consumer segmentations. The query is parsed to identify individual grammatical tokens which are then correlated with specific segment token types through the use of a token repository. A custom thesaurus is used to identify synonymous terms for grammatical tokens which may not exist in the token repository. User feedback enables the custom thesaurus to learn additional synonyms for future use. Once the grammatical tokens are mapped onto the identified segment token types, a formal segment definition can be constructed based on a segment definition structure.

    Abstract translation: 公开了使用自然语言处理技术来定义,操纵和与消费者分段进行交互的技术。 在这样的实施例中,内容消费分析引擎可以被配置为接收和处理自然语言分割查询。 该查询可以包括例如定义新分段的命令,操纵现有分段的命令,或者是索引有关现有消费者分段的信息的命令。 解析查询以识别单个语法令牌,然后通过使用令牌存储库与特定段标记类型相关联。 自定义词库用于识别令牌存储库中可能不存在的语法令牌的同义词。 用户反馈使定制词库能够学习更多的同义词以供将来使用。 一旦将语法令牌映射到所识别的段令牌类型上,则可以基于段定义结构来构建形式段定义。

    TEXT EXTRACTION MODULE FOR CONTEXTUAL ANALYSIS ENGINE
    5.
    发明申请
    TEXT EXTRACTION MODULE FOR CONTEXTUAL ANALYSIS ENGINE 审中-公开
    文本分析引擎的文本提取模块

    公开(公告)号:US20150106157A1

    公开(公告)日:2015-04-16

    申请号:US14054318

    申请日:2013-10-15

    CPC classification number: G06Q30/0201 G06F17/27

    Abstract: A contextual analysis engine systematically extracts, analyzes and organizes digital content stored in an electronic file such as a webpage. Content can be extracted using a text extraction module which is capable of separating the content which is to be analyzed from less meaningful content such as format specifications and programming scripts. The resulting unstructured corpus of plain text can then be passed to a text analytics module capable of generating a structured categorization of topics included within the content. This structured categorization can be organized based on a content topic ontology which may have been previously defined or which may be developed in real-time. The systems disclosed herein optionally include an input/output interface capable of managing workflows of the text extraction module and the text analytics module, administering a cache of previously generated results, and interfacing with other applications that leverage the disclosed contextual analysis services.

    Abstract translation: 上下文分析引擎系统地提取,分析和组织存储在诸如网页的电子文件中的数字内容。 可以使用文本提取模块来提取内容,该文本提取模块能够将要分析的内容与诸如格式规范和编程脚本之类的不太有意义的内容进行分离。 然后,所得到的纯文本的非结构化语料库可以被传递到能够生成包含在内容内的主题的结构化分类的文本分析模块。 这种结构化分类可以基于可能已经被定义或可以实时开发的内容主题本体来组织。 本文公开的系统可选地包括能够管理文本提取模块和文本分析模块的工作流的输入/输出接口,管理先前生成的结果的缓存,以及与利用所公开的上下文分析服务的其他应用程序的接口。

    IDENTIFICATION OF READING ORDER TEXT SEGMENTS WITH A PROBABILISTIC LANGUAGE MODEL

    公开(公告)号:US20180267956A1

    公开(公告)日:2018-09-20

    申请号:US15462684

    申请日:2017-03-17

    Abstract: A computer implemented method and system identifies correct structured reading-order sequence of text segments that are extracted from a file structured in a portable document format. A probabilistic language model is generated from a large text corpus to comprise observed word sequence patterns for a given language. The language model measures whether splicing together a first text segment with another continuation text segment results in a phrase that is more likely than a phrase resulting from splicing together the first text segment with other continuation text segments. Sets of text segments are provided to the probabilistic model, where the sets of text segments comprise a first set including the first text segment and a first continuation text segment. A second set includes the first text segment and a second continuation text segment. A score is obtained for each set of text segments. The score is indicative of a likelihood of the set providing a correct structured reading-order sequence. The probabilistic language model may be generated in accordance with a Recurrent Neural Network or an n-gram model.

    GENERATING A QUERY STATEMENT BASED ON UNSTRUCTURED INPUT
    7.
    发明申请
    GENERATING A QUERY STATEMENT BASED ON UNSTRUCTURED INPUT 有权
    基于非结构输入生成查询声明

    公开(公告)号:US20160140123A1

    公开(公告)日:2016-05-19

    申请号:US14540602

    申请日:2014-11-13

    CPC classification number: G06F17/3043

    Abstract: Techniques for generating a query statement to query a dataset may be provided. For example, the query statement can be generated from natural language input, such as a natural language utterance. To do so, the input can be analyzed to detect a sentence, identify words in the sentence, and tag the words with the corresponding word types (e.g., nouns, verbs, adjectives, etc.). Expressions using the tags can be generated. Data about the expressions can be inputted to a classifier. Based on a detected pattern associated with the expressions, the classifier can predict a structure of the query statement, such as what expressions correspond to what clauses of the query statement. Based on this prediction, words associated with the expressions can be added to the clauses to generate the query statement and accordingly query the dataset.

    Abstract translation: 可以提供用于生成用于查询数据集的查询语句的技术。 例如,查询语句可以从自然语言输入生成,如自然语言语言。 为了这样做,可以分析输入以检测句子,识别句子中的单词,并用相应的单词类型(例如,名词,动词,形容词等)来标记单词。 可以生成使用标签的表达式。 关于表达式的数据可以输入到分类器。 基于与表达式相关联的检测模式,分类器可以预测查询语句的结构,例如什么表达式对应于查询语句的哪些子句。 基于此预测,可以将与表达式关联的单词添加到子句中以生成查询语句,并相应地查询数据集。

    Generating a query statement based on unstructured input

    公开(公告)号:US10025819B2

    公开(公告)日:2018-07-17

    申请号:US14540602

    申请日:2014-11-13

    Abstract: Techniques for generating a query statement to query a dataset may be provided. For example, the query statement can be generated from natural language input, such as a natural language utterance. To do so, the input can be analyzed to detect a sentence, identify words in the sentence, and tag the words with the corresponding word types (e.g., nouns, verbs, adjectives, etc.). Expressions using the tags can be generated. Data about the expressions can be inputted to a classifier. Based on a detected pattern associated with the expressions, the classifier can predict a structure of the query statement, such as what expressions correspond to what clauses of the query statement. Based on this prediction, words associated with the expressions can be added to the clauses to generate the query statement and accordingly query the dataset.

    Contextual analysis engine
    9.
    发明授权

    公开(公告)号:US09990422B2

    公开(公告)日:2018-06-05

    申请号:US14054351

    申请日:2013-10-15

    Inventor: Walter Chang

    CPC classification number: G06F17/30705

    Abstract: A contextual analysis engine systematically extracts, analyzes and organizes digital content stored in an electronic file such as a webpage. Content can be extracted using a text extraction module which is capable of separating the content which is to be analyzed from less meaningful content such as format specifications and programming scripts. The resulting unstructured corpus of plain text can then be passed to a text analytics module capable of generating a structured categorization of topics included within the content. This structured categorization can be organized based on a content topic ontology which may have been previously defined or which may be developed in real-time. The systems disclosed herein optionally include an input/output interface capable of managing workflows of the text extraction module and the text analytics module, administering a cache of previously generated results, and interfacing with other applications that leverage the disclosed contextual analysis services.

    Natural language consumer segmentation

    公开(公告)号:US10102246B2

    公开(公告)日:2018-10-16

    申请号:US14513410

    申请日:2014-10-14

    Abstract: Techniques are disclosed for using natural language processing techniques to define, manipulate, and interact with consumer segmentations. In such embodiments a content consumption analytics engine can be configured to receive and process a natural language segmentation query. The query may comprise, for example, a command that defines a new segmentation, a command that manipulates existing segmentations, or a command that solicits information relating to existing consumer segmentations. The query is parsed to identify individual grammatical tokens which are then correlated with specific segment token types through the use of a token repository. A custom thesaurus is used to identify synonymous terms for grammatical tokens which may not exist in the token repository. User feedback enables the custom thesaurus to learn additional synonyms for future use. Once the grammatical tokens are mapped onto the identified segment token types, a formal segment definition can be constructed based on a segment definition structure.

Patent Agency Ranking