System and Method for Automatically Detecting and Interactively Displaying Information About Entities, Activities, and Events from Multiple-Modality Natural Language Sources
    5.
    发明申请
    System and Method for Automatically Detecting and Interactively Displaying Information About Entities, Activities, and Events from Multiple-Modality Natural Language Sources 审中-公开
    自动检测和交互式显示多模式自然语言源的实体,活动和事件信息的系统和方法

    公开(公告)号:US20130332450A1

    公开(公告)日:2013-12-12

    申请号:US13493659

    申请日:2012-06-11

    IPC分类号: G06F17/30 G06F17/28

    摘要: A method for automatically extracting and organizing information by a processing device from a plurality of data sources is provided. A natural language processing information extraction pipeline that includes an automatic detection of entities is applied to the data sources. Information about detected entities is identified by analyzing products of the natural language processing pipeline. Identified information is grouped into equivalence classes containing equivalent information. At least one displayable representation of the equivalence classes is created. An order in which the at least one displayable representation is displayed is computed. A combined representation of the equivalence classes that respects the order in which the displayable representation is displayed is produced.

    摘要翻译: 提供了一种通过处理装置从多个数据源自动提取和组织信息的方法。 包括实体的自动检测的自然语言处理信息提取流水线被应用于数据源。 通过分析自然语言处理流水线的产品来识别检测到的实体信息。 识别的信息被分为包含等效信息的等价类。 创建等价类的至少一个可显示的表示形式。 计算显示至少一个可显示表示的顺序。 产生了相当于显示可显示表示的顺序的等价类的组合表示。

    Predicting pronouns of dropped pronoun style languages for natural language translation
    6.
    发明授权
    Predicting pronouns of dropped pronoun style languages for natural language translation 有权
    预测用于自然语言翻译的代词缩略语言代词

    公开(公告)号:US08903707B2

    公开(公告)日:2014-12-02

    申请号:US13348995

    申请日:2012-01-12

    IPC分类号: G06F17/28

    CPC分类号: G06F17/2827

    摘要: A method, an apparatus and an article of manufacture for determining a dropped pronoun from a source language. The method includes collecting parallel sentences from a source and a target language, creating at least one word alignment between the parallel sentences in the source and the target language, mapping at least one pronoun from the target language sentence onto the source language sentence, computing at least one feature from the mapping, wherein the at least one feature is extracted from both the source language and the at least one pronoun projected from the target language, and using the at least one feature to train a classifier to predict position and spelling of at least one pronoun in the target language when the at least one pronoun is dropped in the source language.

    摘要翻译: 一种用于从源语言确定掉落代词的方法,装置和制品。 该方法包括从源和目标语言收集并行句子,在源和目标语言中的并行句子之间创建至少一个单词对齐,将来自目标语言句子的至少一个代词映射到源语言句子上,计算 所述映射中的至少一个特征,其中从所述源语言和从所述目标语言投射的所述至少一个代词中提取所述至少一个特征,并且使用所述至少一个特征来训练分类器来预测位置和拼写的位置和拼写 当至少一个代词在源语言中被删除时,目标语言中至少有一个代词。

    Adaptation of statistical parsers based on mathematical transform
    7.
    发明授权
    Adaptation of statistical parsers based on mathematical transform 有权
    基于数学变换的统计解析器的适应

    公开(公告)号:US07308400B2

    公开(公告)日:2007-12-11

    申请号:US09737259

    申请日:2000-12-14

    IPC分类号: G06F17/27 G10L15/18

    CPC分类号: G06F17/2715

    摘要: An arrangement for adapting statistical parsers to new data using a mathematical transform, particularly a Markov transform. In particular, it is assumed that an initial statistical parser is available and a batch of new data is given. The initial model is mapped to a new model by a Markov matrix, each of whose rows sums to one. In the unsupervised setup, where “true” parses are missing, the transform matrix is obtained by maximizing the log likelihood of the parses of test data decoded using the model before adaptation. The proposed algorithm can be applied to supervised adaptation, as well.

    摘要翻译: 使用数学变换,特别是马尔科夫变换,使统计解析器适应新数据的安排。 特别地,假设初始统计解析器可用并且给出一批新数据。 初始模型通过马尔可夫矩阵映射到新模型,每个行的行总和为1。 在无人监控的设置中,“真”解析丢失,通过最大化在适应之前使用模型解码的测试数据的解析的对数似然性来获得变换矩阵。 所提出的算法也可以应用于监督适应。

    Predicting Pronouns for Pro-Drop Style Languages for Natural Language Translation
    8.
    发明申请
    Predicting Pronouns for Pro-Drop Style Languages for Natural Language Translation 有权
    预测自然语言翻译中Pro-Drop风格语言的代词

    公开(公告)号:US20130185049A1

    公开(公告)日:2013-07-18

    申请号:US13348995

    申请日:2012-01-12

    IPC分类号: G06F17/28 G06F17/27

    CPC分类号: G06F17/2827

    摘要: A method, an apparatus and an article of manufacture for determining a dropped pronoun from a source language. The method includes collecting parallel sentences from a source and a target language, creating at least one word alignment between the parallel sentences in the source and the target language, mapping at least one pronoun from the target language sentence onto the source language sentence, computing at least one feature from the mapping, wherein the at least one feature is extracted from both the source language and the at least one pronoun projected from the target language, and using the at least one feature to train a classifier to predict position and spelling of at least one pronoun in the target language when the at least one pronoun is dropped in the source language.

    摘要翻译: 一种用于从源语言确定掉落代词的方法,装置和制品。 该方法包括从源和目标语言收集并行句子,在源和目标语言中的并行句子之间创建至少一个单词对齐,将来自目标语言句子的至少一个代词映射到源语言句子上,计算 所述映射中的至少一个特征,其中从所述源语言和从所述目标语言投射的所述至少一个代词中提取所述至少一个特征,并且使用所述至少一个特征来训练分类器来预测位置和拼写的位置和拼写 当至少一个代词在源语言中被删除时,目标语言中至少有一个代词。

    Chinese character-based parser
    9.
    发明申请
    Chinese character-based parser 有权
    基于汉字的解析器

    公开(公告)号:US20050234707A1

    公开(公告)日:2005-10-20

    申请号:US10826707

    申请日:2004-04-16

    IPC分类号: G06F17/27 G06F17/28

    CPC分类号: G06F17/271 G06F17/2863

    摘要: A parser is provided that parses a Chinese text stream at the character level and builds a syntactic structure of Chinese character sequences. A character-based syntactic parse tree contains word boundaries, part-of-speech tags, and phrasal structure information. Syntactic knowledge constrains the system when it determines word boundaries. A deterministic procedure is used to convert word-based parse trees into character-based trees. Character-level tags are derived from word-level part-of-speech tags and word-boundary information is encoded with a positional tag. Word-level parts-of-speech become a constituent label in character-based trees. A maximum entropy parser is then built and tested.

    摘要翻译: 提供一个解析器,在字符级别分析中文文本流,构建汉字序列的句法结构。 基于字符的句法解析树包含词边界,词性标签和短语结构信息。 语义知识约束系统确定字边界时。 使用确定性过程将基于字的解析树转换为基于字符的树。 字符级标签是从词级词义标签中得出的,而字边界信息用位置标签编码。 词级词性成为基于字符的树的组成标签。 然后构建和测试最大熵解析器。

    Chinese character-based parser
    10.
    发明授权
    Chinese character-based parser 有权
    基于汉字的解析器

    公开(公告)号:US07464024B2

    公开(公告)日:2008-12-09

    申请号:US10826707

    申请日:2004-04-16

    IPC分类号: G06F17/27

    CPC分类号: G06F17/271 G06F17/2863

    摘要: A parser is provided that parses a Chinese text stream at the character level and builds a syntactic structure of Chinese character sequences. A character-based syntactic parse tree contains word boundaries, part-of-speech tags, and phrasal structure information. Syntactic knowledge constrains the system when it determines word boundaries. A deterministic procedure is used to convert word-based parse trees into character-based trees. Character-level tags are derived from word-level part-of-speech tags and word-boundary information is encoded with a positional tag. Word-level parts-of-speech become a constituent label in character-based trees. A maximum entropy parser is then built and tested.

    摘要翻译: 提供一个解析器,在字符级别分析中文文本流,构建汉字序列的句法结构。 基于字符的句法解析树包含词边界,词性标签和短语结构信息。 语义知识约束系统确定字边界时。 使用确定性过程将基于字的解析树转换为基于字符的树。 字符级标签是从词级词义标签中得出的,而字边界信息用位置标签编码。 词级词性成为基于字符的树的组成标签。 然后构建和测试最大熵解析器。