Text segmentation and label assignment with user interaction by means of topic specific language models and topic-specific label statistics
    1.
    发明授权
    Text segmentation and label assignment with user interaction by means of topic specific language models and topic-specific label statistics 有权
    通过主题特定语言模型和主题特定标签统计信息,通过用户交互进行文本分割和标签分配

    公开(公告)号:US08200487B2

    公开(公告)日:2012-06-12

    申请号:US10595831

    申请日:2004-11-12

    IPC分类号: G10L15/00

    摘要: The invention relates to a method, a computer program product, a segmentation system and a user interface for structuring an unstructured text by making use of statistical models trained on annotated training data. The method performs text segmentation into text sections and assigns labels to text sections as section headings. The performed segmentation and assignment is provided to a user for general review. Additionally, alternative segmentations and label assignments are provided to the user being capable to select alternative segmentations and alternative labels as well as to enter a user defined segmentation and user defined label. In response to the modifications introduced by the user, a plurality of different actions are initiated incorporating the re-segmentation and re-labelling of successive parts of the document or the entire document. Furthermore the method comprises a learning functionality, logging and analyzing user introduced modifications for adaptation of user's preferences and for further training of the statistical models.

    摘要翻译: 本发明涉及通过利用在注释训练数据上训练的统计模型来构造非结构化文本的方法,计算机程序产品,分割系统和用户界面。 该方法执行文本分段到文本部分,并将标签分配给文本部分作为标题。 执行的分割和分配被提供给用户进行一般审查。 此外,替代分割和标签分配被提供给能够选择替代分割和替代标签以及输入用户定义的分割和用户定义标签的用户。 响应于用户引入的修改,启动了多个不同的动作,其中包括文档或整个文档的连续部分的重新分割和重新标记。 此外,该方法包括学习功能,记录和分析用户引入的修改以适应用户偏好和进一步训练统计模型。

    Text segmentation and label assignment with user interaction by means of topic specific language models and topic-specific label statistics
    2.
    发明授权
    Text segmentation and label assignment with user interaction by means of topic specific language models and topic-specific label statistics 有权
    通过主题特定语言模型和主题特定标签统计信息,通过用户交互进行文本分割和标签分配

    公开(公告)号:US08688448B2

    公开(公告)日:2014-04-01

    申请号:US13619972

    申请日:2012-09-14

    IPC分类号: G10L15/00

    摘要: The invention relates to a method, a computer program product, a segmentation system and a user interface for structuring an unstructured text by making use of statistical models trained on annotated training data. The method performs text segmentation into text sections and assigns labels to text sections as section headings. The performed segmentation and assignment is provided to a user for general review. Additionally, alternative segmentations and label assignments are provided to the user being capable to select alternative segmentations and alternative labels as well as to enter a user defined segmentation and user defined label. In response to the modifications introduced by the user, a plurality of different actions are initiated incorporating the re-segmentation and re-labeling of successive parts of the document or the entire document.

    摘要翻译: 本发明涉及通过利用在注释训练数据上训练的统计模型来构造非结构化文本的方法,计算机程序产品,分割系统和用户界面。 该方法执行文本分段到文本部分,并将标签分配给文本部分作为标题。 执行的分割和分配被提供给用户进行一般审查。 此外,替代分割和标签分配被提供给能够选择替代分割和替代标签以及输入用户定义的分割和用户定义标签的用户。 响应于用户引入的修改,启动了多个不同的动作,其中包括文档或整个文档的连续部分的重新分割和重新标记。

    TEXT SEGMENTATION AND LABEL ASSIGNMENT WITH USER INTERACTION BY MEANS OF TOPIC SPECIFIC LANGUAGE MODELS AND TOPIC-SPECIFIC LABEL STATISTICS
    3.
    发明申请
    TEXT SEGMENTATION AND LABEL ASSIGNMENT WITH USER INTERACTION BY MEANS OF TOPIC SPECIFIC LANGUAGE MODELS AND TOPIC-SPECIFIC LABEL STATISTICS 有权
    用主题特定语言模型和主题特定标签统计的用户交互的文本分段和标签分配

    公开(公告)号:US20130066625A1

    公开(公告)日:2013-03-14

    申请号:US13619972

    申请日:2012-09-14

    IPC分类号: G06F17/27

    摘要: The invention relates to a method, a computer program product, a segmentation system and a user interface for structuring an unstructured text by making use of statistical models trained on annotated training data. The method performs text segmentation into text sections and assigns labels to text sections as section headings. The performed segmentation and assignment is provided to a user for general review. Additionally, alternative segmentations and label assignments are provided to the user being capable to select alternative segmentations and alternative labels as well as to enter a user defined segmentation and user defined label. In response to the modifications introduced by the user, a plurality of different actions are initiated incorporating the re-segmentation and re-labeling of successive parts of the document or the entire document.

    摘要翻译: 本发明涉及通过利用在注释训练数据上训练的统计模型来构造非结构化文本的方法,计算机程序产品,分割系统和用户界面。 该方法执行文本分段到文本部分,并将标签分配给文本部分作为标题。 执行的分割和分配被提供给用户进行一般审查。 此外,替代分割和标签分配被提供给能够选择替代分割和替代标签以及输入用户定义的分割和用户定义标签的用户。 响应于用户引入的修改,启动了多个不同的动作,其中包括文档或整个文档的连续部分的重新分割和重新标记。

    Text Segmentation and Label Assignment with User Interaction by Means of Topic Specific Language Models and Topic-Specific Label Statistics
    4.
    发明申请
    Text Segmentation and Label Assignment with User Interaction by Means of Topic Specific Language Models and Topic-Specific Label Statistics 有权
    通过主题特定语言模型和主题特定标签统计的用户交互的文本分段和标签分配

    公开(公告)号:US20080201130A1

    公开(公告)日:2008-08-21

    申请号:US10595831

    申请日:2004-11-12

    IPC分类号: G06F17/27

    摘要: The invention relates to a method, a computer program product, a segmentation system and a user interface for structuring an unstructured text by making use of statistical models trained on annotated training data. The method performs text segmentation into text sections and assigns labels to text sections as section headings. The performed segmentation and assignment is provided to a user for general review. Additionally, alternative segmentations and label assignments are provided to the user being capable to select alternative segmentations and alternative labels as well as to enter a user defined segmentation and user defined label. In response to the modifications introduced by the user, a plurality of different actions are initiated incorporating the re-segmentation and re-labelling of successive parts of the document or the entire document. Furthermore the method comprises a learning functionality, logging and analyzing user introduced modifications for adaptation of user's preferences and for further training of the statistical models.

    摘要翻译: 本发明涉及通过利用在注释训练数据上训练的统计模型来构造非结构化文本的方法,计算机程序产品,分割系统和用户界面。 该方法执行文本分段到文本部分,并将标签分配给文本部分作为标题。 执行的分割和分配被提供给用户进行一般审查。 此外,替代分割和标签分配被提供给能够选择替代分割和替代标签以及输入用户定义的分割和用户定义标签的用户。 响应于用户引入的修改,启动了多个不同的动作,其中包括文档或整个文档的连续部分的重新分割和重新标记。 此外,该方法包括学习功能,记录和分析用户引入的修改以适应用户偏好和进一步训练统计模型。

    Text segmentation and label assignment with user interaction by means of topic specific language models and topic-specific label statistics
    5.
    发明授权
    Text segmentation and label assignment with user interaction by means of topic specific language models and topic-specific label statistics 有权
    通过主题特定语言模型和主题特定标签统计信息,通过用户交互进行文本分割和标签分配

    公开(公告)号:US08332221B2

    公开(公告)日:2012-12-11

    申请号:US13210214

    申请日:2011-08-15

    IPC分类号: G10L15/00

    摘要: The invention relates to a method, a computer program product, a segmentation system and a user interface for structuring an unstructured text by making use of statistical models trained on annotated training data. The method performs text segmentation into text sections and assigns labels to text sections as section headings. The performed segmentation and assignment is provided to a user for general review. Additionally, alternative segmentations and label assignments are provided to the user being capable to select alternative segmentations and alternative labels as well as to enter a user defined segmentation and user defined label. In response to the modifications introduced by the user, a plurality of different actions are initiated incorporating the re-segmentation and re-labeling of successive parts of the document or the entire document.

    摘要翻译: 本发明涉及通过利用在注释训练数据上训练的统计模型来构造非结构化文本的方法,计算机程序产品,分割系统和用户界面。 该方法执行文本分段到文本部分,并将标签分配给文本部分作为标题。 执行的分割和分配被提供给用户进行一般审查。 此外,替代分割和标签分配被提供给能够选择替代分割和替代标签以及输入用户定义的分割和用户定义标签的用户。 响应于用户引入的修改,启动了多个不同的动作,其中包括文档或整个文档的连续部分的重新分割和重新标记。

    Topic specific models for text formatting and speech recognition
    6.
    发明授权
    Topic specific models for text formatting and speech recognition 有权
    用于文本格式和语音识别的主题具体模型

    公开(公告)号:US08041566B2

    公开(公告)日:2011-10-18

    申请号:US10595830

    申请日:2004-11-12

    IPC分类号: G10L15/00

    摘要: The present invention relates to a method, a computer system and a computer program product for speech recognition and/or text formatting by making use of topic specific statistical models. A text document which may be obtained from a first speech recognition pass is subject to segmentation and to an assignment of topic specific models for each obtained section. Each model of the set of models provides statistic information about language model probabilities, about text processing or formatting rules, as e.g. the interpretation of commands for punctuation, formatting, text highlighting or of ambiguous text portions requiring specific formatting, as well as a specific vocabulary being characteristic for each section of the recognized text. Furthermore, other properties of a speech recognition and/or formatting system (such as e.g. settings for the speaking rate) may be encoded in the statistical models. The models themselves are generated on the basis of annotated training data and/or by manual coding. Based on the assignment of models to sections of text an improved speech recognition and/or text formatting procedure is performed.

    摘要翻译: 本发明涉及一种通过利用专题统计模型进行语音识别和/或文本格式化的方法,计算机系统和计算机程序产品。 可以从第一语音识别通过获得的文本文档被分割并分配给每个获得的部分的主题特定模型的分配。 模型集合中的每个模型提供关于语言模型概率,关于文本处理或格式化规则的统计信息,例如。 用于标点符号,格式化,文本突出显示的命令的解释或需要特定格式化的不明确的文本部分以及对于识别的文本的每个部分特有的特定词汇表的解释。 此外,可以在统计模型中编码语音识别和/或格式化系统的其他属性(例如用于说话率的设置)。 模型本身是根据注释的训练数据和/或手动编码生成的。 基于将模型分配给文本部分,执行改进的语音识别和/或文本格式化过程。

    Text Segmentation and Topic Annotation for Document Structuring
    7.
    发明申请
    Text Segmentation and Topic Annotation for Document Structuring 审中-公开
    文本分段和主题注释文档结构

    公开(公告)号:US20070260564A1

    公开(公告)日:2007-11-08

    申请号:US10588639

    申请日:2004-11-12

    IPC分类号: G06F15/18 G06F17/27 G06F17/30

    CPC分类号: G06F17/27 G06F17/2765

    摘要: The invention relates to a method, a computer program product and a computer system for structuring an unstructured text by making use of statistical models trained on annotated training data. Each section of text in which the text is segmented is further assigned to a topic which is associated to a set of labels. The statistical models for the segmentation of the text and for the assignment of a topic and its associated labels to a section of text explicitly accounts for: correlations between a section of text and a topic, a topic transition between sections, a topic position within the document and a (topic-dependent) section length. Hence structural information of the training data is exploited in order to perform segmentation and annotation of unknown text.

    摘要翻译: 本发明涉及一种通过利用在注释训练数据上训练的统计模型来构造非结构化文本的方法,计算机程序产品和计算机系统。 将文本分割的文本的每个部分进一步分配给与一组标签相关联的主题。 用于文本分段和用于将主题及其关联标签分配给文本部分的统计模型明确地表示:文本部分与主题之间的相关性,部分之间的主题转换,内容中的主题位置 文件和(主题相关)部分长度。 因此,利用训练数据的结构信息来执行未知文本的分割和注释。

    Automatic Text Correction
    8.
    发明申请
    Automatic Text Correction 审中-公开
    自动文本校正

    公开(公告)号:US20070299664A1

    公开(公告)日:2007-12-27

    申请号:US11575674

    申请日:2005-09-28

    IPC分类号: G06F3/16

    摘要: The present invention provides a method of generating text transformation rules for speech to text transcription systems. The text transformation rules are generated by means of comparing an erroneous text generated by a speech to text transcription system with a correct reference text. Comparison of erroneous and reference text allows to derive a set of text transformation rules that are evaluated by means of a strict application to the training text and successive comparison with the reference text. Evaluation of text transformation rules provides a sufficient approach to determine which of the automatically generated text transformation rules provide an enhancement or degradation of the erroneous text. In this way only those text transformation rules of the set of text transformation rules are selected that guarantee an enhancement of the erroneous text. In this way systematic errors of an automatic speech recognition or natural language process system can be effectively compensated.

    摘要翻译: 本发明提供了一种生成用于语音到文本转录系统的文本转换规则的方法。 通过将语音产生的错误文本与文本转录系统与正确的参考文本进行比较来产生文本转换规则。 错误和参考文本的比较允许导出一组文本转换规则,通过对训练文本的严格应用和与参考文本的连续比较来评估。 文本转换规则的评估提供了一种足够的方法来确定哪些自动生成的文本转换规则提供错误文本的增强或降级。 以这种方式,仅选择文本转换规则集合中的那些文本转换规则,以保证错误文本的增强。 以这种方式,可以有效地补偿自动语音识别或自然语言处理系统的系统误差。

    Speech recognition method with language model adaptation
    9.
    发明授权
    Speech recognition method with language model adaptation 失效
    语言识别方法与语言模型适应

    公开(公告)号:US6157912A

    公开(公告)日:2000-12-05

    申请号:US033202

    申请日:1998-03-02

    CPC分类号: G10L15/065 G10L15/183

    摘要: Language models which take into account the probabilities of word sequences are used in speech recognition, in particular in the recognition of fluently spoken language with a wide vocabulary, in order to increase the recognition reliability. These models are obtained from comparatively large quantities of text and accordingly represent values which were averaged over several texts. This means, however, that the language model is not well adapted to peculiarities of a special text. To achieve such an adaptation of a given language model to a special text on the basis of only a short text fragment, according to the invention, it is suggested that first the unigram language model is adapted with the short text and, in dependence thereon, the M-gram language model is subsequently adapted. A method is described for adapting the unigram language model values which automatically carries out a subdivision of the words into semantic classes.

    摘要翻译: 考虑到词序概率的语言模型用于语音识别,特别是在识别具有较宽词汇的流畅语言的情况下,以增加识别的可靠性。 这些模型是从相对大量的文本中获得的,因此代表在几个文本中平均的值。 然而,这意味着语言模型不能很好地适应特殊文本的特殊性。 为了根据本发明,将特定语言模型的特定语言模型适应于特殊文本,根据本发明,建议首先将单词语言模型与短文本进行匹配,并且依赖于该文本, 随后调整了M-gram语言模型。 描述了一种用于适应单字语言模型值的方法,其自动地将单词的细分实现为语义类。

    ESTABLISHING A CONTOUR OF A STRUCTURE BASED ON IMAGE INFORMATION
    10.
    发明申请
    ESTABLISHING A CONTOUR OF A STRUCTURE BASED ON IMAGE INFORMATION 审中-公开
    建立基于图像信息的结构轮廓

    公开(公告)号:US20120082354A1

    公开(公告)日:2012-04-05

    申请号:US13377401

    申请日:2010-06-18

    IPC分类号: G06K9/48 G06K9/00

    摘要: A system for establishing a contour of a structure is disclosed. An initialization subsystem (1) is used for initializing an adaptive mesh representing an approximate contour of the structure, the structure being represented at least partly by a first image, and the structure being represented at least partly by a second image. A deforming subsystem (2) is used for deforming the adaptive mesh, based on feature information of the first image and feature information of the second image. The deforming subsystem comprises a force-establishing subsystem (3) for establishing a force acting on at least part of the adaptive mesh, in dependence on the feature information of the first image and the feature information of the second image. A transform-establishing subsystem (4) is used for establishing a coordinate transform reflecting a registration mismatch between the first image, the second image, and the adaptive mesh.

    摘要翻译: 公开了一种用于建立结构轮廓的系统。 初始化子系统(1)用于初始化表示结构的近似轮廓的自适应网格,该结构至少部分地由第一图像表示,并且该结构至少部分地由第二图像表示。 变形子系统(2)用于基于第一图像的特征信息和第二图像的特征信息来变形自适应网格。 变形子系统包括根据第一图像的特征信息和第二图像的特征信息来建立作用在自适应网格的至少一部分上的力的力建立子系统(3)。 变换建立子系统(4)用于建立反映第一图像,第二图像和自适应网格之间的配准失配的坐标变换。