Text segmentation and label assignment with user interaction by means of topic specific language models and topic-specific label statistics
    1.
    发明授权
    Text segmentation and label assignment with user interaction by means of topic specific language models and topic-specific label statistics 有权
    通过主题特定语言模型和主题特定标签统计信息,通过用户交互进行文本分割和标签分配

    公开(公告)号:US08200487B2

    公开(公告)日:2012-06-12

    申请号:US10595831

    申请日:2004-11-12

    IPC分类号: G10L15/00

    摘要: The invention relates to a method, a computer program product, a segmentation system and a user interface for structuring an unstructured text by making use of statistical models trained on annotated training data. The method performs text segmentation into text sections and assigns labels to text sections as section headings. The performed segmentation and assignment is provided to a user for general review. Additionally, alternative segmentations and label assignments are provided to the user being capable to select alternative segmentations and alternative labels as well as to enter a user defined segmentation and user defined label. In response to the modifications introduced by the user, a plurality of different actions are initiated incorporating the re-segmentation and re-labelling of successive parts of the document or the entire document. Furthermore the method comprises a learning functionality, logging and analyzing user introduced modifications for adaptation of user's preferences and for further training of the statistical models.

    摘要翻译: 本发明涉及通过利用在注释训练数据上训练的统计模型来构造非结构化文本的方法,计算机程序产品,分割系统和用户界面。 该方法执行文本分段到文本部分,并将标签分配给文本部分作为标题。 执行的分割和分配被提供给用户进行一般审查。 此外,替代分割和标签分配被提供给能够选择替代分割和替代标签以及输入用户定义的分割和用户定义标签的用户。 响应于用户引入的修改,启动了多个不同的动作,其中包括文档或整个文档的连续部分的重新分割和重新标记。 此外,该方法包括学习功能,记录和分析用户引入的修改以适应用户偏好和进一步训练统计模型。

    Text segmentation and label assignment with user interaction by means of topic specific language models and topic-specific label statistics
    2.
    发明授权
    Text segmentation and label assignment with user interaction by means of topic specific language models and topic-specific label statistics 有权
    通过主题特定语言模型和主题特定标签统计信息,通过用户交互进行文本分割和标签分配

    公开(公告)号:US08688448B2

    公开(公告)日:2014-04-01

    申请号:US13619972

    申请日:2012-09-14

    IPC分类号: G10L15/00

    摘要: The invention relates to a method, a computer program product, a segmentation system and a user interface for structuring an unstructured text by making use of statistical models trained on annotated training data. The method performs text segmentation into text sections and assigns labels to text sections as section headings. The performed segmentation and assignment is provided to a user for general review. Additionally, alternative segmentations and label assignments are provided to the user being capable to select alternative segmentations and alternative labels as well as to enter a user defined segmentation and user defined label. In response to the modifications introduced by the user, a plurality of different actions are initiated incorporating the re-segmentation and re-labeling of successive parts of the document or the entire document.

    摘要翻译: 本发明涉及通过利用在注释训练数据上训练的统计模型来构造非结构化文本的方法,计算机程序产品,分割系统和用户界面。 该方法执行文本分段到文本部分,并将标签分配给文本部分作为标题。 执行的分割和分配被提供给用户进行一般审查。 此外,替代分割和标签分配被提供给能够选择替代分割和替代标签以及输入用户定义的分割和用户定义标签的用户。 响应于用户引入的修改,启动了多个不同的动作,其中包括文档或整个文档的连续部分的重新分割和重新标记。

    TEXT SEGMENTATION AND LABEL ASSIGNMENT WITH USER INTERACTION BY MEANS OF TOPIC SPECIFIC LANGUAGE MODELS AND TOPIC-SPECIFIC LABEL STATISTICS
    3.
    发明申请
    TEXT SEGMENTATION AND LABEL ASSIGNMENT WITH USER INTERACTION BY MEANS OF TOPIC SPECIFIC LANGUAGE MODELS AND TOPIC-SPECIFIC LABEL STATISTICS 有权
    用主题特定语言模型和主题特定标签统计的用户交互的文本分段和标签分配

    公开(公告)号:US20130066625A1

    公开(公告)日:2013-03-14

    申请号:US13619972

    申请日:2012-09-14

    IPC分类号: G06F17/27

    摘要: The invention relates to a method, a computer program product, a segmentation system and a user interface for structuring an unstructured text by making use of statistical models trained on annotated training data. The method performs text segmentation into text sections and assigns labels to text sections as section headings. The performed segmentation and assignment is provided to a user for general review. Additionally, alternative segmentations and label assignments are provided to the user being capable to select alternative segmentations and alternative labels as well as to enter a user defined segmentation and user defined label. In response to the modifications introduced by the user, a plurality of different actions are initiated incorporating the re-segmentation and re-labeling of successive parts of the document or the entire document.

    摘要翻译: 本发明涉及通过利用在注释训练数据上训练的统计模型来构造非结构化文本的方法,计算机程序产品,分割系统和用户界面。 该方法执行文本分段到文本部分,并将标签分配给文本部分作为标题。 执行的分割和分配被提供给用户进行一般审查。 此外,替代分割和标签分配被提供给能够选择替代分割和替代标签以及输入用户定义的分割和用户定义标签的用户。 响应于用户引入的修改,启动了多个不同的动作,其中包括文档或整个文档的连续部分的重新分割和重新标记。

    Text Segmentation and Label Assignment with User Interaction by Means of Topic Specific Language Models and Topic-Specific Label Statistics
    4.
    发明申请
    Text Segmentation and Label Assignment with User Interaction by Means of Topic Specific Language Models and Topic-Specific Label Statistics 有权
    通过主题特定语言模型和主题特定标签统计的用户交互的文本分段和标签分配

    公开(公告)号:US20080201130A1

    公开(公告)日:2008-08-21

    申请号:US10595831

    申请日:2004-11-12

    IPC分类号: G06F17/27

    摘要: The invention relates to a method, a computer program product, a segmentation system and a user interface for structuring an unstructured text by making use of statistical models trained on annotated training data. The method performs text segmentation into text sections and assigns labels to text sections as section headings. The performed segmentation and assignment is provided to a user for general review. Additionally, alternative segmentations and label assignments are provided to the user being capable to select alternative segmentations and alternative labels as well as to enter a user defined segmentation and user defined label. In response to the modifications introduced by the user, a plurality of different actions are initiated incorporating the re-segmentation and re-labelling of successive parts of the document or the entire document. Furthermore the method comprises a learning functionality, logging and analyzing user introduced modifications for adaptation of user's preferences and for further training of the statistical models.

    摘要翻译: 本发明涉及通过利用在注释训练数据上训练的统计模型来构造非结构化文本的方法,计算机程序产品,分割系统和用户界面。 该方法执行文本分段到文本部分,并将标签分配给文本部分作为标题。 执行的分割和分配被提供给用户进行一般审查。 此外,替代分割和标签分配被提供给能够选择替代分割和替代标签以及输入用户定义的分割和用户定义标签的用户。 响应于用户引入的修改,启动了多个不同的动作,其中包括文档或整个文档的连续部分的重新分割和重新标记。 此外,该方法包括学习功能,记录和分析用户引入的修改以适应用户偏好和进一步训练统计模型。

    Text segmentation and label assignment with user interaction by means of topic specific language models and topic-specific label statistics
    5.
    发明授权
    Text segmentation and label assignment with user interaction by means of topic specific language models and topic-specific label statistics 有权
    通过主题特定语言模型和主题特定标签统计信息,通过用户交互进行文本分割和标签分配

    公开(公告)号:US08332221B2

    公开(公告)日:2012-12-11

    申请号:US13210214

    申请日:2011-08-15

    IPC分类号: G10L15/00

    摘要: The invention relates to a method, a computer program product, a segmentation system and a user interface for structuring an unstructured text by making use of statistical models trained on annotated training data. The method performs text segmentation into text sections and assigns labels to text sections as section headings. The performed segmentation and assignment is provided to a user for general review. Additionally, alternative segmentations and label assignments are provided to the user being capable to select alternative segmentations and alternative labels as well as to enter a user defined segmentation and user defined label. In response to the modifications introduced by the user, a plurality of different actions are initiated incorporating the re-segmentation and re-labeling of successive parts of the document or the entire document.

    摘要翻译: 本发明涉及通过利用在注释训练数据上训练的统计模型来构造非结构化文本的方法,计算机程序产品,分割系统和用户界面。 该方法执行文本分段到文本部分,并将标签分配给文本部分作为标题。 执行的分割和分配被提供给用户进行一般审查。 此外,替代分割和标签分配被提供给能够选择替代分割和替代标签以及输入用户定义的分割和用户定义标签的用户。 响应于用户引入的修改,启动了多个不同的动作,其中包括文档或整个文档的连续部分的重新分割和重新标记。

    Topic specific models for text formatting and speech recognition
    6.
    发明授权
    Topic specific models for text formatting and speech recognition 有权
    用于文本格式和语音识别的主题具体模型

    公开(公告)号:US08041566B2

    公开(公告)日:2011-10-18

    申请号:US10595830

    申请日:2004-11-12

    IPC分类号: G10L15/00

    摘要: The present invention relates to a method, a computer system and a computer program product for speech recognition and/or text formatting by making use of topic specific statistical models. A text document which may be obtained from a first speech recognition pass is subject to segmentation and to an assignment of topic specific models for each obtained section. Each model of the set of models provides statistic information about language model probabilities, about text processing or formatting rules, as e.g. the interpretation of commands for punctuation, formatting, text highlighting or of ambiguous text portions requiring specific formatting, as well as a specific vocabulary being characteristic for each section of the recognized text. Furthermore, other properties of a speech recognition and/or formatting system (such as e.g. settings for the speaking rate) may be encoded in the statistical models. The models themselves are generated on the basis of annotated training data and/or by manual coding. Based on the assignment of models to sections of text an improved speech recognition and/or text formatting procedure is performed.

    摘要翻译: 本发明涉及一种通过利用专题统计模型进行语音识别和/或文本格式化的方法,计算机系统和计算机程序产品。 可以从第一语音识别通过获得的文本文档被分割并分配给每个获得的部分的主题特定模型的分配。 模型集合中的每个模型提供关于语言模型概率,关于文本处理或格式化规则的统计信息,例如。 用于标点符号,格式化,文本突出显示的命令的解释或需要特定格式化的不明确的文本部分以及对于识别的文本的每个部分特有的特定词汇表的解释。 此外,可以在统计模型中编码语音识别和/或格式化系统的其他属性(例如用于说话率的设置)。 模型本身是根据注释的训练数据和/或手动编码生成的。 基于将模型分配给文本部分,执行改进的语音识别和/或文本格式化过程。

    Text Segmentation and Topic Annotation for Document Structuring
    7.
    发明申请
    Text Segmentation and Topic Annotation for Document Structuring 审中-公开
    文本分段和主题注释文档结构

    公开(公告)号:US20070260564A1

    公开(公告)日:2007-11-08

    申请号:US10588639

    申请日:2004-11-12

    IPC分类号: G06F15/18 G06F17/27 G06F17/30

    CPC分类号: G06F17/27 G06F17/2765

    摘要: The invention relates to a method, a computer program product and a computer system for structuring an unstructured text by making use of statistical models trained on annotated training data. Each section of text in which the text is segmented is further assigned to a topic which is associated to a set of labels. The statistical models for the segmentation of the text and for the assignment of a topic and its associated labels to a section of text explicitly accounts for: correlations between a section of text and a topic, a topic transition between sections, a topic position within the document and a (topic-dependent) section length. Hence structural information of the training data is exploited in order to perform segmentation and annotation of unknown text.

    摘要翻译: 本发明涉及一种通过利用在注释训练数据上训练的统计模型来构造非结构化文本的方法,计算机程序产品和计算机系统。 将文本分割的文本的每个部分进一步分配给与一组标签相关联的主题。 用于文本分段和用于将主题及其关联标签分配给文本部分的统计模型明确地表示:文本部分与主题之间的相关性,部分之间的主题转换,内容中的主题位置 文件和(主题相关)部分长度。 因此,利用训练数据的结构信息来执行未知文本的分割和注释。

    Speech recognition method with language model adaptation
    8.
    发明授权
    Speech recognition method with language model adaptation 失效
    语言识别方法与语言模型适应

    公开(公告)号:US6157912A

    公开(公告)日:2000-12-05

    申请号:US033202

    申请日:1998-03-02

    CPC分类号: G10L15/065 G10L15/183

    摘要: Language models which take into account the probabilities of word sequences are used in speech recognition, in particular in the recognition of fluently spoken language with a wide vocabulary, in order to increase the recognition reliability. These models are obtained from comparatively large quantities of text and accordingly represent values which were averaged over several texts. This means, however, that the language model is not well adapted to peculiarities of a special text. To achieve such an adaptation of a given language model to a special text on the basis of only a short text fragment, according to the invention, it is suggested that first the unigram language model is adapted with the short text and, in dependence thereon, the M-gram language model is subsequently adapted. A method is described for adapting the unigram language model values which automatically carries out a subdivision of the words into semantic classes.

    摘要翻译: 考虑到词序概率的语言模型用于语音识别,特别是在识别具有较宽词汇的流畅语言的情况下,以增加识别的可靠性。 这些模型是从相对大量的文本中获得的,因此代表在几个文本中平均的值。 然而,这意味着语言模型不能很好地适应特殊文本的特殊性。 为了根据本发明,将特定语言模型的特定语言模型适应于特殊文本,根据本发明,建议首先将单词语言模型与短文本进行匹配,并且依赖于该文本, 随后调整了M-gram语言模型。 描述了一种用于适应单字语言模型值的方法,其自动地将单词的细分实现为语义类。

    Recording content on a record medium that contains a desired content descriptor
    9.
    发明申请
    Recording content on a record medium that contains a desired content descriptor 审中-公开
    在包含所需内容描述符的记录介质上记录内容

    公开(公告)号:US20070140654A1

    公开(公告)日:2007-06-21

    申请号:US10576165

    申请日:2004-10-21

    IPC分类号: H04N7/00

    摘要: The invention relates to a method for recording content on a record medium (2) that contains a desired content descriptor (3), comprising the steps of reading said desired content descriptor (3) from said record medium (2), scanning the content (10, 12) of at least one multimedia source (6, 7) for desired content that matches said desired content descriptor (3), and recording said desired content on said record medium (3). Said record medium (2) is preferably a Digital Versatile Disc (DVD), said desired content descriptor (3) is preferably a keyword contained in a blank of said DVD, and said at least one multimedia source (6, 7) is preferably a television receiver. The DVD with the keyword contained therein thus triggers the recording of content from the television receiver that matches said keyword on said DVD. The invention further relates to a computer program product, a device and a record medium.

    摘要翻译: 本发明涉及一种用于在包含所需内容描述符(3)的记录介质(2)上记录内容的方法,包括从所述记录介质(2)读取所述期望内容描述符(3),扫描内容( 用于与所述期望内容描述符(3)匹配的期望内容的至少一个多媒体源(6,7)的10,10(12),以及在所述记录介质(3)上记录所述期望内容。 所述记录介质(2)优选地是数字通用盘(DVD),所述所需内容描述符(3)优选地是包含在所述DVD的空白中的关键字,并且所述至少一个多媒体源(6,7)优选地是 电视接收机。 因此,包含在其中的关键字的DVD触发从电视接收机记录与所述DVD上的所述关键词相匹配的内容。 本发明还涉及计算机程序产品,设备和记录介质。

    Method and device for the rapid, pattern-recognition-supported transcription of spoken and written utterances
    10.
    发明申请
    Method and device for the rapid, pattern-recognition-supported transcription of spoken and written utterances 审中-公开
    用于快速,模式识别支持的口头和书面话语转录的方法和设备

    公开(公告)号:US20060167685A1

    公开(公告)日:2006-07-27

    申请号:US10503420

    申请日:2003-01-30

    IPC分类号: G10L15/26

    CPC分类号: G10L15/24 G06K9/222

    摘要: The invention relates to a method and a device for the transcription of spoken and written utterances. To this end, the utterances undergo speech or text recognition, and the recognition result (ME) is combined with a manually created transcription (MT) of the utterances in order to obtain the transcription. The additional information rendered usable by the combination as a result of the recognition result (ME) enables the transcriber to work relatively roughly and therefore quickly on the manual transcription. When using a keyboard (25), he can, for example, restrict himself to hitting the keys of only one row and/or can omit some keystrokes completely. In addition, the manual transcribing can also be accelerated by the suggestion of continuations (31) to the text input so far (30), which continuations are anticipated by virtue of the recognition result (ME).

    摘要翻译: 本发明涉及一种用于转录口头和书面话语的方法和装置。 为此,话语进行语音或文本识别,并将识别结果(ME)与手语创建的语音转录(MT)相结合,以获得转录。 作为识别结果(ME)的结果,由组合使用的附加信息使得抄录员能够相对粗略地工作并因此快速地进行手动转录。 当使用键盘(25)时,他可以例如限制自己敲击一行的键和/或完全省略一些按键。 此外,通过对迄今为止(30)的文本输入的延续(31)的建议也可以加速手动抄录,这种延续是凭借识别结果(ME)而预期的。