Finite state data structures with paths representing paired strings of tags and tag combinations
    1.
    发明授权
    Finite state data structures with paths representing paired strings of tags and tag combinations 失效
    有状态数据结构,其路径表示成对的标签和标签组合串

    公开(公告)号:US06816830B1

    公开(公告)日:2004-11-09

    申请号:US09419435

    申请日:1999-10-15

    申请人: Andre Kempe

    发明人: Andre Kempe

    IPC分类号: G06F1727

    CPC分类号: G06F17/2715

    摘要: A finite state data structure includes paths that represent pairs of strings, with a first string that is a string of tag combinations and a second string that is a string of tags for tokens in a language. The second strings of a set of paths with the same first string include only highly probable strings of tags for the first string. The data structure can be an FST or a bimachine, and can be used for mapping strings of tag combinations to strings of tags. The tags can, for example, indicate parts of speech of words, and the tag combinations can be ambiguity classes or, in a bimachine, reduced ambiguity classes. An FST can be obtained by approximating a Hidden Markov Model. A bimachine can include left-to-right and right-to-left sequential FSTs obtained based on frequencies of tokens in a training corpus.

    摘要翻译: 有限状态数据结构包括表示字符串对的路径,第一个字符串是一组标签组合,第二个字符串是用于语言中标记的一串标签。 具有相同第一个字符串的一组路径的第二个字符串仅包含用于第一个字符串的高度可能的标签字符串。 数据结构可以是FST或bimachine,可用于将标签组合的字符串映射到标签字符串。 例如,标签可以表示单词的语音部分,并且标签组合可以是歧义类,或者在bimachine中,缩减歧义类。 可以通过逼近隐马尔可夫模型来获得FST。 双面机可以包括从训练语料库中的令牌的频率获得的从左到右和从右到左的顺序FST。

    Language acquisition aide
    2.
    发明授权
    Language acquisition aide 失效
    语言习得助手

    公开(公告)号:US06704699B2

    公开(公告)日:2004-03-09

    申请号:US09946391

    申请日:2001-09-06

    申请人: Einat H. Nir

    发明人: Einat H. Nir

    IPC分类号: G06F1727

    摘要: A stand-alone, hand-held apparatus is provided, which combines elements of a scanning dictionary with an automatic-translation software, for in-context translation. Additionally, the apparatus may include a text-to-speech synthesis, for in-tandem exposure to spoken and translated forms of a portion of text, such as a clause, a sentence, or a few sentences. A whole document may thus be read, for example, sentence by sentence. The apparatus may further be used for teaching correct pronunciation of any portion of text, by recording the user's pronunciation of the portion of text and comparing it with a text-to-speech synthesis produced by the apparatus.

    摘要翻译: 提供了一种独立的手持式设备,其将扫描词典的元素与自动翻译软件相结合,用于上下文翻译。 另外,该装置可以包括文本到语音合成,用于串联曝光文本部分的语言和翻译形式,例如子句,句子或少数句子。 因此,整个文件可以被逐句地阅读。 该装置还可以用于通过记录用户对文本部分的发音并将其与由该装置产生的文本到语音合成进行比较来教导文本的任何部分的正确发音。

    System and method to compile instructions to manipulate linguistic structures into separate functions
    5.
    发明授权
    System and method to compile instructions to manipulate linguistic structures into separate functions 有权
    系统和方法来编译指令以将语言结构操纵为单独的函数

    公开(公告)号:US06529865B1

    公开(公告)日:2003-03-04

    申请号:US09419533

    申请日:1999-10-18

    IPC分类号: G06F1727

    摘要: A grammar programming language (“GPL”) compiler compiles each rule in a natural language grammar into a separate function that can be invoked by a translation system to apply the rule to a representation of a natural language expression. The GPL compiler can output the functions for the rules as source code for a standard computer programming language to be further compiled into object code that can be directly executed by a computer processor. The GPL compiler can also generate special functions for each rule to enable multi-layered operations on the representations and to handle the processing of representations of ambiguous expressions.

    摘要翻译: 语法编程语言(“GPL”)编译器将自然语言语法中的每个规则编译成可由翻译系统调用以将规则应用于自然语言表达形式的单独函数。 GPL编译器可以将规则的功能输出为标准计算机编程语言的源代码,以进一步编译成可由计算机处理器直接执行的目标代码。 GPL编译器还可以为每个规则生成特殊功能,以便对表示进行多层次操作,并处理对模糊表达式的表示的处理。

    Language model adaptation via network of similar users
    6.
    发明授权
    Language model adaptation via network of similar users 有权
    类似用户网络的语言模型适应

    公开(公告)号:US06484136B1

    公开(公告)日:2002-11-19

    申请号:US09422383

    申请日:1999-10-21

    IPC分类号: G06F1727

    摘要: A language recognition system, method and program product for recognizing language based input from computer users on a network of connected computers. Each computer includes at least one user based language model trained for a corresponding user for automatic speech recognition, handwriting recognition, machine translation, gesture recognition or other similar actions that require interpretation of user activities. Network computer users are clustered into classes of similar users according to user similarities such as, nationality, profession, sex, age, etc. User characteristics are collected by sensors and from databases and, then, distributed over the network during user activities. Language models with similarities among similar users on the network are identified. The language models include a language model domain, with similar language models being clustered according to their domains. Language models identified as similar are modified in response to user production activities. After modification of one language model, other identified similar language models are compared and adapted. Also, user data, including information about user activities and language model data, is transmitted over the network to other similar users. Language models are adapted only in response to similar user activities, when these activities are recorded and transmitted over the network. Language models are given a global context based on similar users that are connected together over the network.

    摘要翻译: 用于在连接的计算机的网络上识别来自计算机用户的基于语言的输入的语言识别系统,方法和程序产品。 每个计算机包括针对用于自动语音识别,手写识别,机器翻译,手势识别或需要解释用户活动的其他类似动作的相应用户训练的至少一个基于用户的语言模型。 网络计算机用户根据用户的相似性(如国籍,职业,性别,年龄等)将类群集群分类。用户特征由传感器和数据库收集,然后在用户活动期间通过网络进行分发。 确定网络上类似用户之间具有相似性的语言模型。 语言模型包括语言模型域,其类似的语言模型根据其域名进行聚类。 识别为类似的语言模型是针对用户生产活动进行修改的。 在修改了一种语言模型后,对其他识别的类似语言模型进行了比较和修改。 此外,包括关于用户活动的信息和语言模型数据的用户数据通过网络发送到其他类似的用户。 当这些活动通过网络记录和传输时,语言模型仅适用于响应类似的用户活动。 基于通过网络连接在一起的类似用户,给予语言模型全局上下文。

    Automatic search of audio channels by matching viewer-spoken words against closed-caption/audio content for interactive television
    7.
    发明授权
    Automatic search of audio channels by matching viewer-spoken words against closed-caption/audio content for interactive television 有权
    通过将观众口语与针对交互式电视的封闭字幕/音频内容相匹配来自动搜索音频频道

    公开(公告)号:US06480819B1

    公开(公告)日:2002-11-12

    申请号:US09258115

    申请日:1999-02-25

    IPC分类号: G06F1727

    CPC分类号: G10L15/26 G10L15/1815

    摘要: A method and apparatus is provided to enable a user watching and/or listening to a program to search for new information in the stream of a telecommunications data. The apparatus includes a voice recognition system that recognizes the user's request and causes a search to be performed in the long stream of data of at least one other telecommunication channel. The system includes a storage device for storing and processing the request. Upon recognition of the request, the incoming signal or signals are scanned for matches with the request. Upon finding the match between the request and the incoming signal, information related to the data is brought to the viewer's attention. This can be accomplished by either changing the viewer's station or by bringing in a split screen display forward into the display.

    摘要翻译: 提供了一种方法和装置,用于使用户能够观看和/或收听节目以搜索电信数据流中的新信息。 该装置包括语音识别系统,其识别用户的请求并且使得在至少另一个电信信道的长流数据中执行搜索。 该系统包括用于存储和处理该请求的存储装置。 一旦识别到请求,就会扫描输入信号或与该请求匹配的信号。 在找到请求和输入信号之间的匹配时,与数据相关的信息被引起观众的注意。 这可以通过改变观众的电台或将分屏显示向前推入显示器来实现。

    System and method for identifying language using morphologically-based techniques

    公开(公告)号:US06415250B1

    公开(公告)日:2002-07-02

    申请号:US08878264

    申请日:1997-06-18

    IPC分类号: G06F1727

    摘要: A language identification system for automatically identifying a language in which an input text is written based upon a probabilistic analysis of predetermined portions of words sampled from the input text. The predetermined portions of words reflect morphological characteristics of natural languages. The automatic language identification system determines which language of a plurality of represented languages a given text is written based upon a value representing the relative likelihood that the text is a particular one of the plurality of represented languages due to a presence of a morphologically-significant word portion in the text. Preferably the word portion is the last three characters in a word. The relative likelihood is derived from a relative frequency of occurrence of the fixed-length word ending in each of a plurality of language corpuses, within each language corpus corresponding to one of the plurality of represented languages. Specifically, the automatic language identification system includes a language corpus analyzer that generates, for each of a plurality of word endings extracted from at least one of the language corpuses, a plurality of probabilities associated with the word ending and one of the plurality of represented languages. Each of the language corpuses represents a natural language and each of the probabilities represents a relative likelihood that the text is the associated language due to the presence of the associated word ending in the text. The relative likelihood is derived from a relative frequency that the associated word ending occurs in each of the plurality of language corpuses. The automatic language identification system also comprises a language identification engine that determines, for each of the represented languages, an arithmetic sum of the relative probabilities for all the word endings which appear in the text. The source language is determined to be the represented language having the greatest arithmetic sum of relative probabilities, provided this sum exceeds zero.

    Method and apparatus for multi-language indexing
    9.
    发明授权
    Method and apparatus for multi-language indexing 失效
    多语言索引的方法和装置

    公开(公告)号:US06389387B1

    公开(公告)日:2002-05-14

    申请号:US09321016

    申请日:1999-05-27

    IPC分类号: G06F1727

    CPC分类号: G06F17/30737

    摘要: A method for forming an index comprising indexing features for a plurality of documents, includes the steps of identifying each of at least some of the terms present in the documents, generating from each identified term at least one equivalent term which is different from but linguistically related to the identified term, forming for each of the identified terms a first indexing feature comprising the identified term and an identifier of the or each document in which the identified term occurs, forming for each of the equivalent terms a second indexing feature comprising the equivalent term and an identifier of the or each document in which the identifier term to which the equivalent term is equivalent occurs, and forming an index comprising the first and second indexing features.

    摘要翻译: 一种用于形成包括多个文档的索引特征的索引的方法,包括以下步骤:识别文档中存在的至少一些术语中的每一个,从每个识别的术语生成至少一个与语言相关的等效术语 对于所识别的术语,为每个所识别的术语形成包括所识别的术语的第一索引特征以及所识别术语出现在其中的每个文档的标识符,为每个等效术语形成包括等效术语的第二索引特征 以及其中发生等效术语等效的标识符项的文档或每个文档的标识符,并且形成包括第一和第二索引特征的索引。

    Text structure analyzing apparatus, abstracting apparatus, and program recording medium
    10.
    发明授权
    Text structure analyzing apparatus, abstracting apparatus, and program recording medium 失效
    文本结构分析装置,抽象装置和程序记录介质

    公开(公告)号:US06374209B1

    公开(公告)日:2002-04-16

    申请号:US09271569

    申请日:1999-03-18

    IPC分类号: G06F1727

    CPC分类号: G06F17/2745

    摘要: A text input section (1) divides an inputted text into sentences and attaches a number to each of the sentences, which is stored in a text data base together with the number. An important word recognizing section (2) generates a list of important words for each sentence to store it in a storing section (8). An important word weighting section (3) weights each important word. A relation degree computing section (4) computes a relation degree between an attention sentence and a precedent sentence. An important degree computing section (5) computes an importance degree of each attention sentence. A tree structure determining section (6) determines a parent sentence of the attention sentence and determines a tree structure of the inputted text. Unlike the case of determining whether or not character strings of key words are merely coincident with each other, it is possible to determine a parent sentence of each sentence based on a degree of connection between two sentences and analyze a structure of the inputted text with high accuracy according to the above construction.

    摘要翻译: 文本输入部分(1)将输入的文本分成句子,并将数字连同与数字一起存储在文本数据库中的每个句子。 一个重要的词识别部分(2)生成每个句子的重要词列表,以将其存储在存储部分(8)中。 一个重要的词加权部分(3)对每个重要的词加权。 关系度计算部(4)计算关注句与先验句之间的关系度。 重要程度计算部分(5)计算每个关注句的重要程度。 树结构确定部分(6)确定关注句子的父语句并确定输入的文本的树结构。 与确定关键字的字符串是否仅相互一致的情况不同,可以基于两个句子之间的连接程度来确定每个句子的父语句,并且以高的方式分析输入的文本的结构 根据上述结构的精度。