Region-Matching Transducers for Text-Characterization
    1.
    发明申请
    Region-Matching Transducers for Text-Characterization 有权
    区域匹配传感器用于文本表征

    公开(公告)号:US20100161314A1

    公开(公告)日:2010-06-24

    申请号:US12338085

    申请日:2008-12-18

    IPC分类号: G06F17/27

    CPC分类号: G06F17/2775

    摘要: Computer methods, apparatus and articles of manufacture therefor, are disclosed for text-characterization using a finite state transducer that along each path accepts on a first side an n-gram of text-characterization (e.g., a language or a topic) and outputs on a second side a sequence of symbols identifying one or more text-characterizations from a set of text-characterizations. The finite state transducer is applied to input data. For each n-gram accepted by the finite state transducer, a frequency counter associated with the n-gram of the one or more text-characterizations in the set of text-characterizations is incremented. The input data is classified as one or more text-characterizations from the set of text-characterizations using the frequency counters associated therewith.

    摘要翻译: 公开了用于文本表征的计算机方法,装置和制品,其使用有限状态传感器,沿着每个路径在第一侧接受n-gram的文本表征(例如,语言或主题),并在 第二侧是从一组文本表征中识别一个或多个文本表征的符号序列。 将有限状态传感器应用于输入数据。 对于由有限状态传感器接受的每个n-gram,与文本特征集合中的一个或多个文本表征的n-gram相关联的频率计数器递增。 使用与其相关联的频率计数器将输入数据分类为来自一组文本表征的一个或多个文本表征。

    Region-Matching Transducers for Natural Language Processing
    2.
    发明申请
    Region-Matching Transducers for Natural Language Processing 有权
    用于自然语言处理的区域匹配传感器

    公开(公告)号:US20100161313A1

    公开(公告)日:2010-06-24

    申请号:US12338058

    申请日:2008-12-18

    IPC分类号: G06F17/27

    CPC分类号: G06F17/2775

    摘要: Computer methods, apparatus and articles of manufacture therefor, are disclosed for developing a region-matching transducer for marking language data having delimited strings. The region-matching transducer defines one or more patterns of one or more sequences of delimited strings, with at least one of the patterns defined in the region-matching transducer having an arrangement of a plurality of class-matching networks. The plurality of class-matching networks defines a combination of two or more entity classes from one or both of part-of-speech classes and application-specific classes. The region-matching transducer has, for each of the one or more patterns, an arc that leads from a penultimate state with a transition label that identifies the entity class of the pattern, and shares states between patterns leading to a penultimate state when segments of delimited strings making up two or more patterns overlap.

    摘要翻译: 公开了用于开发用于标记具有界定字符串的语言数据的区域匹配传感器的计算机方法,装置及其制造方法。 所述区域匹配传感器定义一个或多个限定字符串序列的一个或多个图案,所述区域匹配换能器中定义的至少一个图案具有多个类匹配网络的布置。 多个类别匹配网络定义来自语音类别和应用特定类中的一个或两个的两个或多个实体类的组合。 区域匹配传感器对于一个或多个图案中的每一个具有从倒数第二状态引导的弧,其具有标识图案的实体类别的转换标签,并且在导致倒数第二状态的图案之间共享状态, 组合两个或多个模式的分隔字符串重叠。

    Indexing a database by finite-state transducer
    4.
    发明授权
    Indexing a database by finite-state transducer 失效
    通过有限状态传感器索引数据库

    公开(公告)号:US5950184A

    公开(公告)日:1999-09-07

    申请号:US746684

    申请日:1996-11-14

    IPC分类号: G06F17/30

    摘要: A technique of using the path numbers of an acyclic finite-state transducer as a method of indexing a database. Each entry in the database has associated therewith one or more keys. A finite state transducer is provided defining the keys for the database. For each key, a path number is determined associated with that key, the path number defining a mapping between that key and the (or each) corresponding entry in the database.

    摘要翻译: 使用非循环有限状态换能器的路径编号作为索引数据库的方法的技术。 数据库中的每个条目都与一个或多个密钥相关联。 提供了一个定义数据库密钥的有限状态传感器。 对于每个密钥,确定与该密钥相关联的路径号,该路径号定义该密钥与数据库中的(或每个)相应条目之间的映射。

    System And Method For Generating, Updating, And Using Meaningful Tags
    5.
    发明申请
    System And Method For Generating, Updating, And Using Meaningful Tags 有权
    用于生成,更新和使用有意义的标签的系统和方法

    公开(公告)号:US20130159306A1

    公开(公告)日:2013-06-20

    申请号:US13330488

    申请日:2011-12-19

    IPC分类号: G06F17/30 G06F7/00

    摘要: A system and method for generating tag glossaries and use thereof is provided. A set of tags is accessed. Each tag is associated with a glossary that includes one or more terms and definitions for the terms. A new tag is generated and a new glossary is generated for the new tag based on the glossaries associated with the set of tags. The tag glossaries can be used to provide context for documents associated with the tags, to determine appropriate tags for untagged documents, to help in search for other documents, and to build indices for documents or collections of documents.

    摘要翻译: 提供了一种用于生成标签词汇表及其使用的系统和方法。 访问一组标签。 每个标签与术语表相关联,其中包含术语的一个或多个术语和定义。 生成新标签,并根据与该组标记相关联的词汇表为新标记生成新的词汇表。 标签词汇表可用于为与标签相关联的文档提供上下文,以确定未标记文档的适当标签,帮助搜索其他文档,以及构建文档或文档集合的索引。

    Modifying an input string partitioned in accordance with directionality
and length constraints
    6.
    发明授权
    Modifying an input string partitioned in accordance with directionality and length constraints 失效
    修改根据方向性和长度约束分割的输入字符串

    公开(公告)号:US6023760A

    公开(公告)日:2000-02-08

    申请号:US857942

    申请日:1997-05-16

    IPC分类号: G06F17/28 G06F17/27 G06F17/20

    摘要: A processor implemented method of modifying a string of a regular language, which includes at least two symbols and at least two predetermined substrings. Upon receipt of the string, the processor determines an initial position within the string of a substring matching one of the preselected substrings. To make this determination, the processor either matches symbols of the string starting from the left and proceeding to the right or by starting from the right and proceeding to the left. After identifying the initial position, the processor then selects either the longest or the shortest of the preselected substrings. The processor then replaces the matching substring with the string of the lower language associated with the selected preselected substring and outputs the modified string.

    摘要翻译: 一种处理器实现的方法,其修改常规语言的字符串,其包括至少两个符号和至少两个预定的子字符串。 在接收到字符串时,处理器确定与预选子字符串之一匹配的子字符串的字符串内的初始位置。 为了做出这一决定,处理器可以从左边开始,从右边开始,或者从右边开始并向左进行匹配的字符串。 在识别初始位置之后,处理器然后选择预选子串中最长或最短的子串。 然后,处理器将使用与所选预选子串相关联的较低语言的字符串替换匹配的子字符串,并输出修改的字符串。

    Finite-state encoding system for hyphenation rules
    7.
    发明授权
    Finite-state encoding system for hyphenation rules 失效
    连字符规则的有限状态编码系统

    公开(公告)号:US5737621A

    公开(公告)日:1998-04-07

    申请号:US469173

    申请日:1995-06-06

    CPC分类号: G06F17/26

    摘要: Valid positions for hyphens in input strings are determined by reading in and processing the symbols of the input string through a finite state transducer which has a state-transition data structure determined by a compilation of a set of hyphenation rules. The output of the encoding system can include a hyphenated string, or can accept a hyphenated string and output an indication of whether the input hyphenation is proper according to the set of hyphenation rules.

    摘要翻译: 输入字符串中连字符的有效位置是通过读入和处理输入字符串的符号来确定的,该有限状态转换器具有通过编组连字符规则确定的状态转换数据结构。 编码系统的输出可以包括连字符串,或者可以接受连字符串,并根据连字符规则集输出输入连字符是否合适的指示。

    Context-sensitive method of finding information about a word in an
electronic dictionary
    8.
    发明授权
    Context-sensitive method of finding information about a word in an electronic dictionary 失效
    在电子词典中查找单词的信息的上下文相关方法

    公开(公告)号:US5642522A

    公开(公告)日:1997-06-24

    申请号:US396286

    申请日:1995-02-28

    IPC分类号: G06F17/21 G06F17/27 G06F17/30

    CPC分类号: G06F17/274

    摘要: A technique of using an electronic dictionary in conjunction with electronically-encoded running text that gives the user the most relevant information rather than belaboring the user with all possible information about a selected word. The technique maps the selected word from its inflected form to its citation form, analyzes the selected word in the context of neighboring and surrounding words to resolve ambiguities, and displays the information that is determined to be the most likely to be relevant. The dictionary preferably has information about multi-word combinations that include the selected word, and the context determination typically entails checking whether the selected word is part of a predefined multi-word combination.

    摘要翻译: 使用电子词典与电子编码的运行文本一起使用的技术,其给予用户最相关的信息,而不是使用户关于所选词的所有可能信息。 该技术将所选择的单词从其变形形式映射到其引用形式,在相邻和周围单词的上下文中分析所选择的单词以解决模糊性,并显示被确定为最有可能相关的信息。 字典优选地具有关于包括所选择的单词的多字组合的信息,并且上下文确定通常需要检查所选择的单词是否是预定义多字组合的一部分。