System for categorizing character strings using acceptability and
category information contained in ending substrings
    1.
    发明授权
    System for categorizing character strings using acceptability and category information contained in ending substrings 失效
    使用可接受性和包含在结尾子字符串中的类别信息对字符串进行分类的系统

    公开(公告)号:US5488719A

    公开(公告)日:1996-01-30

    申请号:US814552

    申请日:1991-12-30

    IPC分类号: G06K9/68 G06F17/30

    CPC分类号: G06K9/6807

    摘要: A data storage medium stores string data that can be used in character recognition and instructions for accessing the string data. The string data includes data units that can be accessed by a processor in executing the instructions. The processor can use character data indicating characters of a string to access a sequence of the data units that ends with an ending subsequence. The ending subsequence includes acceptance information indicating whether a string whose sequence of data units ends with the ending subsequence is an acceptable string. If so, the ending subsequence also includes category set information indicating a set of categories for strings whose sequences end with the ending subsequence. The categories can include words, numbers, compound words, and so forth. The acceptance information can include a bit in a character label data unit that includes information indicating the character type of an ending character. The acceptance information can also include an acceptance data unit whose value indicates an acceptable string ending. The acceptance data unit can be followed by category data units, each with a value indicating a category. The category data units can be used to obtain a bit vector for a string, each bit of which indicates whether the string is in one of the categories. For compactness, all or part of an ending subsequence can be shared by plural acceptable strings. Looping can be used to represent a category with a potentially infinite number of strings, such as numbers.

    摘要翻译: 数据存储介质存储可用于字符识别的字符串数据和用于访问字符串数据的指令。 字符串数据包括处理器在执行指令时可以访问的数据单元。 处理器可以使用指示字符串的字符的字符数据来访问以结尾子序列结束的数据单元的序列。 结束子序列包括表示数据单元的序列是否以结束子序列结束的字符串是可接受字符串的接收信息。 如果是,则结束子序列还包括指示其序列以结尾子序列结束的字符串的类别集合的类别集信息。 类别可以包括单词,数字,复合词等。 接受信息可以包括字符标签数据单元中包含指示结束字符的字符类型的信息的位。 接受信息还可以包括其值表示可接受的字符串结束的接受数据单元。 接受数据单元可以跟随类别数据单元,每个具有指示类别的值。 类别数据单元可以用于获得字符串的位向量,其每一位表示字符串是否在其中一个类别中。 对于紧凑性,结束子序列的全部或部分可以由多个可接受的字符串共享。 循环可用于表示具有潜在无限数量的字符串(例如数字)的类别。