Non-text object storage and retrieval
    11.
    发明授权
    Non-text object storage and retrieval 失效
    非文本对象的存储和检索

    公开(公告)号:US5404435A

    公开(公告)日:1995-04-04

    申请号:US195043

    申请日:1994-02-14

    IPC分类号: G06F12/00 G06F17/30 G06F3/03

    摘要: The presence of a non-text object is sensed in a mixed object document to be archived in an information retrieval system. In addition to text objects, a mixed object document can contain non-text objects such as image objects, graphics objects, formatted objects, font objects, voice objects, video objects and animation objects. This enables the creation of key words which characterize the non-text object, for incorporation in the inverted file index of the data base, thereby enabling the later retrieval of either the entire document or the independent retrieval of the non-text object through the use of such key words.

    摘要翻译: 在混合对象文档中感测非文本对象的存在以存档在信息检索系统中。 除了文本对象之外,混合对象文档可以包含非文本对象,如图像对象,图形对象,格式化对象,字体对象,语音对象,视频对象和动画对象。 这使得能够创建表征非文本对象的关键词,用于并入数据库的反转文件索引中,从而使得通过使用能够稍后检索整个文档或非文本对象的独立检索 的关键词。

    System and method for deferred processing of OCR scanned mail
    12.
    发明授权
    System and method for deferred processing of OCR scanned mail 失效
    OCR扫描邮件延期处理的系统和方法

    公开(公告)号:US5031223A

    公开(公告)日:1991-07-09

    申请号:US426617

    申请日:1989-10-24

    摘要: The invention is characterized as a data processing architecture and method for multi-stage processing of mail, using knowledge based techniques. The system includes OCR-scanning a multipart address field of a mail piece at a sending location, the address field including at least two portions, a first stage routing portion (destination city, state, country, zip code) and a second stage routing portion (destination street address, building floor, corporate addressee internal routing). At the sending location, the image of the entire address field is captured by an OCR head and stored in memory. A serial number is printed on the mail piece. The first routing portion is then converted into sorting signals to sort the mail piece to a truck at the sending location which is to be dispatched to the city, state and country indicated in the first stage routing portion. Then, while the mail piece is in transit by truck to the destination city, the image of the second stage routing portion is analyzed by a knowledge base processor to resolve street addresses, building floor, corporate addressee internal routing information and addressee name. The deferred execution of the analysis by the knowledge base processor is available because of the sporadic volume of mail pieces submitted to the sytem. While the mail piece is in transit on the truck, the knowledge processor completes its analysis and is able to transmit by electronic communications link to the destination location, the information that the mail piece is on its way and the second stage routing information needed to automatically sort and deliver the mail piece to its corporate addressee.

    Alpha content match prescan method for automatic spelling error
correction
    13.
    发明授权
    Alpha content match prescan method for automatic spelling error correction 失效
    Alpha内容匹配预扫描方法进行自动拼写纠错

    公开(公告)号:US4328561A

    公开(公告)日:1982-05-04

    申请号:US108000

    申请日:1979-12-28

    摘要: A system for reducing the computation required to match a misspelled word against various candidates from a dictionary to find one or more words that represent the best match to the misspelled word. The major facility offered is the ability to computationally discern the degree of apparent match that exists between words that do not perfectly match a given target word without requiring the computationally tedious procedure of character by character positional matching which necessitates shifting and realignment to accommodate for differences between the candidate and target words due to character differences or added and dropped syllables. The system includes a method for storing and retrieving words from the dictionary based on their likelihood of being the correct version of a misspelled word and then reviewing those words further using the Prescan Alpha Content Match to reduce the number of candidates that must then be examined in a high resolution positional match to find the candidate(s) which matches the mis-spelled word with the greatest character affinity. The Prescan Alpha Content Match reduces the number of candidates in contention so as to make a high resolution match computationally feasible on a real-time basis.

    摘要翻译: 一种用于减少匹配拼写错误的单词与来自词典的各种候选者所需的计算的系统,以找到表示与拼写错误的单词最佳匹配的一个或多个单词。 所提供的主要设施是能够计算地辨别不完全匹配给定目标词的单词之间存在的表观匹配程度,而不需要按字符位置匹配进行计算上繁琐的过程,这需要移位和重新排列以适应 候选人和目标词由于字符差异或添加和删除的音节。 该系统包括一种方法,用于根据其拼写错误的单词的正确版本的可能性来存储和检索词典中的单词,然后使用Prescan Alpha内容匹配来进一步检查这些单词,以减少必须接受检查的候选人数 高分辨率位置匹配,以找到匹配具有最大字符亲和力的拼写错误的单词的候选。 Prescan Alpha内容匹配减少了竞争中的候选人的数量,以使得在实时的基础上使计算上可行的高分辨率匹配。

    Mixed mode enhanced resolution hyphenation function for a text
processing system
    14.
    发明授权
    Mixed mode enhanced resolution hyphenation function for a text processing system 失效
    混合模式增强了文本处理系统的分辨率连字符功能

    公开(公告)号:US4574363A

    公开(公告)日:1986-03-04

    申请号:US397703

    申请日:1982-07-13

    CPC分类号: G06F17/25 G06F17/26

    摘要: The combination of dictionary driven hyphenation, specialized algorithmic hyphenation and intelligent blank insertion provides improved right margin justification capability in a text processing system. When hyphenation is required for right margin justification, the system compares the word to be hyphenated to a prestored dictionary of words containing hyphenation points. When the word to be hyphenated matches one of the dictionary words the hyphenation points are retrieved and the word is split at the right margin. If the word to be hyphenated does not match one of the dictionary words, then a specialized list of prestored hyphenated suffixes and prestored statistical character digrams are compared to the word to determine the appropriate hyphenation points. Once the word has been split, the system searches the line for sets of predetermined words which may be separated from other words in the sentence by adding space to the line with a minimum of aesthetic distortion. Space is then added to the line until the line ending equals the right margin. The text is then printed.

    摘要翻译: 字典驱动连字,专用算法连字和智能空白插入的组合在文本处理系统中提供了改进的右边距调整能力。 当需要连字符进行右边距调整时,系统将要连字的单词与包含连字符的单词的预先存储的字典进行比较。 当要连字的单词与一个字典单词匹配时,检索连字符点,并将该单词在右边距分割。 如果要连字的单词与字典单词不匹配,则将预先存储的连字符后缀和预先存储的统计字符数组的专门列表与单词进行比较,以确定适当的连字符点。 一旦该单词被分割,系统将在线中搜索可以通过以最小的美学失真向该行添加空格而将句子中的其他单词分离的预定单词的集合。 然后将空格添加到行中,直到行结束等于右边距。 然后打印文本。

    Method for identification and compression of facsimile symbols in text
processing systems
    15.
    发明授权
    Method for identification and compression of facsimile symbols in text processing systems 失效
    文本处理系统中传真符号的识别和压缩方法

    公开(公告)号:US4499499A

    公开(公告)日:1985-02-12

    申请号:US454230

    申请日:1982-12-29

    CPC分类号: G06K9/80 H04N1/4115

    摘要: An improved system for identifying and compacting text data to be transmitted over communications lines and thereby reducing the data volume and transmission time. Transmitting and receiving text processing systems are provided identical library memories containing words commonly used in correspondence. Each word in a document to be communicated is compared to the transmitting system's word library and, if found in the library, only the library address is transmitted. If the word is not found in the library, then it is added to the transmitting system's library, sent, and added to the receiving system's library. The receiving system reconstructs the document by using the received addresses to access the appropriate words from its library and place them in the document. The system combines this word match encoding with character match encoding and facsimile run length encoding for communicating words not found in the system library. The character match process requires a template match and non-linear difference code summation combined with N-dimensional weighting using prestored feature vectors for statistically determining the match between an input character and characters stored in the system library.

    摘要翻译: 一种改进的系统,用于识别和压缩通过通信线路发送的文本数据,从而减少数据量和传输时间。 发送和接收文本处理系统提供了包含通信中通常使用的单词的相同的库存储器。 将要传送的文档中的每个单词与发送系统的单词库进行比较,如果在库中找到,则仅传输库地址。 如果在库中找不到该字,则将其添加到发送系统的库中,发送并添加到接收系统的库中。 接收系统通过使用接收到的地址来重构文档,以从其库中访问适当的单词并将它们放在文档中。 该系统将该字匹配编码与字符匹配编码和传真运行长度编码相结合,用于传达在系统库中未发现的单词。 字符匹配过程需要模板匹配和非线性差分代码求和与使用预存的特征向量的N维加权相结合,以统计确定输入字符与存储在系统库中的字符之间的匹配。

    Office correspondence storage and retrieval system
    16.
    发明授权
    Office correspondence storage and retrieval system 失效
    办公通信存储和检索系统

    公开(公告)号:US4358824A

    公开(公告)日:1982-11-09

    申请号:US107994

    申请日:1979-12-28

    摘要: A system that intelligently abstracts and archives a document for storage and interprets a free form user retrieval query to recall the document from the storage file. The system includes a method for automatically selecting keywords from the document using a parts of a speech directory. A method is given for weighing the importance or centrality of each keyword with respect to the document of its origin. Using the same logic paths, a free form query that describes the document in the same manner that it would have to be described to a secretary to "find" it in a filing cabinet, the system automatically determines the key matching terms and finds the archived document(s) with the greatest affinity.

    摘要翻译: 一种智能抽象和归档文档以进行存储的系统,并解释一个自由表单用户检索查询,以从存储文件中调用该文档。 该系统包括用于使用语音目录的一部分从文档中自动选择关键字的方法。 给出了衡量每个关键词对其起源文件的重要性或中心性的方法。 使用相同的逻辑路径,一种自由格式查询,以与向秘书进行描述的方式相同的方式描述文档,以便在文件柜中“查找”文档,系统自动确定关键匹配项,并找到归档 具有最大亲和力的文件。

    Instantaneous alpha content prescan method for automatic spelling error
correction
    17.
    发明授权
    Instantaneous alpha content prescan method for automatic spelling error correction 失效
    用于自动拼写错误纠正的瞬时alpha内容预扫描方法

    公开(公告)号:US4355371A

    公开(公告)日:1982-10-19

    申请号:US133707

    申请日:1980-03-25

    IPC分类号: G06K9/72 G06F17/27 G06F7/02

    CPC分类号: G06F17/273

    摘要: A system for reducing the computation required to match a misspelled word against various candidates from a dictionary to find one or more words that represent the best match to the misspelled word. The major facility offered is the ability to computationally discern the degree of apparent match that exists between words that do not perfectly match a given target word without requiring the computationally tedious procedure of character by character positional matching which necessitates shifting and realignment to accommodate for differences between the candidate and target words due to character differences or added and dropped syllables. The system includes a method for storing and retrieving words from the dictionary based on their likelihood of being the correct version of a misspelled word and then reviewing those words further to reduce the number of candidates that must then be examined in a high resolution positional match to find the candidate(s) which matches the misspelled word with the greatest character affinity. This technique reduces the number of candidates in contention so as to make a high resolution match computationally feasible on a real-time basis. The discriminant potential and the real-time computational burden associated with the technique are balanced in an optimal manner.

    摘要翻译: 一种用于减少匹配拼写错误的单词与来自词典的各种候选者所需的计算的系统,以找到表示与拼写错误的单词最佳匹配的一个或多个单词。 所提供的主要设施是能够计算地辨别不完全匹配给定目标词的单词之间存在的表观匹配程度,而不需要按字符位置匹配进行计算上繁琐的过程,这需要移位和重新排列以适应 候选人和目标词由于字符差异或添加和删除的音节。 该系统包括一种方法,用于根据其拼写错误的单词的正确版本的可能性来存储和检索词典中的单词,然后进一步检查这些单词以减少必须在高分辨率位置匹配中必须检查的候选人数 找到匹配拼写错误的单词具有最大字符亲和度的候选人。 这种技术减少了竞争中的候选人的数量,以使得在实时的基础上使计算上可行的高分辨率匹配。 与该技术相关的判别电位和实时计算负担以最佳方式进行平衡。

    Stem processing for data reduction in a dictionary storage file
    18.
    发明授权
    Stem processing for data reduction in a dictionary storage file 失效
    用于字典存储文件中的数据缩减的句柄处理

    公开(公告)号:US4342085A

    公开(公告)日:1982-07-27

    申请号:US1123

    申请日:1979-01-05

    CPC分类号: G06F17/273

    摘要: A system for reducing storage requirements and accessing times in a text processing machine for automatic spelling verification and hyphenation functions. The system includes a method for storing a word list file and accessing the word list file such that legal prefixes and suffixes are truncated and only the unique root element, or "stem", of a word is stored. A set of unique rules is provided for prefix/suffix removal during compilation of the word list file and subsequent accessing of the word list file. Spelling verification is accomplished by applying the rules to the words whose spelling is to be verified and application of the said rules provides, under most circumstances, a natural hyphenation break point at the prefix-stem and stem-suffix junctions.

    摘要翻译: 一种用于在文本处理机中减少存储要求和访问时间的系统,用于自动拼写检验和连字符功能。 该系统包括用于存储单词列表文件和访问单词列表文件的方法,使得合法前缀和后缀被截断,并且仅存储单词的唯一根元素或“词干”。 在汇编单词列表文件和随后访问单词列表文件时,提供了一组唯一的规则用于前缀/后缀删除。 拼写验证是通过将规则应用于要拼写验证的单词,并且在大多数情况下,前缀和词干后缀连接处的自然连字符断点在上述规则的应用程序中提供。

    Multi-channel recognition discriminator

    公开(公告)号:US3988715A

    公开(公告)日:1976-10-26

    申请号:US625618

    申请日:1975-10-24

    CPC分类号: G06K9/72 G06K2209/01

    摘要: A multi-channel multi-genre character recognition discriminator is disclosed which performs the decision making process between strings of characters coming from a multi-channel (i.e., three or more channels) alpha-numeric output optical character reader (OCR) system for use in such applications as, for example, text processing and mail processing. The multi-channel output OCR uses separate recognition processes for each genre or character set indicative of a distinct group with respect to style (i.e., font) or form, and attempts to recognize each character independently as belonging to each respective genre. For example, in a three channel output OCR for reading mixed numeric, English and Russian Cyrillic character sets, the English alphabetic interpretation of a scanned word is outputted as an English alphabetic subfield on a first OCR output line, the Cyrillic interpretation of the scanned word is outputted as a Cyrillic subfield on a second OCR output line, and numeric interpretation of the scanned word is outputted as a numeric subfield on a third OCR output line. A multi-channel multi-genre character recognition discriminator analyzes these three subfield character streams by calculating a first conditional probability that given the OCR has scanned and recognized an English alphabetic character E.sub.i, the probability that numeric N.sub.K and Cyrillic C.sub.J characters were respectively misrecognized by their recognition channels; a second conditional probability that given the OCR has scanned and recogized a Cyrillic character C.sub.J the probability that numeric N.sub.K and English E.sub.i characters were respectively misrecognized by their recognition channels; and a third conditional probability that given the OCR scanned and recognized a numeric character N.sub.K, the probability that English E.sub.i and Cyrillic C.sub.J characters were respectively misrecognized by their recognition channels. These conditional probabilities are developed character by character for each character within a string thereof or a word. A first product of all the first type conditional probabilities is calculated for all of the characters in a word (which may, of course, contain only a single character); similarly second and third products are calculated for the second and third conditional probabilities, respectively. The magnitudes of the products of these conditional probabilities are then compared in an N-channel comparator, and the highest probability subfield is selected as the most probable interpretation of the word scanned by the OCR.