摘要:
An improved system for compacting text data to be transmitted over communications lines and thereby reduce the data volume and transmission time. Transmitting and receiving text processing systems are provided with identical library memories containing words commonly used in correspondence. Each word in a document to be communicated is compared to the transmitting system's word library and, if found in the library, only the library address is transmitted. If the word is not found in the library, then it is added to the transmitting system's library, sent, and added to the receiving system's library. The receiving system reconstructs the document by using the received addresses to access the appropriate words from its library and place them in the document. The system combines this word match encoding with character match encoding and facsimile run length encoding for communicating words not found in the system library.
摘要:
An improved system for compacting text data to be transmitted over communications lines and thereby reduce the data volume and transmission time. Transmitting and receiving text processing systems are provided with identical library memories containing words commonly used in correspondence. Each word in a document to be communicated is compared to the transmitting system's word library and, if found in the library, only the library address is transmitted. If the word is not found in the library, then it is added to the transmitting system's library, sent, and added to the receiving system's library. The receiving system reconstructs the document by using the received addresses to access the appropriate words from its library and place them in the document. The system combines this word match encoding with character match encoding and facsimile run length encoding for communicating words not found in the system library.
摘要:
Method and system for reducing the computation required to match a misspelled word against various candidates from a dictionary to find one or more words that represent the best match to the misspelled word. The method consists in comparing steps (20-24) a bit mask whose bits are set to reflect the presence or absence of specific characters or character combinations without regard to position in the misspelled word and in each of the dictionary candidate words. Then, (steps 25-27) a candidate word is dismissed from additional processing if there is not a predetermined percentage of bit mask match between the masks of the misspelled word and the candidate word.
摘要:
The combination of dictionary driven hyphenation, specialized algorithmic hyphenation and intelligent blank insertion provides improved right margin justification capability in a text processing system. When hyphenation is required for right margin justification, the system compares the word to be hyphenated to a prestored dictionary of words containing hyphenation points. When the word to be hyphenated matches one of the dictionary words the hyphenation points are retrieved and the word is split at the right margin. If the word to be hyphenated does not match one of the dictionary words, then a specialized list of prestored hyphenated suffixes and prestored statistical character digrams are compared to the word to determine the appropriate hyphenation points. Once the word has been split, the system searches the line for sets of predetermined words which may be separated from other words in the sentence by adding space to the line with a minimum of aesthetic distortion. Space is then added to the line until the line ending equals the right margin. The text is then printed.
摘要:
A system for automatically proofreading a document for word use validation in a text processing system is provided by coupling a specialized dictionary of sets of homophones and confusable words to sets of di-gram and N-gram conditions whereby proper usage of the words can be statistically determined. A text document is reviewed word-by-word against a dictionary of homophones and confusable words. When a match occurs, the related list of syntactic rules is examined relative to the context of the subject homophone or confusable word. If the syntax in the immediate context of the homophone or confusable word conflicts with the prestored syntax rules, the homophone or confusable word is highlighted on the system display. The system then displays the definition of the highlighted word along with possible intended alternative forms and their respective definitions. The operator can examine the word used and the possible alternatives and make a determination as to whether an error has been made and if a correction of the text is required. If correction is required, the operator may cause the error word to be replaced by the desired word by positioning the display cursor under the desired word and depressing an appropriate key on the system keyboard.
摘要:
An improved method for storing and accessing relational data bases in information processing systems. The data base records (44) are synthesized into a summary sorted list (45) of unique data elements. The data elements are related by virtue of their positions in the sorted list to pointers stored in an index table (43) which is an isomorphic mapping of the data base records from which the summary sorted list was derived. The index table captures the record content and juxtaposition of the record fields in a relational manner and yields the effect of a totally inverted data base file. The index table pointers facilitate high speed relational query processing with a minimum allocation of memory.
摘要:
Method for automatically abstracting a document in machine readable form consisting in storing in a dictionary memory (8) language terms commonly used in document preparation, comparing language terms from an input document received from an input register (16) with the stored language terms, selecting language terms from input document which do not compare, selecting language terms from input document which compare, coding the selecting language terms with the identity of the input document and storing the language terms in memory (12). When retrieving a document from storage, the processor (10) under the control of instruction memory (14) compares the words in an input query against the word index file in memory (12) and provides in register (18) the selected documents whose identification code corresponds to the highest retrieval value calculated using each identification code of each language term that compares.
摘要:
An improved system for identifying and compacting text data to be transmitted over communications lines and thereby reducing the data volume and transmission time. Transmitting and receiving text processing systems are provided identical library memories containing words commonly used in correspondence. Each word in a document to be communicated is compared to the transmitting system's word library and, if found in the library, only the library address is transmitted. If the word is not found in the library, then it is added to the transmitting system's library, sent, and added to the receiving system's library. The receiving system reconstructs the document by using the received addresses to access the appropriate words from its library and place them in the document. The system combines this word match encoding with character match encoding and facsimile run length encoding for communicating words not found in the system library. The character match process requires a template match and non-linear difference code summation combined with N-dimensional weighting using prestored feature vectors for statisticaly determining the match between an input character and characters stored in the system library.