-
公开(公告)号:EP0160672A4
公开(公告)日:1986-05-12
申请号:EP84903871
申请日:1984-10-17
申请人: TEXT SCIENCES CORP
发明人: TAGUE LOUIE DON , COBB ALLEN T
摘要: A method and apparatus for compressing alphanumeric data that is stored or transmitted in the form of digital codes. A dictionary is created which assigns each word of the alphanumeric text and the punctuation that follows it to a unique address or token of, illustratively, up to 16 bits (two bytes). Each word in the alphanumeric text is then replaced by the address that refers to that word in the dictionary. Because the dictionary can contain up to 2 = 65,536 entries, it is more than adequate for the storage of the words associated with almost any book. Because only two bytes of information are needed to address any one of these 65,000 words, replacement of each word of text with two bytes of address information reduces the average number of digits required to store the text by a factor of about three. Further reductions of 25% or more in the length of the compressed text can be achieved in most cases by representing the most frequency used words with tokens that are shorter than two bytes in length. The number of bytes required to store the dictionary can be substantially reduced by storing the words in alphabetical order and taking advantage of the redundancy in characters that results. Thus, if the second of two entries contains five letters that are the same as that of the preceding entry, this can be signified by storing one character representing the number 5 and the remaining characters not common to both entries.