摘要:
Compressing data from a markup language document such as an XML document includes the steps of creating from the document a path based statistical tree built according to a given set of rules, and compressing the document by using the statistical tree. In an embodiment, the statistical tree includes a multitude of paths, and a single bit represents each of said paths. Also, the document may include both enumerated data and non-enumerated data, and the enumerated data is compressed by using the statistical tree. In an embodiment, the document includes a multitude of document nodes, and the step of creating the path based statistical tree includes the step of forming said tree with a multitude of tree nodes, each of the tree nodes representing one of the document nodes.
摘要:
A dictionary for compressing and decompressing textual data has a number of keys. Each key is associated with an identifier. The keys include static word or phrase keys, where each static word or phrase key lists one or more unchanging words in a particular order. The keys further include dynamic phrase keys, where each dynamic phrase key lists a number of words and one or more placeholders in a particular order, and each placeholder denotes a place where a word or phrase other than the words of the dynamic phrase key is to be inserted. At least one of the dynamic phrase keys may identify one or more of the words by identifiers for corresponding static words or phrase keys. At least one of the static word or phrase keys may identify one or more of the words by identifiers for corresponding other static words or phrase keys.