摘要:
A non-transitory computer-readable recording medium has stored therein a program for causing a computer to execute a process. The process includes: when obtaining a character string including one unit of character information at one position in the character string, referring to presence/absence information indicating whether or not at least one character string, in a character string group including a plurality of character strings to which compression codes have been assigned, includes the one unit of character information at the one position; and searching the character string group for the obtained character string except for a case that the presence/absence information indicates that none of the character strings included in the character string group include the one unit of character information at the one position.
摘要:
An encoding device 100 encodes a target file by using a static dictionary 121 and a dynamic dictionary 122. The encoding device 100 generates index information of the target file by folding a file axis and a word axis of the target file utilizing base numbers, respectively, the index information indicates presence information of words registered in the static dictionary 121 and the dynamic dictionary 122. The encoding device 100 generates, when the target file is updated, the difference information indicating difference of the index information with respect to the file axis direction or the word axis direction.
摘要:
An encoding unit encodes first encoding each of first words in a target file utilizing a first code allocation rule, each of the first words having an appearance frequency larger than an appearance frequency of a word positioned at a given ordinal rank in word frequency information, the word frequency information being information of word frequencies in a plurality of files that the target file is included, the first code allocation rule being generated from the word frequency information, and the encoding unit encodes at least a second word in the target file into a code with a first code length utilizing a second code allocation rule, the second word having appearance frequency smaller than the appearance frequency of the word positioned at the given ordinal rank in the word frequency information, the second code allocation rule being different from the first code allocation rule.
摘要:
An extracting method that is executed by a computer. The extracting method includes storing first information into a storage device, wherein the first information indicates for each of a plurality of files and for each of a plurality of character data, whether the file includes the character data; storing second information into the storage device when a given file included in the files is updated, wherein the second information indicates for each of the character data, whether the given file includes the character data; and extracting a file group from the files when a search request is received, wherein from the file group, a file is excluded that is indicated by the first information and the second information not to include a character data to be searched for included in the search request.
摘要:
An extracting method that is executed by a computer. The extracting method includes storing first information into a storage device, wherein the first information indicates for each of a plurality of files and for each of a plurality of character data, whether the file includes the character data; storing second information into the storage device when a given file included in the files is updated, wherein the second information indicates for each of the character data, whether the given file includes the character data; and extracting a file group from the files when a search request is received, wherein from the file group, a file is excluded that is indicated by the first information and the second information not to include a character data to be searched for included in the search request.
摘要:
An information processing apparatus splits a word to be encoded into a plurality of word elements. The information processing apparatus obtains a plurality of hashed word elements by hashing each of the plurality of word elements, number of bits of each of the plurality of hashed word elements corresponding to a position of each of the plurality of word elements in the word, respectively. The information processing apparatus outputs an encoding result that the plurality of the hashed word elements are combined.
摘要:
A non-transitory computer-readable recording medium stores a data search program that causes a computer to execute a process including: receiving a search character string for target text data; and searching for the search character string by a logical operation between index information associated with appearance positions in the target text data of each of characters or words appearing in the target text data as bitmap data and search bitmap data generated to be associated with an appearance order in the search character string of respective characters or respective words constituting the search character string.
摘要:
The encoding device 100 extracts, when encoding a target file by using a static dictionary unit 121 and a dynamic dictionary unit 122, a registered word included in an external dictionary unit 221 from among words registered in the dynamic dictionary unit 122, in which the external dictionary associates a specific word group and a code group with each other; and registers, in the dynamic dictionary unit 122, a code of the registered word in the external dictionary unit 221 and a dynamic code assigned dynamically in association with each other.
摘要:
An information processing apparatus according to an embodiment determines whether a target character string is registered in a first dictionary, the target character string being a compression target contained in input data, outputs a compression code corresponding to the target character string when the target string is registered in the first dictionary, searches the target character string in first data when the target character string is not registered in the first dictionary, the first date accumulating character strings that are a part of the input data and have been determined to be not registered in the first dictionary, registers a matched character string in a second dictionary different from the first dictionary when the target character string is searched in the first data and outputs a compression code corresponding to a registration number of the target character string in the second dictionary.
摘要:
A non-transitory computer-readable recording medium has stored therein a compression program that causes a computer to execute a process. The process includes: reading a plurality of character strings from a compression target file; examining order information whether there are any concatenated character strings that contain a certain character string in the plurality of character strings with an order of the certain character string, in a compression dictionary, the compression dictionary correlating a plurality of concatenated character strings with a plurality of compression codes respectively, each of the plurality of concatenated character strings include a plurality of character strings, the order information indicating whether there is a specific character string in the compression dictionary with an order of the specific character string; and searching the compression dictionary utilizing the plurality of character strings when the order information indicates that one or more concatenated character strings include the certain character string.