摘要:
In registering operation of a document to be searched for, a document identifier management table for managing a range of a document identifier stored for each page and a page identifier of the page is created, and an individual-search-server's search range management table for managing the range of the document identifier in charge of each search server is created. In searching operation of each search server of the document to be searched for, the individual-search-server's search range management table is referred to acquire a range of the allocated document identifier. For each index key forming a query term specified as a query condition, the document identifier management table is referred to to acquire the page identifier storing the document identifier of the allocated range. The searching operation is carried out by referring to a page shown by the acquired page identifier.
摘要:
Provided is a data management method. Data corresponds to an entry including a reference to another entry and is managed in a set which is a collection of pieces of the data. The set corresponds to a linked list where the entry corresponding to the data is linked in order of addition of the data. The entry includes an insertion time sequence number inserted into the linked list and information indicating if the data has been deleted from the set. In that case, the entry is separated from the linked list at a predetermined timing. The linked list is traced to refer to the data. When the insertion time sequence number of the reference entry is later than the insertion time sequence number of the entry which has already been referred to, it is judged that the reference entry has been separated from the linked list.
摘要:
A method for creating an index for searching a structured document having a document data-structure stored, in a computer. The method to be performed in the computer for creating an index includes the steps of: analyzing a structured document to extract a document data-structure in the structured document; normalizing the extracted document data-structure to create a logical structure index composed of a plurality of elements having a hierarchical structure; extracting an appearance number of each element in the created logical structure index; and extracting elements for creating the index, based on the logical structure index, by comparing the extracted element appearance number and a first predetermined threshold.
摘要:
There is provided a data management method for managing data stored in a parallel database system in which a plurality of data servers manage data. The parallel database system manages: correspondence information between a characteristic of the data and each of the plurality of data servers that manages the data; and a data area corresponding to the characteristic of the data. The data management method comprising the steps of: extracting the characteristic of the data from data to be stored in the data area; storing the data in the data area based on the extracted characteristic of the data; specifying a corresponding data area based on the characteristic of the data stored in the data area by referring to the correspondence information; and accessing, by each of the plurality of data servers, the specified data area.
摘要:
A method for creating an index for searching a structured document having a document data-structure stored, in a computer. The method to be performed in the computer for creating an index includes the steps of: analyzing a structured document to extract a document data-structure in the structured document; normalizing the extracted document data-structure to create a logical structure index composed of a plurality of elements having a hierarchical structure; extracting an appearance number of each element in the created logical structure index; and extracting elements for creating the index, based on the logical structure index, by comparing the extracted element appearance number and a first predetermined threshold.
摘要:
In registering operation of a document to be searched for, a document identifier management table for managing a range of a document identifier stored for each page and a page identifier of the page is created, and an individual-search-server's search range management table for managing the range of the document identifier in charge of each search server is created. In searching operation of each search server of the document to be searched for, the individual-search-server's search range management table is referred to acquire a range of the allocated document identifier. For each index key forming a query term specified as a query condition, the document identifier management table is referred to to acquire the page identifier storing the document identifier of the allocated range. The searching operation is carried out by referring to a page shown by the acquired page identifier.
摘要:
Information that individual elements (characteristic character strings) indicative of characteristics of a registered document appear in the registered document is stored in advance. When calculating similarity of the registered document, a query designated by a searcher is analyzed. The query is represented by a characteristic vector having the individual elements which take the relation between a plurality of words into consideration. Pieces of appearance information of the individual words contained in the query are counted. The counted appearance information is compared with a searching index to calculate similarity between documents.
摘要:
Information that individual elements (characteristic character rings) indicative of characteristics of a registered document appear in the registered document is stored in advance. When calculating similarity of the registered document, a query designated by a searcher is analyzed. The query is represented by a characteristic vector having the individual elements which take the relation between a plurality of words into consideration. Pieces of appearance information of the individual words contained in the query are counted. The counted appearance information is compared with a searching index to calculate similarity between documents.