摘要:
A document searching apparatus includes an input unit that inputs a search query for conducting a search in a structured document, the structured document being obtained by expressing elements included in a document in a hierarchical manner; a query converting unit that converts a query sentence constituting the search query and a search target element of the query sentence according to a predetermined rule so as to generate a new search query; a document searching unit that searches the structured document by using the new search query; and a search-result presenting unit that presents a result of the search.
摘要:
According to one embodiment, a keyword presentation apparatus includes an extraction unit, a selection unit and a clustering unit. The extraction unit is configured to extract, as technical terms, morpheme strings, which are not defined in a general concept dictionary, from a document set. The selection unit is configured to evaluate relevancies between each of basic term candidates and the technical terms, and to preferentially select basic term candidates having high relevancies as basic terms. The clustering unit is configured to calculate weighted sums of statistical degrees of correlation between the basic terms based on the document set, to calculate conceptual degrees of correlation between the basic terms based on the general concept dictionary, and to cluster the basic terms based on the weighted sums.
摘要:
According to one embodiment, a keyword presentation apparatus includes an extraction unit, a selection unit and a clustering unit. The extraction unit is configured to extract, as technical terms, morpheme strings, which are not defined in a general concept dictionary, from a document set. The selection unit is configured to evaluate relevancies between each of basic term candidates and the technical terms, and to preferentially select basic term candidates having high relevancies as basic terms. The clustering unit is configured to calculate weighted sums of statistical degrees of correlation between the basic terms based on the document set, to calculate conceptual degrees of correlation between the basic terms based on the general concept dictionary, and to cluster the basic terms based on the weighted sums.
摘要:
An apparatus for retrieving structured documents includes a first categorizing unit configured to categorize components into a first component of typical descriptions and a second component of atypical descriptions, based on statistics information for the components, a second categorizing unit configured to categorize the terms into a first term whose appearance ratio in the first component exceeds a threshold and a second term whose appearance ratio in the first component is not more than the threshold, an extraction unit configured to extract a set of structured documents each having the first component including the first term and the second component from the structured documents, and a ranking unit configured to rank the set of structured documents by a retrieval score calculating based o a relation between the second term and the second component.
摘要:
According to one embodiment, an information search apparatus includes a generation unit, a selection unit, a search unit and a display unit. The generation unit generates recognition candidate character strings based on shapes of strokes and combinations of the shapes. The selection unit calculates reliability values for the recognition candidate character strings and selects search keys from the recognition candidate character strings. The search unit searches a database for second character strings including the search keys, and obtains one or more result character strings indicating search results of each of the search keys. The display displays the one or more result character strings corresponding to each of the search keys distinctively.
摘要:
A feature-vector generation apparatus includes an input unit configured to input content data including at least one of video data and audio data, a generation unit configured to generate a feature vector, based on information indicating a time at which a characterizing state of the content data appears, the characterizing state being characterized by a change of the at least one of the video data and the audio data, and a storage unit configured to store the content data and the feature vector.
摘要:
An information retrieval system, includes speech recognition means for making speech recognition for a spoken question to generate first text information, generation means for modifying the first text information to generate second text information as a interrogative to make a search for an answer to the question, and search means for searching the answer from a document database by using the second text information.
摘要:
According to one embodiment, an information search apparatus includes a generation unit, a selection unit, a search unit and a display unit. The generation unit generates recognition candidate character strings based on shapes of strokes and combinations of the shapes. The selection unit calculates reliability values for the recognition candidate character strings and selects search keys from the recognition candidate character strings. The search unit searches a database for second character strings including the search keys, and obtains one or more result character strings indicating search results of each of the search keys. The display displays the one or more result character strings corresponding to each of the search keys distinctively.
摘要:
A feature-vector generation apparatus includes an input unit configured to input content data including at least one of video data and audio data, a generation unit configured to generate a feature vector, based on information indicating a time at which a characterizing state of the content data appears, the characterizing state being characterized by a change of the at least one of the video data and the audio data, and a storage unit configured to store the content data and the feature vector.
摘要:
An information retrieval system, includes speech recognition means for making speech recognition for a spoken question to generate first text information, generation means for modifying the first text information to generate second text information as a interrogative to make a search for an answer to the question, and search means for searching the answer from a document database by using the second text information.