摘要:
In order to calculate a reliability that serves as an index of reliableness of an evaluator who evaluated a document, a reliability calculation apparatus (2) is provided with a reliability calculation unit (21) that specifies an evaluation by each evaluator with respect to each author, based on first information specifying respective correspondence relationships between documents targeted for evaluation, evaluators who evaluated the documents and contents of the evaluations, and second information specifying respective correspondence relationships between the documents and authors of the documents, and calculates the reliability of each evaluator, based on the specified evaluation with respect to each author.
摘要:
In an inverted list of each node in a taxonomy, among each node, an inverted list of the highest node is a list of integer values indicating an identifier of search subject data, and an inverted list of a node other than the highest node, in place of the identifier, is a list of integer values indicating a position in an inverted list corresponding to a node that is higher by one than the node. Furthermore, a list of integer values in an inverted list of each node is divided into two or more blocks, and a differential value between an integer value and an integer value directly before the integer value in the block is converted into a bit string of a variable length integer code.
摘要:
In order to accurately learn a function for evaluating documents, even in the case where sample documents having missing feature values are included as training data, a document evaluation apparatus is provided with a data classification unit (3) that classifies a set of sample documents based on missing patterns of a first feature vector, a first learning unit (4) that uses feature values that are not missing in the first feature vector and evaluation values to learn a first function for calculating a first score which is a weighted evaluation value for each classification, a feature vector generation unit (5) that computes a feature value corresponding to each classification using the first score, and generates a second feature vector having the computed feature values, and a second learning unit (6) that uses the second feature vector and the evaluation values to learn a second function for calculating a second score for evaluating documents targeted for evaluation.
摘要:
The present invention more suitably determines whether a combination of words is an unexpected combination by the use of a smaller corpus. Disclosed is an unexpectedness determination system provided with: category identifying means which identifies a category to which a word belongs; category co-occurrence frequency identifying means which identifies a category co-occurrence frequency between two categories; unexpectedness index calculating means which calculates an index representing a degree of unexpectedness of a combination of two words. The category identifying means identifies a first category, to which an inputted first word belongs, and a second category, to which an inputted second word belongs, the category co-occurrence frequency identifying means identifies the category co-occurrence frequencies between the first category and categories other than the first category, and the unexpectedness index calculating means calculates an index representing the degree of unexpectedness of a combination of the first word and the second word on the basis of the category co-occurrence frequency identified by the category co-occurrence frequency identifying means.
摘要:
A linkage information output apparatus includes: a linkage information retrieval unit for acquiring, upon receiving source information, destination information linked with the source information, a frequency of occurrence of the source information, a frequency of occurrence of linked each of the destination information, and a frequency of occurrence of a link of the source information and each of the destination information from a linkage information accumulation unit; a recognition degree calculation unit calculating, based on each acquired frequency of occurrence, a recognition degree of the source information, a recognition degree of each acquired destination information, and a recognition degree of each link; and a high interest information narrowing unit selecting destination information to output from among each destination information based on a combination of two or more among a recognition degree of the source information, a recognition degree of the destination information, and a recognition degree of the link.
摘要:
In an inverted list of each node in a taxonomy, among each node, an inverted list of the highest node is a list of integer values indicating an identifier of search subject data, and an inverted list of a node other than the highest node, in place of the identifier, is a list of integer values indicating a position in an inverted list corresponding to a node that is higher by one than the node. Furthermore, a list of integer values in an inverted list of each node is divided into two or more blocks, and a differential value between an integer value and an integer value directly before the integer value in the block is converted into a bit string of a variable length integer code.
摘要:
A linkage information output apparatus includes: a linkage information retrieval unit for acquiring, upon receiving source information, destination information linked with the source information, a frequency of occurrence of the source information, a frequency of occurrence of linked each of the destination information, and a frequency of occurrence of a link of the source information and each of the destination information from a linkage information accumulation unit; a recognition degree calculation unit calculating, based on each acquired frequency of occurrence, a recognition degree of the source information, a recognition degree of each acquired destination information, and a recognition degree of each link; and a high interest information narrowing unit selecting destination information to output from among each destination information based on a combination of two or more among a recognition degree of the source information, a recognition degree of the destination information, and a recognition degree of the link.
摘要:
In order to accurately learn a function for evaluating documents, even in the case where sample documents having missing feature values are included as training data, a document evaluation apparatus is provided with a data classification unit (3) that classifies a set of sample documents based on missing patterns of a first feature vector, a first learning unit (4) that uses feature values that are not missing in the first feature vector and evaluation values to learn a first function for calculating a first score which is a weighted evaluation value for each classification, a feature vector generation unit (5) that computes a feature value corresponding to each classification using the first score, and generates a second feature vector having the computed feature values, and a second learning unit (6) that uses the second feature vector and the evaluation values to learn a second function for calculating a second score for evaluating documents targeted for evaluation.
摘要:
When gathering words through a dictionary growth process, a dictionary growth unit (102) stores information indicating through what process of input and output a word has been gathered in a gathering process memory unit (107). Then, a clustering unit (103) classifies the word that has been gathered by the dictionary growth process into clusters on the basis of information recorded in the gathering process memory unit (107). Next, a type determination unit (104) determines whether a word comprising a cluster is of the same type as a seed word or of a different type, for each cluster into which the word has been classified, on the basis of information recorded in the gather process memory unit (107). In addition, an output unit (105) associates information indicating the gathered word, the cluster to which the word belongs and whether the cluster is of the same type as the seed word or of a different type, and displays such.
摘要:
A boundary word identification unit (103) identifies a boundary word belonging to a plurality of categories among words gathered in dictionary growth processing. Then, a category membership degree calculation unit (104) calculates, for each category to which the boundary word belongs, a category membership degree indicating a degree to which the boundary word belongs to the category on the basis of information recorded in a gathering process memory unit (108). Next, a category update unit (105) determines the category to which the boundary word belongs on the basis of the category membership degree calculated by the category membership degree calculation unit (104) and updates information stored in a gathered-by-category word memory unit (109) so that the determination result is reflected.