Abstract:
An information processing apparatus includes a lexical analysis unit that generates a training word string, a group generation unit that generates a plurality of training word groups, a matrix generation unit that generates, for each training word group, a training matrix in which a plurality of words and respective semantic vectors of the words are associated, a classification unit that calculates, for a word of each position of the training word string, a probability of the word corresponding to a specific word, using the training matrices generated by the matrix generation unit and a determination model that uses a convolutional neural network, and an optimization processing unit that updates parameters of the determination model, such that the probability of the word labeled as corresponding to the specific word is high, among the probabilities of the words of the respective positions of the training word string calculated by the classification unit.
Abstract:
An information processing apparatus includes a lexical analysis unit that generates a training word string, a pair generation unit that generates a plurality of training word pairs, a matrix generation unit that generates, for each training word pair, a training matrix in which a plurality of words and respective semantic vectors of the words are associated, a classification unit that calculates, for a word of each position of the training word string, a probability of the word corresponding to a specific word, using the training matrices generated by the matrix generation unit and a determination model that uses a convolutional neural network, and an optimization processing unit that updates parameters of the determination model, such that the probability of the word labeled as corresponding to the specific word is high, among the probabilities of the words of the respective positions of the training word string calculated by the classification unit.
Abstract:
Provided is a text processing system capable of classifying a plurality of texts into groups whose overviews are able to be grasped and classifying texts semantically having entailment relation into the same group even if the texts are not determined to have the entailment relation. Entailment recognition means 71 performs entailment recognition between texts on given texts. Group generation means 72 selects an individual text and generates a group including texts entailing the selected text as members. Group integration means 73 integrates groups in the case where groups satisfy a predetermined condition based on the degree of overlap of members between groups.
Abstract:
A text processing system that is able to appropriately determine textual entailment between sentences with high coverage is provided. The text processing system is configured to execute: processing of extracting a common substructure that is a partial structure of a same type, the partial structure being common to a first sentence and a second sentence and, based on the a structure representing the first sentence and a structure representing the second sentence; processing of extracting at least one of a feature amount representing a dependency relationship between the at least one common substructure in the first and second sentences and a feature amount representing a dependency relationship between the common substructure in the first and second sentences and a substructure different from the common substructure; and processing of determining an entailment relationship between the first sentence and the second sentence by using the extracted feature amount.
Abstract:
A text mining device includes: an analysis unit which acquires, from data including text and one or more attributes including an attribute name and an attribute value and associated with the text, the attributes as analysis viewpoints, analyzes the data using the respective analysis viewpoints to obtain an analysis result from each analysis viewpoint, and generates result vectors of the respective analysis viewpoints; a similarity acquisition unit which acquires a vector similarity between the result vectors of the plural analysis viewpoints; and a recommendation unit which extracts and output a combination of the analysis viewpoints as a recommendation candidate on basis of the vector similarity.
Abstract:
A method for classifying a new instance including a text document by using training instances with class including labeled data and zero or more training instances with class including unlabeled data, comprising: estimating a word distribution for each class by using the labeled data and the unlabeled data; estimating a background distribution and a degree of interpolation between the background distribution and the word distribution by using the labeled data and the unlabeled data; calculating two probabilities for that the word generated from the word distribution and the word generated from the background distribution; combining the two probabilities by using the interpolation; combining the resulting probabilities of all words to estimate a document probability for the class that indicates the document is generated from the class; and classifying the new instance as a class for which the document probability is the highest.
Abstract:
An entailment evaluation device includes: a generation unit which generates first information indicating at least the order of occurrence of events of first and second simple sentences included in the hypothesis text and generates second information indicating at least the order of occurrence of events of third and fourth simple sentences included in a target text, the third simple sentence being related to the first simple sentence, the fourth simple sentence being related to the second simple sentence; a calculation unit which obtains a calculation result by comparing, based on the first and second information, the order of occurrence of events of first and second simple sentences and order of occurrence of events of third and fourth simple sentences; and a determination unit which determines, based on at least the calculation result, whether or not the target text entails the hypothesis text.
Abstract:
A similar data search device includes: an inverted index generating unit which determines size ranges of sets of search targets for each of inverted indexes so that the number of sets of search targets is not smaller than a specified number and generates inverted indexes by dividing the sets of search targets according to the determined size ranges; an unnecessary inverted index identifying unit which determines, based on a size of a set of search conditions and a threshold value specified for a similarity between sets, a condition necessary for the similarity to be no smaller than the threshold value, and identifies, as an inverted index unnecessary for searches, any inverted index other than those inverted indexes containing a set whose minimum size value satisfies the condition; and a data search unit which conducts a search on a non-identified inverted index.
Abstract:
A reasoning system that enables reasoning when there is a shortage of knowledge. An input unit receives a start state and an end state. A rule candidate generation unit identifies a first state, obtained by tracking one or more known rules from the start state, and a second state, obtained by backtracking one or more known rules from the end state, respectively. The generation unit generates a rule candidate relating to the first state and the second state or generates a rule candidate relating to the first state and a rule candidate relating to the second state. A rule selection unit selects, based on feasibility of the generated rule candidate, which is calculated based on one or more known rules, the generated rule candidate as a new rule. A derivation unit derives the end state from the start state, based on one or more known rules and the new rule.
Abstract:
A document classification method includes a first step for calculating smoothing weights for each word and a fixed class, a second step for calculating smoothed second-order word probability, and a third step for classifying document including calculating the probability that the document belongs to the fixed class.