摘要:
A term synonym acquisition apparatus includes: a first generating unit which generates a context vector of an input term in an original language and a context vector of each synonym candidate in the original language; a second generating unit which generates a context vector of an auxiliary term in an auxiliary language that is different from the original language, where the auxiliary term specifies a sense of the input term; a combining unit which generates a combined context vector based on the context vector of the input term and the context vector of the auxiliary term; and a ranking unit which compares the combined context vector with the context vector of each synonym candidate to generate ranked synonym candidates in the original language.
摘要:
A term translation acquisition apparatus includes: a creation unit which creates a statistical model based on a set of input terms' context vectors, wherein the set of terms including at least two terms, are in the same source language and describe the same concept; and a ranking unit which uses the created statistical model to score terms in a target language that are considered as translation candidates for the concept.
摘要:
A synonymous expression assessment device includes: synonymy assessment means for receiving input of binary relations each of which includes a nominal and a predicate, and assessing whether or not the input binary relations are synonymous using a similarity between input nominals and a similarity between input predicates; and inter-predicate similarity computation means for, when computing the similarity between the input predicates based on a distribution of occurrence frequencies of nominals that are in binary relations to the input predicate in a document set, performing the computation using a distribution of only nominals that are used in the same type of concept as the input nominal.
摘要:
A text mining system including an analysis target search unit which judges whether a commonality in expressions among text data exists, an analysis viewpoint generation unit which generates an analysis viewpoint to extract an expression from the target data, a positive example set identification unit which identifies a positive example set including an expression matching the generated analysis viewpoint in the target data, a characteristic quantity calculation unit which calculates a characteristic quantity showing a degree of characterizing the positive example set of expressions in the target data, and a characteristic expression ranking unit which extracts expressions having the calculated characteristic quantity equal to or greater than a predetermined threshold as characteristic expressions and ranks the extracted characteristic expressions, and the target search unit extracts the analysis viewpoint among which a difference in ranks provided for the characteristic expressions is equal to or greater than a predetermined threshold.
摘要:
A textual entailment recognition apparatus (2) includes a vector generation unit (21) that generates, for each of first and second texts, a vector for each predicate-argument structure by using a word other than a word indicating a type of argument of a predicate in the predicate-argument structure; a combination identification (22) unit that compares the vector generated for each predicate-argument structure for the first text and the vector generated for each predicate-argument structure for the second text, and identifies combinations of the predicate-argument structures of the first text and the predicate-argument structure of the second text based on a result of the comparison; and an entailment determination unit (23) that obtains a feature amount for each of the identified combinations, and determines whether the first text entails the second text based on the obtained feature amounts.
摘要:
A text mining apparatus, a text mining method, and a program are provided that enable the influence that computer processing errors have on mining results to be reduced during text mining performed on a plurality of text data pieces including a text data piece generated by computer processing. A text mining apparatus 1 to be used includes an inherent portion extraction unit 6 that, for each of a plurality of text data pieces including a text data piece generated by computer processing, extracts an inherent portion of the text data piece relative to another of the text data pieces, an inherent confidence setting unit 7 that, for each inherent portion of each of the text data pieces, sets inherent confidence indicating confidence of the inherent portion, using the confidence that has been set for each of the text data pieces, and a mining processing unit 8 that performs text mining on each inherent portion of each of the text data pieces, using the inherent confidence.
摘要:
A text processing apparatus is provided with a segment determination unit 36 and a descriptive content determination unit 33. The segment determination unit 36 determines, with respect to a homogeneous segment that is similar to segments constituting a first text which is set as an analysis target (analysis target text) and that is included in another first text, whether the content thereof is included in a second text. The descriptive content determination unit 33 determines whether each segment constituting the analysis target text should be described in a corresponding second text, based on the determination result.
摘要:
A text mining apparatus, a text mining method, and a program are provided that enable the influence that computer processing errors have on mining results to be reduced during text mining performed on a plurality of text data pieces including a text data piece generated by computer processing. A text mining apparatus 1 to be used includes an inherent portion extraction unit 6 that, for each of a plurality of text data pieces including a text data piece generated by computer processing, extracts an inherent portion of the text data piece relative to another of the text data pieces, an inherent confidence setting unit 7 that, for each inherent portion of each of the text data pieces, sets inherent confidence indicating confidence of the inherent portion, using the confidence that has been set for each of the text data pieces, and a mining processing unit 8 that performs text mining on each inherent portion of each of the text data pieces, using the inherent confidence.
摘要:
Provided is a text mining device that performs an analysis properly with respect to a difference between plural related document data. Equipped are an element extracting section 140 that extracts language elements from related two or more document data respectively; a differential processing section 150 that extracts a difference between the document data by comparing the elements between the document data which were extracted by the element extracting means 140; and a statistical processing section 170 that performs statistical processing on the difference extracted by the differential processing section 150. The differential processing section 150 has: element associating section 151 that associates respective elements which are in identical, similar, synonymous, or analogous relation by comparing the elements of the document data between the document data which were extracted by the element extracting section 140; and differential element extracting section 152 that extracts an element with no corresponding element of a pair in the association by the element association section 151.
摘要:
A text mining apparatus, a text mining method, and a program are provided that accurately discriminate inherent portions of each of a plurality of text data pieces including a text data piece generated by computer processing.A text mining apparatus 1 to be used performs text mining using, as targets, a plurality of text data pieces including a text data piece generated by computer processing. Confidence is set for each of the text data pieces. The text mining apparatus 1 includes an inherent portion extraction unit 6 that extracts an inherent portion of each text data piece relative to another of the text data pieces, using the confidence set for each of the text data pieces.