摘要:
In case plural pieces of data are analyzed, parts of these pieces of data including a difference which should be compared and analyzed with priority are analyzed exhaustively, with suppressing a cost of analyzing.A text mining system includes an analysis target data pair search unit which judges whether there is a commonality in expressions among pieces of text data, the pieces of text data being included in plural pieces of analysis target data, respectively, an analysis viewpoint generation unit which generates an analysis viewpoint which is a condition to extract an expression from each of the pieces of analysis target data with a commonality in such a way that characteristic expression lists, each of which is a set of characteristic expressions satisfying a predetermined condition in text data included in the pieces of analysis target data, are different among the pieces of analysis target data, a positive example set identification unit which identifies a positive example set including an expression matching the generated analysis viewpoint in each of the pieces of analysis target data, a characteristic quantity calculation unit which calculates a characteristic quantity showing a degree of characterizing the positive example set for each of expressions in each of the pieces of analysis target data, and a characteristic expression ranking unit which extracts expressions each having the calculated characteristic quantity being equal to or greater than a predetermined threshold as characteristic expressions and provides ranks for the extracted characteristic expressions in descending order of the calculated characteristic quantity, wherein the analysis target data pair search unit extracts the analysis viewpoint for the pieces of analysis target data among which a difference in ranks provided for each of the characteristic expressions is equal to or greater than a predetermined threshold.
摘要:
Disclosed are a text mining system, text mining method, and recording medium for suppressing increase in cost of analysis for an analyst even if, when analyzing a plurality of data to be analyzed, the data are to be integrally analyzed. The text mining system comprises a data set generation unit for generating a data set to be analyzed that includes data to be analyzed that include text data; and a data set search unit for searching for a data set to be analyzed of which the feature representation coverage exceeds a value given beforehand, or the cost of analysis does not exceed a value given beforehand from data sets to be analyzed generated by the data set generation unit; wherein the feature representation coverage is the ratio of the number of feature representations included in a feature representation list which is a group of feature representations, which are representations satisfying predetermined conditions from text data within the data set to be analyzed, to the number of feature representations in all data to be analyzed; and the cost of analysis is defined on the basis of the number of feature representations included in the data set to be analyzed.
摘要:
A text mining apparatus, a text mining method, and a program are provided that accurately discriminate inherent portions of each of a plurality of text data pieces including a text data piece generated by computer processing.A text mining apparatus 1 to be used performs text mining using, as targets, a plurality of text data pieces including a text data piece generated by computer processing. Confidence is set for each of the text data pieces. The text mining apparatus 1 includes an inherent portion extraction unit 6 that extracts an inherent portion of each text data piece relative to another of the text data pieces, using the confidence set for each of the text data pieces.
摘要:
A text processing apparatus is provided with a segment determination unit 36 and a descriptive content determination unit 33. The segment determination unit 36 determines, with respect to a homogeneous segment that is similar to segments constituting a first text which is set as an analysis target (analysis target text) and that is included in another first text, whether the content thereof is included in a second text. The descriptive content determination unit 33 determines whether each segment constituting the analysis target text should be described in a corresponding second text, based on the determination result.
摘要:
Disclosed are a text mining method, device, and program capable of performing text mining with a specific topic as an object with high precision. An element identification unit calculates a feature degree, which is an index for indicating a degree that within a text set of interest, which is a set of text that is to be analyzed, an element of the text appears. An output unit identifies distinctive elements within the text set of interest on the basis of the calculated feature degree and outputs the identified elements. The element identification unit corrects the feature degree on the basis of a topic relatedness degree, which is a value indicating a degree related to a topic of analysis, which is a topic for which each text portion of the text being analyzed has been partitioned into predetermined units that are to be analyzed.
摘要:
An information analysis device (30) comprises a relevant portion identification unit (31) that compares analyzed target text with topic-related text that is written about the same event as the analyzed target text and includes information related to a specific topic, and that specifies a portion of the analyzed target text related to the topic-related text; a potential topic word extraction unit (32) that extracts a word of the specific portion; and a statistical model generation unit (33) that generates a statistical model that estimates a degree of appearance of a word on a specific topic of the analyzed target text. The statistical model generation unit (33) generates a statistical model such that degrees of appearance in a specific topic of the topic-related text word and of the extracted word are higher than those of other words.
摘要:
A text mining apparatus, a text mining method, and a program are provided that accurately discriminate inherent portions of each of a plurality of text data pieces including a text data piece generated by computer processing.A text mining apparatus 1 to be used performs text mining using, as targets, a plurality of text data pieces including a text data piece generated by computer processing. Confidence is set for each of the text data pieces. The text mining apparatus 1 includes an inherent portion extraction unit 6 that extracts an inherent portion of each text data piece relative to another of the text data pieces, using the confidence set for each of the text data pieces.
摘要:
A text mining system including an analysis target search unit which judges whether a commonality in expressions among text data exists, an analysis viewpoint generation unit which generates an analysis viewpoint to extract an expression from the target data, a positive example set identification unit which identifies a positive example set including an expression matching the generated analysis viewpoint in the target data, a characteristic quantity calculation unit which calculates a characteristic quantity showing a degree of characterizing the positive example set of expressions in the target data, and a characteristic expression ranking unit which extracts expressions having the calculated characteristic quantity equal to or greater than a predetermined threshold as characteristic expressions and ranks the extracted characteristic expressions, and the target search unit extracts the analysis viewpoint among which a difference in ranks provided for the characteristic expressions is equal to or greater than a predetermined threshold.
摘要:
A text mining apparatus, a text mining method, and a program are provided that enable the influence that computer processing errors have on mining results to be reduced during text mining performed on a plurality of text data pieces including a text data piece generated by computer processing. A text mining apparatus 1 to be used includes an inherent portion extraction unit 6 that, for each of a plurality of text data pieces including a text data piece generated by computer processing, extracts an inherent portion of the text data piece relative to another of the text data pieces, an inherent confidence setting unit 7 that, for each inherent portion of each of the text data pieces, sets inherent confidence indicating confidence of the inherent portion, using the confidence that has been set for each of the text data pieces, and a mining processing unit 8 that performs text mining on each inherent portion of each of the text data pieces, using the inherent confidence.
摘要:
A text processing apparatus is provided with a segment determination unit 36 and a descriptive content determination unit 33. The segment determination unit 36 determines, with respect to a homogeneous segment that is similar to segments constituting a first text which is set as an analysis target (analysis target text) and that is included in another first text, whether the content thereof is included in a second text. The descriptive content determination unit 33 determines whether each segment constituting the analysis target text should be described in a corresponding second text, based on the determination result.