Abstract:
A text visualization system allows a user to ascertain a relationship of viewpoints between categories in clustering of documents including texts of a plurality of categories. Each of a plurality of documents includes texts of each of a plurality of categories. A text and an element text that entails the text among texts included in the plurality of documents are set with respect to each of the categories. A text display displays a plurality of texts of each of one or more categories among the plurality of categories. The text display, in response to receiving a designation of a text of a specific category by a reception unit, extracts, from a plurality of texts of another category, a text that entails an element text of the another category included in a document including an element text that entails the text of the specific category, and displays the extracted text.
Abstract:
An entailment evaluation device includes: a generation unit which generates first information indicating at least the order of occurrence of events of first and second simple sentences included in the hypothesis text and generates second information indicating at least the order of occurrence of events of third and fourth simple sentences included in a target text, the third simple sentence being related to the first simple sentence, the fourth simple sentence being related to the second simple sentence; a calculation unit which obtains a calculation result by comparing, based on the first and second information, the order of occurrence of events of first and second simple sentences and order of occurrence of events of third and fourth simple sentences; and a determination unit which determines, based on at least the calculation result, whether or not the target text entails the hypothesis text.
Abstract:
A similar data search device includes: an inverted index generating unit which determines size ranges of sets of search targets for each of inverted indexes so that the number of sets of search targets is not smaller than a specified number and generates inverted indexes by dividing the sets of search targets according to the determined size ranges; an unnecessary inverted index identifying unit which determines, based on a size of a set of search conditions and a threshold value specified for a similarity between sets, a condition necessary for the similarity to be no smaller than the threshold value, and identifies, as an inverted index unnecessary for searches, any inverted index other than those inverted indexes containing a set whose minimum size value satisfies the condition; and a data search unit which conducts a search on a non-identified inverted index.
Abstract:
A text processing system that is able to appropriately determine textual entailment between sentences with high coverage is provided. The text processing system is configured to execute: processing of extracting a common substructure that is a partial structure of a same type, the partial structure being common to a first sentence and a second sentence and, based on the a structure representing the first sentence and a structure representing the second sentence; processing of extracting at least one of a feature amount representing a dependency relationship between the at least one common substructure in the first and second sentences and a feature amount representing a dependency relationship between the common substructure in the first and second sentences and a substructure different from the common substructure; and processing of determining an entailment relationship between the first sentence and the second sentence by using the extracted feature amount.
Abstract:
A text mining device includes: an analysis unit which acquires, from data including text and one or more attributes including an attribute name and an attribute value and associated with the text, the attributes as analysis viewpoints, analyzes the data using the respective analysis viewpoints to obtain an analysis result from each analysis viewpoint, and generates result vectors of the respective analysis viewpoints; a similarity acquisition unit which acquires a vector similarity between the result vectors of the plural analysis viewpoints; and a recommendation unit which extracts and output a combination of the analysis viewpoints as a recommendation candidate on basis of the vector similarity.
Abstract:
A classification model with a high precision ratio at a high recall ratio is learned. A classification model learning system (100) includes a learning data storage unit (110) and a learning unit (130). The learning data storage unit (110) stores pieces of learning data each of which has been classified as a positive example or a negative example. The learning unit (130) learns, by using the pieces of learning data, a classification model in such a way that a precision ratio of classification by the classification model is made larger under a constraint of a minimum value of a recall ratio of classification by the classification model.
Abstract:
Provided is a text processing system which, when an attribute corresponding to one tabulation axis is set, is capable of generating a text group which will produce non-obvious tabulation results when cross-tabulation is performed using that attribute. At the time of input of respective attribute values of an attribute which corresponds to a tabulation axis in cross tabulation and a document associated with any one of the attribute values of the attribute, text extraction means 71 extracts portion not including the attribute value of the attribute from each text obtained by dividing the document into predetermined units. Group generation means 72 performs entailment recognition between texts on the extracted texts and groups texts having an entailment relation.
Abstract:
A classification dictionary generation apparatus includes: a lower threshold storage unit that stores lower threshold information that determines a lower threshold of dimensional values of a classification dictionary for classifying a category of a document; and a control unit that generates the classification dictionary based on learning data whose category is known, wherein the control unit generates, based on the lower threshold information stored in the lower threshold storage unit, the classification dictionary in which all of the dimensional values are equal to or larger than the lower threshold.
Abstract:
A text visualization system which allows a user to efficiently ascertain a result of clustering of texts is provided. A clustering system (1) includes a representative text display unit (51), a reception unit (55), and an element text display unit (52). The clustering system (1) is accessibly connected to a storage that stores a plurality of texts and information indicating a representative text and an element text that entails the representative text among the plurality of texts. The representative text display unit (51) displays a plurality of representative texts. The reception unit (55) receives a designation of a specific representative text among the plurality of representative texts. The element text display unit (52) extracts, in response to receiving the designation of the specific representative text, an element text that entails the designated specific representative text from the plurality of texts, and displays the extracted element text.
Abstract:
A similar sentence set generation unit 81 groups sentences representing a same concept or event from a set of analysis target sentences, to generate a similar sentence set. A similar sentence set extraction unit 82 extracts, using one or more specific sentence extractors each capable of extracting a specific sentence belonging to a specific classification from the set of analysis target sentences, one or more sentences not extracted by any of the specific sentence extractors from among the sentences belonging to the similar sentence set, as an exclusion similar sentence set.