摘要:
A method for stabilizing a knowledge graph includes: generating a knowledge graph in which same entities in a semantic relation list between entities provided as an input are represented as a single node based on names and types of the entities; computing, on the knowledge graph, semantic similarities between all potential entity pairs of same entity types by comparing, for each potential entity pair, a type of relation associated with an entity in the entity pair and an opponent entity to the entity; and selecting, based on the semantic similarities, a representative entity from each of semantically similar entity pairs on the knowledge graph and integrating an opponent entity to the representative entity into the representative entity. The method further includes computing relation weighted values between the entities by using a graph analysis and statistic information, and adding the weighted values to the knowledge graph.
摘要:
A method for stabilizing a knowledge graph includes: generating a knowledge graph in which same entities in a semantic relation list between entities provided as an input are represented as a single node based on names and types of the entities; computing, on the knowledge graph, semantic similarities between all potential entity pairs of same entity types by comparing, for each potential entity pair, a type of relation associated with an entity in the entity pair and an opponent entity to the entity; and selecting, based on the semantic similarities, a representative entity from each of semantically similar entity pairs on the knowledge graph and integrating an opponent entity to the representative entity into the representative entity. The method further includes computing relation weighted values between the entities by using a graph analysis and statistic information, and adding the weighted values to the knowledge graph.
摘要:
An electronic document processing apparatus includes: a document set storage unit storing hash tables including hash values of documents to be processed; a content extraction unit for extracting body contents from a newly input electronic document; and a sentence separation unit for separating sentences from the extracted body contents. The apparatus further includes a duplicate document determination unit for converting the separated sentences into unique hash values by a hash algorithm, determining each of the separated checking if there is a duplicate sentence depending on whether or not there is a collision between the converted hash values and the hash values in the hash tables of the document set storage unit, and determining if the electronic document is a duplicate document based on the ratio of duplicate sentences to all of the sentences in the electronic document.
摘要:
A personalized search apparatus includes: a model generating unit for generating a user favorites analysis model based on directory grouping information about directories stored in a user terminal and user behavior information; and a user favorites analysis model DB for storing the generated user favorites analysis model. Further, the personalized search apparatus includes a search engine for searching for a file relevant to an input query using an information search engine installed in the user terminal to generate search results; and a personalized search engine for re-ranking the search results generated by the search engine based on the user favorites analysis model to generate personalized search results.
摘要:
An apparatus for verifying training data using machine learning includes: a training data separation unit for separating provided initial training data into N training data and N verification data, where N is a natural number; a machine learning unit for performing machine learning on the separated training data to generate a training model; an automatic tagging unit for automatically tagging an original text of the verification data using the generated training model to provide automatic tagging results; and an error determination unit for comparing the verification data to the automatic tagging results to determine error candidates of the training data.
摘要:
An apparatus for verifying training data using machine learning includes: a training data separation unit for separating provided initial training data into N training data and N verification data, where N is a natural number; a machine learning unit for performing machine learning on the separated training data to generate a training model; an automatic tagging unit for automatically tagging an original text of the verification data using the generated training model to provide automatic tagging results; and an error determination unit for comparing the verification data to the automatic tagging results to determine error candidates of the training data.
摘要:
A question type and domain identifying apparatus includes: a question type identifier for recognizing the number of words of a user's question to identify whether the user's question is a query for performing information searching or a question for performing a question and answer (Q&A); a question domain distributor for distributing one of plural preset domain specialized Q&A engines, as a Q&A engine of the user's question based on the recognized word number; and a Q&A engine block, including the domain specialized Q&A engines, for selectively performing information searching or a Q&A with respect to the user's question in response to the distribution of the question domain distributor.
摘要:
A question type and domain identifying apparatus includes: a question type identifier for recognizing the number of words of a user's question to identify whether the user's question is a query for performing information searching or a question for performing a question and answer (Q&A); a question domain distributor for distributing one of plural preset domain specialized Q&A engines, as a Q&A engine of the user's question based on the recognized word number; and a Q&A engine block, including the domain specialized Q&A engines, for selectively performing information searching or a Q&A with respect to the user's question in response to the distribution of the question domain distributor.
摘要:
A method for automatically extracting information of products, includes searching documents based on product names; and extracting sentences including advantages and disadvantages for products having the product names from the searched documents. Further, the method for automatically extracting the information of the products includes classifying the sentences by similar contents among the extracted sentences; selecting representative sentences among the classified sentences; and calculating each weight of the selected representative sentences.
摘要:
The present invention relates to a method and device for generating an ontology instance that classifies documents into structured documents and unstructured documents and automatically generates ontology instances. The method includes collecting documents corresponding to classes of an ontology from Web; if the collected documents are unstructured documents, extracting inter-entity relationship information from the unstructured documents; if the collected documents are structured documents, extracting inter-entity relationship information from the structured documents; generating ontology instances from the extracted inter-entity relationship information, and mapping the generated ontology instances to corresponding classes of the ontology.