摘要:
A method of identifying location data in a data set comprises generating a data sample from the data set, training a plurality of models with the data sample to identify the location data in the data set, and applying the data set to the trained models to determine the location data within the data set. The plurality of models includes one or more first models to identify primary attributes of the location data indicating a geographical area and one or more second models to identify secondary attributes of the location data used to determine corresponding primary attributes.
摘要:
Relationship extraction between descriptors in one or more lists of weather condition descriptors, and adverse event descriptors within unstructured data sources using natural language processing. Medical condition descriptor may be a descriptor that may be used to further extract relationships between weather condition descriptors and adverse event descriptors. A data object is generated, according to a data model, based on the extracted relationships between the descriptors. A set of candidate unstructured documents containing the extracted relationship between the descriptors is retrieved and filtered by selecting unstructured documents that include a precautionary measure descriptor. The filtered precautionary measure descriptors are presented to a user in a summarized message to a user device.
摘要:
Relationship extraction between descriptors in one or more lists of weather condition descriptors, and adverse event descriptors within unstructured data sources using natural language processing. Medical condition descriptor may be a descriptor that may be used to further extract relationships between weather condition descriptors and adverse event descriptors. A data object is generated, according to a data model, based on the extracted relationships between the descriptors. A set of candidate unstructured documents containing the extracted relationship between the descriptors is retrieved and filtered by selecting unstructured documents that include a precautionary measure descriptor. The filtered precautionary measure descriptors are presented to a user in a summarized message to a user device.
摘要:
A method of identifying location data in a data set comprises generating a data sample from the data set, training a plurality of models with the data sample to identify the location data in the data set, and applying the data set to the trained models to determine the location data within the data set. The plurality of models includes one or more first models to identify primary attributes of the location data indicating a geographical area and one or more second models to identify secondary attributes of the location data used to determine corresponding primary attributes.
摘要:
A method for detecting solutions to a problem using content in online discussion sources. The method includes receiving a request, such request identifying a problem, and searching multiple online discussion sources for content related to the problem. Responsive to finding content related to the problem, the method searches the multiple online discussion sources for a plurality of solutions to the problem. Responsive to finding a plurality of solutions to the problem, the method forms groups containing the solutions from each of the multiple online discussion sources. The method then determines a likeliness to solve the problem for each of the groups and ranks the groups based on the determined likeliness to solve the problem. The method then determines that the rank of at least one group meets a threshold value, wherein the threshold value is based on a confidence in the likeliness to solve the problem.
摘要:
An uninterrupted reading experience can be provided by calculating a vocabulary level for a user in a first language and comparing difficulty levels of words within a document in the first language to the vocabulary level of the user in the first language. Each word of the document having a difficulty level that exceeds the vocabulary level of the user in the first language can be selected.
摘要:
A mechanism is provided for automatically detecting and cleansing erroneous concepts in an aggregated knowledge base. A graph data structure representing the concept present in a portion of the natural language content is generated. The graph data structure is analyzed to determine whether or not the graph data structure comprises one or more concept conflicts in association with a set of nodes in the graph data structure, the one or more concept conflicts are associated with the set of nodes if two or more nodes represent separate and distinct concepts. Responsive to determining that there are one or more concept conflicts due to there being two or more nodes representing separate and distinct concepts, the two or more nodes are split into separate distinct concepts within the knowledge base.
摘要:
A method, computer system, and a computer program product for determining the reliability of a claim is provided. The present invention may include receiving an input data from a user. The present invention may also include analyzing the claim associated with the received input data to determine a reliability score associated with the input data, wherein the claim is semantically similar to the received input data. The present invention may further include generating, from a prediction model, the reliability score for the claim associated with the received input data. The present invention may also include presenting the reliability score for the claim associated with the received input data to the user.
摘要:
A method for detecting solutions to a problem using content in online discussion sources. The method includes receiving a request, such request identifying a problem, and searching multiple online discussion sources for content related to the problem. Responsive to finding content related to the problem, the method searches the multiple online discussion sources for a plurality of solutions to the problem. Responsive to finding a plurality of solutions to the problem, the method forms groups containing the solutions from each of the multiple online discussion sources. The method then determines a likeliness to solve the problem for each of the groups and ranks the groups based on the determined likeliness to solve the problem. The method then determines that the rank of at least one group meets a threshold value, wherein the threshold value is based on a confidence in the likeliness to solve the problem.
摘要:
Merging synonymous entities from multiple structured sources into a dataset includes receiving a first set of paired terms from a first authoritative source for a domain and a second set of paired terms from a second authoritative source for the domain. The first set of paired terms is compared to the second set of paired terms with a similarity assessment based on a clustering statistical algorithm to identify paired terms from the first set of paired terms that share a synonymous term with one or more paired terms from the second set of paired terms. The paired terms associated with the synonymous term are merged and a dataset is generated that associates a normalized version of the synonymous term with any terms included in the merged paired terms.