摘要:
A computer-implemented method of extracting key phrases from a document is disclosed comprising the steps of accessing a repository comprising linked subjects, the repository comprising first and second data structures representing the relationship between said subjects using different representation criteria; pruning the first data structure by removing links between subjects based on a further relationship between said subjects in the second data structure; matching phrases in said document to subjects in the pruned first data structure; further pruning the pruned first data structure by removing unmatched subjects that are not linked to matched subjects; determining a ranking for each matched subject; and selecting key phrases using the determined subject rankings. A computer program for implementing the steps of this method when executed on a computer is also disclosed.
摘要:
A method of determining main text in a mark-up document is provided, which comprises determining a length of each paragraph in the mark-up document; and determining one or more main paragraphs of the mark-up document based upon the length of the paragraphs in the mark-up document.
摘要:
A computer-implemented method of extracting key phrases from a document is disclosed comprising the steps of accessing a repository comprising linked subjects, the repository comprising first and second data structures representing the relationship between said subjects using different representation criteria; pruning the first data structure by removing links between subjects based on a further relationship between said subjects in the second data structure; matching phrases in said document to subjects in the pruned first data structure; further pruning the pruned first data structure by removing unmatched subjects that are not linked to matched subjects; determining a ranking for each matched subject; and selecting key phrases using the determined subject rankings. A computer program for implementing the steps of this method when executed on a computer is also disclosed.
摘要:
A method of determining main text in a mark-up document is provided, which comprises determining a length of each paragraph in the mark-up document; and determining one or more main paragraphs of the mark-up document based upon the length of the paragraphs in the mark-up document.
摘要:
An event occurring in a particular geographic region is identified based on disseminated information containing public commentary in the particular geographic region. Attributes that are related to the event are identified, and sentiment words relating to the identified event are extracted from the disseminated information, where the extracted sentiment words are in a local language of the particular geographic region. A sentiment trend visualization is generated that depicts a trend of sentiments of at least a particular one of the identified attributes, wherein the sentiments are based on the sentiment words for at least the particular attribute.
摘要:
An event occurring in a particular geographic region is identified based on disseminated information containing public commentary in the particular geographic region. Attributes that are related to the event are identified, and sentiment words relating to the identified event are extracted from the disseminated information, where the extracted sentiment words are in a local language of the particular geographic region. A sentiment trend visualization is generated that depicts a trend of sentiments of at least a particular one of the identified attributes, wherein the sentiments are based on the sentiment words for at least the particular attribute.
摘要:
Embodiments of the present disclosure may include methods, systems, and machine readable and executable instructions and/or logic. An example method for creating a handwritten character font library can include receiving a set of standard characters to a computing device, and deriving a group of character components from the initial set of characters. A subset of characters is selected from the set of standard characters, the subset collectively including substantially all the group of character components. Handwritten characters corresponding to the subset of characters are received to the computing device, and handwritten character components are extracted from the hand written characters corresponding to the group of character components. A set of handwritten characters is then constructed from the received handwritten characters and/or the handwritten character components.