摘要:
A method and system for ranking objects based on relationships with objects of a different object type is provided. The ranking system defines an equation for each attribute of each type of object. The equations define the attribute values and are based on relationships between the attribute and the attributes associated with the same type of object and different types of objects. The ranking system iteratively calculates the attribute values for the objects using the equations until the attribute values converge on a solution. The ranking system then ranks objects based on attribute values.
摘要:
The invention provides a method of interactively crawling data records on a web page. Users may select various data records of interest on a web page to generate templates to search for similar data items on the same web page or on different web pages. A tree matching algorithm may be used to compare and extract data matching the generated template.
摘要:
A method and system for ranking messages of discussion threads based on relationships between messages and authors is provided. The ranking system defines an equation for attributes of a message and an author. The equations define the attribute values and are based on relationships between the attribute and the attributes associated with the same type of object, and different types of objects. The ranking system iteratively calculates the attribute values for the objects using the equations until the attribute values converge on a solution. The ranking system then ranks the messages based on attribute values.
摘要:
A system that identifies a person associated with a document is provided. The system retrieves a name associated with a document and reduces the name to a canonical form. The system then compares the canonical form of the name to the canonical form of the names of known persons. If a match is not found, then the system indicates that the person whose name is associated with the document is a previously unknown person. If a match is found, then the system compares attributes of the document with attributes of documents associated with the matching known person. If those attributes are similar, then the system indicates that the person whose name is associated with the document is the matching known person. Otherwise, the system indicates that the person whose name is associated with the document is a previously unknown person.
摘要:
A method and system for calculating the significance of a sentence within a document is provided. The summarization system calculates the significance of the sentences of a document and selects the most significant sentences as the summary of the document. The summarization system calculates the significance of a sentence based on the “important” words of the document that are contained within the sentence. The summarization system calculates the importance of words of the document using various scoring techniques and then combines the scores to classify a word as important or not important. The summarization system can then be used to identify significant sentences of the document based on the important words that a sentence contains and select significant sentences as a summary of the document.
摘要:
A clustering architecture that dynamically groups the search result documents into clusters labeled by phrases extracted from the search result snippets. Documents related to the same topic usually share a common vocabulary. The words are first clustered based on their co-occurrences and each cluster forms a potentially interesting topic. Keywords are chosen and then clustered by counting co-occurrences of pairs of keywords. Documents are assigned to relevant topics based on the feature vectors of the clusters.
摘要:
A method and system for classifying display pages based on automatically generated summaries of display pages. A web page classification system uses a web page summarization system to generate summaries of web pages. The summary of a web page may include the sentences of the web page that are most closely related to the primary topic of the web page. The summarization system may combine the benefits of multiple summarization techniques to identify the sentences of a web page that represent the primary topic of the web page. Once the summary is generated, the classification system may apply conventional classification techniques to the summary to classify the web page. The classification system may use conventional classification techniques such as a Naïve Bayesian classifier or a support vector machine to identify the classifications of a web page based on the summary generated by the summarization system.
摘要:
Systems and methods for mining service requests for product support are described. In one aspect, unstructured service requests are converted to one or more structured answer objects. Each structured answer object includes hierarchically structured historic problem diagnosis data. In view of a product problem description, a set of the one or more structured answer data objects is identified. Each structured solution data object in the set includes term(s) and/or phrase(s) related to the product problem description. Historic and hierarchically structured problem diagnosis data from the set is provided to an end-user for product problem diagnosis.
摘要:
Methods for determining a predictive rating are disclosed. In an embodiment, an active user is compared to a set of clusters. One or more of the clusters are determined to be most similar to the active user. From the one or more clusters, K users are determined to be most similar to the active user. Prior ratings for an item by the K users may be used to predict a rating for the item for the active user.
摘要:
Computer-readable media having computer-executable instructions and apparatuses provide a keyphrase navigation map (KNM) for a document page. Keyphrases are extracted from the document page. Keyphrase clusters are subsequently formed by a measure of relevancy, and a salient keyphrase is determined for each cluster. A thumbnail is formed with tags corresponding to the salient keyphrases. A selected tag is expanded with associated keyphrases. An associated keyphrase may be further selected in order to facilitate the navigation of the document page. The displayed tags on the thumbnail are positioned in accordance with locations of associated keyphrases in the document page.