摘要:
A system that identifies a person associated with a document is provided. The system retrieves a name associated with a document and reduces the name to a canonical form. The system then compares the canonical form of the name to the canonical form of the names of known persons. If a match is not found, then the system indicates that the person whose name is associated with the document is a previously unknown person. If a match is found, then the system compares attributes of the document with attributes of documents associated with the matching known person. If those attributes are similar, then the system indicates that the person whose name is associated with the document is the matching known person. Otherwise, the system indicates that the person whose name is associated with the document is a previously unknown person.
摘要:
A method and system for calculating the significance of a sentence within a document is provided. The summarization system calculates the significance of the sentences of a document and selects the most significant sentences as the summary of the document. The summarization system calculates the significance of a sentence based on the “important” words of the document that are contained within the sentence. The summarization system calculates the importance of words of the document using various scoring techniques and then combines the scores to classify a word as important or not important. The summarization system can then be used to identify significant sentences of the document based on the important words that a sentence contains and select significant sentences as a summary of the document.
摘要:
A clustering architecture that dynamically groups the search result documents into clusters labeled by phrases extracted from the search result snippets. Documents related to the same topic usually share a common vocabulary. The words are first clustered based on their co-occurrences and each cluster forms a potentially interesting topic. Keywords are chosen and then clustered by counting co-occurrences of pairs of keywords. Documents are assigned to relevant topics based on the feature vectors of the clusters.
摘要:
A method and system for classifying display pages based on automatically generated summaries of display pages. A web page classification system uses a web page summarization system to generate summaries of web pages. The summary of a web page may include the sentences of the web page that are most closely related to the primary topic of the web page. The summarization system may combine the benefits of multiple summarization techniques to identify the sentences of a web page that represent the primary topic of the web page. Once the summary is generated, the classification system may apply conventional classification techniques to the summary to classify the web page. The classification system may use conventional classification techniques such as a Naïve Bayesian classifier or a support vector machine to identify the classifications of a web page based on the summary generated by the summarization system.
摘要:
Systems and methods for mining service requests for product support are described. In one aspect, unstructured service requests are converted to one or more structured answer objects. Each structured answer object includes hierarchically structured historic problem diagnosis data. In view of a product problem description, a set of the one or more structured answer data objects is identified. Each structured solution data object in the set includes term(s) and/or phrase(s) related to the product problem description. Historic and hierarchically structured problem diagnosis data from the set is provided to an end-user for product problem diagnosis.
摘要:
A method and system for measuring the similarity of objects based on relationships with objects of the same type and different types and similarities of those objects to other objects is provided. In one embodiment, the similarity system defines intra-type and inter-type similarity functions for each type of object. The similarity system may combine the intra-type and inter-type similarity functions for a certain type into an overall similarity function for that type. After defining the similarity functions, the similarity system collects attribute values for the objects, which may include relationship data between objects of the same type, referred to as intra-type relationships, and relationships between objects of different types, referred to as inter-type relationships. After collecting the attribute values for the objects, the similarity system solves the intra-type and inter-type similarity functions by iteratively calculating the similarities for the objects until the similarities converge on a solution.
摘要:
An implicit links enhancement system and method for search engines that generates implicit links obtained from mining user access logs to facilitate enhanced local searching of web sites and intranets. Embodiments of the implicit links search enhancement system and method includes extracting implicit links by mining users' access patterns and then using a modified link analysis algorithm to re-rank search results obtained from traditional search engines. More specifically, embodiments of the method include extracting implicit links from a user access log, generating an implicit links graph from the extracted implicit links, and computing page rankings using the implicit links graph. The implicit links are extracted from the log using a two-item sequential pattern mining technique. Search results obtained from a search engine are re-ranked based on an implicit links analysis performed using an updated implicit links graph, a modified re-ranking formula, and at least one re-ranking technique.
摘要:
An implicit links enhancement system and method for search engines that generates implicit links obtained from mining user access logs to facilitate enhanced local searching of web sites and intranets. Embodiments of the implicit links search enhancement system and method includes extracting implicit links by mining users' access patterns and then using a modified link analysis algorithm to re-rank search results obtained from traditional search engines. More specifically, embodiments of the method include extracting implicit links from a user access log, generating an implicit links graph from the extracted implicit links, and computing page rankings using the implicit links graph. The implicit links are extracted from the log using a two-item sequential pattern mining technique. Search results obtained from a search engine are re-ranked based on an implicit links analysis performed using an updated implicit links graph, a modified re-ranking formula, and at least one re-ranking technique.
摘要:
An implicit links enhancement system and method for search engines that generates implicit links obtained from mining user access logs to facilitate enhanced local searching of web sites and intranets. The implicit links search enhancement system and method includes extracting implicit links by mining users' access patterns and then using a modified link analysis algorithm to re-rank search results obtained from traditional search engines. More specifically, the implicit links search enhancement method includes extracting implicit links from a user access log, generating an implicit links graph from the extracted implicit links, and computing page rankings using the implicit links graph. The implicit links are extracted from the log using a two-item sequential pattern mining technique. Search results obtained from a search engine are re-ranked based on an implicit links analysis performed using an updated implicit links graph, a modified re-ranking formula, and at least one re-ranking technique.
摘要:
An implicit links enhancement system and method for search engines that generates implicit links obtained from mining user access logs to facilitate enhanced local searching of web sites and intranets. The implicit links search enhancement system and method includes extracting implicit links by mining users' access patterns and then using a modified link analysis algorithm to re-rank search results obtained from traditional search engines. More specifically, the implicit links search enhancement method includes extracting implicit links from a user access log, generating an implicit links graph from the extracted implicit links, and computing page rankings using the implicit links graph. The implicit links are extracted from the log using a two-item sequential pattern mining technique. Search results obtained from a search engine are re-ranked based on an implicit links analysis performed using an updated implicit links graph, a modified re-ranking formula, and at least one re-ranking technique.