摘要:
A summary system for evaluating summaries of documents and for generating summaries of documents based on normalized probabilities of portions of the documents is provided. A summarization system generates a summary by selecting sentences for the summary based on their normalized probabilities as derived from a document model. An evaluation system evaluates the effectiveness of a summary based on a normalized probability for the summary that is derived from a document model.
摘要:
A summary system for evaluating summaries of documents and for generating summaries of documents based on normalized probabilities of portions of the document. A summarization system generates a summary by selecting sentences for the summary based on their normalized probabilities as derived from a document model. An evaluation system evaluates the effectiveness of a summary based on a normalized probability for the summary that is derived from a document model.
摘要:
A method and system for identifying information about people is provided. The information system identifies groups of people that have relationships based on their relationships to documents or more generally to objects. The information system initially is provided with an indication of which people have which relationships to which documents. The information system then identifies clusters of people based on having a relationship to the same objects. The information system may also identify clusters of related objects associated with a cluster of people. When a user wants to identify information about a person, the user can provide the name of that person to the information system. The information system then can retrieve and display the names of the other people who are in the same cluster as the person.
摘要:
A method and system for prioritizing communications based on classifications of sentences within the communications is provided. A sentence classification system may classify sentences of communications according to various classifications such as “sentence mode.” The sentence classification system trains a sentence classifier using training data and then classifies sentences using the trained sentence classifier. After the sentences of a communication are classified, a document ranking system may generate a rank for the communication based on the classifications of the sentences within the communication. The document ranking system trains a document rank classifier using training data and then calculates the rank of communications using the trained document rank classifier.
摘要:
A method and system for ranking documents of search results based on information richness and diversity of topics. A ranking system determines the information richness of each document within a search result. The ranking system groups documents of a search result based on their relatedness, meaning that they are directed to similar topics. The ranking system ranks the documents to ensure that the highest ranking documents may include at least one document covering each topic, that is, one document from each of the groups. The ranking system selects the document from each group that has the highest information richness of the documents within the group. When the documents are presented to a user in rank order, the user will likely find on the first page of the search result documents that cover a variety of topics, rather than just a single popular topic.
摘要:
A method and system for detecting whether an outgoing communication contains confidential information or other target information is provided. The detection system is provided with a collection of documents that contain confidential information, referred to as “confidential documents.” When the detection system is provided with an outgoing communication, it compares the content of the outgoing communication to the content of the confidential documents. If the outgoing communication contains confidential information, then the detection system may prevent the outgoing communication from being sent outside the organization. The detection system detects confidential information based on the similarity between the content of an outgoing communication and the content of confidential documents that are known to contain confidential information.
摘要:
A clustering architecture that dynamically groups the search result documents into clusters labeled by phrases extracted from the search result snippets. Documents related to the same topic usually share a common vocabulary. The words are first clustered based on their co-occurrences and each cluster forms a potentially interesting topic. Keywords are chosen and then clustered by counting co-occurrences of pairs of keywords. Documents are assigned to relevant topics based on the feature vectors of the clusters.
摘要:
A method and system for identifying information about people is provided. The information system identifies groups of people that have relationships based on their relationships to documents or more generally to objects. The information system initially is provided with an indication of which people have which relationships to which documents. The information system then identifies clusters of people based on having a relationship to the same objects. The information system may also identify clusters of related objects associated with a cluster of people. When a user wants to identify information about a person, the user can provide the name of that person to the information system. The information system then can retrieve and display the names of the other people who are in the same cluster as the person.
摘要:
A method and system for classifying display pages based on automatically generated summaries of display pages. A web page classification system uses a web page summarization system to generate summaries of web pages. The summary of a web page may include the sentences of the web page that are most closely related to the primary topic of the web page. The summarization system may combine the benefits of multiple summarization techniques to identify the sentences of a web page that represent the primary topic of the web page. Once the summary is generated, the classification system may apply conventional classification techniques to the summary to classify the web page. The classification system may use conventional classification techniques such as a Naïve Bayesian classifier or a support vector machine to identify the classifications of a web page based on the summary generated by the summarization system.
摘要:
A method is provided to ubiquitously promote products or services provided by a plurality of advertisers across a plurality of content-based applications. The method may include receiving data from the plurality of advertisers describing the products or services provided by the plurality of advertisers. The received data is stored in an advertisement database where a user may subsequently query the advertisement database to locate data promoting a product or service. Additionally, terms and payment data are extracted from the received advertisement data to efficiently promote the products or services provided by the advertiser across the plurality of content-based applications.