摘要:
A method and system for detecting whether an outgoing communication contains confidential information or other target information is provided. The detection system is provided with a collection of documents that contain confidential information, referred to as “confidential documents.” When the detection system is provided with an outgoing communication, it compares the content of the outgoing communication to the content of the confidential documents. If the outgoing communication contains confidential information, then the detection system may prevent the outgoing communication from being sent outside the organization. The detection system detects confidential information based on the similarity between the content of an outgoing communication and the content of confidential documents that are known to contain confidential information.
摘要:
A clustering architecture that dynamically groups the search result documents into clusters labeled by phrases extracted from the search result snippets. Documents related to the same topic usually share a common vocabulary. The words are first clustered based on their co-occurrences and each cluster forms a potentially interesting topic. Keywords are chosen and then clustered by counting co-occurrences of pairs of keywords. Documents are assigned to relevant topics based on the feature vectors of the clusters.
摘要:
A method and system for identifying information about people is provided. The information system identifies groups of people that have relationships based on their relationships to documents or more generally to objects. The information system initially is provided with an indication of which people have which relationships to which documents. The information system then identifies clusters of people based on having a relationship to the same objects. The information system may also identify clusters of related objects associated with a cluster of people. When a user wants to identify information about a person, the user can provide the name of that person to the information system. The information system then can retrieve and display the names of the other people who are in the same cluster as the person.
摘要:
An implicit links enhancement system and method for search engines that generates implicit links obtained from mining user access logs to facilitate enhanced local searching of web sites and intranets. The implicit links search enhancement system and method includes extracting implicit links by mining users' access patterns and then using a modified link analysis algorithm to re-rank search results obtained from traditional search engines. More specifically, the implicit links search enhancement method includes extracting implicit links from a user access log, generating an implicit links graph from the extracted implicit links, and computing page rankings using the implicit links graph. The implicit links are extracted from the log using a two-item sequential pattern mining technique. Search results obtained from a search engine are re-ranked based on an implicit links analysis performed using an updated implicit links graph, a modified re-ranking formula, and at least one re-ranking technique.
摘要:
A method and system for classifying display pages based on automatically generated summaries of display pages. A web page classification system uses a web page summarization system to generate summaries of web pages. The summary of a web page may include the sentences of the web page that are most closely related to the primary topic of the web page. The summarization system may combine the benefits of multiple summarization techniques to identify the sentences of a web page that represent the primary topic of the web page. Once the summary is generated, the classification system may apply conventional classification techniques to the summary to classify the web page. The classification system may use conventional classification techniques such as a Naïve Bayesian classifier or a support vector machine to identify the classifications of a web page based on the summary generated by the summarization system.
摘要:
A method and system for generating a projection matrix for projecting data from a high dimensional space to a low dimensional space. The system establishes an objective function based on a maximum margin criterion matrix. The system then provides data samples that are in the high dimensional space and have a class. For each data sample, the system incrementally derives leading eigenvectors of the maximum margin criterion matrix based on the derivation of the leading eigenvectors of the last data sample. The derived eigenvectors compose the projection matrix, which can be used to project data samples in a high dimensional space into a low dimensional space.
摘要:
A system for calculating the importance of web pages is provided. The web pages are organized hierarchically into collections. The system calculates the importance of each collection based on inter-collection links from a web page in one collection to a web page in another collection. The system then calculates the importance of web pages in the collections with a high calculated importance based on links between the web pages in those collections using, for example, a conventional page rank algorithm. The system may also calculate the importance of web pages in each collection with a low calculated importance separately based on the links between the web pages in the collection using, for example, a conventional page rank algorithm.
摘要:
A method and system for clustering documents based on generalized sentence patterns of the topics of the documents is provided. A generalized sentence patterns (“GSP”) system identifies a “sentence” that describes the topic of a document. To cluster documents, the GSP system generates a “generalized sentence” form of the sentence that describes the topic of each document. The generalized sentence is an abstraction of the words of the sentence. The GSP system identifies clusters of documents based on the patterns of their generalized sentences. The GSP system clusters documents when the generalized sentence representations of their topics have a similar pattern.
摘要:
An implicit links enhancement system and method for search engines that generates implicit links obtained from mining user access logs to facilitate enhanced local searching of web sites and intranets. The implicit links search enhancement system and method includes extracting implicit links by mining users' access patterns and then using a modified link analysis algorithm to re-rank search results obtained from traditional search engines. More specifically, the implicit links search enhancement method includes extracting implicit links from a user access log, generating an implicit links graph from the extracted implicit links, and computing page rankings using the implicit links graph. The implicit links are extracted from the log using a two-item sequential pattern mining technique. Search results obtained from a search engine are re-ranked based on an implicit links analysis performed using an updated implicit links graph, a modified re-ranking formula, and at least one re-ranking technique.
摘要:
A method and system for detecting whether an outgoing communication contains confidential information or other target information is provided. The detection system is provided with a collection of documents that contain confidential information, referred to as “confidential documents.” When the detection system is provided with an outgoing communication, it compares the content of the outgoing communication to the content of the confidential documents. If the outgoing communication contains confidential information, then the detection system may prevent the outgoing communication from being sent outside the organization. The detection system detects confidential information based on the similarity between the content of an outgoing communication and the content of confidential documents that are known to contain confidential information.