摘要:
A system for calculating the importance of web pages is provided. The web pages are organized hierarchically into collections. The system calculates the importance of each collection based on inter-collection links from a web page in one collection to a web page in another collection. The system then calculates the importance of web pages in the collections with a high calculated importance based on links between the web pages in those collections using, for example, a conventional page rank algorithm. The system may also calculate the importance of web pages in each collection with a low calculated importance separately based on the links between the web pages in the collection using, for example, a conventional page rank algorithm.
摘要:
A method and system for clustering documents based on generalized sentence patterns of the topics of the documents is provided. A generalized sentence patterns (“GSP”) system identifies a “sentence” that describes the topic of a document. To cluster documents, the GSP system generates a “generalized sentence” form of the sentence that describes the topic of each document. The generalized sentence is an abstraction of the words of the sentence. The GSP system identifies clusters of documents based on the patterns of their generalized sentences. The GSP system clusters documents when the generalized sentence representations of their topics have a similar pattern.
摘要:
A method and system for detecting whether an outgoing communication contains confidential information or other target information is provided. The detection system is provided with a collection of documents that contain confidential information, referred to as “confidential documents.” When the detection system is provided with an outgoing communication, it compares the content of the outgoing communication to the content of the confidential documents. If the outgoing communication contains confidential information, then the detection system may prevent the outgoing communication from being sent outside the organization. The detection system detects confidential information based on the similarity between the content of an outgoing communication and the content of confidential documents that are known to contain confidential information.
摘要:
A peer-to-peer advertisement platform is provided to ubiquitously promote products or services supplied by advertisers across content-based applications executing on nodes in a peer-to-peer network. The peer-to-peer advertisement platform may include a registration component to register nodes in the peer-to-peer advertising platform, an advertisement submission component to receive advertisement data from the advertisers, and a distribution component to distribute the advertisement data to the nodes registered in the peer-to-peer advertisement platform. The peer-to-peer advertisement platform also includes a money sharing component to reward nodes based on a contribution level assigned to the node. Accordingly, the peer-to-peer advertisement platform stores the advertisement data locally at the plurality of nodes registered in the peer-to-peer advertising platform and shares a portion of the revenue generated from the advertisement data with the nodes registered in the peer-to-peer advertising platform.
摘要:
Representing queries and determining similarity of queries based on an autoregressive integrated moving average (“ARIMA”) model is provided. A query analysis system represents each query by its ARIMA coefficients. The query analysis system may estimate the frequency information for a desired past or future interval based on frequency information for some initial intervals. The query analysis system may also determine the similarity of a pair of queries based on the similarity of their ARIMA coefficients. The query analysis system may use various metrics, such as a correlation metric, to determine the similarity of the ARIMA coefficients.
摘要:
A method and system for calculating the importance of persons based on interpersonal relationships and prioritizing communications based on importance of participants in the communications is provided. A prioritization system identifies relationships between persons and identifies the importance of a person to other persons based on these relationships. After the prioritization system identifies the importance of persons, the prioritization system can prioritize communications based on the importance of the senders or recipients.
摘要:
The described systems, methods and data structures are directed to ranking Web pages with hierarchical considerations. The hierarchical structures and the linking relationships of the World Wide Web are used to provide a page importance ranking for Web searches. The linking relationships are aggregated to a high level node at each of the hierarchical structures. A link graph analysis is performed on the aggregated linking relationships to determine the importance of each node. The importance of each node may be propagated to pages associated with that node. For each page, the importance of that page and the importance of the node associated with the page are used to calculate the page importance ranking.
摘要:
A system for calculating the importance of web pages is provided. The web pages are organized hierarchically into collections. The system calculates the importance of each collection based on inter-collection links from a web page in one collection to a web page in another collection. The system then calculates the importance of web pages in the collections with a high calculated importance based on links between the web pages in those collections using, for example, a conventional page rank algorithm. The system may also calculate the importance of web pages in each collection with a low calculated importance separately based on the links between the web pages in the collection using, for example, a conventional page rank algorithm.
摘要:
Systems and methods for related term suggestion are described. In one aspect, relationships among respective ones of two or more multi-type data objects are identified. The respective ones of the multi-type data objects include at least one object of a first type and at least one object of a second type that is different from the first type. The multi-type data objects are iteratively clustered in view of respective ones of the relationships to generate reinforced clusters.
摘要:
Techniques for analyzing and modeling the frequency of queries are provided by a query analysis system. A query analysis system analyzes frequencies of a query over time to determine whether the query is time-dependent or time-independent. The query analysis system forecasts the frequency of time-dependent queries based on their periodicities. The query analysis system forecasts the frequency of time-independent queries based on causal relationships with other queries. To forecast the frequency of time-independent queries, the query analysis system analyzes the frequency of a query over time to identify significant increases in the frequency, which are referred to as “query events” or “events.” The query analysis system forecasts frequencies of time-independent queries based on queries with events that tend to causally precede events of the query to be forecasted.