摘要:
Techniques are provided through which “suspicious” web pages may be identified automatically. A “suspicious” web page possesses characteristics that indicate some manipulation to artificially inflate the position of the web page within ranked search results. Web pages may be represented as nodes within a graph. Links between web pages may be represented as directed edges between the nodes. “Snapshots” of the current state of a network of interlinked web pages may be automatically generated at different times. In the time interval between snapshots, the state of the network may change. By comparing an earlier snapshot to a later snapshot, such changes can be identified. Extreme changes, which are deemed to vary significantly from the normal range of expected changes, can be detected automatically. Web pages relative to which these extreme changes have occurred may be marked as suspicious web pages which may merit further investigation or action.
摘要:
Techniques are provided through which “suspicious” web pages may be identified automatically. A “suspicious” web page possesses characteristics that indicate some manipulation to artificially inflate the position of the web page within ranked search results. Web pages may be represented as nodes within a graph. Links between web pages may be represented as directed edges between the nodes. “Snapshots” of the current state of a network of interlinked web pages may be automatically generated at different times. In the time interval between snapshots, the state of the network may change. By comparing an earlier snapshot to a later snapshot, such changes can be identified. Extreme changes, which are deemed to vary significantly from the normal range of expected changes, can be detected automatically. Web pages relative to which these extreme changes have occurred may be marked as suspicious web pages which may merit further investigation or action.
摘要:
According to the approach described herein, an approach is provided for identifying transient links on a Web page by crawling a Web page consecutively after a brief interval and comparing the links from each crawl to identify transient links. The approach ensures that transient links are not crawled and archived, thereby saving resources for crawling valid links leading to useful information
摘要:
Techniques are provided for generating descriptions of matching resources in a manner that takes into account the kind, quality, and relevance of the available sources of information about the matching resources. For example, after the search engine identifies matching resources based on the query terms, the search engine determines the kinds of available sources of information about each matching resource. For each matching resource, based on the kinds of available sources of information about the matching resource, one of a plurality of processes is selected to generate a description for the matching resource. Using the content-sensitive description generation techniques described herein, a single result set may include abstracts that were generated using several different processes, where the difference in process corresponds to a difference in the kind, quality, and relevance of the available sources of information about each matching resource.
摘要:
A computer-implemented approach for organizing input listings from various sources of input listings. Input listings are organized by mapping the input listings to consolidated listing that correspond to the input listings. The mapping of the input listings are based on various techniques such as a Stock Keeping Unit item-listing-to-consolidated-listing matching technique, a name/title item-listing-to-consolidated-listing matching technique, and a model item-listing-to-consolidated-listing matching technique.
摘要:
A computer-implemented approach for organizing input listings from various sources of input listings. Input listings are organized by mapping the input listings to consolidated listing that correspond to the input listings. The mapping of the input listings are based on various techniques such as a Stock Keeping Unit item-listing-to-consolidated-listing matching technique, a name/title item-listing-to-consolidated-listing matching technique, and a model item-listing-to-consolidated-listing matching technique.
摘要:
A computer-implemented approach for organizing input listings from various sources of input listings. Input listings are organized by mapping the input listings to consolidated listing that correspond to the input listings. The mapping of the input listings are based on various techniques such as a Stock Keeping Unit item-listing-to-consolidated-listing matching technique, a name/title item-listing-to-consolidated-listing matching technique, and a model item-listing-to-consolidated-listing matching technique.
摘要:
A computer-implemented approach for organizing input listings from various sources of input listings. Input listings are organized by mapping the input listings to consolidated listing that correspond to the input listings. The mapping of the input listings are based on various techniques such as a Stock Keeping Unit item-listing-to-consolidated-listing matching technique, a name/title item-listing-to-consolidated-listing matching technique, and a model item-listing-to-consolidated-listing matching technique.
摘要:
A system for detecting artificial promotion of a resource, including a search engine operative to index a set incoming links (“inlinks”) which reference the resource, a log module coupled with the search engine and configured to store log data associated with the set of inlinks, a partitioning module coupled with log module and operative to partition the set of inlinks into a plurality of groups of inlinks based on at least one partitioning scheme, a statistics module coupled with the partitioning module and operative to compute a statistic associated with the inlinks within each of the plurality of groups of inlinks, and a computation module coupled with the statistics module and operative to process the computed statistic associated with the inlinks of each of the plurality of groups of inlinks and compute a metric associated with set of inlinks where the metric indicates a level of uniformity of a distribution of values of the respective computed statistics among the plurality of groups of inlinks, and where the search engine places a list of search results, generated in response to a search query, in a pattern based on the metric.
摘要:
A computer-implemented approach for organizing input listings from various sources of input listings. Input listings are organized by mapping the input listings to consolidated listing that correspond to the input listings. The mapping of the input listings are based on various techniques such as a Stock Keeping Unit item-listing-to-consolidated-listing matching technique, a name/title item-listing-to-consolidated-listing matching technique, and a model item-listing-to-consolidated-listing matching technique.