摘要:
A computing device is described herein that is configured to select a subset of keywords from a plurality of keywords based at least on measures of competition associated with the keywords and to suggest the selected subset for bidding. The plurality of keywords is relevant to at least one advertising target. The computing device calculates a measure of competition for a respective keyword based on a number of bidders for the respective keyword and on a number of available advertisement slots in search results provided responsive to queries for the respective keyword.
摘要:
Method for creating a graph representing web browsing behavior, including receiving web browsing behavior data from one or more web browsers; adding a node on the graph for each web page listed in the web browsing behavior data; adding a first link connecting two or more nodes on the graph, wherein the first link representing a hyperlink for accessing a webpage; calculating an amount of time in which each web page is being accessed; determining a number of units of time in the calculated amount of time; adding one or more virtual nodes to the graph based on the number of units of time; and adding a second link connecting two or more virtual nodes on the graph, wherein the second link representing a virtual hyperlink for accessing a webpage.
摘要:
Many search engines attempt to understand and predict a user's search intent after the submission of search queries. Predicting search intent allows search engines to tailor search results to particular information needs of the user. Unfortunately, current techniques passively predict search intent after a query is submitted. Accordingly, one or more systems and/or techniques for actively predicting search intent from user browsing behavior data are disclosed herein. For example, search patterns of a user browsing a web page and shortly thereafter performing a query may be extracted from user browsing behavior. Queries within the search patterns may be ranked based upon a search trigger likelihood that content of the web page motivated the user to perform the query. In this way, query suggestions having a high search trigger likelihood and a diverse range of topics may be generated and/or presented to users of the web page.
摘要:
An anti-spam tool works with a web browser to detect spam webpages locally on a client machine. The anti-spam tool can be implemented either as a plug-in module or an integral part of the browser, and manifested as a toolbar. The tool can perform an anti-spam action whenever a webpage is accessed through the browser, and does not require direct involvement of a search engine. A spam detection module installed on the computing device determines whether a webpage being accessed or whether a link contained in the webpage being accessed is spam, by comparing the URL of the webpage or the link with a spam list. The spam list can be downloaded from a remote search engine server, stored locally and updated from time to time. A two-level indexing technique is also introduced to improve the efficiency of the anti-spam tool's use of the spam list.
摘要:
The page ranking technique described herein employs a Markov Skeleton Mirror Process (MSMP), which is a particular case of Markov Skeleton Processes, to model and calculate page importance scores. Given a web graph and its metadata, the technique builds an MSMP model on the web graph. It first estimates the stationary distribution of a EMC and views it as transition probability. It next computes the mean staying time using the metadata. Finally, it calculates the product of transition probability and mean staying time, which is actually the stationary distribution of MSMP. This is regarded as page importance.
摘要:
An anti-spam technique for protecting search engine ranking is based on mining search engine optimization (SEO) forums. The anti-spam technique collects webpages such as SEO forum posts from a list of suspect spam websites, and extracts suspicious link exchange URLs and corresponding link formation from the collected webpages. A search engine ranking penalty is then applied to the suspicious link exchange URLs. The penalty is at least partially determined by the link information associated with the respective suspicious link exchange URL. To detect more suspicious link exchange URLs, the technique may propagate one or more levels from a seed set of suspicious link exchange URLs generated by mining SEO forums.
摘要:
This application describes a system and method for estimating user intent towards categories of content. The estimation of user intent may be based at least in part on a score for prior user actions and a decay function that is applied to that score to provide an estimate of current user intent. The estimate represents current user intent for time periods in which user actions towards a category of content are negligible or non-existent.
摘要:
Some implementations provide techniques for determining which URLs to select for crawling from a pool of URLs. For example, the selection of URLs for crawling may be made based on maintaining a high coverage of the known URLs and/or high discoverability of the World Wide Web. Some implementations provide a multi-level coverage strategy for crawling selection. Further, some implementations provide techniques for discovering unseen URLs.
摘要:
Method for determining a webpage importance, including receiving web browsing behavior data of one or more users; creating a model of the web browsing behavior data; calculating a stationary probability distribution of the model; and correlating the stationary probability distribution to the webpage importance.
摘要:
A calculate importance system calculates the global importance of a web page based on a “mean hitting time.” Hitting time of a target web page is a measure of the minimum number of transitions needed to land on the target web page. Mean hitting time of a target web page is an average number of such transitions for all possible starting web pages. The calculate importance system calculates a global importance score for a web page based on the reciprocal of a mean hitting time. A search engine may rank web pages of a search result based on a combination of relevance of the web pages to the search request and global importance of the web pages based on a global hitting time.