摘要:
Methods in computer-readable media for searching a large volume of documents is provided. In embodiments, the plurality of related documents are consolidated by a web host into a synthetic search document. The synthetic search document includes a set of descriptive information for each web page consolidated into the synthetic search document. Each set of descriptive information is associated with a subpart identifier that includes information that allows a search engine to provide a link to navigate to an individual document. Web pages consolidated into a synthetic search document may be edited to include an indication that that web page is not to be individually searched or indexed by a search engine. Similarly, the synthetic search document may be designated as a synthetic search document by information included on it.
摘要:
A politeness manager estimates traffic to the sites based on historical log data generated and sent by plug-ins or toolbars on client web browsers. The historical log data details dates and times the web browsers visit different web sites that is used to understand what timeframes specific web sites are busy and what timeframes the web sites are not busy. Crawl rates for different timeframes for a web site are determined based on the historical log data, and web crawlers are scheduled to crawl the web site according to the crawl rates to minimize the chances that web crawler requests are responsible for the site crashing.
摘要:
Embodiments of our technology provide a method, system, and media for presenting relevant information incident to attempting to present information that is unavailable by way of a website. One embodiment of the method includes receiving a request to present a desired web page, determining that the desired web page is unavailable for presentation, determining search criteria associated with the request, dynamically generating a second web page that includes search results that were obtained based on the search criteria, and presenting the second web page on a display device.
摘要:
A client application installed on end user computers generates metadata from the content of web pages visited by end users and provides the metadata to a search engine. When an end user visits a web page, the end user's computer downloads and displays the web page to the end user. The client application may simultaneously access the web page content and generate this metadata in the form of a content signature of the web page from the web page content. The client application then provides the content signature to a search engine. The search engine may employ content signatures to identify new web pages to crawl and index. Additionally, the search engine may employ content signatures to identify changes to web pages and determine the crawl frequency of web pages.
摘要:
A politeness manager estimates traffic to the sites based on historical log data generated and sent by plug-ins or toolbars on client web browsers. The historical log data details dates and times the web browsers visit different web sites that is used to understand what timeframes specific web sites are busy and what timeframes the web sites are not busy. Crawl rates for different timeframes for a web site are determined based on the historical log data, and web crawlers are scheduled to crawl the web site according to the crawl rates to minimize the chances that web crawler requests are responsible for the site crashing.
摘要:
A client application installed on end user computers generates metadata from the content of web pages visited by end users and provides the metadata to a search engine. When an end user visits a web page, the end user's computer downloads and displays the web page to the end user. The client application may simultaneously access the web page content and generate this metadata in the form of a content signature of the web page from the web page content. The client application then provides the content signature to a search engine. The search engine may employ content signatures to identify new web pages to crawl and index. Additionally, the search engine may employ content signatures to identify changes to web pages and determine the crawl frequency of web pages.
摘要:
A system and related techniques generate a survey to capture user feedback about the quality of search results, in a continuous context with the user's Web page or other search activity. According to embodiments, a survey frame inviting the user to undertake a set of search questions may be presented within the same set of page frames which display the search results, so that the user may choose to answer the survey while still viewing their search results, or selected Web sites or other hits. According to further embodiments, rather than being presented within the frame structure of a page, the survey may be presented from within a browser toolbar extension, side-by-side or otherwise arranged within the environment of the user's search activity. Unlike other feedback gathering platforms which may force the user to navigate to a new page to view and respond to questions, or transmit email questionnaires after the fact, according to the invention in one regard the user may be prompted into a dialogue to supply feedback about their search experience, while still within the contextual workflow of that experience, and still being able to view or review results or content which they have received. User distraction is therefore minimized while feedback quality may be improved. The user feedback which rates the quality or accuracy of the search results or search experience may in embodiments be stored and used to train search intelligence, or for other purposes.
摘要:
Embodiments of our technology provide a method, system, and media for presenting relevant information incident to attempting to present information that is unavailable by way of a website. One embodiment of the method includes receiving a request to present a desired web page, determining that the desired web page is unavailable for presentation, determining search criteria associated with the request, dynamically generating a second web page that includes search results that were obtained based on the search criteria, and presenting the second web page on a display device.