-
公开(公告)号:US09043306B2
公开(公告)日:2015-05-26
申请号:US12861788
申请日:2010-08-23
申请人: Fabrice Canel , Junaid Ahmed , Thomas Francis McElroy , Walter Sun , Kumar Chellapilla , Abhishek Singh , Vishnu Challam
发明人: Fabrice Canel , Junaid Ahmed , Thomas Francis McElroy , Walter Sun , Kumar Chellapilla , Abhishek Singh , Vishnu Challam
IPC分类号: G06F17/30
CPC分类号: G06F17/30864 , G06F17/30109 , G06F17/30336 , G06F17/30867 , G06F17/30899
摘要: A client application installed on end user computers generates metadata from the content of web pages visited by end users and provides the metadata to a search engine. When an end user visits a web page, the end user's computer downloads and displays the web page to the end user. The client application may simultaneously access the web page content and generate this metadata in the form of a content signature of the web page from the web page content. The client application then provides the content signature to a search engine. The search engine may employ content signatures to identify new web pages to crawl and index. Additionally, the search engine may employ content signatures to identify changes to web pages and determine the crawl frequency of web pages.
摘要翻译: 安装在最终用户计算机上的客户端应用程序从最终用户访问的网页的内容生成元数据,并将元数据提供给搜索引擎。 当最终用户访问网页时,最终用户的计算机下载并将该网页显示给最终用户。 客户端应用程序可以同时访问网页内容,并从网页内容以网页的内容签名的形式生成该元数据。 然后,客户应用程序将内容签名提供给搜索引擎。 搜索引擎可以使用内容签名来识别新的网页来爬行和索引。 此外,搜索引擎可以使用内容签名来识别网页的改变并确定网页的爬行频率。
-
公开(公告)号:US09116990B2
公开(公告)日:2015-08-25
申请号:US12789020
申请日:2010-05-27
申请人: Walter Sun , Thomas Arthur Ledbetter , Vinay Sudhir Deshpande , Yinzhe Yu , Lin Guo , Abhishek Singh , Junaid Ahmed , Jay Kumar Goyal , Jingfeng Li , Brahm Kiran Singh
发明人: Walter Sun , Thomas Arthur Ledbetter , Vinay Sudhir Deshpande , Yinzhe Yu , Lin Guo , Abhishek Singh , Junaid Ahmed , Jay Kumar Goyal , Jingfeng Li , Brahm Kiran Singh
CPC分类号: G06F17/30864 , G06F17/30719 , G06Q10/00
摘要: Methods, systems, and computer-storage media for improving the freshness, or the apparent freshness, of search results are described. In an embodiment, the first portion of search results presented on a search results page are based on responsiveness to the search query and a second portion of results describe only recently published documents that are responsive to the search query. In an embodiment, a more recent version of the document, which is not directly used to determine responsiveness, is used to build the caption for a search result. Another way to make search results appear fresh is to include a publication time within the search result caption. In one embodiment, the publication time is generated by calculating a point in time between when a document is first added to a search index and the previous time the search engine visited the site where the document was found.
摘要翻译: 描述了用于提高搜索结果的新鲜度或表观新鲜度的方法,系统和计算机存储介质。 在一个实施例中,在搜索结果页面上呈现的搜索结果的第一部分基于对搜索查询的响应,并且结果的第二部分仅描述响应于搜索查询的最近发布的文档。 在一个实施例中,使用不直接用于确定响应性的文档的更新版本来构建搜索结果的标题。 使搜索结果显示新鲜的另一种方法是在搜索结果标题中包含发布时间。 在一个实施例中,发布时间是通过计算文档首次添加到搜索索引的时间点与搜索引擎访问该文档所在的站点的之前的时间点来生成的。
-
公开(公告)号:US09026519B2
公开(公告)日:2015-05-05
申请号:US13205809
申请日:2011-08-09
IPC分类号: G06F17/30
CPC分类号: G06F17/30598 , G06F17/30371 , G06F17/3071 , G06F17/30864 , G06F17/30867 , G06F17/30991 , G06F17/30997
摘要: Methods, systems, and media are provided for delivering clustered search results for recent and non-recent events by maintaining the identification (ID) numbers of the respective clustered documents beyond the “fresh” life span of the clustered documents. When clusters are formed according to similar content, an ID number and associated attributes are assigned to each of the clusters. This provides a mechanism to track and retrieve the respective clusters for subsequent delivery of search results. The respective ID numbers of the clusters are maintained, even after the documents are no longer considered “fresh.” These similar-content clusters are further subdivided according to publication date. This provides individual subdivided clusters for similar content events that occurred at different time spans, which are delivered along with individual non-clustered search results in a SERP.
摘要翻译: 提供了方法,系统和媒体,用于通过维护相应聚集文档的识别(ID)数字超出聚集文档的“新鲜”寿命,来传递最近和非近期事件的群集搜索结果。 当根据类似内容形成簇时,将ID号和相关属性分配给每个簇。 这提供了跟踪和检索相应集群以用于随后传送搜索结果的机制。 即使文件不再被认为是“新鲜”,集群的各自的ID号也被维护。这些类似内容的集群根据出版日期进一步细分。 这提供了用于在不同时间跨度发生的类似内容事件的单独的细分簇,它们与SERP中的各个非聚集搜索结果一起传送。
-
公开(公告)号:US08255385B1
公开(公告)日:2012-08-28
申请号:US13053772
申请日:2011-03-22
申请人: Walter Sun , Yipeng Li , Xiao Zhang , Junaid Ahmed
发明人: Walter Sun , Yipeng Li , Xiao Zhang , Junaid Ahmed
CPC分类号: G06F17/3089
摘要: Methods and systems for determining an adaptive crawl rate for a Web crawler based on historical publication data from a Web source are provided. A frequency of publication of the Web source is determined over a specified period of time, and an adaptive crawl rate is calculated using the frequency of publication. The Web crawler is then deployed at the calculated adaptive crawl rate.
摘要翻译: 提供了基于Web源的历史发布数据来确定Web爬虫的自适应爬网速率的方法和系统。 在指定的时间段内确定Web源的发布频率,并使用发布频率计算自适应爬网速率。 然后以计算的自适应爬网率部署Web爬虫。
-
公开(公告)号:US09424356B2
公开(公告)日:2016-08-23
申请号:US12964092
申请日:2010-12-09
申请人: Walter Sun , Junaid Ahmed , Yipeng Li , Peter Bailey , Nikhil Dandekar , Sasi Parthasarathy , Xin Chen , Xiao Zhang
发明人: Walter Sun , Junaid Ahmed , Yipeng Li , Peter Bailey , Nikhil Dandekar , Sasi Parthasarathy , Xin Chen , Xiao Zhang
IPC分类号: G06F17/30
CPC分类号: G06F17/30312 , G06F17/3053 , G06F17/30613 , G06F17/30864 , G06F17/30867 , G06F17/30876
摘要: Methods, systems, and computer-readable media are provided for updating a search index with new uniform resource locators (URLs) and spiking URLs with increased user interest. History data, provided from browser applications residing on users' computers that indicate URLs accessed by the users, is parsed to identify new/previously unknown URLs. The history data also indicates URLs in which there is increased interest based on a number of recent hits as compared to an average number of hits determined over time. Author postings of new URLs to social networking sites and a quality rating of the authors may also be used to identify and filter new URLs. Search indexes are updated with the new and spiking URL data. As such, lag time between posting of new URLs and spiking of URL interest and inclusion of this data in a search index is greatly decreased.
摘要翻译: 提供了方法,系统和计算机可读介质,用于使用新的统一资源定位符(URL)更新搜索索引和增加用户兴趣的加标URL。 由驻留在用户计算机上的用于指示用户访问的URL的浏览器应用程序提供的历史数据将被解析,以识别新的/以前未知的URL。 与根据随时间确定的平均击球次数相比,历史数据还指示基于最近命中数的兴趣增加的URL。 社交网站的新URL的作者发布和作者的品质评级也可用于识别和过滤新的URL。 搜索索引将使用新的和加标的URL数据进行更新。 因此,新的URL发布之间的滞后时间和URL兴趣的尖峰以及将这些数据包含在搜索索引中的时间大大降低。
-
公开(公告)号:US09292607B2
公开(公告)日:2016-03-22
申请号:US13196008
申请日:2011-08-02
申请人: Sasi Parthasarathy , Junaid Ahmed , Walter Sun , Jingfeng Li , Paul Alexander Dow , Yajie Siamwalla
发明人: Sasi Parthasarathy , Junaid Ahmed , Walter Sun , Jingfeng Li , Paul Alexander Dow , Yajie Siamwalla
IPC分类号: G06F17/30
CPC分类号: G06F17/30867 , G06F17/30876
摘要: Methods, systems, and computer-readable media are provided for updating a search index with new uniform resource locators (URLs) and with metadata for new and known URLs. Data associated with communications made by users using a social network is received. The communications include a URL therein that a user has shared, posted, or otherwise communicated to one or more other users using the social network. When the URL is not found in a search index it is identified as a new URL and is added to the search index. A measure of a trending interest, or virality, of the URL is determined from the data. The determined virality is associated with the URL in a search index as metadata. The virality is useable to inform a ranking of the URL against a plurality of other URLs for identification and presentation as a search result in a search engine results page.
摘要翻译: 提供了方法,系统和计算机可读介质,用于使用新的统一资源定位符(URL)更新搜索索引,并为新的和已知的URL使用元数据。 收到与使用社交网络的用户进行通信相关联的数据。 通信包括其中用户已经使用社交网络共享,发布或以其他方式传送给一个或多个其他用户的URL。 当搜索索引中找不到该URL时,将其标识为新的URL,并将其添加到搜索索引中。 从数据中确定URL的趋势兴趣或病毒的度量。 确定的病毒与搜索索引中的URL作为元数据相关联。 所述病毒可用于在搜索引擎结果页面中将用于识别和呈现的URL的排序通知给多个其他URL作为搜索结果。
-
公开(公告)号:US20120150833A1
公开(公告)日:2012-06-14
申请号:US13196008
申请日:2011-08-02
申请人: Sasi Parthasarathy , Junaid Ahmed , Walter Sun , Jingfeng Li , Paul Alexander Dow , Yajie Siamwalla
发明人: Sasi Parthasarathy , Junaid Ahmed , Walter Sun , Jingfeng Li , Paul Alexander Dow , Yajie Siamwalla
CPC分类号: G06F17/30867 , G06F17/30876
摘要: Methods, systems, and computer-readable media are provided for updating a search index with new uniform resource locators (URLs) and with metadata for new and known URLs. Data associated with communications made by users using a social network is received. The communications include a URL therein that a user has shared, posted, or otherwise communicated to one or more other users using the social network. When the URL is not found in a search index it is identified as a new URL and is added to the search index. A measure of a trending interest, or virality, of the URL is determined from the data. The determined virality is associated with the URL in a search index as metadata. The virality is useable to inform a ranking of the URL against a plurality of other URLs for identification and presentation as a search result in a search engine results page.
摘要翻译: 提供了方法,系统和计算机可读介质,用于使用新的统一资源定位符(URL)更新搜索索引,并为新的和已知的URL使用元数据。 收到与使用社交网络的用户进行通信相关联的数据。 通信包括其中用户已经使用社交网络共享,发布或以其他方式传送给一个或多个其他用户的URL。 当搜索索引中找不到该URL时,将其标识为新的URL,并将其添加到搜索索引中。 从数据中确定URL的趋势兴趣或病毒的度量。 确定的病毒与搜索索引中的URL作为元数据相关联。 所述病毒可用于在搜索引擎结果页面中将用于识别和呈现的URL的排序通知给多个其他URL作为搜索结果。
-
公开(公告)号:US08244701B2
公开(公告)日:2012-08-14
申请号:US13169807
申请日:2011-06-27
申请人: Walter Sun , Jay Kumar Goyal , Pratibha Permandla , Yinzhe Yu , Jingfeng Li
发明人: Walter Sun , Jay Kumar Goyal , Pratibha Permandla , Yinzhe Yu , Jingfeng Li
CPC分类号: G06F17/30702 , G06F17/30867
摘要: Systems and methods for applying user behavior data to improve search query result ranking are provided. Upon receiving an update file indicating that recent, significant user behavior data is available for a document associated with an inverted index, the update file is published periodically and frequently to an index server. After filtering out the relevant update information from the update file, the index server extracts identifiers of the documents having the associated user behavior data. The update file and the identifier of the documents are utilized to update an in-memory index containing representations of metadata indicative of the user behavior. The in-memory index is continuously updated and utilized to serve search query results in response to user search queries. Search query results from the in-memory index are ranked using the user behavior data prior to serving. Thus, results associated with recent, significant user-behavior metadata receive prominent placement on the search results page.
摘要翻译: 提供了用于应用用户行为数据以改善搜索查询结果排序的系统和方法。 在接收到指示最近的重要用户行为数据可用于与反向索引相关联的文档的更新文件时,更新文件被周期性地且频繁地发布到索引服务器。 在从更新文件中滤除相关更新信息之后,索引服务器提取具有相关用户行为数据的文档的标识符。 更新文件和文档的标识符用于更新包含表示用户行为的元数据的内存中索引。 存储器内索引不断更新并用于响应于用户搜索查询来提供搜索查询结果。 来自内存中索引的搜索查询结果使用用户行为数据进行排序。 因此,与最近的重要用户行为元数据相关联的结果在搜索结果页面上接收突出的位置。
-
-
-
-
-
-
-