-
公开(公告)号:US20080307189A1
公开(公告)日:2008-12-11
申请号:US11811619
申请日:2007-06-11
申请人: Anton Mityagin , Kumar Chellapilla , Denis Charles
发明人: Anton Mityagin , Kumar Chellapilla , Denis Charles
IPC分类号: G06F12/00
CPC分类号: G06F17/30011
摘要: Multiple Bloom filters are generated to partition data between first and second disjoint data sets of elements. Each element in the first data set is assigned to a bucket of a first set of buckets, and each element in the second data set is assigned to a bucket of a second set of buckets. A Bloom filter is generated for each bucket of the first set of buckets. The Bloom filter generated for a bucket indicates that each element assigned to that bucket is part of the first data set, and that each element assigned to a corresponding bucket of the second set of buckets is not part of the first data set. Additionally, a Bloom filter corresponding to a subsequently received element can be determined and used to identify whether that subsequently received element is part of the first data set or the second data set.
摘要翻译: 生成多个Bloom过滤器以在元素的第一和第二不相交数据集之间划分数据。 第一数据集中的每个元素被分配给第一组桶的桶,并且第二数据集中的每个元素被分配给第二组桶的桶。 为第一组存储桶的每个桶生成布隆过滤器。 为桶生成的Bloom过滤器指示分配给该桶的每个元素是第一数据集的一部分,并且分配给第二组桶的相应桶的每个元素不是第一数据集的一部分。 此外,可以确定与随后接收到的元素相对应的布隆式过滤器,并用于识别随后接收的元件是否是第一数据集或第二数据集的一部分。
-
公开(公告)号:US07743013B2
公开(公告)日:2010-06-22
申请号:US11811619
申请日:2007-06-11
申请人: Anton Mityagin , Kumar Chellapilla , Denis Charles
发明人: Anton Mityagin , Kumar Chellapilla , Denis Charles
IPC分类号: G06F17/30
CPC分类号: G06F17/30011
摘要: Multiple Bloom filters are generated to partition data between first and second disjoint data sets of elements. Each element in the first data set is assigned to a bucket of a first set of buckets, and each element in the second data set is assigned to a bucket of a second set of buckets. A Bloom filter is generated for each bucket of the first set of buckets. The Bloom filter generated for a bucket indicates that each element assigned to that bucket is part of the first data set, and that each element assigned to a corresponding bucket of the second set of buckets is not part of the first data set. Additionally, a Bloom filter corresponding to a subsequently received element can be determined and used to identify whether that subsequently received element is part of the first data set or the second data set.
摘要翻译: 生成多个Bloom过滤器以在元素的第一和第二不相交数据集之间划分数据。 第一数据集中的每个元素被分配给第一组桶的桶,并且第二数据集中的每个元素被分配给第二组桶的桶。 为第一组存储桶的每个桶生成布隆过滤器。 为桶生成的Bloom过滤器指示分配给该桶的每个元素是第一数据集的一部分,并且分配给第二组桶的相应桶的每个元素不是第一数据集的一部分。 此外,可以确定与随后接收到的元素相对应的布隆式过滤器,并用于识别随后接收的元件是否是第一数据集或第二数据集的一部分。
-
公开(公告)号:US08768919B2
公开(公告)日:2014-07-01
申请号:US13599543
申请日:2012-08-30
申请人: Kumar Chellapilla , Anton Mityagin , Xuanhui Wang
发明人: Kumar Chellapilla , Anton Mityagin , Xuanhui Wang
CPC分类号: G06F17/30864
摘要: A human or hand-labeled ranking of URL results for a search query is compared against actual click data for the respective query/URL pairs (e.g., which URLs were actually clicked on by users when the URLs were presented to users when the search query was run in the real world). The human ranking or ordering of the URL results (e.g., pre-existing relevance ranking) for the query can then be adjusted, if necessary, based upon the real world click data (e.g., click relevance ranking). The modified pre-existing relevance ranking can be used in providing future search results.
摘要翻译: 将搜索查询的URL结果的人或手标记的排序与相应查询/ URL对的实际点击数据进行比较(例如,当搜索查询为当用户显示URL时,用户实际点击了哪些URL 在现实世界中运行)。 然后,如果需要,可以基于真实世界点击数据(例如,点击相关性排名)来调整查询的URL结果的人类排名或排序(例如,预先存在的相关性排名)。 修改的预先存在的相关性排名可用于提供未来的搜索结果。
-
公开(公告)号:US20110016116A1
公开(公告)日:2011-01-20
申请号:US12893107
申请日:2010-09-29
申请人: Kumar Chellapilla , Anton Mityagin , Xuanhui Wang
发明人: Kumar Chellapilla , Anton Mityagin , Xuanhui Wang
IPC分类号: G06F17/30
CPC分类号: G06F17/30864
摘要: A human or hand-labeled ranking of URL results for a search query is compared against actual click data for the respective query/URL pairs (e.g., which URLs were actually clicked on by users when the URLs were presented to users when the search query was run in the real world). The human ranking or ordering of the URL results (e.g., pre-existing relevance ranking) for the query can then be adjusted, if necessary, based upon the real world click data (e.g., click relevance ranking). The modified pre-existing relevance ranking can be used in providing future search results.
摘要翻译: 将搜索查询的URL结果的人或手标记的排序与相应查询/ URL对的实际点击数据进行比较(例如,当搜索查询为当用户显示URL时,用户实际点击了哪些URL 在现实世界中运行)。 然后,如果需要,可以基于真实世界点击数据(例如,点击相关性排名)来调整查询的URL结果的人类排名或排序(例如,预先存在的相关性排名)。 修改的预先存在的相关性排名可用于提供未来的搜索结果。
-
公开(公告)号:US20090248657A1
公开(公告)日:2009-10-01
申请号:US12056302
申请日:2008-03-27
申请人: Kumar Chellapilla , Anton Mityagin , Xuanhui Wang
发明人: Kumar Chellapilla , Anton Mityagin , Xuanhui Wang
CPC分类号: G06F17/30864
摘要: Mislabeled URLs are identified and corrected based upon a click relevance ranking computed from user data comprising user click information. The click relevance ranking is formed by applying a set of relevance ordering rules to user log data aggregated by query and URL and by mapping the results of the relevance ordering rules into a linear ordering. For a given query, the aggregated user log data comprises a relative total number of impression, a relative total number of clicks received and a rank associated with the query/URL pair at the time of the total number of impressions and total number of clicks received. The click relevance ranking is used to identify and correct mislabeled query/URL pairs of other rankings according to a number of disclosed methods.
摘要翻译: 基于由包括用户点击信息的用户数据计算的点击相关性排名来识别和纠正错误标记的URL。 通过将一组相关性排序规则应用于通过查询和URL聚合的用户日志数据并将相关性排序规则的结果映射为线性排序来形成点击相关性排名。 对于给定的查询,聚合的用户日志数据包括相对总曝光次数,接收的相对总点击次数和与查看/ URL对相关联的排名以及总共接收的点击次数 。 点击相关性排名用于根据所公开的方法的数量来识别和纠正其他排名的错误标记的查询/ URL对。
-
公开(公告)号:US20120323907A1
公开(公告)日:2012-12-20
申请号:US13599543
申请日:2012-08-30
申请人: Kumar Chellapilla , Anton Mityagin , Xuanhui Wang
发明人: Kumar Chellapilla , Anton Mityagin , Xuanhui Wang
IPC分类号: G06F17/30
CPC分类号: G06F17/30864
摘要: A human or hand-labeled ranking of URL results for a search query is compared against actual click data for the respective query/URL pairs (e.g., which URLs were actually clicked on by users when the URLs were presented to users when the search query was run in the real world). The human ranking or ordering of the URL results (e.g., pre-existing relevance ranking) for the query can then be adjusted, if necessary, based upon the real world click data (e.g., click relevance ranking). The modified pre-existing relevance ranking can be used in providing future search results.
摘要翻译: 将搜索查询的URL结果的人或手标记的排序与相应查询/ URL对的实际点击数据进行比较(例如,当搜索查询为当用户显示URL时,用户实际点击了哪些URL 在现实世界中运行)。 然后,如果需要,可以基于真实世界点击数据(例如,点击相关性排名)来调整查询的URL结果的人类排名或排序(例如,预先存在的相关性排名)。 修改的预先存在的相关性排名可用于提供未来的搜索结果。
-
公开(公告)号:US08290945B2
公开(公告)日:2012-10-16
申请号:US12893107
申请日:2010-09-29
申请人: Kumar Chellapilla , Anton Mityagin , Xuanhui Wang
发明人: Kumar Chellapilla , Anton Mityagin , Xuanhui Wang
CPC分类号: G06F17/30864
摘要: A human or hand-labeled ranking of URL results for a search query is compared against actual click data for the respective query/URL pairs (e.g., which URLs were actually clicked on by users when the URLs were presented to users when the search query was run in the real world). The human ranking or ordering of the URL results (e.g., pre-existing relevance ranking) for the query can then be adjusted, if necessary, based upon the real world click data (e.g., click relevance ranking). The modified pre-existing relevance ranking can be used in providing future search results.
摘要翻译: 将搜索查询的URL结果的人或手标记的排序与相应查询/ URL对的实际点击数据进行比较(例如,当搜索查询为当用户显示URL时,用户实际点击了哪些URL 在现实世界中运行)。 然后,如果需要,可以基于真实世界点击数据(例如,点击相关性排名)来调整查询的URL结果的人类排名或排序(例如,预先存在的相关性排名)。 修改的预先存在的相关性排名可用于提供未来的搜索结果。
-
公开(公告)号:US09104960B2
公开(公告)日:2015-08-11
申请号:US13163857
申请日:2011-06-20
CPC分类号: G06N7/005 , G06Q30/0242
摘要: Methods, systems, and computer-storage media having computer-usable instructions embodied thereon for calculating event probabilities are provided. The event may be a click probability. Event probabilities are calculated using a system optimized for runtime model accuracy with an operable learning algorithm. Bin counting techniques are used to calculate event probabilities based on a count of event occurrences and non-event occurrences. Linear parameters, such and counts of clicks and non-clicks, may also be used in the system to allow for runtime adjustments.
摘要翻译: 提供了具有计算机可用指令的方法,系统和计算机存储介质,用于计算事件概率。 事件可能是点击概率。 事件概率是使用针对运行时模型精度优化的系统与可操作的学习算法计算的。 Bin计数技术用于根据事件发生次数和非事件发生次数来计算事件概率。 也可以在系统中使用线性参数,例如点击次数和非点击次数,以允许运行时间调整。
-
公开(公告)号:US09043306B2
公开(公告)日:2015-05-26
申请号:US12861788
申请日:2010-08-23
申请人: Fabrice Canel , Junaid Ahmed , Thomas Francis McElroy , Walter Sun , Kumar Chellapilla , Abhishek Singh , Vishnu Challam
发明人: Fabrice Canel , Junaid Ahmed , Thomas Francis McElroy , Walter Sun , Kumar Chellapilla , Abhishek Singh , Vishnu Challam
IPC分类号: G06F17/30
CPC分类号: G06F17/30864 , G06F17/30109 , G06F17/30336 , G06F17/30867 , G06F17/30899
摘要: A client application installed on end user computers generates metadata from the content of web pages visited by end users and provides the metadata to a search engine. When an end user visits a web page, the end user's computer downloads and displays the web page to the end user. The client application may simultaneously access the web page content and generate this metadata in the form of a content signature of the web page from the web page content. The client application then provides the content signature to a search engine. The search engine may employ content signatures to identify new web pages to crawl and index. Additionally, the search engine may employ content signatures to identify changes to web pages and determine the crawl frequency of web pages.
摘要翻译: 安装在最终用户计算机上的客户端应用程序从最终用户访问的网页的内容生成元数据,并将元数据提供给搜索引擎。 当最终用户访问网页时,最终用户的计算机下载并将该网页显示给最终用户。 客户端应用程序可以同时访问网页内容,并从网页内容以网页的内容签名的形式生成该元数据。 然后,客户应用程序将内容签名提供给搜索引擎。 搜索引擎可以使用内容签名来识别新的网页来爬行和索引。 此外,搜索引擎可以使用内容签名来识别网页的改变并确定网页的爬行频率。
-
公开(公告)号:US08244752B2
公开(公告)日:2012-08-14
申请号:US12106857
申请日:2008-04-21
申请人: Greg Buehrer , Kumar Chellapilla , Jack W. Stokes
发明人: Greg Buehrer , Kumar Chellapilla , Jack W. Stokes
CPC分类号: H04L47/10
摘要: A method for classifying search query traffic can involve receiving a plurality of labeled sample search query traffic and generating a feature set partitioned into human physical limit features and query stream behavioral features. A model can be generated using the plurality of labeled sample search query traffic and the feature set. Search query traffic can be received and the model can be utilized to classify the received search query traffic as generated by a human or automatically generated.
摘要翻译: 用于分类搜索查询流量的方法可以包括接收多个标记的样本搜索查询流量并生成被划分为人体物理限制特征和查询流行为特征的特征集。 可以使用多个标记的样本搜索查询流量和特征集来生成模型。 可以接收搜索查询流量,并且该模型可以用于对由人类生成的或自动生成的接收的搜索查询流量进行分类。
-
-
-
-
-
-
-
-
-