-
公开(公告)号:US20110078131A1
公开(公告)日:2011-03-31
申请号:US12569978
申请日:2009-09-30
申请人: Ji-Rong Wen , Yu Chen , Guomao Xin , Yunxiao Ma , Yi Liu , Zhicheng Dou , Qing Yu , Shuming Shi
发明人: Ji-Rong Wen , Yu Chen , Guomao Xin , Yunxiao Ma , Yi Liu , Zhicheng Dou , Qing Yu , Shuming Shi
CPC分类号: G06F16/951
摘要: Described is the running of search-related experiments on a full (or partial) offline snapshot copy of the search engine documents of an actual production system. A snapshot experimentation subsystem runs experimental code related to web searches on the offline data, including to run experimental index building code to build an experimental index (e.g., to test a new document feature), and/or to run experimental search-related code, such as to rank search results according to experimental ranking code, to implement an experimental search strategy, and/or to generate experimental captions.
摘要翻译: 描述了对实际生产系统的搜索引擎文档的完整(或部分)离线快照副本的搜索相关实验的运行。 快照实验子系统运行与离线数据上的网络搜索相关的实验代码,包括运行实验索引构建代码来构建实验索引(例如,测试新文档特征)和/或运行实验搜索相关代码, 例如根据实验排名代码对搜索结果进行排名,以实现实验搜索策略,和/或生成实验标题。
-
公开(公告)号:US20110137886A1
公开(公告)日:2011-06-09
申请号:US12632821
申请日:2009-12-08
申请人: Ji-Rong Wen , Guomao Xin , Yunxiao Ma , Yu Chen , Qing Yu , Yi Liu , Zhicheng Dou , Shuming Shi
发明人: Ji-Rong Wen , Guomao Xin , Yunxiao Ma , Yu Chen , Qing Yu , Yi Liu , Zhicheng Dou , Shuming Shi
IPC分类号: G06F17/30
CPC分类号: G06F16/951
摘要: Described is a data-centric web search engine technology/architecture, in which document metadata, including offline-extracted metadata, is used as part of a search indexing and ranking pipeline. A web data management component receives crawled documents and extracts document metadata from the documents. An indexing component uses the document metadata to build an index for the documents. A serving component uses the index and the document metadata to serve content, e.g., search results. Also described is the use of query metadata extracted from queries of a query log for use in the pipeline.
摘要翻译: 描述了以数据为中心的网络搜索引擎技术/架构,其中包括离线提取的元数据的文档元数据被用作搜索索引和排序流水线的一部分。 Web数据管理组件接收爬取的文档并从文档中提取文档元数据。 索引组件使用文档元数据构建文档的索引。 服务组件使用索引和文档元数据来提供内容,例如搜索结果。 还描述了使用从查询日志的查询中提取的查询元数据用于流水线。
-
公开(公告)号:US20130086024A1
公开(公告)日:2013-04-04
申请号:US13248894
申请日:2011-09-29
申请人: Yi Liu , Yu Chen , Qing Yu , Ji-Rong Wen
发明人: Yi Liu , Yu Chen , Qing Yu , Ji-Rong Wen
IPC分类号: G06F17/30
CPC分类号: G06F16/951 , G06F16/3338
摘要: Systems, methods, devices, and media are described to facilitate the training and employing of a three-class classifier for post-execution search query reformulation. In some embodiments, the classification is trained through a supervised learning process, based on a training set of queries mined from a query log. Query reformulation candidates are determined for each query in the training set, and searches are performed using each reformulation candidate and the un-reformulated training query. The resulting documents lists are analyzed to determine ranking and topic drift features, and to calculate a quality classification. The features and classification for each reformulation candidate are used to train the classifier in an offline mode. In some embodiments, the classifier is employed in an online mode to dynamically perform query reformulation on user-submitted queries.
摘要翻译: 描述了系统,方法,设备和媒体,以便于训练和采用用于执行后搜索查询重新设计的三类分类器。 在一些实施例中,基于从查询日志挖掘的查询的训练集,通过监督学习过程训练分类。 针对训练集中的每个查询确定查询重写候选,并且使用每个重新配置候选和未重新编排的训练查询执行搜索。 分析结果文件列表以确定排名和主题漂移特征,并计算质量分类。 每个重组候选人的特征和分类用于在离线模式下训练分类器。 在一些实施例中,分类器以在线模式使用以动态地对用户提交的查询进行查询重新配置。
-
公开(公告)号:US08645370B2
公开(公告)日:2014-02-04
申请号:US12972259
申请日:2010-12-17
申请人: Qing Yu , Shuming Shi , Zhiwei Li , Ji-Rong Wen , Wei-Ying Ma
发明人: Qing Yu , Shuming Shi , Zhiwei Li , Ji-Rong Wen , Wei-Ying Ma
IPC分类号: G06F17/30
CPC分类号: G06F17/30864 , G06F17/30265
摘要: A method and system for determining relevance of a document having text and images to a text string is provided. A scoring system identifies image text associated with an image of the document. The scoring system calculates an image score indicating relevance of the image text to the text string. The image score may be used in many applications, such as searching, summary generation, and document classification, image search, and image classification.
摘要翻译: 提供了一种用于确定具有文本和图像的文档与文本串的相关性的方法和系统。 评分系统识别与文档的图像相关联的图像文本。 评分系统计算指示图像文本与文本字符串的相关性的图像分数。 图像分数可以用于许多应用中,例如搜索,汇总生成和文档分类,图像搜索和图像分类。
-
公开(公告)号:US08112421B2
公开(公告)日:2012-02-07
申请号:US11781220
申请日:2007-07-20
申请人: Nan Sun , Qing Yu , Shuming Shi , Ji-Rong Wen
发明人: Nan Sun , Qing Yu , Shuming Shi , Ji-Rong Wen
IPC分类号: G06F17/30
CPC分类号: G06F17/30675
摘要: A learning system for a search ranking function model may include a computer program that iteratively refines the model using new queries and associated documents from an unlabeled training set. The unlabeled training set may include a set of queries for which the associated documents have not been labeled as “relevant” or otherwise labeled. The new queries may be selected based on a similarity to and an accuracy of each neighbor from a labeled training set, such as a labeled validation set. Upon selection, the documents associated with the new queries may be labeled. The new queries and their associated documents may be accumulated into a labeled training set, such as a labeled training set, and a refined model may be learned based on the augmented labeled training set. The model may be iteratively refined until it is determined that the model is adequate.
摘要翻译: 用于搜索排序功能模型的学习系统可以包括使用来自未标记训练集合的新查询和相关联文档迭代地提炼模型的计算机程序。 未标记的训练集可以包括一组查询,其中相关联的文档未被标记为“相关”或以其他方式标记。 可以基于与标记的训练集(例如标记的验证集)的每个邻居的相似性和准确性来选择新的查询。 选择后,与新查询相关联的文档可能被标记。 新查询及其相关联的文档可以被累积到诸如标记的训练集之类的标记训练集中,并且可以基于增强的标记训练集来学习精细模型。 可以迭代地改进该模型,直到确定该模型是足够的。
-
公开(公告)号:US07877384B2
公开(公告)日:2011-01-25
申请号:US11681161
申请日:2007-03-01
申请人: Qing Yu , Shuming Shi , Zhiwei Li , Ji-Rong Wen , Wei-Ying Ma
发明人: Qing Yu , Shuming Shi , Zhiwei Li , Ji-Rong Wen , Wei-Ying Ma
IPC分类号: G06F17/30
CPC分类号: G06F17/30864 , G06F17/30265
摘要: A method and system for determining relevance of a document having text and images to a text string is provided. A scoring system identifies image text associated with an image of the document. The scoring system calculates an image score indicating relevance of the image text to the text string. The image score may be used in many applications, such as searching, summary generation, and document classification, image search, and image classification.
摘要翻译: 提供了一种用于确定具有文本和图像的文档与文本串的相关性的方法和系统。 评分系统识别与文档的图像相关联的图像文本。 评分系统计算指示图像文本与文本字符串的相关性的图像分数。 图像分数可以用于许多应用中,例如搜索,汇总生成和文档分类,图像搜索和图像分类。
-
公开(公告)号:US20090024607A1
公开(公告)日:2009-01-22
申请号:US11781220
申请日:2007-07-20
申请人: Nan Sun , Qing Yu , Shuming Shi , Ji-Rong Wen
发明人: Nan Sun , Qing Yu , Shuming Shi , Ji-Rong Wen
IPC分类号: G06F7/00
CPC分类号: G06F17/30675
摘要: A learning system for a search ranking function model may include a computer program that iteratively refines the model using new queries and associated documents from an unlabeled training set. The unlabeled training set may include a set of queries for which the associated documents have not been labeled as “relevant” or otherwise labeled. The new queries may be selected based on a similarity to and an accuracy of each neighbor from a labeled training set, such as a labeled validation set. Upon selection, the documents associated with the new queries may be labeled. The new queries and their associated documents may be accumulated into a labeled training set, such as a labeled training set, and a refined model may be learned based on the augmented labeled training set. The model may be iteratively refined until it is determined that the model is adequate.
摘要翻译: 用于搜索排序功能模型的学习系统可以包括使用来自未标记训练集合的新查询和相关联文档迭代地提炼模型的计算机程序。 未标记的训练集可以包括一组查询,其中相关联的文档未被标记为“相关”或以其他方式标记。 可以基于与标记的训练集(例如标记的验证集)的每个邻居的相似性和准确性来选择新的查询。 选择后,与新查询相关联的文档可能被标记。 新查询及其相关联的文档可以被累积到诸如标记的训练集之类的标记训练集中,并且可以基于增强的标记训练集来学习精细模型。 可以迭代地改进该模型,直到确定该模型是足够的。
-
公开(公告)号:US20110087660A1
公开(公告)日:2011-04-14
申请号:US12972259
申请日:2010-12-17
申请人: Qing Yu , Shuming Shi , Zhiwei Li , Ji-Rong Wen , Wei-Ying Ma
发明人: Qing Yu , Shuming Shi , Zhiwei Li , Ji-Rong Wen , Wei-Ying Ma
IPC分类号: G06F17/30
CPC分类号: G06F17/30864 , G06F17/30265
摘要: A method and system for determining relevance of a document having text and images to a text string is provided. A scoring system identifies image text associated with an image of the document. The scoring system calculates an image score indicating relevance of the image text to the text string. The image score may be used in many applications, such as searching, summary generation, and document classification, image search, and image classification.
摘要翻译: 提供了一种用于确定具有文本和图像的文档与文本串的相关性的方法和系统。 评分系统识别与文档的图像相关联的图像文本。 评分系统计算指示图像文本与文本字符串的相关性的图像分数。 图像分数可以用于许多应用中,例如搜索,汇总生成和文档分类,图像搜索和图像分类。
-
公开(公告)号:US20080215561A1
公开(公告)日:2008-09-04
申请号:US11681161
申请日:2007-03-01
申请人: Qing Yu , Shuming Shi , Zhiwei Li , Ji-Rong Wen , Wei-Ying Ma
发明人: Qing Yu , Shuming Shi , Zhiwei Li , Ji-Rong Wen , Wei-Ying Ma
IPC分类号: G06F17/30
CPC分类号: G06F17/30864 , G06F17/30265
摘要: A method and system for determining relevance of a document having text and images to a text string is provided. A scoring system identifies image text associated with an image of the document. The scoring system calculates an image score indicating relevance of the image text to the text string. The image score may be used in many applications, such as searching, summary generation, and document classification, image search, and image classification.
摘要翻译: 提供了一种用于确定具有文本和图像的文档与文本串的相关性的方法和系统。 评分系统识别与文档的图像相关联的图像文本。 评分系统计算指示图像文本与文本字符串的相关性的图像分数。 图像分数可以用于许多应用中,例如搜索,汇总生成和文档分类,图像搜索和图像分类。
-
公开(公告)号:US08364816B2
公开(公告)日:2013-01-29
申请号:US11871810
申请日:2007-10-12
申请人: Chuanxiong Guo , Jiahe H. Wang , Qing Yu , Yongguang Zhang , Yunxin Liu
发明人: Chuanxiong Guo , Jiahe H. Wang , Qing Yu , Yongguang Zhang , Yunxin Liu
IPC分类号: G06F15/16
CPC分类号: H04L61/20 , G06F17/30241 , G06F17/3087
摘要: A network address mapping system is described. The network address mapping system can identify a set of Web pages, collects information from the Web pages indicating geographical locations (“geolocations”), and correlate the geolocations with the network addresses from which the identified Web pages are served. The collected information can be weighted based on various factors, such as its relative position in a Web page. The collected information can then be used to identify a geolocation. The network mapping system can deduce geolocations for portions of ranges of network addresses based on the score, and can infer geolocations for other portions based on the deduced geolocations. This mapping can then be stored in a database and provided as a geomapping service. The network address mapping system is able to map network addresses to geographical locations. Thereafter, when a user's client computing device accesses a Web server, the Web server can easily and accurately determine a geographical location by querying the database storing the mapping or a geomapping service.
摘要翻译: 描述网络地址映射系统。 网络地址映射系统可以识别一组网页,从指定地理位置(地理位置)的网页收集信息,并将地理位置与所识别的网页从其提供的网络地址相关联。 所收集的信息可以基于各种因素加权,例如其在网页中的相对位置。 然后可以使用收集的信息来识别地理位置。 网络映射系统可以基于分数推断出部分网络地址范围的地理位置,并且可以基于推导的地理位置来推断其他部分的地理位置。 然后,该映射可以存储在数据库中并作为地理服务提供。 网络地址映射系统能够将网络地址映射到地理位置。 此后,当用户的客户计算设备访问Web服务器时,Web服务器可以通过查询存储映射的数据库或地理位置服务来容易且准确地确定地理位置。
-
-
-
-
-
-
-
-
-