-
公开(公告)号:US08645288B2
公开(公告)日:2014-02-04
申请号:US12959060
申请日:2010-12-02
申请人: Taifeng Wang , Bin Gao , Tie-Yan Liu
发明人: Taifeng Wang , Bin Gao , Tie-Yan Liu
IPC分类号: G06F15/18
CPC分类号: G06F17/30873 , G06F17/30867
摘要: Some implementations provide techniques for selecting web pages for inclusion in an index. For example, some implementations apply regularization to select a subset of the crawled web pages for indexing based on link relationships between the crawled web pages, features extracted from the crawled web pages, and user behavior information determined for at least some of the crawled web pages. Further, in some implementations, the user behavior information may be used to sort a training set of crawled web pages into a plurality of labeled groups. The labeled groups may be represented in a directed graph that indicates relative priorities for being selected for indexing.
摘要翻译: 一些实现提供用于选择包括在索引中的网页的技术。 例如,一些实现应用正则化来基于被爬网的网页之间的链接关系,从被爬网的网页提取的特征以及为至少一些被爬网的网页确定的用户行为信息来选择用于索引的被爬网网页的子集 。 此外,在一些实现中,可以使用用户行为信息来将爬网网页的训练集合分类成多个标记的组。 标记的组可以在有向图中表示,其指示被选择用于索引的相对优先级。
-
公开(公告)号:US20130097011A1
公开(公告)日:2013-04-18
申请号:US13273924
申请日:2011-10-14
申请人: Taifeng Wang , Tie-Yan Liu , Bin Gao , Tao Qin
发明人: Taifeng Wang , Tie-Yan Liu , Bin Gao , Tao Qin
IPC分类号: G06Q30/02
CPC分类号: G06Q30/02
摘要: An advertisement perception predictor may forecast the effectiveness of an online advertisement in a web page by predicting whether the online advertisement may be perceived by a consumer. The advertisement perception predictor may use a perception model that is trained for determining perception probability values of online advertisements. The perception model may be applied to an online advertisement to determine a perception probability value for the online advertisement. The perception probability value may indicate the likelihood that a consumer is likely to view the online advertisement.
摘要翻译: 广告感知预测器可以通过预测在线广告是否可被消费者感知来预测网页中的在线广告的有效性。 广告感知预测器可以使用被训练用于确定在线广告的感知概率值的感知模型。 感知模型可以应用于在线广告以确定在线广告的感知概率值。 感知概率值可以指示消费者可能查看在线广告的可能性。
-
公开(公告)号:US20130173398A1
公开(公告)日:2013-07-04
申请号:US13340195
申请日:2011-12-29
申请人: Taifeng Wang , Tie-Yan Liu
发明人: Taifeng Wang , Tie-Yan Liu
IPC分类号: G06Q30/02
CPC分类号: G06Q30/0256
摘要: Implementations for providing menu-based advertising are disclosed. A search engine front-end determines non-search engine information pages that are relevant to the user input based on user input entered into a search query field on a search page. A suggestion menu is caused to be displayed on a search page. The suggestion menu includes interactive elements that are interactive to cause a client device to retrieve the non-search engine information pages associated with the interactive elements. The interactive elements may be advertisements, and the suggestion menu may also be used to display search query suggestions.
摘要翻译: 公开了提供基于菜单的广告的实现。 搜索引擎前端基于输入到搜索页面上的搜索查询字段中的用户输入来确定与用户输入相关的非搜索引擎信息页面。 导致建议菜单显示在搜索页面上。 建议菜单包括交互式的交互式元素,以使客户端设备检索与交互元素相关联的非搜索引擎信息页面。 交互元素可以是广告,并且建议菜单也可以用于显示搜索查询建议。
-
公开(公告)号:US08229968B2
公开(公告)日:2012-07-24
申请号:US12055777
申请日:2008-03-26
申请人: Taifeng Wang , Tie-Yan Liu , Minghao Liu , Zhi Chen
发明人: Taifeng Wang , Tie-Yan Liu , Minghao Liu , Zhi Chen
CPC分类号: G06F12/0875 , G06F17/30902 , G06F17/30958
摘要: Embodiments for caching and accessing Directed Acyclic Graph (DAG) data to and from a computing device of a DAG distributed execution engine during the processing of an iterative algorithm. In accordance with one embodiment, a method includes processing a first subgraph of the plurality of subgraphs from the distributed storage system in the computing device. The first subgraph being processed with associated input values in the computing device to generate first output values in an iteration. The method further includes storing a second subgraph in a cache of the device. The second subgraph being a duplicate of the first subgraph. Moreover, the method also includes processing the second subgraph with the first output values to generate second output values if the device is to process the first subgraph in each of one or more subsequent iterations.
摘要翻译: 用于在迭代算法的处理期间向DAG分布式执行引擎的计算设备缓存和访问定向非循环图(DAG)数据的实施例。 根据一个实施例,一种方法包括从计算设备中的分布式存储系统处理多个子图的第一子图。 在计算设备中用相关联的输入值处理第一子图,以在迭代中生成第一输出值。 该方法还包括将第二子图存储在设备的高速缓存中。 第二个子图是第一个子图的副本。 此外,该方法还包括用第一输出值处理第二子图以产生第二输出值,如果该设备要在一个或多个后续迭代中的每一个中处理第一子图。
-
公开(公告)号:US09589056B2
公开(公告)日:2017-03-07
申请号:US13080510
申请日:2011-04-05
申请人: Taifeng Wang , Tie-Yan Liu , Xiaodong Fan
发明人: Taifeng Wang , Tie-Yan Liu , Xiaodong Fan
IPC分类号: G06F17/30
CPC分类号: G06F17/30867 , G06F17/30312 , G06F17/3089
摘要: Techniques for determining user information needs and selecting data based on user information needs are described herein. The present disclosure describes extracting topics of interests to users from multiple sources including search log data and social network website, and assigns a budget to each topic to stipulate the quota of data to be selected for each topic. The present disclosure also describes calculating similarities between gathered data and the topics, and selecting top related data with each topic subject to limit of the budget. A search engine may use the techniques described here to select data for its index.
摘要翻译: 本文描述了用于确定用户信息需求和基于用户信息需求选择数据的技术。 本公开内容描述了从多个源(包括搜索日志数据和社交网站)向用户提取兴趣的主题,并且为每个主题分配预算以规定要为每个主题选择的数据的配额。 本公开还描述了计算所收集的数据和主题之间的相似性,并且根据预算的限制来选择与每个主题相关的顶部相关数据。 搜索引擎可以使用这里描述的技术来选择其索引的数据。
-
公开(公告)号:US20120259850A1
公开(公告)日:2012-10-11
申请号:US13083353
申请日:2011-04-08
申请人: Tie-Yan Liu , Taifeng Wang
发明人: Tie-Yan Liu , Taifeng Wang
IPC分类号: G06F17/30
CPC分类号: G06F17/30705 , G06F17/30864
摘要: Efficient search query clustering using tripartite graphs may enable a search engine developer to model information needs of users while expending less computing resources. The efficient clustering of search queries may involve multiple computing devices receiving a subgraph of a multi-partite graph that encompasses search queries, as well as receiving a global center vector table that includes cluster center entries for query clusters. At each computing device, the received global center vector table may be filtered to eliminate one or more cluster center entries that are irrelevant to the search queries. Subsequently, the search queries may be clustered into the query clusters by at least using the filtered global center vector table at each of the computing devices. In some instances, one or more comparisons between search queries and the cluster center entries in the global center vector table during the clustering may be eliminated.
摘要翻译: 使用三方图的有效的搜索查询集群可以使搜索引擎开发人员能够模拟用户的信息需求,同时减少计算资源。 搜索查询的有效聚类可以涉及多个计算设备,其接收包含搜索查询的多分图的子图,以及接收包括用于查询簇的聚类中心条目的全局中心向量表。 在每个计算设备处,可以对接收到的全局中心向量表进行过滤以消除与搜索查询无关的一个或多个聚类中心条目。 随后,搜索查询可以通过至少使用每个计算设备处的经过滤的全局中心向量表来聚集到查询群集中。 在某些情况下,可以消除在聚类期间在全局中心向量表中的搜索查询与群集中心条目之间的一个或多个比较。
-
公开(公告)号:US20110295855A1
公开(公告)日:2011-12-01
申请号:US12790942
申请日:2010-05-31
申请人: Taifeng Wang , Tie-Yan Liu
发明人: Taifeng Wang , Tie-Yan Liu
IPC分类号: G06F17/30
CPC分类号: G06F17/30584
摘要: Systems, methods, and devices for sorting and processing various types of graph data are described herein. Partitioning graph data into master data and associated slave data allows for sorting of the graph data by sorting the master data. In another embodiment, promoting a data bucket having a first data bucket size to a data bucket having a second data bucket size greater than the first data bucket size upon reaching a memory limit allows for the reduction of temporary files output by the data bucket.
摘要翻译: 这里描述了用于排序和处理各种类型的图形数据的系统,方法和装置。 将图形数据分割为主数据和关联的从属数据允许通过排序主数据对图形数据进行排序。 在另一个实施例中,在达到存储器限制时,将具有第一数据桶大小的数据桶推送到具有大于第一数据桶大小的第二数据桶大小的数据桶允许减少由数据桶输出的临时文件。
-
公开(公告)号:US08423547B2
公开(公告)日:2013-04-16
申请号:US13083353
申请日:2011-04-08
申请人: Tie-Yan Liu , Taifeng Wang
发明人: Tie-Yan Liu , Taifeng Wang
IPC分类号: G06F17/30
CPC分类号: G06F17/30705 , G06F17/30864
摘要: Efficient search query clustering using tripartite graphs may enable a search engine developer to model information needs of users while expending less computing resources. The efficient clustering of search queries may involve multiple computing devices receiving a subgraph of a multi-partite graph that encompasses search queries, as well as receiving a global center vector table that includes cluster center entries for query clusters. At each computing device, the received global center vector table may be filtered to eliminate one or more cluster center entries that are irrelevant to the search queries. Subsequently, the search queries may be clustered into the query clusters by at least using the filtered global center vector table at each of the computing devices. In some instances, one or more comparisons between search queries and the cluster center entries in the global center vector table during the clustering may be eliminated.
摘要翻译: 使用三方图的有效的搜索查询集群可以使搜索引擎开发人员能够模拟用户的信息需求,同时减少计算资源。 搜索查询的有效聚类可以涉及多个计算设备,其接收包含搜索查询的多分图的子图,以及接收包括用于查询簇的聚类中心条目的全局中心向量表。 在每个计算设备处,可以对接收到的全局中心向量表进行过滤以消除与搜索查询无关的一个或多个聚类中心条目。 随后,搜索查询可以通过至少使用每个计算设备处的经过滤的全局中心向量表来聚集到查询群集中。 在某些情况下,可以消除在聚类期间在全局中心向量表中的搜索查询与群集中心条目之间的一个或多个比较。
-
公开(公告)号:US20130091013A1
公开(公告)日:2013-04-11
申请号:US13268078
申请日:2011-10-07
申请人: Taifeng Wang , Tie-Yan Liu
发明人: Taifeng Wang , Tie-Yan Liu
IPC分类号: G06Q30/02
CPC分类号: G06Q30/0241
摘要: Techniques for providing targeted social advertisements in a social network are described. A targeted social advertisement application detects a commercial intent of a user and retrieves input from friends in the social network. In an implementation, a user interface includes a pane to display a comment with the commercial intent submitted by the user in the social network, the commercial intent being detected for a potential product. The user interface also includes a voting pane to display a plurality of candidate products targeted towards the commercial intent of the user for the potential product. One or more command buttons are on the voting pane to prompt voting as recommendations for the plurality of candidate products from friends of the user.
摘要翻译: 描述了在社交网络中提供目标社交广告的技术。 目标社交广告应用程序检测用户的商业意图并从社交网络中的朋友检索输入。 在实现中,用户界面包括用于在社交网络中呈现由用户提交的商业意图的评论的窗格,为潜在产品检测到商业意图。 用户界面还包括投票窗格,以显示针对潜在产品的用户的商业意图的多个候选产品。 一个或多个命令按钮位于投票窗格上,以提示投票作为来自用户的朋友的多个候选产品的建议。
-
公开(公告)号:US20120143844A1
公开(公告)日:2012-06-07
申请号:US12958611
申请日:2010-12-02
申请人: Taifeng Wang , Tie-Yan Liu , Bin Gao
发明人: Taifeng Wang , Tie-Yan Liu , Bin Gao
IPC分类号: G06F17/30
CPC分类号: G06F16/951
摘要: Some implementations provide techniques for determining which URLs to select for crawling from a pool of URLs. For example, the selection of URLs for crawling may be made based on maintaining a high coverage of the known URLs and/or high discoverability of the World Wide Web. Some implementations provide a multi-level coverage strategy for crawling selection. Further, some implementations provide techniques for discovering unseen URLs.
摘要翻译: 一些实现提供了用于确定哪些URL被选择用于从URL池中进行爬网的技术。 例如,可以基于保持已知URL的高覆盖率和/或万维网的高可发现性来进行用于爬网的URL的选择。 一些实现提供了用于爬网选择的多级覆盖策略。 此外,一些实现提供用于发现不可见URL的技术。
-
-
-
-
-
-
-
-
-