DISTRIBUTED DATA REORGANIZATION FOR PARALLEL EXECUTION ENGINES
    11.
    发明申请
    DISTRIBUTED DATA REORGANIZATION FOR PARALLEL EXECUTION ENGINES 审中-公开
    用于并行执行机构的分布式数据重组

    公开(公告)号:US20100281078A1

    公开(公告)日:2010-11-04

    申请号:US12433880

    申请日:2009-04-30

    IPC分类号: G06F7/00 G06F17/30 G06F3/048

    CPC分类号: G06F16/217 G06F16/24532

    摘要: A distributed data reorganization system and method for mapping and reducing raw data containing a plurality of data records. Embodiments of the distributed data reorganization system and method operate in a general-purpose parallel execution environment that use an arbitrary communication directed acyclic graph. The vertices of the graph accept multiple data inputs and generate multiple data inputs, and may be of different types. Embodiments of the distributed data reorganization system and method include a plurality of distributed mappers that use a mapping criteria supplied by a developer to map the plurality of data records to data buckets. The mapped data record and data bucket identifications are input for a plurality of distributed reducers. Each distributed reducer groups together data records having the same data bucket identification and then uses a merge logic supplied by the developer to reduce the grouped data records to obtain reorganized data.

    摘要翻译: 一种用于映射和减少包含多个数据记录的原始数据的分布式数据重组系统和方法。 分布式数据重组系统和方法的实施例在使用任意通信有向无环图的通用并行执行环境中操作。 图形的顶点接受多个数据输入并生成多个数据输入,并且可能是不同的类型。 分布式数据重组系统和方法的实施例包括使用由开发者提供的映射标准将多个数据记录映射到数据桶的多个分布式映射器。 为多个分布式减速器输入映射数据记录和数据桶标识。 每个分布式减速器将具有相同数据桶标识的数据记录组合在一起,然后使用由开发人员提供的合并逻辑来减少分组的数据记录以获得重新组织的数据。

    DATA CACHING FOR DISTRIBUTED EXECUTION COMPUTING
    12.
    发明申请
    DATA CACHING FOR DISTRIBUTED EXECUTION COMPUTING 有权
    数据缓存用于分布式执行计算

    公开(公告)号:US20090249004A1

    公开(公告)日:2009-10-01

    申请号:US12055777

    申请日:2008-03-26

    IPC分类号: G06F12/00

    摘要: Embodiments for caching and accessing Directed Acyclic Graph (DAG) data to and from a computing device of a DAG distributed execution engine during the processing of an iterative algorithm. In accordance with one embodiment, a method includes processing a first subgraph of the plurality of subgraphs from the distributed storage system in the computing device. The first subgraph being processed with associated input values in the computing device to generate first output values in an iteration. The method further includes storing a second subgraph in a cache of the device. The second subgraph being a duplicate of the first subgraph. Moreover, the method also includes processing the second subgraph with the first output values to generate second output values if the device is to process the first subgraph in each of one or more subsequent iterations.

    摘要翻译: 用于在迭代算法的处理期间向DAG分布式执行引擎的计算设备缓存和访问定向非循环图(DAG)数据的实施例。 根据一个实施例,一种方法包括从计算设备中的分布式存储系统处理多个子图的第一子图。 在计算设备中用相关联的输入值处理第一子图,以在迭代中生成第一输出值。 该方法还包括将第二子图存储在设备的高速缓存中。 第二个子图是第一个子图的副本。 此外,该方法还包括用第一输出值处理第二子图以产生第二输出值,如果该设备要在一个或多个后续迭代中的每一个中处理第一子图。

    Task-Based Advertisement Delivery
    13.
    发明申请
    Task-Based Advertisement Delivery 审中-公开
    基于任务的广告传送

    公开(公告)号:US20130097027A1

    公开(公告)日:2013-04-18

    申请号:US13272844

    申请日:2011-10-13

    IPC分类号: G06Q30/02

    CPC分类号: G06Q30/02

    摘要: A task guidance tool that displays instructional steps and associated advertisements may facilitate the accomplishment of a task by users who are otherwise unfamiliar with the task. The task guidance tool may be developed from input data mined from various sources. The task guidance tool may display a series of step pages in which each step page include instructions for accomplishing a corresponding step of the task. Further, one or more step pages of the task guidance tool may be provided with selected advertisements that are displayed with the step instructions.

    摘要翻译: 显示教学步骤和相关联广告的任务指导工具可以促进由不熟悉任务的用户完成任务。 任务指导工具可以从从各种来源挖掘的输入数据中开发。 任务指导工具可以显示一系列步骤页面,其中每个步骤页面包括用于完成任务的相应步骤的指令。 此外,可以为任务指导工具的一个或多个步骤页面提供与步骤指令一起显示的所选择的广告。

    User Information Needs Based Data Selection
    14.
    发明申请
    User Information Needs Based Data Selection 有权
    基于用户信息需求的数据选择

    公开(公告)号:US20120259831A1

    公开(公告)日:2012-10-11

    申请号:US13080510

    申请日:2011-04-05

    IPC分类号: G06F7/00 G06F17/30

    摘要: Techniques for determining user information needs and selecting data based on user information needs are described herein. The present disclosure describes extracting topics of interests to users from multiple sources including search log data and social network website, and assigns a budget to each topic to stipulate the quota of data to be selected for each topic. The present disclosure also describes calculating similarities between gathered data and the topics, and selecting top related data with each topic subject to limit of the budget. A search engine may use the techniques described here to select data for its index.

    摘要翻译: 本文描述了用于确定用户信息需求和基于用户信息需求选择数据的技术。 本公开内容描述了从多个源(包括搜索日志数据和社交网站)向用户提取兴趣的主题,并且为每个主题分配预算以规定要为每个主题选择的数据的配额。 本公开还描述了计算所收集的数据和主题之间的相似性,并且根据预算的限制来选择与每个主题相关的顶部相关数据。 搜索引擎可以使用这里描述的技术来选择其索引的数据。

    Graph-processing techniques for a MapReduce engine
    15.
    发明授权
    Graph-processing techniques for a MapReduce engine 有权
    MapReduce引擎的图形处理技术

    公开(公告)号:US08224825B2

    公开(公告)日:2012-07-17

    申请号:US12790942

    申请日:2010-05-31

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30584

    摘要: Systems, methods, and devices for sorting and processing various types of graph data are described herein. Partitioning graph data into master data and associated slave data allows for sorting of the graph data by sorting the master data. In another embodiment, promoting a data bucket having a first data bucket size to a data bucket having a second data bucket size greater than the first data bucket size upon reaching a memory limit allows for the reduction of temporary files output by the data bucket.

    摘要翻译: 这里描述了用于排序和处理各种类型的图形数据的系统,方法和装置。 将图形数据分割为主数据和关联的从属数据允许通过排序主数据对图形数据进行排序。 在另一个实施例中,在达到存储器限制时,将具有第一数据桶大小的数据桶推送到具有大于第一数据桶大小的第二数据桶大小的数据桶允许减少由数据桶输出的临时文件。

    PAGE SELECTION FOR INDEXING
    16.
    发明申请
    PAGE SELECTION FOR INDEXING 有权
    页面选择索引

    公开(公告)号:US20120143792A1

    公开(公告)日:2012-06-07

    申请号:US12959060

    申请日:2010-12-02

    IPC分类号: G06F17/30 G06F15/18

    CPC分类号: G06F17/30873 G06F17/30867

    摘要: Some implementations provide techniques for selecting web pages for inclusion in an index. For example, some implementations apply regularization to select a subset of the crawled web pages for indexing based on link relationships between the crawled web pages, features extracted from the crawled web pages, and user behavior information determined for at least some of the crawled web pages. Further, in some implementations, the user behavior information may be used to sort a training set of crawled web pages into a plurality of labeled groups. The labeled groups may be represented in a directed graph that indicates relative priorities for being selected for indexing.

    摘要翻译: 一些实现提供用于选择包括在索引中的网页的技术。 例如,一些实现应用正则化来基于被爬网的网页之间的链接关系,从被爬网的网页提取的特征以及为至少一些被爬网的网页确定的用户行为信息来选择用于索引的被爬网网页的子集 。 此外,在一些实现中,可以使用用户行为信息来将爬网网页的训练集合分类成多个标记的组。 标记的组可以在有向图中表示,其指示被选择用于索引的相对优先级。

    Semi-Supervised Page Importance Ranking
    17.
    发明申请
    Semi-Supervised Page Importance Ranking 审中-公开
    半监督页面重要性排名

    公开(公告)号:US20110295845A1

    公开(公告)日:2011-12-01

    申请号:US12789278

    申请日:2010-05-27

    IPC分类号: G06F17/30

    CPC分类号: G06F16/951

    摘要: Importance ranking of web pages is performed by defining a graph-based regularization term based on document features, edge features, and a web graph of a plurality of web pages, and deriving a loss term based on human feedback data. The graph-based regularization term and the loss term are combined to obtain a global objective function. The global objective function is optimized to obtain parameters for the document features and edge features and to produce static rank scores for the plurality of web pages. Further, the plurality of web pages is ordered based on the static rank scores.

    摘要翻译: 通过基于文档特征,边缘特征和多个网页的网络图定义基于图形的正则化术语,并且基于人类反馈数据导出丢失项来执行网页的重要性排名。 基于图形的正则化项和损失项被组合以获得全局目标函数。 优化全局目标函数以获得文档特征和边缘特征的参数,并且为多个网页产生静态等级分数。 此外,基于静态等级分数来排序多个网页。

    USER INTENT STRENGTH AGGREGATING BY DECAY FACTOR
    18.
    发明申请
    USER INTENT STRENGTH AGGREGATING BY DECAY FACTOR 审中-公开
    用衰减因子聚合的用户信度强度

    公开(公告)号:US20120253930A1

    公开(公告)日:2012-10-04

    申请号:US13078300

    申请日:2011-04-01

    IPC分类号: G06Q30/00

    CPC分类号: G06Q30/0251

    摘要: This application describes a system and method for estimating user intent towards categories of content. The estimation of user intent may be based at least in part on a score for prior user actions and a decay function that is applied to that score to provide an estimate of current user intent. The estimate represents current user intent for time periods in which user actions towards a category of content are negligible or non-existent.

    摘要翻译: 该应用描述了用于估计用户对内容类别的意图的系统和方法。 用户意图的估计可至少部分地基于用于先前用户动作的分数和应用于该分数以提供当前用户意图的估计的衰减函数。 估计值表示用户对一类内容的操作可忽略或不存在的时间段的当前用户意图。

    Calculating web page importance based on web behavior model
    19.
    发明授权
    Calculating web page importance based on web behavior model 有权
    基于Web行为模型计算网页重要性

    公开(公告)号:US08103599B2

    公开(公告)日:2012-01-24

    申请号:US12237392

    申请日:2008-09-25

    IPC分类号: G06F17/00 G06F17/20

    CPC分类号: G06F17/30864 G06Q30/02

    摘要: Method for determining a webpage importance, including receiving web browsing behavior data of one or more users; creating a model of the web browsing behavior data; calculating a stationary probability distribution of the model; and correlating the stationary probability distribution to the webpage importance.

    摘要翻译: 用于确定网页重要性的方法,包括接收一个或多个用户的网页浏览行为数据; 创建网络浏览行为数据的模型; 计算模型的固定概率分布; 并将固定概率分布与网页重要性相关联。

    Calculating global importance of documents based on global hitting times
    20.
    发明授权
    Calculating global importance of documents based on global hitting times 失效
    根据全球打击时间计算文件的全球重要性

    公开(公告)号:US07930303B2

    公开(公告)日:2011-04-19

    申请号:US11742276

    申请日:2007-04-30

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30864

    摘要: A calculate importance system calculates the global importance of a web page based on a “mean hitting time.” Hitting time of a target web page is a measure of the minimum number of transitions needed to land on the target web page. Mean hitting time of a target web page is an average number of such transitions for all possible starting web pages. The calculate importance system calculates a global importance score for a web page based on the reciprocal of a mean hitting time. A search engine may rank web pages of a search result based on a combination of relevance of the web pages to the search request and global importance of the web pages based on a global hitting time.

    摘要翻译: 计算重要度系统基于“平均打击时间”计算网页的全局重要性。目标网页的打击时间是衡量目标网页上所需的最小转换次数的度量。 目标网页的平均打击时间是所有可能的起始网页的平均数量。 计算重要性系统基于平均击球时间的倒数计算网页的全局重要性得分。 搜索引擎可以基于网页与搜索请求的相关性和基于全局打击时间的网页的全球重要性的组合来对搜索结果的网页进行排序。