Identifying similar files in an environment having multiple client computers
    1.
    发明授权
    Identifying similar files in an environment having multiple client computers 有权
    在具有多个客户端计算机的环境中识别类似的文件

    公开(公告)号:US08489612B2

    公开(公告)日:2013-07-16

    申请号:US12409978

    申请日:2009-03-24

    IPC分类号: G06F17/30

    CPC分类号: G06N5/02 G06F17/3015

    摘要: To identify similar files in an environment having multiple client computers, a first client computer receives, from a coordinator computer, a request to find files located at the first client computer that are similar to at least one comparison file, wherein the request has also been sent to other client computers by the coordinator computer to request that the other client computers also find files that are similar to the at least one comparison file. In response to the request, the first client computer compares signatures of the files located at the first client computer with a signature of the at least one comparison file to identify at least a subset of the files located at the first client computer that are similar to the at least one comparison file according to a comparison metric. The first client computer sends, to the coordinator computer, a response relating to the comparing.

    摘要翻译: 为了在具有多个客户端计算机的环境中识别类似的文件,第一客户端计算机从协调器计算机接收查找位于第一客户端计算机上的文件的请求,其类似于至少一个比较文件,其中该请求也已被 由协调器计算机发送到其他客户端计算机,以请求其他客户端计算机还查找与至少一个比较文件类似的文件。 响应于该请求,第一客户端计算机将位于第一客户端计算机的文件的签名与至少一个比较文件的签名进行比较,以识别位于第一客户端计算机的文件的至少一个子集,其类似于 所述至少一个比较文件根据比较度量。 第一个客户端计算机向协调者计算机发送与比较有关的响应。

    IDENTIFYING SIMILAR FILES IN AN ENVIRONMENT HAVING MULTIPLE CLIENT COMPUTERS
    2.
    发明申请
    IDENTIFYING SIMILAR FILES IN AN ENVIRONMENT HAVING MULTIPLE CLIENT COMPUTERS 有权
    在具有多个客户端计算机的环境中识别类似文件

    公开(公告)号:US20100250480A1

    公开(公告)日:2010-09-30

    申请号:US12409978

    申请日:2009-03-24

    IPC分类号: G06N5/02 G06F17/30 G06Q10/00

    CPC分类号: G06N5/02 G06F17/3015

    摘要: To identify similar files in an environment having multiple client computers, a first client computer receives, from a coordinator computer, a request to find files located at the first client computer that are similar to at least one comparison file, wherein the request has also been sent to other client computers by the coordinator computer to request that the other client computers also find files that are similar to the at least one comparison file. In response to the request, the first client computer compares signatures of the files located at the first client computer with a signature of the at least one comparison file to identify at least a subset of the files located at the first client computer that are similar to the at least one comparison file according to a comparison metric. The first client computer sends, to the coordinator computer, a response relating to the comparing.

    摘要翻译: 为了在具有多个客户端计算机的环境中识别类似的文件,第一客户端计算机从协调器计算机接收查找位于第一客户端计算机上的文件的请求,其类似于至少一个比较文件,其中该请求也已被 由协调器计算机发送到其他客户端计算机,以请求其他客户端计算机还查找与至少一个比较文件类似的文件。 响应于该请求,第一客户端计算机将位于第一客户端计算机的文件的签名与至少一个比较文件的签名进行比较,以识别位于第一客户端计算机的文件的至少一个子集,其类似于 所述至少一个比较文件根据比较度量。 第一个客户端计算机向协调者计算机发送与比较相关的响应。

    Storing update data using a processing pipeline
    5.
    发明授权
    Storing update data using a processing pipeline 有权
    使用处理流水线存储更新数据

    公开(公告)号:US08311982B2

    公开(公告)日:2012-11-13

    申请号:US12703858

    申请日:2010-02-11

    IPC分类号: G06F17/30

    摘要: A system has a processing pipeline with a plurality of processing stages, where each of the processing stages has one or plural processors, and where the processing stages are individually and independently scalable. A first of the processing stages of the processing pipeline provides a received date update into an update data structure, where the update data structure is accessible to process a query received by the system. One or more additional of the processing stages transforms the update data structure to allow for merging of the transformed update data structure into a database, where the transformed update data structure is accessible to process the query. Content of the transformed update data structure is stored into the database.

    摘要翻译: 系统具有多个处理级的处理流水线,其中每个处理级具有一个或多个处理器,并且处理阶段是单独且独立的可扩展的。 处理流水线的第一处理阶段将接收到的日期更新提供给更新数据结构,其中可更新数据结构可用于处理系统接收到的查询。 处理阶段中的一个或多个处理阶段将更新数据结构转换为允许将变换的更新数据结构合并到数据库中,其中变换的更新数据结构可被访问以处理该查询。 转换的更新数据结构的内容被存储到数据库中。

    STORING UPDATE DATA USING A PROCESSING PIPELINE
    6.
    发明申请
    STORING UPDATE DATA USING A PROCESSING PIPELINE 有权
    使用加工管道存储更新数据

    公开(公告)号:US20110196880A1

    公开(公告)日:2011-08-11

    申请号:US12703858

    申请日:2010-02-11

    IPC分类号: G06F17/30 G06F7/00 G06F17/00

    摘要: A system has a processing pipeline with a plurality of processing stages, where each of the processing stages has one or plural processors, and where the processing stages are individually and independently scalable. A first of the processing stages of the processing pipeline provides a received date update into an update data structure, where the update data structure is accessible to process a query received by the system. One or more additional of the processing stages transforms the update data structure to allow for merging of the transformed update data structure into a database, where the transformed update data structure is accessible to process the query. Content of the transformed update data structure is stored into the database.

    摘要翻译: 系统具有多个处理级的处理流水线,其中每个处理级具有一个或多个处理器,并且处理阶段是单独且独立的可扩展的。 处理流水线的第一处理阶段将接收到的日期更新提供给更新数据结构,其中可更新数据结构可用于处理系统接收到的查询。 处理阶段中的一个或多个处理阶段将更新数据结构转换为允许将变换的更新数据结构合并到数据库中,其中变换的更新数据结构可被访问以处理该查询。 转换的更新数据结构的内容被存储到数据库中。

    Scheduling Data Analysis Operations In A Computer System
    7.
    发明申请
    Scheduling Data Analysis Operations In A Computer System 有权
    计算机系统中的计划数据分析操作

    公开(公告)号:US20100251256A1

    公开(公告)日:2010-09-30

    申请号:US12413969

    申请日:2009-03-30

    IPC分类号: G06F9/46

    摘要: A technique receiving identifiers from a plurality of nodes. Each identifier identifies an associated data object, and at least some of the data objects being replicated on different nodes. The technique includes scheduling analysis of the data objects on the nodes based at least in part on a distribution of replicas of the data objects among the nodes and modeled performances of the nodes.

    摘要翻译: 一种从多个节点接收标识符的技术。 每个标识符标识相关联的数据对象,并且至少一些数据对象被复制在不同的节点上。 该技术包括至少部分地基于节点之间的数据对象的副本的分布和节点的建模性能来对节点上的数据对象进行调度分析。

    Resource assignment for jobs in a system having a processing pipeline that satisfies a data freshness query constraint
    8.
    发明授权
    Resource assignment for jobs in a system having a processing pipeline that satisfies a data freshness query constraint 有权
    具有满足数据新鲜度查询约束的处理流水线的系统中的作业的资源分配

    公开(公告)号:US09389913B2

    公开(公告)日:2016-07-12

    申请号:US13383594

    申请日:2010-07-08

    IPC分类号: G06F9/50 G06F17/30

    摘要: A set of jobs to be scheduled is identified (402) in a system including a processing pipeline having plural processing stages that apply corresponding different processing to a data update to allow the data update to be stored. The set of jobs is based on one or both of the data update and a query that is to access data in the system. The set of jobs is scheduled (404) by assigning resources to perform the set of jobs, where assigning the resources is subject to at least one constraint selected from at least one constraint associated with the data update and at least one constraint associated with the query.

    摘要翻译: 在包括具有多个处理级的处理流水线的系统中识别要安排的一组作业(402),该处理流程对数据更新应用相应的不同处理以允许存储数据更新。 作业集合基于数据更新中的一个或两个,以及要访问系统中的数据的查询。 通过分配资源来执行作业集合来调度作业集合(404),其中分配资源受到从与数据更新相关联的至少一个约束中的至少一个约束以及与查询相关联的至少一个约束的至少一个约束 。

    RESOURCE ASSIGNMENT FOR JOBS IN A SYSTEM HAVING A PROCESSING PIPELINE
    9.
    发明申请
    RESOURCE ASSIGNMENT FOR JOBS IN A SYSTEM HAVING A PROCESSING PIPELINE 有权
    在具有加工管道的系统中进行工作的资源分配

    公开(公告)号:US20140223444A1

    公开(公告)日:2014-08-07

    申请号:US13383594

    申请日:2010-07-08

    IPC分类号: G06F9/50

    摘要: A set of jobs to be scheduled is identified (402) in a system including a processing pipeline having plural processing stages that apply corresponding different processing to a data update to allow the data update to be stored. The set of jobs is based on one or both of the data update and a query that is to access data in the system. The set of jobs is scheduled (404) by assigning resources to perform the set of jobs, where assigning the resources is subject to at least one constraint selected from at least one constraint associated with the data update and at least one constraint associated with the query.

    摘要翻译: 在包括具有多个处理级的处理流水线的系统中识别要安排的一组作业(402),该处理流程对数据更新应用相应的不同处理以允许存储数据更新。 作业集合基于数据更新中的一个或两个,以及要访问系统中的数据的查询。 通过分配资源来执行作业集合来调度作业集合(404),其中分配资源受到从与数据更新相关联的至少一个约束中的至少一个约束以及与查询相关联的至少一个约束的至少一个约束 。