摘要:
To identify similar files in an environment having multiple client computers, a first client computer receives, from a coordinator computer, a request to find files located at the first client computer that are similar to at least one comparison file, wherein the request has also been sent to other client computers by the coordinator computer to request that the other client computers also find files that are similar to the at least one comparison file. In response to the request, the first client computer compares signatures of the files located at the first client computer with a signature of the at least one comparison file to identify at least a subset of the files located at the first client computer that are similar to the at least one comparison file according to a comparison metric. The first client computer sends, to the coordinator computer, a response relating to the comparing.
摘要:
To identify similar files in an environment having multiple client computers, a first client computer receives, from a coordinator computer, a request to find files located at the first client computer that are similar to at least one comparison file, wherein the request has also been sent to other client computers by the coordinator computer to request that the other client computers also find files that are similar to the at least one comparison file. In response to the request, the first client computer compares signatures of the files located at the first client computer with a signature of the at least one comparison file to identify at least a subset of the files located at the first client computer that are similar to the at least one comparison file according to a comparison metric. The first client computer sends, to the coordinator computer, a response relating to the comparing.
摘要:
Multiple file system events are detected on one or more nodes of a file system, each file system event corresponding to an operation that is to be performed on the file system. Each of the multiple system events are durably recorded as an entry for a journal of the file system prior to either performance or completion of the corresponding operation. A programmatic component that is external to the file system can process entries from the journal, and in response, the entries can be expired from the journal.
摘要:
In at least some examples, a system includes a distributed database and control logic to enable updates and queries to the distributed database. The control logic applies a plurality of identifiers to the updates and queries to maintain distinct fault domains in the distributed database.
摘要:
A system has a processing pipeline with a plurality of processing stages, where each of the processing stages has one or plural processors, and where the processing stages are individually and independently scalable. A first of the processing stages of the processing pipeline provides a received date update into an update data structure, where the update data structure is accessible to process a query received by the system. One or more additional of the processing stages transforms the update data structure to allow for merging of the transformed update data structure into a database, where the transformed update data structure is accessible to process the query. Content of the transformed update data structure is stored into the database.
摘要:
A system has a processing pipeline with a plurality of processing stages, where each of the processing stages has one or plural processors, and where the processing stages are individually and independently scalable. A first of the processing stages of the processing pipeline provides a received date update into an update data structure, where the update data structure is accessible to process a query received by the system. One or more additional of the processing stages transforms the update data structure to allow for merging of the transformed update data structure into a database, where the transformed update data structure is accessible to process the query. Content of the transformed update data structure is stored into the database.
摘要:
A technique receiving identifiers from a plurality of nodes. Each identifier identifies an associated data object, and at least some of the data objects being replicated on different nodes. The technique includes scheduling analysis of the data objects on the nodes based at least in part on a distribution of replicas of the data objects among the nodes and modeled performances of the nodes.
摘要:
A set of jobs to be scheduled is identified (402) in a system including a processing pipeline having plural processing stages that apply corresponding different processing to a data update to allow the data update to be stored. The set of jobs is based on one or both of the data update and a query that is to access data in the system. The set of jobs is scheduled (404) by assigning resources to perform the set of jobs, where assigning the resources is subject to at least one constraint selected from at least one constraint associated with the data update and at least one constraint associated with the query.
摘要:
A set of jobs to be scheduled is identified (402) in a system including a processing pipeline having plural processing stages that apply corresponding different processing to a data update to allow the data update to be stored. The set of jobs is based on one or both of the data update and a query that is to access data in the system. The set of jobs is scheduled (404) by assigning resources to perform the set of jobs, where assigning the resources is subject to at least one constraint selected from at least one constraint associated with the data update and at least one constraint associated with the query.
摘要:
A data processing system includes a plurality of processing stages. In response to a query, a membership structure is accessed to determine whether partially processed data from a particular one of the processing stages.