摘要:
To identify similar files in an environment having multiple client computers, a first client computer receives, from a coordinator computer, a request to find files located at the first client computer that are similar to at least one comparison file, wherein the request has also been sent to other client computers by the coordinator computer to request that the other client computers also find files that are similar to the at least one comparison file. In response to the request, the first client computer compares signatures of the files located at the first client computer with a signature of the at least one comparison file to identify at least a subset of the files located at the first client computer that are similar to the at least one comparison file according to a comparison metric. The first client computer sends, to the coordinator computer, a response relating to the comparing.
摘要:
To identify similar files in an environment having multiple client computers, a first client computer receives, from a coordinator computer, a request to find files located at the first client computer that are similar to at least one comparison file, wherein the request has also been sent to other client computers by the coordinator computer to request that the other client computers also find files that are similar to the at least one comparison file. In response to the request, the first client computer compares signatures of the files located at the first client computer with a signature of the at least one comparison file to identify at least a subset of the files located at the first client computer that are similar to the at least one comparison file according to a comparison metric. The first client computer sends, to the coordinator computer, a response relating to the comparing.
摘要:
Multiple file system events are detected on one or more nodes of a file system, each file system event corresponding to an operation that is to be performed on the file system. Each of the multiple system events are durably recorded as an entry for a journal of the file system prior to either performance or completion of the corresponding operation. A programmatic component that is external to the file system can process entries from the journal, and in response, the entries can be expired from the journal.
摘要:
In at least some examples, a system includes a distributed database and control logic to enable updates and queries to the distributed database. The control logic applies a plurality of identifiers to the updates and queries to maintain distinct fault domains in the distributed database.
摘要:
A data processing system includes a plurality of processing stages. In response to a query, a membership structure is accessed to determine whether partially processed data from a particular one of the processing stages.
摘要:
A technique receiving identifiers from a plurality of nodes. Each identifier identifies an associated data object, and at least some of the data objects being replicated on different nodes. The technique includes scheduling analysis of the data objects on the nodes based at least in part on a distribution of replicas of the data objects among the nodes and modeled performances of the nodes.
摘要:
A data processing system includes a plurality of processing stages. In response to a query, a membership structure is accessed to determine whether partially processed data from a particular one of the processing stages.
摘要:
A technique receiving identifiers from a plurality of nodes. Each identifier identifies an associated data object, and at least some of the data objects being replicated on different nodes. The technique includes scheduling analysis of the data objects on the nodes based at least in part on a distribution of replicas of the data objects among the nodes and modeled performances of the nodes.
摘要:
A processing subsystem has plural processing stages, where output of one of the plural processing stages is provided to another of the processing stages. Resources are dynamically assigned to the plural processing stages.
摘要:
A processing subsystem has plural processing stages, where output of one of the plural processing stages is provided to another of the processing stages. Resources are dynamically assigned to the plural processing stages.