Identifying similar files in an environment having multiple client computers
    1.
    发明授权
    Identifying similar files in an environment having multiple client computers 有权
    在具有多个客户端计算机的环境中识别类似的文件

    公开(公告)号:US08489612B2

    公开(公告)日:2013-07-16

    申请号:US12409978

    申请日:2009-03-24

    IPC分类号: G06F17/30

    CPC分类号: G06N5/02 G06F17/3015

    摘要: To identify similar files in an environment having multiple client computers, a first client computer receives, from a coordinator computer, a request to find files located at the first client computer that are similar to at least one comparison file, wherein the request has also been sent to other client computers by the coordinator computer to request that the other client computers also find files that are similar to the at least one comparison file. In response to the request, the first client computer compares signatures of the files located at the first client computer with a signature of the at least one comparison file to identify at least a subset of the files located at the first client computer that are similar to the at least one comparison file according to a comparison metric. The first client computer sends, to the coordinator computer, a response relating to the comparing.

    摘要翻译: 为了在具有多个客户端计算机的环境中识别类似的文件,第一客户端计算机从协调器计算机接收查找位于第一客户端计算机上的文件的请求,其类似于至少一个比较文件,其中该请求也已被 由协调器计算机发送到其他客户端计算机,以请求其他客户端计算机还查找与至少一个比较文件类似的文件。 响应于该请求,第一客户端计算机将位于第一客户端计算机的文件的签名与至少一个比较文件的签名进行比较,以识别位于第一客户端计算机的文件的至少一个子集,其类似于 所述至少一个比较文件根据比较度量。 第一个客户端计算机向协调者计算机发送与比较有关的响应。

    IDENTIFYING SIMILAR FILES IN AN ENVIRONMENT HAVING MULTIPLE CLIENT COMPUTERS
    2.
    发明申请
    IDENTIFYING SIMILAR FILES IN AN ENVIRONMENT HAVING MULTIPLE CLIENT COMPUTERS 有权
    在具有多个客户端计算机的环境中识别类似文件

    公开(公告)号:US20100250480A1

    公开(公告)日:2010-09-30

    申请号:US12409978

    申请日:2009-03-24

    IPC分类号: G06N5/02 G06F17/30 G06Q10/00

    CPC分类号: G06N5/02 G06F17/3015

    摘要: To identify similar files in an environment having multiple client computers, a first client computer receives, from a coordinator computer, a request to find files located at the first client computer that are similar to at least one comparison file, wherein the request has also been sent to other client computers by the coordinator computer to request that the other client computers also find files that are similar to the at least one comparison file. In response to the request, the first client computer compares signatures of the files located at the first client computer with a signature of the at least one comparison file to identify at least a subset of the files located at the first client computer that are similar to the at least one comparison file according to a comparison metric. The first client computer sends, to the coordinator computer, a response relating to the comparing.

    摘要翻译: 为了在具有多个客户端计算机的环境中识别类似的文件,第一客户端计算机从协调器计算机接收查找位于第一客户端计算机上的文件的请求,其类似于至少一个比较文件,其中该请求也已被 由协调器计算机发送到其他客户端计算机,以请求其他客户端计算机还查找与至少一个比较文件类似的文件。 响应于该请求,第一客户端计算机将位于第一客户端计算机的文件的签名与至少一个比较文件的签名进行比较,以识别位于第一客户端计算机的文件的至少一个子集,其类似于 所述至少一个比较文件根据比较度量。 第一个客户端计算机向协调者计算机发送与比较相关的响应。

    SYSTEM AND METHOD FOR DISPLAYING DOCUMENTS
    3.
    发明申请
    SYSTEM AND METHOD FOR DISPLAYING DOCUMENTS 审中-公开
    用于显示文件的系统和方法

    公开(公告)号:US20110202886A1

    公开(公告)日:2011-08-18

    申请号:US12705585

    申请日:2010-02-13

    IPC分类号: G06F3/048

    CPC分类号: G06F16/353

    摘要: A computer system that includes a graphical user interface used to organize a group of documents is provided. The system includes a processor that is adapted to execute machine-readable instructions. The system also includes a storage device that is adapted to store data. The data includes a plurality of documents and instructions that are executable by the processor to generate the graphical user interface. The graphical user interface includes a cluster map that includes the results of a clustering algorithm applied to the documents. The graphical user interface also includes a principal documents screen that includes a principal document that is identified by weighting each of the documents in a cluster based, at least in part, on an occurrence of representative terms in the document. The representative terms are terms that have been identified by the clustering algorithm as being more effective for distinguishing between documents that belong to different clusters.

    摘要翻译: 提供了包括用于组织一组文档的图形用户界面的计算机系统。 该系统包括适于执行机器可读指令的处理器。 该系统还包括适于存储数据的存储设备。 数据包括可由处理器执行以生成图形用户界面的多个文档和指令。 图形用户界面包括包含应用于文档的聚类算法的结果的聚类映射。 图形用户界面还包括主文档屏幕,其包括通过至少部分地基于文档中的代表项的出现来对群集中的每个文档进行加权来标识的主文档。 代表性术语是由聚类算法识别为对区分属于不同簇的文档更有效的术语。

    Copying a differential data store into temporary storage media in response to a request
    4.
    发明授权
    Copying a differential data store into temporary storage media in response to a request 有权
    响应请求将差分数据存储复制到临时存储介质中

    公开(公告)号:US09141621B2

    公开(公告)日:2015-09-22

    申请号:US12432807

    申请日:2009-04-30

    IPC分类号: G06F7/00 G06F17/30 G06F3/06

    摘要: A plurality of differential data stores are stored in persistent storage media. In response to receiving a first request to store a particular data object, one of the differential data stores that are stored in the persistent storage media is selected, wherein selecting the one differential data store is according to a criterion relating to compression of data objects in the differential data stores. The selected differential data store is copied into temporary storage media, where the copying is not delayed after receiving the first request to await receipt of more requests. The particular data object is inserted into the copy of the selected differential data store in the temporary storage media, where the inserting is performed without having to retrieve more data from the selected differential store in the persistent storage media. The selected differential data store in the persistent storage media is replaced with the copy of the selected differential data store in the temporary storage media that has been modified.

    摘要翻译: 多个差分数据存储器存储在持久存储介质中。 响应于接收到存储特定数据对象的第一请求,选择存储在永久存储介质中的差分数据存储之一,其中选择一个差分数据存储是根据与数据对象的压缩有关的标准 差分数据存储。 所选择的差分数据存储被复制到临时存储介质中,其中在接收到等待接收更多请求的第一请求之后复制不被延迟。 将特定数据对象插入临时存储介质中所选择的差分数据存储的副本,其中执行插入,而不必从永久存储介质中的所选择的差分存储中检索更多的数据。 永久存储介质中所选择的差分数据存储被所修改的临时存储介质中所选差分数据存储的副本所替代。

    DATA PROCESSING APPARATUS AND METHOD OF PROCESSING DATA
    5.
    发明申请
    DATA PROCESSING APPARATUS AND METHOD OF PROCESSING DATA 有权
    数据处理装置和数据处理方法

    公开(公告)号:US20090112945A1

    公开(公告)日:2009-04-30

    申请号:US12257659

    申请日:2008-10-24

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30162 G06F11/1451

    摘要: Data processing apparatus comprising: a chunk store containing specimen data chunks, a manifest store containing a plurality of manifests, each of which represents at least a part of a data set and each of which comprises at least one reference to at least one of said specimen data chunks, a sparse chunk index containing information on only some specimen data chunks, the processor being operable to: process input data into input data chunks; identify manifests having at least one reference to one of said specimen data chunks that corresponds to one of said input data chunks and on which there is information contained in the sparse chunk index; and prioritize the identified manifests for subsequent operation.

    摘要翻译: 数据处理装置,包括:包含标本数据块的块存储器,包含多个清单的清单存储器,每个清单代表数据集的至少一部分,每个清单包括至少一个对所述样本 数据块,仅包含一些标本数据块的信息的稀疏组块索引,所述处理器可操作以:将输入数据处理成输入数据块; 识别具有至少一个对所述样本数据块中的一个的对应于所述输入数据块中的一个的清单,并且其中包含在所述稀疏块索引中的信息; 并将识别的清单优先于后续操作。

    Data processing apparatus and method of processing data
    6.
    发明授权
    Data processing apparatus and method of processing data 有权
    数据处理装置及数据处理方法

    公开(公告)号:US08332404B2

    公开(公告)日:2012-12-11

    申请号:US12257659

    申请日:2008-10-24

    IPC分类号: G06F17/30 G06F7/00

    CPC分类号: G06F17/30162 G06F11/1451

    摘要: Data processing apparatus comprising: a chunk store containing specimen data chunks, a manifest store containing a plurality of manifests, each of which represents at least a part of a data set and each of which comprises at least one reference to at least one of said specimen data chunks, a sparse chunk index containing information on only some specimen data chunks, the processor being operable to: process input data into input data chunks; identify manifests having at least one reference to one of said specimen data chunks that corresponds to one of said input data chunks and on which there is information contained in the sparse chunk index; and prioritize the identified manifests for subsequent operation.

    摘要翻译: 数据处理装置,包括:包含标本数据块的块存储器,包含多个清单的清单存储器,每个清单代表数据集的至少一部分,每个清单包括至少一个对所述样本 数据块,仅包含一些标本数据块的信息的稀疏组块索引,所述处理器可操作以:将输入数据处理成输入数据块; 识别具有至少一个对所述样本数据块中的一个的对应于所述输入数据块中的一个的清单,并且其中包含在所述稀疏块索引中的信息; 并将识别的清单优先于后续操作。

    COPYING A DIFFERENTIAL DATA STORE INTO TEMPORARY STORAGE MEDIA IN RESPONSE TO A REQUEST
    7.
    发明申请
    COPYING A DIFFERENTIAL DATA STORE INTO TEMPORARY STORAGE MEDIA IN RESPONSE TO A REQUEST 有权
    将不同数据存储复制到临时存储介质中以响应请求

    公开(公告)号:US20100280997A1

    公开(公告)日:2010-11-04

    申请号:US12432807

    申请日:2009-04-30

    IPC分类号: G06F17/30

    摘要: A plurality of differential data stores are stored in persistent storage media. In response to receiving a first request to store a particular data object, one of the differential data stores that are stored in the persistent storage media is selected, wherein selecting the one differential data store is according to a criterion relating to compression of data objects in the differential data stores. The selected differential data store is copied into temporary storage media, where the copying is not delayed after receiving the first request to await receipt of more requests. The particular data object is inserted into the copy of the selected differential data store in the temporary storage media, where the inserting is performed without having to retrieve more data from the selected differential store in the persistent storage media. The selected differential data store in the persistent storage media is replaced with the copy of the selected differential data store in the temporary storage media that has been modified.

    摘要翻译: 多个差分数据存储器存储在持久存储介质中。 响应于接收到存储特定数据对象的第一请求,选择存储在永久存储介质中的差分数据存储之一,其中选择一个差分数据存储是根据与数据对象的压缩有关的标准 差分数据存储。 所选择的差分数据存储被复制到临时存储介质中,其中在接收到等待接收更多请求的第一请求之后复制不被延迟。 将特定数据对象插入临时存储介质中所选择的差分数据存储的副本,其中执行插入,而不必从永久存储介质中的所选择的差分存储中检索更多的数据。 永久存储介质中所选择的差分数据存储被所修改的临时存储介质中所选差分数据存储的副本所替代。

    Data processing apparatus and method of processing data
    8.
    发明授权
    Data processing apparatus and method of processing data 有权
    数据处理装置及数据处理方法

    公开(公告)号:US08959089B2

    公开(公告)日:2015-02-17

    申请号:US12988365

    申请日:2008-04-25

    IPC分类号: G06F17/30 G06F11/14

    CPC分类号: G06F11/1453

    摘要: One embodiment is a data processing apparatus that has a chunk store containing specimen data chunks, a manifest store containing a plurality of manifests, each of which represents at least a part of previously processed data and includes at least one reference to at least one of the specimen data chunks, and a sparse chunk index containing information on only some specimen data chunks. Input data is processed into a plurality of input data segments. Each manifest of the first set has at least one reference to one of said specimen data chunks that corresponds to one of the input data chunks of a first input data segment. Specimen data chunks corresponding to other input data chunks of the first input data segment are identified by using the identified first set of manifests and at least one manifest identified when processing previous data.

    摘要翻译: 一个实施例是具有包含标本数据块的块存储器的数据处理装置,包含多个清单的清单存储器,每个清单代表先前处理的数据的至少一部分,并且包括至少一个对至少一个 标本数据块,以及仅包含一些标本数据块的信息的稀疏块指数。 输入数据被处理成多个输入数据段。 第一组的每个清单具有对应于第一输入数据段的输入数据块中的一个的所述样本数据块中的一个的至少一个引用。 对应于第一输入数据段的其他输入数据块的样本数据块通过使用所识别的第一组清单和在处理先前数据时识别的至少一个清单来识别。

    Data processing apparatus and method of processing data
    9.
    发明授权
    Data processing apparatus and method of processing data 有权
    数据处理装置及数据处理方法

    公开(公告)号:US08099573B2

    公开(公告)日:2012-01-17

    申请号:US12256329

    申请日:2008-10-22

    IPC分类号: G06F12/00 G06F13/00 G06F13/28

    CPC分类号: G06F11/1451

    摘要: Data processing apparatus comprising: a chunk store containing specimen data chunks, a manifest store containing at least one manifest that represents at least a part of a data set and that comprises at least one reference to at least one of said specimen data chunks, a sparse chunk index containing information on only those specimen data chunks having a predetermined characteristic, the processing apparatus being operable to process input data into input data chunks and to use the sparse chunk index to identify at least one of said at least one manifest that includes at least one reference to one of said specimen data chunks that corresponds to one of said input data chunks having the predetermined characteristic.

    摘要翻译: 数据处理装置包括:包含样本数据块的块存储器,包含至少一个表示数据集的至少一部分的清单的清单存储器,并且包括至少一个对所述样本数据块中的至少一个的引用,稀疏 块指数仅包含具有预定特征的样本数据块的信息,该处理装置可操作以将输入数据处理成输入数据块,并使用稀疏块指数来识别至少包括至少一个清单中的至少一个, 对与所述具有预定特征的所述输入数据块之一相对应的所述样本数据块之一的引用。

    BATCHING REQUESTS FOR ACCESSING DIFFERENTIAL DATA STORES
    10.
    发明申请
    BATCHING REQUESTS FOR ACCESSING DIFFERENTIAL DATA STORES 审中-公开
    批量访问不同数据存储的要求

    公开(公告)号:US20100281077A1

    公开(公告)日:2010-11-04

    申请号:US12432804

    申请日:2009-04-30

    IPC分类号: G06F17/30

    CPC分类号: G06F16/2471 G06F16/27

    摘要: Data objects are selectively stored across a plurality of differential data stores, where selection of the differential data stores for storing respective data objects is according to a criterion relating to compression of the data objects in each of the data stores, and where the differential data stores are stored in persistent storage media. Plural requests for accessing the differential data stores are batched, and one of the differential data stores is selected to page into temporary storage from the persistent storage media. The batched plural requests for accessing the selected differential data store that has been paged into the temporary storage are executed.

    摘要翻译: 数据对象被选择性地存储在多个差分数据存储器中,其中用于存储各个数据对象的差分数据存储的选择是根据与每个数据存储器中的数据对象的压缩相关的标准,并且差分数据存储 存储在持久存储介质中。 批量访问差异数据存储的多个请求被选择,并且选择差分数据存储中的一个来从永久存储介质寻入临时存储。 执行已经被分页到临时存储器中的批量复制的访问所选择的差分数据存储器的请求。