Abstract:
Technology is disclosed for improving the storage efficiency and communication efficiency for a storage client device by maximizing the cache hit rate and minimizing data requests to the storage server. The storage server provides a duplication list to the storage client device. The duplication list contains references (e.g. storage addresses) to data blocks that contain duplicate data content. The storage client uses the duplication list to improve the cache hit rate. The duplication list is pruned to contain references to data blocks relevant to the storage client device. The storage server can prune the duplication list based on a working set of storage objects for a client. Alternatively, the storage server can prune the duplication list based on content characteristics, e.g. duplication degree and access frequency. Duplicate blocks to which the client does not have access can be excluded from the duplication list.