Just-in-time analytics on large file systems
    1.
    发明授权
    Just-in-time analytics on large file systems 有权
    大型文件系统的即时分析

    公开(公告)号:US09244975B2

    公开(公告)日:2016-01-26

    申请号:US13328810

    申请日:2011-12-16

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30442

    摘要: As file systems reach the petabytes scale, users and administrators are increasingly interested in acquiring high-level analytical information for file management and analysis. Two particularly important tasks are the processing of aggregate and top-k queries which, unfortunately, cannot be quickly answered by hierarchical file systems such as ext3 and NTFS. Existing pre-processing based solutions, e.g., file system crawling and index building, consume a significant amount of time and space (for generating and maintaining the indexes) which in many cases cannot be justified by the infrequent usage of such solutions. User interests can often be sufficiently satisfied by approximate (i.e., statistically accurate) answers. A just-in-time sampling-based system can, after consuming a small number of disk accesses, produce extremely accurate answers for a broad class of aggregate and top-k queries over a file system without the requirement of any prior knowledge. The system is efficient, accurate and scalable.

    摘要翻译: 随着文件系统达到PB级,用户和管理员越来越有兴趣获取用于文件管理和分析的高级分析信息。 两个特别重要的任务是处理聚合和top-k查询,不幸的是不能通过分层文件系统(如ext3和NTFS)快速回答。 现有的基于预处理的解决方案,例如文件系统爬行和索引构建,消耗了大量的时间和空间(用于生成和维护索引),这在许多情况下不能被这种解决方案的频繁使用所证明。 用户兴趣通常可以通过近似(即统计准确的)答案来充分满足。 基于时间抽样的系统可以在消耗少量磁盘访问后,通过文件系统为广泛的聚合和顶级查询提供非常准确的答案,而无需任何先前的知识。 该系统是高效,准确和可扩展的。