Storing files in a parallel computing system based on user or application specification
    1.
    发明授权
    Storing files in a parallel computing system based on user or application specification 有权
    基于用户或应用程序规范将文件存储在并行计算系统中

    公开(公告)号:US09298733B1

    公开(公告)日:2016-03-29

    申请号:US13536289

    申请日:2012-06-28

    IPC分类号: G06F17/30

    CPC分类号: G06F17/302 G06F17/30224

    摘要: Techniques are provided for storing files in a parallel computing system based on a user-specification. A plurality of files generated by a distributed application in a parallel computing system are stored by obtaining a specification from the distributed application indicating how the plurality of files should be stored; and storing one or more of the plurality of files in one or more storage nodes of a multi-tier storage system based on the specification. The plurality of files comprise a plurality of complete files and/or a plurality of sub-files. The specification can optionally be processed by a daemon executing on one or more nodes in a multi-tier storage system. The specification indicates how the plurality of files should be stored, for example, identifying one or more storage nodes where the plurality of files should be stored.

    摘要翻译: 提供了基于用户规范在并行计算系统中存储文件的技术。 由并行计算系统中的分布式应用生成的多个文件通过从分布式应用获得指定如何存储多个文件的规范来存储; 以及基于所述规范,将所述多个文件中的一个或多个存储在多层存储系统的一个或多个存储节点中。 多个文件包括多个完整文件和/或多个子文件。 该规范可以可选地由在多层存储系统中的一个或多个节点上执行的守护进程来处理。 该规范指示如何存储多个文件,例如,识别应该存储多个文件的一个或多个存储节点。

    Methods and apparatus for multi-resolution replication of files in a parallel computing system using semantic information
    2.
    发明授权
    Methods and apparatus for multi-resolution replication of files in a parallel computing system using semantic information 有权
    使用语义信息的并行计算系统中文件的多分辨率复制的方法和装置

    公开(公告)号:US09165014B1

    公开(公告)日:2015-10-20

    申请号:US13536358

    申请日:2012-06-28

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30215 G06F17/30244

    摘要: Techniques are provided for storing files in a parallel computing system using different resolutions. A method is provided for storing at least one file generated by a distributed application in a parallel computing system. The file comprises one or more of a complete file and a sub-file. The method comprises the steps of obtaining semantic information related to the file; generating a plurality of replicas of the file with different resolutions based on the semantic information; and storing the file and the plurality of replicas of the file in one or more storage nodes of the parallel computing system. The different resolutions comprise, for example, a variable number of bits and/or a different sub-set of data elements from the file. A plurality of the sub-files can be merged to reproduce the file.

    摘要翻译: 提供了使用不同分辨率在并行计算系统中存储文件的技术。 提供了一种用于将由分布式应用生成的至少一个文件存储在并行计算系统中的方法。 该文件包括完整文件和子文件中的一个或多个。 该方法包括获取与文件相关的语义信息的步骤; 基于语义信息生成具有不同分辨率的文件的多个副本; 以及将所述文件的文件和所述多个副本存储在所述并行计算系统的一个或多个存储节点中。 不同的分辨率例如包括来自文件的可变数量的位和/或不同的数据元素子集。 可以合并多个子文件以再现该文件。

    Storing files in a parallel computing system based on user-specified parser function
    3.
    发明授权
    Storing files in a parallel computing system based on user-specified parser function 有权
    基于用户指定的解析器函数将文件存储在并行计算系统中

    公开(公告)号:US08868576B1

    公开(公告)日:2014-10-21

    申请号:US13536369

    申请日:2012-06-28

    IPC分类号: G06F17/30

    CPC分类号: G06F17/3056 G06F17/30091

    摘要: Techniques are provided for storing files in a parallel computing system based on a user-specified parser function. A plurality of files generated by a distributed application in a parallel computing system are stored by obtaining a parser from the distributed application for processing the plurality of files prior to storage; and storing one or more of the plurality of files in one or more storage nodes of the parallel computing system based on the processing by the parser. The plurality of files comprise one or more of a plurality of complete files and a plurality of sub-files. The parser can optionally store only those files that satisfy one or more semantic requirements of the parser. The parser can also extract metadata from one or more of the files and the extracted metadata can be stored with one or more of the plurality of files and used for searching for files.

    摘要翻译: 提供了用于基于用户指定的解析器功能在并行计算系统中存储文件的技术。 由并行计算系统中的分布式应用程序生成的多个文件通过从分布式应用程序获得解析器来存储,用于在存储之前处理多个文件; 以及基于所述解析器的处理,将所述多个文件中的一个或多个存储在所述并行计算系统的一个或多个存储节点中。 多个文件包括多个完整文件和多个子文件中的一个或多个。 解析器可以可选地仅存储满足解析器的一个或多个语义要求的那些文件。 解析器还可以从一个或多个文件中提取元数据,并且所提取的元数据可以与多个文件中的一个或多个文件一起存储并用于搜索文件。

    Methods and apparatus for capture and storage of semantic information with sub-files in a parallel computing system
    4.
    发明授权
    Methods and apparatus for capture and storage of semantic information with sub-files in a parallel computing system 有权
    用于在并行计算系统中用子文件捕获和存储语义信息的方法和装置

    公开(公告)号:US08949255B1

    公开(公告)日:2015-02-03

    申请号:US13536384

    申请日:2012-06-28

    IPC分类号: G06F17/30

    CPC分类号: G06F17/301

    摘要: Techniques are provided for storing files in a parallel computing system using sub-files with semantically meaningful boundaries. A method is provided for storing at least one file generated by a distributed application in a parallel computing system. The file comprises one or more of a complete file and a plurality of sub-files. The method comprises the steps of obtaining a user specification of semantic information related to the file; providing the semantic information as a data structure description to a data formatting library write function; and storing the semantic information related to the file with one or more of the sub-files in one or more storage nodes of the parallel computing system. The semantic information provides a description of data in the file. The sub-files can be replicated based on semantically meaningful boundaries.

    摘要翻译: 提供了使用具有语义有意义的边界的子文件在并行计算系统中存储文件的技术。 提供了一种用于将由分布式应用生成的至少一个文件存储在并行计算系统中的方法。 该文件包括完整文件和多个子文件中的一个或多个。 该方法包括以下步骤:获得与文件相关的语义信息的用户指定; 提供语义信息作为数据格式化库写入功能的数据结构描述; 以及将与所述文件相关的所述语义信息与所述子文件中的一个或多个存储在所述并行计算系统的一个或多个存储节点中。 语义信息提供文件中数据的描述。 子文件可以基于语义有意义的边界进行复制。

    Storing files in a parallel computing system using list-based index to identify replica files
    5.
    发明授权
    Storing files in a parallel computing system using list-based index to identify replica files 有权
    使用基于列表的索引将文件存储在并行计算系统中以识别副本文件

    公开(公告)号:US09087075B1

    公开(公告)日:2015-07-21

    申请号:US13536331

    申请日:2012-06-28

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30212

    摘要: Improved techniques are provided for storing files in a parallel computing system using a list-based index to identify file replicas. A file and at least one replica of the file are stored in one or more storage nodes of the parallel computing system. An index for the file comprises at least one list comprising a pointer to a storage location of the file and a storage location of the at least one replica of the file. The file comprises one or more of a complete file and one or more sub-files. The index may also comprise a checksum value for one or more of the file and the replica(s) of the file. The checksum value can be evaluated to validate the file and/or the file replica(s). A query can be processed using the list.

    摘要翻译: 提供了改进的技术,用于使用基于列表的索引来在并行计算系统中存储文件以识别文件副本。 文件和文件的至少一个副本存储在并行计算系统的一个或多个存储节点中。 文件的索引包括至少一个列表,其包括指向文件的存储位置的指针和该文件的至少一个副本的存储位置。 该文件包括完整文件和一个或多个子文件中的一个或多个。 索引还可以包括文件中的一个或多个文件和文件副本的校验和值。 可以评估校验和值以验证文件和/或文件副本。 可以使用列表处理查询。

    Small file aggregation in a parallel computing system
    6.
    发明授权
    Small file aggregation in a parallel computing system 有权
    并行计算系统中的小文件聚合

    公开(公告)号:US08825652B1

    公开(公告)日:2014-09-02

    申请号:US13536315

    申请日:2012-06-28

    IPC分类号: G06F17/30

    CPC分类号: G06F17/302

    摘要: Techniques are provided for small file aggregation in a parallel computing system. An exemplary method for storing a plurality of files generated by a plurality of processes in a parallel computing system comprises aggregating the plurality of files into a single aggregated file; and generating metadata for the single aggregated file. The metadata comprises an offset and a length of each of the plurality of files in the single aggregated file. The metadata can be used to unpack one or more of the files from the single aggregated file.

    摘要翻译: 在并行计算系统中提供了用于小文件聚合的技术。 用于存储由并行计算系统中的多个进程生成的多个文件的示例性方法包括将所述多个文件聚合成单个聚合文件; 并为单个聚合文件生成元数据。 元数据包括单个聚合文件中的多个文件中的每一个的偏移量和长度。 元数据可用于从单个聚合文件中解压缩一个或多个文件。

    Cooperative storage of shared files in a parallel computing system with dynamic block size
    7.
    发明授权
    Cooperative storage of shared files in a parallel computing system with dynamic block size 有权
    共享文件在具有动态块大小的并行计算系统中的协同存储

    公开(公告)号:US09183211B1

    公开(公告)日:2015-11-10

    申请号:US13730080

    申请日:2012-12-28

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30 G06F17/30091

    摘要: Improved techniques are provided for parallel writing of data to a shared object in a parallel computing system. A method is provided for storing data generated by a plurality of parallel processes to a shared object in a parallel computing system. The method is performed by at least one of the processes and comprises: dynamically determining a block size for storing the data; exchanging a determined amount of the data with at least one additional process to achieve a block of the data having the dynamically determined block size; and writing the block of the data having the dynamically determined block size to a file system. The determined block size comprises, e.g., a total amount of the data to be stored divided by the number of parallel processes. The file system comprises, for example, a log structured virtual parallel file system, such as a Parallel Log-Structured File System (PLFS).

    摘要翻译: 提供了改进的技术,用于在并行计算系统中将数据并行写入共享对象。 提供了一种用于将由多个并行处理生成的数据存储到并行计算系统中的共享对象的方法。 该方法通过至少一个处理来执行,并且包括:动态地确定用于存储数据的块大小; 用至少一个附加过程交换确定量的数据以实现具有动态确定的块大小的数据块; 以及将具有动态确定的块大小的数据的块写入文件系统。 确定的块大小包括例如待存储的数据的总量除以并行进程的数量。 文件系统包括例如日志结构化虚拟并行文件系统,诸如并行对数结构化文件系统(PLFS)。

    Request queues for interactive clients in a shared file system of a parallel computing system
    8.
    发明授权
    Request queues for interactive clients in a shared file system of a parallel computing system 有权
    在并行计算系统的共享文件系统中为交互式客户端请求队列

    公开(公告)号:US09110695B1

    公开(公告)日:2015-08-18

    申请号:US13730112

    申请日:2012-12-28

    IPC分类号: G06F9/455

    摘要: Interactive requests are processed from users of log-in nodes. A metadata server node is provided for use in a file system shared by one or more interactive nodes and one or more batch nodes. The interactive nodes comprise interactive clients to execute interactive tasks and the batch nodes execute batch jobs for one or more batch clients. The metadata server node comprises a virtual machine monitor; an interactive client proxy to store metadata requests from the interactive clients in an interactive client queue; a batch client proxy to store metadata requests from the batch clients in a batch client queue; and a metadata server to store the metadata requests from the interactive client queue and the batch client queue in a metadata queue based on an allocation of resources by the virtual machine monitor. The metadata requests can be prioritized, for example, based on one or more of a predefined policy and predefined rules.

    摘要翻译: 从登录节点的用户处理交互式请求。 提供元数据服务器节点用于由一个或多个交互式节点和一个或多个批处理节点共享的文件系统。 交互式节点包括交互式客户端以执行交互式任务,批处理节点为一个或多个批处理客户端执行批处理作业。 元数据服务器节点包括虚拟机监视器; 交互式客户端代理,用于在交互式客户端队列中存储来自交互式客户机的元数据请求; 批处理客户机代理,用于在批处理客户端队列中存储批处理客户端的元数据请求; 以及元数据服务器,用于基于虚拟机监视器的资源分配来存储元数据队列中来自交互式客户机队列和批处理客户机队列的元数据请求。 元数据请求可以被优先化,例如,基于预定义策略和预定义规则中的一个或多个。