DETERMINING OPTIMAL DATA ACCESS FOR DEEP LEARNING APPLICATIONS ON A CLUSTER

    公开(公告)号:US20230014344A1

    公开(公告)日:2023-01-19

    申请号:US17305735

    申请日:2021-07-14

    摘要: A computer-implemented method, a computer program product, and a computer system for determining optimal data access for deep learning applications on a cluster. A server determines candidate cache locations for one or more compute nodes in the cluster. The server fetches a mini-batch of a dataset located at a remote storage service into the candidate cache locations. The server collects information about time periods of completing a job on the one or more nodes, where the job is executed against fetched mini-batch at the candidate cache locations and the mini-batch at the remote storage location. The server selects, from the candidate cache locations and the remote storage location, a cache location. The server fetches the data of the dataset from the remote storage service to the cache location, and the one or more nodes execute the job against fetched data of the dataset at the cache location.

    SELECTING AND RESIZING CURRENTLY EXECUTING JOB TO ACCOMMODATE EXECUTION OF ANOTHER JOB

    公开(公告)号:US20170220379A1

    公开(公告)日:2017-08-03

    申请号:US15240399

    申请日:2016-08-18

    IPC分类号: G06F9/48 G06F9/50

    摘要: A job execution scheduling system and associated methods are provided for accommodating a request for additional computing resources to execute a job that is currently being executed or a request for computing resources to execute a new job. The job execution scheduling system may utilize a decision function to determine one or more currently executing jobs to select for resizing. Resizing a currently executing job may include de-allocating one or more computing resources from the currently executing job and allocating the de-allocated resources to the job for which the request was received. In this manner, the request for additional computing resources is accommodated, while at the same time, the one or more jobs from which computing resources were de-allocated continue to be executed using a reduced set of computing resources.

    Selecting and resizing currently executing job to accommodate execution of another job
    10.
    发明授权
    Selecting and resizing currently executing job to accommodate execution of another job 有权
    选择并调整当前正在执行的作业以适应其他作业的执行

    公开(公告)号:US09448842B1

    公开(公告)日:2016-09-20

    申请号:US15010079

    申请日:2016-01-29

    IPC分类号: G06F9/455 G06F9/46 G06F9/50

    摘要: A job execution scheduling system and associated methods are provided for accommodating a request for additional computing resources to execute a job that is currently being executed or a request for computing resources to execute a new job. The job execution scheduling system may utilize a decision function to determine one or more currently executing jobs to select for resizing. Resizing a currently executing job may include de-allocating one or more computing resources from the currently executing job and allocating the de-allocated resources to the job for which the request was received. In this manner, the request for additional computing resources is accommodated, while at the same time, the one or more jobs from which computing resources were de-allocated continue to be executed using a reduced set of computing resources.

    摘要翻译: 提供作业执行调度系统和相关联的方法以适应对附加计算资源的请求以执行当前正在执行的作业或者执行计算资源以执行新作业的请求。 作业执行调度系统可以利用决策功能来确定当前正在执行的一个或多个作业以选择调整大小。 调整当前执行的作业的大小可以包括从当前执行的作业中取消分配一个或多个计算资源,并将去分配的资源分配给接收到该请求的作业。 以这种方式,适应对附加计算资源的请求,同时,使用减少的一组计算资源继续执行计算资源被去分配的一个或多个作业。