SYSTEMS AND METHODS OF RESOURCE CONFIGURATION OPTIMIZATION FOR MACHINE LEARNING WORKLOADS

    公开(公告)号:US20210357256A1

    公开(公告)日:2021-11-18

    申请号:US16874479

    申请日:2020-05-14

    摘要: Systems and methods are provided for optimally allocating resources used to perform multiple tasks/jobs, e.g., machine learning training jobs. The possible resource configurations or candidates that can be used to perform such jobs are generated. A first batch of training jobs can be randomly selected and run using one of the possible resource configuration candidates. Subsequent batches of training jobs may be performed using other resource configuration candidates that have been selected using an optimization process, e.g., Bayesian optimization. Upon reaching a stopping criterion, the resource configuration resulting in a desired optimization metric, e.g., fastest job completion time can be selected and used to execute the remaining training jobs.

    FINE-GRAINED AND COARSE-GRAINED CONGESTION WINDOW SYNCHRONIZATION FOR TCP

    公开(公告)号:US20240259324A1

    公开(公告)日:2024-08-01

    申请号:US18161519

    申请日:2023-01-30

    IPC分类号: H04L47/27 H04L47/193

    摘要: Systems and methods are provided for improved TCP congestion control designed to address “mixed coarse-grained-fine-grained signal” scenarios. A TCP sender of the present technology achieves this improvement by leveraging two TCP congestion windows for a TCP connection: (1) a “fine-grained TCP signal-dependent congestion window” which is adjusted in response to “fine-grained” TCP congestion signals (as intelligently classified/defined by the present technology); and (2) a “coarse-grained TCP signal-dependent congestion window” which is adjusted in response to “coarse-grained” TCP congestion signals (as intelligently classified/defined by the present technology). With these two novel/unique congestion windows at disposal, the TCP sender can then dynamically (and intelligently) select an appropriate congestion window for dictating packet transmission for a TCP connection (e.g., the contemporaneously smaller congestion window). The TCP sender can also dynamically (and intelligently) synchronize the two congestion windows in order to ensure smoother transitions between utilized congestion windows.

    BANDWIDTH ESTIMATE FILTERING BASED ON PACKET LOSS PATTERNS

    公开(公告)号:US20230344733A1

    公开(公告)日:2023-10-26

    申请号:US17728859

    申请日:2022-04-25

    摘要: Systems and methods are provided for effectuating a filtering technique that can enable available bandwidth (e.g., on a network path) to be estimated in the presence of moderate losses caused by certain queue management techniques. When packet losses exist on a network path due to certain types of packet queue transmission mechanisms, methods, or models, a bump detection algorithm (BDA) can be used to perform bandwidth estimation. When a pattern of packet loss is identified (from its signature) as being one with which BDA can be performed to accurately estimate available bandwidth on a network path, the BDA-based bandwidth estimate may be used to place/route and load balance network traffic, or otherwise used to engage in network traffic engineering, take other network-related action(s), or reported out. Otherwise, the bandwidth estimation is suppressed and not used.

    TOKEN BUCKET WITH ACTIVE QUEUE MANAGEMENT
    4.
    发明公开

    公开(公告)号:US20230198910A1

    公开(公告)日:2023-06-22

    申请号:US17554935

    申请日:2021-12-17

    摘要: Systems and methods are provided for a new type of quality of service (QoS) primitive at a network device that has better performance than traditional QoS primitives. The QoS primitive may comprise a token bucket with active queue management (TBAQM). Particularly, the TBAQM may receive a data packet that is processed by the token bucket; adjust tokens associated with the token bucket, where the tokens are added based on a configured rate and subtracted in association with processing the data packet; determine a number of tokens associated with the token bucket, comprising: when the token bucket has zero tokens, initiating a first action with the data packet, and when the token bucket has more than zero tokens, determining a marking probability based on the number of tokens and initiating a second action based on the marking probability.

    SYSTEMS AND METHODS OF RESOURCE CONFIGURATION OPTIMIZATION FOR MACHINE LEARNING WORKLOADS

    公开(公告)号:US20220292303A1

    公开(公告)日:2022-09-15

    申请号:US17199294

    申请日:2021-03-11

    摘要: Systems and methods can be configured to determine a plurality of computing resource configurations used to perform machine learning model training jobs. A computing resource configuration can comprise: a first tuple including numbers of worker nodes and parameter server nodes, and a second tuple including resource allocations for the worker nodes and parameter server nodes. At least one machine learning training job can be executed using a first computing resource configuration having a first set of values associated with the first tuple. During the executing the machine learning training job: resource usage of the worker nodes and parameter server nodes caused by a second set of values associated with the second tuple can be monitored, and whether to adjust the second set of values can be determined. Whether a stopping criterion is satisfied can be determined. One of the plurality of computing resource configurations can be selected.

    DEEP LEARNING AUTOTUNING TASK OPTIMIZATION

    公开(公告)号:US20220129315A1

    公开(公告)日:2022-04-28

    申请号:US17077962

    申请日:2020-10-22

    IPC分类号: G06F9/50 G06N20/00

    摘要: Systems and methods are provided for improving autotuning procedures. For example, the system can implement a task launcher, a scheduler, and an agent to launch, schedule, and execute decomposed autotuning stages, respectively. The scheduling policy implemented by the scheduler may perform operations beyond a simple scheduling policy (e.g., a FIFO-based scheduling policy), which produces a high queuing delay. By leveraging autotuning specific domain knowledge, this may help reduce queuing delay and improve resource utilization that is otherwise found in traditional systems.

    CONTEXT-AWARE AND STATELESS DEEP LEARNING AUTOTUNING FRAMEWORK

    公开(公告)号:US20220198317A1

    公开(公告)日:2022-06-23

    申请号:US17125626

    申请日:2020-12-17

    IPC分类号: G06N20/00 G06F9/48

    摘要: Systems and methods are provided for improving autotuning procedures using stateless processing with a remote key-value store. For example, the system can implement a task launcher, a scheduler, and an agent to launch, schedule, and execute decomposed autotuning stages, respectively. The scheduling policy implemented by the scheduler may perform operations beyond a simple scheduling policy (e.g., a FIFO-based scheduling policy), which produces a high queuing delay. Compared to the traditional systems, by leveraging autotuning specific domain knowledge, queueing delay is reduced and resource utilization is improved.