DETERMINING MEMORY REQUIREMENTS FOR LARGE-SCALE ML APPLICATIONS TO FACILITATE EXECUTION IN GPU-EMBEDDED CLOUD CONTAINERS
Abstract:
We disclose a system that executes an inferential model in VRAM that is embedded in a set of graphics-processing units (GPUs). The system obtains execution parameters for the inferential model specifying: a number of signals, a number of training vectors, a number of observations and a desired data precision. It also obtains one or more formulae for computing memory usage for the inferential model based on the execution parameters. Next, the system uses the one or more formulae and the execution parameters to compute an estimated memory footprint for the inferential model. The system uses the estimated memory footprint to determine a required number of GPUs to execute the inferential model, and generates code for executing the inferential model in parallel while efficiently using available memory in the required number of GPUs. Finally, the system uses the generated code to execute the inferential model in the set of GPUs.
Information query
Patent Agency Ranking
0/0