-
公开(公告)号:US12093101B2
公开(公告)日:2024-09-17
申请号:US17682917
申请日:2022-02-28
Applicant: Meta Platforms Technologies, LLC
Inventor: Vlad Fruchter , Nishant Sitapara , Javid Jaffari , Shrirang Madhav Yardi , Bardia Zandian
CPC classification number: G06F1/28 , G02B27/017 , G05F1/66
Abstract: Systems and methods for peak power control include control circuitry which identifies a condition for a device. The control circuitry can apply the condition for the device to one or more models maintained for a plurality of device processing units of the device to determine one or more performance characteristics for the plurality of processing units. The control circuitry can distribute power credits to the plurality of device processing units of the device according to the determined performance characteristics for the plurality of device processing units, to manage a respective peak power for each respective device processing unit according to a number of the power credits distributed to the respective device processing unit.
-
公开(公告)号:US20240220255A1
公开(公告)日:2024-07-04
申请号:US18525172
申请日:2023-11-30
Applicant: Meta Platforms Technologies, LLC
Inventor: Reza Tusi , Tomonari Tohara , David Vakrat , Javid Jaffari , Yuan Liu
IPC: G06F9/30
CPC classification number: G06F9/3013 , G06F9/30036
Abstract: In one embodiment, a computing system may set data to a first group of registers. The first group of registers may be configured to be accessed during a single operation cycle. The system may set a number of patterns to a second group of registers. Each pattern of the number of patterns may include an array of index for the data stored in the first group of registers. The system may select, for a first vector register associated with a vector engine, a first pattern from the patterns stored in the second group of registers. The system may load a first portion of the data from the first group of registers to the first vector register based on the first pattern selected for the first vector register from the patterns stored in the second group of registers.
-
3.
公开(公告)号:US20250138822A1
公开(公告)日:2025-05-01
申请号:US18666540
申请日:2024-05-16
Applicant: Meta Platforms Technologies, LLC
Inventor: Reza Tusi , Shiyu Liu , Anthony Mai , Vlad Fruchter , Javid Jaffari
IPC: G06F9/30
Abstract: A computer-implemented method may include dividing, by a computer processor, binary code into a plurality of chunks, wherein the binary code includes a plurality of instructions. The method may additionally include clustering, by the computer processor, similar chunks of the plurality of chunks. The method may also include performing, by the computer processor, compression of the binary code, the compression being tailored to one or more clusters of the similar chunks. Various other methods, systems, and computer-readable media are also disclosed.
-
公开(公告)号:US20240220259A1
公开(公告)日:2024-07-04
申请号:US18525083
申请日:2023-11-30
Applicant: Meta Platforms Technologies, LLC
Inventor: Tomonari Tohara , Vignesh Vivekraja , Alagappan Valliappan , Andrey Bushev , Javid Jaffari
IPC: G06F9/30
CPC classification number: G06F9/30178 , G06F9/30038 , G06F9/30134
Abstract: In one embodiment, a computing system may set data to a first group of registers. The first group of registers may be configured to be accessed during a single operation cycle. The system may set a number of patterns to a second group of registers. Each pattern of the number of patterns may include an array of index for the data stored in the first group of registers. The system may select, for a first vector register associated with a vector engine, a first pattern from the patterns stored in the second group of registers. The system may load a first portion of the data from the first group of registers to the first vector register based on the first pattern selected for the first vector register from the patterns stored in the second group of registers.
-
公开(公告)号:US20240220281A1
公开(公告)日:2024-07-04
申请号:US18525443
申请日:2023-11-30
Applicant: Meta Platforms Technologies, LLC
Inventor: Vignesh Vivekraja , Tomonari Tohara , Reza Tusi , Abuduwaili Tuoheti , Weiping Liu , Javid Jaffari
IPC: G06F9/445 , G06N3/0464
CPC classification number: G06F9/44505 , G06N3/0464
Abstract: In one embodiment, a method includes accessing a computational graph representing computations to be executed on a computing system comprising a plurality of Execution Units (EUs), identifying a set of candidate mapped-graphs for the computational graph, where each node in a candidate mapped-graph is mapped to an EU capable of calculating the node, ensuring that each edge from a first node to a second node in each candidate mapped-graph satisfies memory constraints, determining an expected cost for executing each candidate mapped-graph using mapped-EUs in the candidate mapped-graph for calculating respective nodes, and selecting a candidate mapped-graph with a least expected cost from the set of candidate mapped-graphs.
-
公开(公告)号:US20240220256A1
公开(公告)日:2024-07-04
申请号:US18525217
申请日:2023-11-30
Applicant: Meta Platforms Technologies, LLC
Inventor: Reza Tusi , Tomonari Tohara , Vignesh Vivekraja , Javid Jaffari
CPC classification number: G06F9/3013 , G06F9/3887
Abstract: In one embodiment, a computing system may load data from a memory unit into a number of registers according to a first order by which the data is arranged. The registers may be configured to be accessed during a single operation cycle. The system may determine a second order for the data based on one or more subsequent operations to process the data. The system may read the data from the registers according to the second order during one or more operation cycles. The data read from the registers may be arranged in the second order. The system may transmit the data arranged in the second order to an execution unit configured to execute the one or more subsequent operations to process the data arranged in the second order.
-
公开(公告)号:US20240220574A1
公开(公告)日:2024-07-04
申请号:US18525466
申请日:2023-11-30
Applicant: Meta Platforms Technologies, LLC
Inventor: Shiyu Liu , Soroush Heidari , Tomonari Tohara , Reza Tusi , Javid Jaffari
IPC: G06F17/16
CPC classification number: G06F17/16
Abstract: A method implemented by a digital signal processor (DSP) including application-specific processing engines is provided. The method includes accessing, by the application-specific processing engines a configurable microcode. The configurable microcode includes a set of instructions configured to cause the application-specific processing engines to execute a matrix-based arithmetic algorithm. The method includes executing, by the application-specific processing engines, and based on the configurable microcode, the matrix-based arithmetic algorithm. Executing the matrix-based arithmetic algorithm includes receiving, by the application-specific processing engines, one or more input matrices, performing, by the application-specific processing engines, a plurality of computations based on the one or more input matrices by iteratively executing one or more of a predetermined set of arithmetic operations until the execution of the matrix-based arithmetic algorithm is completed, and generating, by the application-specific processing engines, an output corresponding to the completed execution of the matrix-based arithmetic algorithm.
-
8.
公开(公告)号:US20240220273A1
公开(公告)日:2024-07-04
申请号:US18527004
申请日:2023-12-01
Applicant: Meta Platforms Technologies, LLC
Inventor: Vignesh Vivekraja , Tomonari Tohara , Reza Tusi , Abuduwaili Tuoheti , Javid Jaffari , Vlad Fruchter , David Vakrat , Ohad Meitav
CPC classification number: G06F9/3893 , G06F9/3001 , G06F9/3012
Abstract: In one embodiment, a system comprising a processor and a non-transitory memory coupled to the processor comprising instructions executable by the processor. The processor, comprising an internal memory; a Multiply-Accumulate (MAC) array; a first vector register array; a second vector register array; and a third vector register array, is operable when executing a first instruction among the instructions to feed a weight vector array from the second vector register array to the MAC array, broadcast an input activation vector to the MAC array, multiply an input activation value broadcast to the MAC unit from the input activation vector and a weight value fed to the MAC unit from the weight vector array at each MAC unit in the MAC array, and store a partial output activation vector to the third vector register array, wherein the partial output activation vector is the output of the MAC array.
-
公开(公告)号:US20250044850A1
公开(公告)日:2025-02-06
申请号:US18886572
申请日:2024-09-16
Applicant: Meta Platforms Technologies, LLC
Inventor: Vlad Fruchter , Nishant Sitapara , Javid Jaffari , Shrirang Madhav Yardi , Bardia Zandian
Abstract: Systems and methods for peak power control include control circuitry which identifies a condition for a device. The control circuitry can apply the condition for the device to one or more models maintained for a plurality of device processing units of the device to determine one or more performance characteristics for the plurality of processing units. The control circuitry can distribute power credits to the plurality of device processing units of the device according to the determined performance characteristics for the plurality of device processing units, to manage a respective peak power for each respective device processing unit according to a number of the power credits distributed to the respective device processing unit.
-
公开(公告)号:US20240220779A1
公开(公告)日:2024-07-04
申请号:US18527063
申请日:2023-12-01
Applicant: Meta Platforms Technologies, LLC
Inventor: Vignesh Vivekraja , Tomonari Tohara , Reza Tusi , Abuduwaili Tuoheti , Javid Jaffari , Vlad Fruchter , David Vakrat , Ohad Meitav
IPC: G06N3/0464 , G06F7/544 , G06F17/15 , H03H17/02
CPC classification number: G06N3/0464 , G06F7/5443 , G06F17/153 , H03H17/02
Abstract: In one embodiment, a system comprising a processor and a non-transitory memory coupled to the processor comprising instructions executable by the processor. The processor, comprising an internal memory; a Multiply-Accumulate (MAC) array; a first vector register array; a second vector register array; and a third vector register array, is operable when executing instructions to transfer weights for M filters and an input activation tensor from an external memory to the internal memory, insert paddings to the input activation tensor in the internal memory based on first configuration parameters, configure the MAC array to a required shape based on second configuration parameters for convolution operations between the input activation tensor and the M filters, and calculate a row of the output activation tensor by performing the convolution operations on corresponding R rows of the input activation tensor with the M filters, wherein R is a filter height.
-
-
-
-
-
-
-
-
-