-
1.
公开(公告)号:US20240220572A1
公开(公告)日:2024-07-04
申请号:US18092183
申请日:2022-12-30
摘要: A compute engine is configured to perform self-attention computations by delaying performance of a division operation of a softmax computation, the performance including iteratively computing a first matrix multiplication of a given row vector of a first matrix and each column vector of a second matrix while determining a first scalar element representing a maximum value of the iterative first matrix multiplications; iteratively subtracting a corresponding determined first scaler element from a result of each computed first matrix multiplication and computing an elementwise exponential function based on a result of the subtraction operation to generate a plurality of elements of a given row vector of a fourth matrix; iteratively computing a second matrix multiplication of a given row vector of the fourth matrix and each column vector of a third matrix while summing the given row vectors of the fourth matrix; and computing a row vector of an output matrix.
-
公开(公告)号:US12014264B2
公开(公告)日:2024-06-18
申请号:US17005488
申请日:2020-08-28
发明人: Zhanying He , Bin Xu , Honghui Yuan
CPC分类号: G06N3/063 , G06F7/485 , G06F7/4876 , G06F7/556 , G06F7/57 , G06F2207/4824
摘要: A data processing circuit is disclosed. The data processing circuit relates to the field of digital circuits, and includes a first computing circuit and an input control circuit. The first computing circuit includes one or more computing sub-circuits. Each computing sub-circuit includes a first addition operation circuit, a multiplication operation circuit, a first comparison operation circuit, and a first nonlinear operation circuit. The first nonlinear operation circuit includes at least one of an exponential operation circuit and a logarithmic operation circuit. The input control circuit is configured to: control the first computing circuit to read input data and an input parameter, and control, according to a received first instruction, the operation circuit in the computing sub-circuit included in the first computing circuit, to perform an operation on the input data and the input parameter.
-
公开(公告)号:US12013865B1
公开(公告)日:2024-06-18
申请号:US18181594
申请日:2023-03-10
IPC分类号: G06F16/24 , G06F7/556 , G06F16/2458
CPC分类号: G06F16/2477 , G06F7/556
摘要: Aspects of the invention include techniques for decomposing trend and seasonality components in a forecast of parametric time series data. A non-limiting example method includes receiving time series data that includes a plurality of values taken over a first period of time. A forecast is generated using the time series data. The forecast can include one or more predicted values over a second period of time. The forecast is decomposed into N components and 2N coalitions are defined for the N components. A coalition value is determined for each coalition of the 2N coalitions.
-
公开(公告)号:US20240134603A1
公开(公告)日:2024-04-25
申请号:US18396437
申请日:2023-12-26
申请人: Intel Corporation
IPC分类号: G06F7/499 , G06F7/556 , G06N3/0464 , G06N3/08
CPC分类号: G06F7/49931 , G06F7/556 , G06N3/0464 , G06N3/08
摘要: The techniques described in the detailed description above enable the manufacturing of circuits with increased performance and efficiency when performing division by a constant number. One embodiment provides circuitry including an input circuit to receive an input value including a plurality of bits, a logarithmic tree coupled with the input circuit, the logarithmic tree configured to compute an array of values based on a plurality of multi-bit groups of the plurality of bits of the input value, each value in the array of values includes a modulus of a corresponding multi-bit group with respect to the constant, a binary array adder to compute a quotient of the division operation based on the array of values, the input value, and the constant, and an output circuit to output the quotient.
-
公开(公告)号:US20240061650A1
公开(公告)日:2024-02-22
申请号:US17820766
申请日:2022-08-18
申请人: Apple Inc.
发明人: Liang-Kai Wang , Ian R. Ollmann , Anthony Y. Tai
CPC分类号: G06F7/556 , G06F7/4873
摘要: Techniques are disclosed relating to polynomial approximation of the base-2 logarithm. In some embodiments, floating-point circuitry is configured to perform an approximation of a base-2 logarithm operation and provide a fixed unit of least precision (ULP) error over a range of inputs. In some embodiments, the floating-point circuitry includes a set of parallel pipelines for polynomial approximation, where the output is chosen from a particular pipeline based on a determination of whether the input operand is in a first subset of a range of inputs. Disclosed techniques may advantageously provide fixed ULP error for an entire input operand range for the floating-point base-2 logarithmic function with minimal area and energy footprint, relative to traditional techniques.
-
公开(公告)号:US11823043B2
公开(公告)日:2023-11-21
申请号:US16688602
申请日:2019-11-19
摘要: Aspects described herein provide a method of processing data in a machine learning model, including: receiving first domain input data; transforming the first domain input data to second domain input data via a domain transformation function; providing the second domain input data to a first layer of a machine learning model; processing the second domain input data in the first layer of the machine learning model according to a set of layer weights; and outputting second domain output data from the first layer of the machine learning model.
-
公开(公告)号:US11604270B2
公开(公告)日:2023-03-14
申请号:US16712864
申请日:2019-12-12
申请人: THALES
发明人: Thierry Mazeau , Patrick Garrec
摘要: A radar system configured to determine radar-ground distance measurements. The radar system includes transmission and reception means configured to transmit two radiofrequency signals towards the ground and to receive the signals obtained by the reflection of the two transmitted signals by the ground and computation means configured to determine the frequential representations of the transmitted signals and of the received signals and determine a frequential quantity as a function of the frequential representations. The radar system is wherein the computation means are configured to sample the frequential quantity over a determined number of samples, which provides a sampled signal; determine a number of frequency measurements as a function of a constant distance measurement accuracy value; determine frequency measurements by applying to the sampled signal a spectral decomposition by fast Fourier transform using a decimation of the sampled signal in a ratio dependent on the distance measurement accuracy value, and determine a distance measurement corresponding to each frequency measurement.
-
公开(公告)号:US20230037227A1
公开(公告)日:2023-02-02
申请号:US17381124
申请日:2021-07-20
摘要: Apparatus and methods are disclosed for performing matrix operations, including operations suited to neural network and other machine learning accelerators and applications, using dual exponent formats. Disclosed matrix formats include single exponent bounding box floating-point (SE-BBFP) and dual exponent bounding box floating-point (DE-BBFP) formats. Shared exponents for each element are determined for each element based on whether the element is used as a row of matrix tile or a column of a matrix file, for example, for a dot product operation. Computing systems suitable for employing such neural networks include computers having general-purpose processors, neural network accelerators, or reconfigure both logic devices, such as Field programmable gate arrays (FPGA). Certain techniques disclosed herein can provide improved system performance while reducing memory and network bandwidth used.
-
公开(公告)号:US11507782B2
公开(公告)日:2022-11-22
申请号:US16824834
申请日:2020-03-20
发明人: Wenbin Yang , Jinpeng Liu , WuiChak Wong , Sanping Li , Zhen Jia
摘要: A method for determining a model compression rate comprises determining a near-zero importance value subset from an importance value set associated with a machine learning model, a corresponding importance value in the importance value set indicating an importance degree of a corresponding input of a processing layer of the machine learning model, importance values in the near-zero importance value subset being closer to zero than other importance values in the importance value set; determining a target importance value from the near-zero importance value subset, the target importance value corresponding to a turning point of a magnitude of the importance values in the near-zero importance value subset; determining a proportion of importance values less than the target importance value in the importance value set in the importance value set; and determining the compression rate for the machine learning model based on the determined proportion.
-
公开(公告)号:US11461275B2
公开(公告)日:2022-10-04
申请号:US16692840
申请日:2019-11-22
申请人: Apple Inc.
发明人: Lars M. Lindberg , Ali Sazegari
摘要: Methods for lossy and lossless pre-processing of image data. In one embodiment, a method for lossy pre-processing image data, where the method may include, at a computing device: receiving the image data, where the image data includes a model having a mesh, the mesh includes vertices defining a surface, the vertices including attribute vectors, and the attribute vectors including values. The method also including quantizing the values of the attribute vectors to produce modified values, where a precision of the modified values is determined based on a largest power determined using a largest exponent of the values, encoding pairs of the modified values into two corresponding units of information. The method also including, for each pair of the pairs of the modified values, serially storing the two corresponding units of information as a data stream into a buffer, and compressing the data stream in the buffer.
-
-
-
-
-
-
-
-
-