Abstract:
Disclosed embodiments relate to instructions for fused multiply-add (FMA) operations with variable-precision inputs. In one example, a processor comprises: fetch circuitry to fetch a single multiply-accumulate (MAC) instruction having fields to indicate an opcode, a destination, a first source vector having a first element width, and a second source vector having a second element width that is smaller than the first element width; decode circuitry to decode the fetched single MAC instruction; and a single instruction multiple data (SIMD) execution circuit to execute the single MAC instruction and perform multiply-accumulate operations within each processing lane of a plurality of processing lanes, the multiply-accumulate operations in each processing lane including: multiplying a subset of elements of the first source vector by corresponding elements of the second source vector to produce a corresponding subset of products, and accumulating the subset of products with an accumulation data element corresponding to the processing lane to generate a result data element corresponding to the processing lane, the result data element each having a width greater than the first element width and the second element width.
Abstract:
Disclosed embodiments relate to instructions for fused multiply-add (FMA) operations with variable-precision inputs. In one example, a processor comprises: fetch circuitry to fetch a single multiply-accumulate (MAC) instruction having fields to indicate an opcode, a destination, a first source vector having a first element width, and a second source vector having a second element width that is smaller than the first element width; decode circuitry to decode the fetched single MAC instruction; and a single instruction multiple data (SIMD) execution circuit to execute the single MAC instruction and perform multiply-accumulate operations within each processing lane of a plurality of processing lanes, the multiply-accumulate operations in each processing lane including: multiplying a subset of elements of the first source vector by corresponding elements of the second source vector to produce a corresponding subset of products, and accumulating the subset of products with an accumulation data element corresponding to the processing lane to generate a result data element corresponding to the processing lane, the result data element each having a width greater than the first element width and the second element width.
Abstract:
L'invention concerne un procédé de traitement dans un accélérateur (1) de réseau de neurones convolutifs comprenant une grille (2) de blocs unitaires de traitement (10) associé à un ensemble (13) de mémoires locales respectives et effectuant des opérations de calcul sur des données stockées dans ses mémoires locales , selon lequel : lors de cycles respectifs de traitement, des blocs unitaires (10) reçoivent et/ou transmettent des données depuis ou à destination de blocs unitaires voisins selon au moins une direction sélectionnée, en fonction desdites données, parmi les directions verticale et horizontale dans la grille ; lors desdits mêmes cycles, des blocs unitaires effectuent une opération de calcul relatives à des données stockées dans leurs mémoires locales lors d'au moins un cycle de traitement antérieur.
Abstract:
A method is described for determining the efficacy of a treatment method for a patient having a tissue disorder by administering a treatment to the patient, measuring a hazard score at two or more points in time after administering the treatment by obtaining an image of a tissue of the patient, wherein the image comprises a plurality of patient image voxels, identifying voxels of the patient image that are damaged by the disorder as damaged patient image voxels, obtaining a hazard atlas of the disorder in the tissue, wherein the hazard atlas comprises a plurality of voxels, each voxel representing a hazard value of an extent of deficit caused by damage from the disorder to that voxel of tissue at that location, and computing a hazard score for the patient, wherein the score is the integration of all damaged patient image voxels weighted by a hazard value corresponding to that voxel location, wherein the hazard score determines the patient's prognosis, and determining the efficacy of the treatment based on the hazard scores. Further, a hazard atlas and methods for generating a hazard atlas are described.
Abstract:
A signal processing apparatus (10) comprises: a fixed-point compute unit (30) for operating on fractional 2's complement values without a mantissa; a memory unit (26) for storing a constant complex value to be used by the compute unit (30) to perform a complex arithmetic operation and a mapping unit (62), the mapping unit (62) being configured to map a complex value stored by the memory unit having a particular imaginary component, such as -1, to a value of unity, i.e. +1.