Abstract:
Embodiments include a method for managing power in a computer system including a main processor and an active memory device including powered units, the active memory device in communication with the main processor by a memory link, the powered units including a processing element. The method includes the main processor executing a program on a program thread, encountering a first section of code to be executed by the active memory device, changing, by a first command, a power state of a powered unit on the active memory device based on the main processor encountering the first section of code, the first command including a store command. The method also includes the processing element executing the first section of code at a second time, changing a power state of the main processor from a power use state to a power saving state based on the processing element executing the first section.
Abstract:
Embodiments relate to sequential location accesses in an active memory device that includes memory and a processing element. An aspect includes a method for sequential location accesses that includes receiving from the memory a first group of data values associated with a queue entry at the processing element. A tag value associated with the queue entry and specifying a position from which to extract a first subset of the data values is read. The queue entry is populated with the first subset of the data values starting at the position specified by the tag value. The processing element determines whether a second subset of the data values in the first group of data values is associated with a subsequent queue entry, and populates a portion of the subsequent queue entry with the second subset of the data values.
Abstract:
A heterogeneous processing system includes a compiler for performing power-constrained code generation and scheduling of work in the heterogeneous processing system. The compiler produces source code that is executable by a computer. The compiler performs a method. The method includes dividing a power budget for the heterogeneous processing system into a discrete number of power tokens. Each of the power tokens has an equal value of units of power. The method also includes determining a power requirement for executing a code segment on a processing element of the heterogeneous processing system. The determining is based on characteristics of the processing element and the code segment. The method further includes allocating, to the processing element at runtime, at least one of the power tokens to satisfy the power requirement.
Abstract:
Embodiments relate to address generation in an active memory device that includes memory and a processing element. An aspect includes a method for address generation in the active memory device. The method includes reading a base address value and an offset address value from a register file group of the processing element. The processing element determines a virtual address based on the base address value and the offset address value. The processing element translates the virtual address into a physical address and accesses a location in the memory based on the physical address.
Abstract:
A multichip module with a vertical stack of a logic chip, a translator chip, and at least one memory chip. The multichip module includes a logic chip, a translator chip over and vertically connecting to the logic chip, and at least one memory chip above and vertically connecting to the translator chip where the translator chip is one of a chip with active devices or a passive chip.
Abstract:
Techniques for refinement of data pipelines are provided. An original file of serialized objects is received, and an original pipeline comprising a plurality of transformations is identified based on the original file. A first computing cost is determined for a first transformation of the plurality of transformations. The first transformation is modified using a predefined optimization, and a second cost of the modified first transformation is determined. Upon determining that the second cost is lower than the first cost, the first transformation is replaced, in the original pipeline, with the optimized first transformation.
Abstract:
A method of text classification includes generating a text embedding vector representing a text sample and applying weights of a regression layer to the text embedding vector to generate a first data model output vector. The method also includes generating a plurality of prototype embedding vectors associated with a respective classification labels and comparing the plurality of prototype embedding vectors to the text embedding vector to generate a second data model output vector. The method further includes assigning a particular classification label to the text sample based on the first data model output vector, the second data model output vector, and one or more weighting values.
Abstract:
Embodiments for implementing partial synchronization between compute processes based on threshold specification in a computing environment. One or more compute processes may be synchronized in one of a plurality of types of computing platforms using a barrier having a barrier release condition based on a threshold of one or more measures. The barrier is defined according to one or more parameters. The one or more compute processes may be released via the barrier upon exceeding the threshold of the one or more measures.
Abstract:
An approach is disclosed that locates a named data element by a local node. A name corresponding to the named data element is received, the named data element exists in a Coordination Namespace allocated in a memory area that is distributed amongst a set of nodes that include the local node and remote nodes. A predicted node identifier is received and then the named data element is requested from the predicted node based on the predicted node identifier.
Abstract:
A method, system and memory controller for implementing memory hierarchy placement decisions in a memory system including direct routing of arriving data into a main memory system and selective injection of the data or computed results into a processor cache in a computer system. A memory controller, or a processing element in a memory system, selectively drives placement of data into other levels of the memory hierarchy. The decision to inject into the hierarchy can be triggered by the arrival of data from an input output (IO) device, from computation, or from a directive of an in-memory processing element.