Abstract:
Method, systems and apparatuses may provide for technology that identifies first data and second data to be stored in a data storage. Each of the first data and the second data are in a first data format. Some technology may also interleave the first data with the second data. The interleaved first and second data are in a second data format. The second data format is different from the first data format.
Abstract:
Various embodiments include nested emulation for a source application and source emulator. Duplicate source ISA libraries redirect the source emulator library calls to a target library, thereby forcing the native emulator through proper emulation channels between first and second ISAs. Other embodiments concern accelerating dynamic linking by determining certain function calls that, rather than being processed through emulation of PLT code, are instead directly called without the need for PLT code translation. Some embodiments address both nested emulation and accelerated dynamic linking but other embodiments include one of nested emulation and accelerated dynamic linking. Other embodiments are described herein.
Abstract:
Methods, apparatus, systems, and articles of manufacture to process a machine learning model in a web-browser environment are disclosed. An example apparatus includes a graph builder to accumulate machine learning operations as a graph. A tensor manager is to, in response to a request to access a tensor that is not yet available and associated with the machine learning operations, identify the graph based on the tensor. A graph cache manager is to determine whether a condensed graph corresponding to the identified graph is available. A graph condenser is to, in response to the graph cache manager determining that the condensed graph is not available, generate the condensed graph. A graph executor is to execute the condensed graph to create the tensor. The tensor manager is to provide the tensor as a response to the request to access the tensor.
Abstract:
Apparatuses, methods and storage media associated with multiple processor modes execution are described herein. In embodiments, an apparatus may include a processor with a plurality of processor modes, including a first processor mode to address a first address space, and a second processor mode to address a second address space, the second address space including the first address space. The apparatus may further include a signal handler to handle a signal from a kernel, in the first processor mode; and a signal handler wrapper to switch the processor to the second processor mode on delivery of the signal from the kernel, save a current extra context of the second processor mode from the second register file to a user stack, switch the processor back to the first processor mode, then invoke the signal handler to handle the signal. Other embodiments may be described or claimed.
Abstract:
Systems and methods may provide translation cache closure and consistent data recovery in dynamic code generating system. An apparatus may group translation cache together and restore a translation cache snapshot as a whole. Chaining between translations may be maintained during saving and restoration.
Abstract:
Method, systems and apparatuses may provide for technology that identifies first data and second data to be stored in a data storage. Each of the first data and the second data are in a first data format. Some technology may also interleave the first data with the second data. The interleaved first and second data are in a second data format. The second data format is different from the first data format.
Abstract:
An example system includes: interface circuitry; programmable circuitry; and instructions to cause the programmable circuitry to: reserve first memory addresses of a host system, the first memory addresses reserved for emulation of a guest system, the guest system based on a first instruction set architecture that is different from a second instruction set architecture of the host system; reserve second memory addresses of the host system that are contiguous with the first memory addresses, the second memory addresses reserved for a first emulated memory access instruction associated with an overflow in the guest system; reserve third memory addresses of the host system for a second emulated memory access instruction associated with an underflow in the guest system; and set memory access privileges of the second and third memory addresses to prevent at least one of a read, a write, or an execution access for the second and third memory addresses.
Abstract:
An example system includes: interface circuitry; programmable circuitry; and instructions to cause the programmable circuitry to: reserve first memory addresses of a host system, the first memory addresses reserved for emulation of a guest system, the guest system based on a first instruction set architecture that is different from a second instruction set architecture of the host system; reserve second memory addresses of the host system that are contiguous with the first memory addresses, the second memory addresses reserved for a first emulated memory access instruction associated with an overflow in the guest system; reserve third memory addresses of the host system for a second emulated memory access instruction associated with an underflow in the guest system; and set memory access privileges of the second and third memory addresses to prevent at least one of a read, a write, or an execution access for the second and third memory addresses.
Abstract:
Systems, apparatuses and methods may provide technology for optimizing an inference neural network model that performs asymmetric quantization by generating a quantized neural network, wherein model weights of the neural network are quantized as signed integer values, and wherein an input layer of the neural network is configured to quantize input values as unsigned integer values, generating a weights accumulation table based on the quantized model weights and a kernel size for the neural network, and generating an output restoration function for an output layer of the neural network based on the weights accumulation table and the kernel size. The technology may also perform per-input channel quantization. The technology may also perform mixed-precision auto-tuning.
Abstract:
One embodiment provides a device. The device includes a processor; a memory; and translator logic. The processor is to execute a host instruction set. The translator logic is to determine whether an offset is a constant and whether the offset is greater than zero and less than a maximum offset in response to receiving a guest memory access instruction that contains a base address plus or minus the offset, the maximum offset related to at least one of a host instruction set architecture (ISA) and a guest ISA.