HYBRID INSTRUMENTATION FRAMEWORK FOR MULTICORE LOW POWER PROCESSORS

    公开(公告)号:US20200065215A1

    公开(公告)日:2020-02-27

    申请号:US16670681

    申请日:2019-10-31

    Abstract: Techniques are provided for redundant execution by a better processor for intensive dynamic profiling after initial execution by a constrained processor. In an embodiment, a system of computer(s) receives a request to profile particular runtime aspects of an original binary executable. Based on the particular runtime aspects and without accessing source logic, the system statically rewrites the original binary executable into a rewritten binary executable that invokes telemetry instrumentation that makes observations of the particular runtime aspects and emits traces of those observations. A first processing core having low power (capacity) performs a first execution of the rewritten binary executable to make first observations and emit first traces of the first observations. Afterwards, a second processing core performs a second (redundant) execution of the original binary executable based on the first traces. The second execution generates a detailed dynamic performance profile based on the second execution.

    Matrix multiplication at memory bandwidth

    公开(公告)号:US10521225B2

    公开(公告)日:2019-12-31

    申请号:US15638168

    申请日:2017-06-29

    Abstract: Techniques related to matrix multiplication at memory bandwidth are disclosed. Computing device(s) perform multiplication of a first matrix with a second matrix to generate a third matrix. A first register stores contiguous element values of the first matrix. Furthermore, a second register stores a first set of contiguous element values of the second matrix, and a third register stores a second set of contiguous element values of the second matrix. The first set and the second set correspond to a first row and a second row, respectively, of the second matrix. The first row and the second row are contiguous rows. A single instruction is executed to cause at least a partial computation of contiguous element values of the third matrix. The single instruction causes multiplication of element values stored in the first register with element values stored in the second and third registers and grouped accumulation of the products.

    MEMORY COHERENCE IN A MULTI-CORE, MULTI-LEVEL, HETEROGENEOUS COMPUTER ARCHITECTURE
    4.
    发明申请
    MEMORY COHERENCE IN A MULTI-CORE, MULTI-LEVEL, HETEROGENEOUS COMPUTER ARCHITECTURE 审中-公开
    多核多层次异构计算机体系结构中的存储器一致性

    公开(公告)号:US20160328326A1

    公开(公告)日:2016-11-10

    申请号:US14705806

    申请日:2015-05-06

    Abstract: Techniques are described for memory coherence in a multi-core system with a heterogeneous memory architecture comprising one or more hardware-managed caches and one or more software-managed caches. According to one embodiment, a set of one or more buffers are allocated in memory, and each respective buffer is associated with a respective metadata tag. The metadata tag may be used to store metadata that identifies a state associated with the respective buffer. The multi-core system may enforce coherence for the one or more hardware-managed caches and the one or more software-managed caches based on the metadata stored in the metadata tag for each respective buffer in the set of one or more buffers. The multi-core system may read the metadata to determine whether a particular buffer is in a hardware-managed or a software-managed cacheable state. Based on the current state of the particular buffer, the multi-core system may perform coherence operations.

    Abstract translation: 描述了用于具有包括一个或多个硬件管理的高速缓存和一个或多个软件管理的高速缓存的异构存储器架构的多核系统中的存储器一致性的技术。 根据一个实施例,在存储器中分配一组一个或多个缓冲器,并且每个相应的缓冲器与相应的元数据标签相关联。 元数据标签可以用于存储标识与相应缓冲器相关联的状态的元数据。 多核系统可以基于存储在元数据标签中的元数据,针对一个或多个缓冲器的集合中的每个相应的缓冲器来强制一个或多个硬件管理的高速缓存和一个或多个软件管理的高速缓存的一致性。 多核系统可以读取元数据以确定特定缓冲器是否处于硬件管理或软件管理的可缓存状态。 基于特定缓冲器的当前状态,多核系统可以执行一致性操作。

    Hybrid instrumentation framework for multicore low power processors

    公开(公告)号:US11030073B2

    公开(公告)日:2021-06-08

    申请号:US16670681

    申请日:2019-10-31

    Abstract: Techniques are provided for redundant execution by a better processor for intensive dynamic profiling after initial execution by a constrained processor. In an embodiment, a system of computer(s) receives a request to profile particular runtime aspects of an original binary executable. Based on the particular runtime aspects and without accessing source logic, the system statically rewrites the original binary executable into a rewritten binary executable that invokes telemetry instrumentation that makes observations of the particular runtime aspects and emits traces of those observations. A first processing core having low power (capacity) performs a first execution of the rewritten binary executable to make first observations and emit first traces of the first observations. Afterwards, a second processing core performs a second (redundant) execution of the original binary executable based on the first traces. The second execution generates a detailed dynamic performance profile based on the second execution.

    SCALABLE DISTRIBUTED COMPUTATION FRAMEWORK FOR DATA-INTENSIVE COMPUTER VISION WORKLOADS

    公开(公告)号:US20200036954A1

    公开(公告)日:2020-01-30

    申请号:US16590289

    申请日:2019-10-01

    Abstract: Techniques described herein provide methods and systems for scalable distribution of computer vision workloads. In an embodiment, a method comprises receiving, at each of a first node and a second node of a distributed system of nodes, two images. The first image comprises a first set of pixels and the second image comprising a second set of pixels. The method further comprises shifting, at the first node, each pixel of the first set of pixels of the first image in a uniform direction by a first number of pixels to form a first shifted image and shifting, at the second node, each pixel of the first set of pixels of the first image in the uniform direction by a second number of pixels to form a second shifted image. The second number of pixels is different from the first number of pixels. The method further comprises overlaying each of the first shifted image and the second shifted image with the second image, such that each pixel of the first shifted image and second shifted image has a corresponding pixel in the second image. The method further comprises creating, at the first node, a first disparity map that indicates, for each pixel of the first shifted image, a level of similarity between the pixel of the first shifted image and the corresponding pixel in the second image and creating, at the second node, a second disparity map that indicates, for each pixel of the second shifted image, a level of similarity between the pixel of the second shifted image and the corresponding pixel in the second image.

    MATRIX MULTIPLICATION AT MEMORY BANDWIDTH
    7.
    发明申请

    公开(公告)号:US20190004794A1

    公开(公告)日:2019-01-03

    申请号:US15638168

    申请日:2017-06-29

    Abstract: Techniques related to matrix multiplication at memory bandwidth are disclosed. Computing device(s) perform multiplication of a first matrix with a second matrix to generate a third matrix. A first register stores contiguous element values of the first matrix. Furthermore, a second register stores a first set of contiguous element values of the second matrix, and a third register stores a second set of contiguous element values of the second matrix. The first set and the second set correspond to a first row and a second row, respectively, of the second matrix. The first row and the second row are contiguous rows. A single instruction is executed to cause at least a partial computation of contiguous element values of the third matrix. The single instruction causes multiplication of element values stored in the first register with element values stored in the second and third registers and grouped accumulation of the products.

    Disk drive failure prediction with neural networks

    公开(公告)号:US11579951B2

    公开(公告)日:2023-02-14

    申请号:US16144912

    申请日:2018-09-27

    Abstract: Techniques are described herein for predicting disk drive failure using a machine learning model. The framework involves receiving disk drive sensor attributes as training data, preprocessing the training data to select a set of enhanced feature sequences, and using the enhanced feature sequences to train a machine learning model to predict disk drive failures from disk drive sensor monitoring data. Prior to the training phase, the RNN LSTM model is tuned using a set of predefined hyper-parameters. The preprocessing, which is performed during the training and evaluation phase as well as later during the prediction phase, involves using predefined values for a set of parameters to generate the set of enhanced sequences from raw sensor reading. The enhanced feature sequences are generated to maintain a desired healthy/failed disk ratio, and only use samples leading up to a last-valid-time sample in order to honor a pre-specified heads-up-period alert requirement.

    Techniques for accurately estimating the reliability of storage systems

    公开(公告)号:US11416324B2

    公开(公告)日:2022-08-16

    申请号:US15930779

    申请日:2020-05-13

    Abstract: Techniques are described herein for accurately measuring the reliability of storage systems. Rather than relying on a series of approximations, which may produce highly optimistic estimates, the techniques described herein use a failure distribution derived from a disk failure data set to derive reliability metrics such as mean time to data loss (MTTDL) and annual durability. A new framework for modeling storage system dynamics is described herein. The framework facilitates theoretical analysis of the reliability. The model described herein captures the complex structure of storage systems considering their configuration, dynamics, and operation. Given this model, a simulation-free analytical solution to the commonly used reliability metrics is derived. The model may also be used to analyze the long-term reliability behavior of storage systems.

    TECHNIQUES FOR ACCURATELY ESTIMATING THE RELIABILITY OF STORAGE SYSTEMS

    公开(公告)号:US20200371855A1

    公开(公告)日:2020-11-26

    申请号:US15930779

    申请日:2020-05-13

    Abstract: Techniques are described herein for accurately measuring the reliability of storage systems. Rather than relying on a series of approximations, which may produce highly optimistic estimates, the techniques described herein use a failure distribution derived from a disk failure data set to derive reliability metrics such as mean time to data loss (MTTDL) and annual durability. A new framework for modeling storage system dynamics is described herein. The framework facilitates theoretical analysis of the reliability. The model described herein captures the complex structure of storage systems considering their configuration, dynamics, and operation. Given this model, a simulation-free analytical solution to the commonly used reliability metrics is derived. The model may also be used to analyze the long-term reliability behavior of storage systems.

Patent Agency Ranking