APPARATUS AND METHOD FOR MEMORY-HIERARCHY AWARE PRODUCER-CONSUMER INSTRUCTIONS
    3.
    发明申请
    APPARATUS AND METHOD FOR MEMORY-HIERARCHY AWARE PRODUCER-CONSUMER INSTRUCTIONS 审中-公开
    用于记忆级别生产者消费者指令的装置和方法

    公开(公告)号:US20140208031A1

    公开(公告)日:2014-07-24

    申请号:US13994724

    申请日:2011-12-21

    IPC分类号: G06F12/08 G06T1/60

    摘要: An apparatus and method are described for efficiently transferring data from a producer core to a consumer core within a central processing unit (CPU). For example, one embodiment of a method comprises: A method for transferring a chunk of data from a producer core of a central processing unit (CPU) to consumer core of the CPU, comprising: writing data to a buffer within the producer core of the CPU until a designated amount of data has been written; upon detecting that the designated amount of data has been written, responsively generating an eviction cycle, the eviction cycle causing the data to be transferred from the fill buffer to a cache accessible by both the producer core and the consumer core; and upon the consumer core detecting that data is available in the cache, providing the data to the consumer core from the cache upon receipt of a read signal from the consumer core.

    摘要翻译: 描述了一种用于在中央处理单元(CPU)内有效地将数据从生产者核心传送到消费者核心的装置和方法。 例如,一种方法的一个实施例包括:一种用于将数据块从中央处理单元(CPU)的生产者核心转移到CPU的消费者核心的方法,包括:将数据写入到所述CPU的生产者核心内的缓冲器 CPU直到指定数据量被写入; 在检测到指定量的数据被写入时,响应地产生驱逐周期,使得将数据从填充缓冲器传送到可由生产者核心和消费者核心访问的高速缓存的逐出循环; 并且在消费者核心检测到数据在高速缓存中可用时,在从消费者核心接收到读取信号时从高速缓存提供数据给消费者核心。

    METHOD AND APPARATUS FOR CUTTING SENIOR STORE LATENCY USING STORE PREFETCHING
    4.
    发明申请
    METHOD AND APPARATUS FOR CUTTING SENIOR STORE LATENCY USING STORE PREFETCHING 有权
    使用商店预购切割高级商店的方法和装置

    公开(公告)号:US20140223105A1

    公开(公告)日:2014-08-07

    申请号:US13993508

    申请日:2011-12-30

    IPC分类号: G06F9/38 G06F12/08

    摘要: In accordance with embodiments disclosed herein, there are provided methods, systems, mechanisms, techniques, and apparatuses for cutting senior store latency using store prefetching. For example, in one embodiment, such means may include an integrated circuit or an out of order processor means that processes out of order instructions and enforces in-order requirements for a cache. Such an integrated circuit or out of order processor means further includes means for receiving a store instruction; means for performing address generation and translation for the store instruction to calculate a physical address of the memory to be accessed by the store instruction; and means for executing a pre-fetch for a cache line based on the store instruction and the calculated physical address before the store instruction retires.

    摘要翻译: 根据本文公开的实施例,提供了使用商店预取来切割高级商店延迟的方法,系统,机制,技术和装置。 例如,在一个实施例中,这种装置可以包括集成电路或乱序处理器装置,其处理不一致的指令并对高速缓存执行按顺序的要求。 这样的集成电路或不按顺序的处理器装置还包括用于接收存储指令的装置; 用于执行所述存储指令的地址生成和转换以计算由所述存储指令访问的存储器的物理地址的装置; 以及用于在存储指令退出之前基于所述存储指令和所计算的物理地址来执行用于高速缓存行的预取的装置。

    Method and apparatus for cutting senior store latency using store prefetching
    5.
    发明授权
    Method and apparatus for cutting senior store latency using store prefetching 有权
    使用存储预取来切割高级存储延迟的方法和装置

    公开(公告)号:US09405545B2

    公开(公告)日:2016-08-02

    申请号:US13993508

    申请日:2011-12-30

    IPC分类号: G06F12/08 G06F9/38

    摘要: In accordance with embodiments disclosed herein, there are provided methods, systems, mechanisms, techniques, and apparatuses for cutting senior store latency using store prefetching. For example, in one embodiment, such means may include an integrated circuit or an out of order processor means that processes out of order instructions and enforces in-order requirements for a cache. Such an integrated circuit or out of order processor means further includes means for receiving a store instruction; means for performing address generation and translation for the store instruction to calculate a physical address of the memory to be accessed by the store instruction; and means for executing a pre-fetch for a cache line based on the store instruction and the calculated physical address before the store instruction retires.

    摘要翻译: 根据本文公开的实施例,提供了使用商店预取来切割高级商店延迟的方法,系统,机制,技术和装置。 例如,在一个实施例中,这种装置可以包括集成电路或乱序处理器装置,其处理不一致的指令并对高速缓存执行按顺序的要求。 这样的集成电路或不按顺序的处理器装置还包括用于接收存储指令的装置; 用于执行所述存储指令的地址生成和转换以计算由所述存储指令访问的存储器的物理地址的装置; 以及用于在存储指令退出之前基于所述存储指令和所计算的物理地址来执行用于高速缓存行的预取的装置。

    Methods and apparatus for efficient communication between caches in hierarchical caching design
    6.
    发明授权
    Methods and apparatus for efficient communication between caches in hierarchical caching design 有权
    用于层次化缓存设计中高速缓存之间高效通信的方法和设备

    公开(公告)号:US09411728B2

    公开(公告)日:2016-08-09

    申请号:US13994399

    申请日:2011-12-23

    摘要: In accordance with embodiments disclosed herein, there are provided methods, systems, mechanisms, techniques, and apparatuses for implementing efficient communication between caches in hierarchical caching design. For example, in one embodiment, such means may include an integrated circuit having a data bus; a lower level cache communicably interfaced with the data bus; a higher level cache communicably interfaced with the data bus; one or more data buffers and one or more dataless buffers. The data buffers in such an embodiment being communicably interfaced with the data bus, and each of the one or more data buffers having a buffer memory to buffer a full cache line, one or more control bits to indicate state of the respective data buffer, and an address associated with the full cache line. The dataless buffers in such an embodiment being incapable of storing a full cache line and having one or more control bits to indicate state of the respective dataless buffer and an address for an inter-cache transfer line associated with the respective dataless buffer. In such an embodiment, inter-cache transfer logic is to request the inter-cache transfer line from the higher level cache via the data bus and is to further write the inter-cache transfer line into the lower level cache from the data bus.

    摘要翻译: 根据本文公开的实施例,提供了用于在分级缓存设计中实现高速缓存之间的有效通信的方法,系统,机制,技术和装置。 例如,在一个实施例中,这种装置可以包括具有数据总线的集成电路; 与数据总线可通信地接口的低级缓存; 与数据总线可通信地接口的更高级别的缓存; 一个或多个数据缓冲器和一个或多个无数据缓冲器。 这种实施例中的数据缓冲器与数据总线可通信地接口,并且一个或多个数据缓冲器中的每一个具有缓冲存储器以缓冲全高速缓存线,一个或多个控制位以指示各个数据缓冲器的状态,以及 与完整缓存行相关联的地址。 在这种实施例中的无数据缓冲器不能存储完整的高速缓存行并且具有一个或多个控制位以指示相应无数据缓冲器的状态和与相应无数据缓冲器相关联的高速缓存间传输线的地址。 在这样的实施例中,高速缓存间传输逻辑是经由数据总线从高级缓存请求高速缓存间传输线,并且进一步将数据总线上的缓存间传输线写入低级缓存。

    METHODS AND APPARATUS FOR EFFICIENT COMMUNICATION BETWEEN CACHES IN HIERARCHICAL CACHING DESIGN
    7.
    发明申请
    METHODS AND APPARATUS FOR EFFICIENT COMMUNICATION BETWEEN CACHES IN HIERARCHICAL CACHING DESIGN 有权
    用于分层缓存设计中的高速缓存之间的有效通信的方法和设备

    公开(公告)号:US20130326145A1

    公开(公告)日:2013-12-05

    申请号:US13994399

    申请日:2011-12-23

    IPC分类号: G06F12/08

    摘要: In accordance with embodiments disclosed herein, there are provided methods, systems, mechanisms, techniques, and apparatuses for implementing efficient communication between caches in hierarchical caching design. For example, in one embodiment, such means may include an integrated circuit having a data bus; a lower level cache communicably interfaced with the data bus; a higher level cache communicably interfaced with the data bus; one or more data buffers and one or more dataless buffers. The data buffers in such an embodiment being communicably interfaced with the data bus, and each of the one or more data buffers having a buffer memory to buffer a full cache line, one or more control bits to indicate state of the respective data buffer, and an address associated with the full cache line. The dataless buffers in such an embodiment being incapable of storing a full cache line and having one or more control bits to indicate state of the respective dataless buffer and an address for an inter-cache transfer line associated with the respective dataless buffer. In such an embodiment, inter-cache transfer logic is to request the inter-cache transfer line from the higher level cache via the data bus and is to further write the inter-cache transfer line into the lower level cache from the data bus.

    摘要翻译: 根据本文公开的实施例,提供了用于在分级缓存设计中实现高速缓存之间的有效通信的方法,系统,机制,技术和装置。 例如,在一个实施例中,这种装置可以包括具有数据总线的集成电路; 与数据总线可通信地接口的低级缓存; 与数据总线可通信地接口的更高级别的缓存; 一个或多个数据缓冲器和一个或多个无数据缓冲器。 这种实施例中的数据缓冲器与数据总线可通信地接口,并且一个或多个数据缓冲器中的每一个具有缓冲存储器以缓冲全高速缓存线,一个或多个控制位以指示各个数据缓冲器的状态,以及 与完整缓存行相关联的地址。 在这种实施例中的无数据缓冲器不能存储完整的高速缓存行并且具有一个或多个控制位以指示相应无数据缓冲器的状态和与相应无数据缓冲器相关联的高速缓存间传输线的地址。 在这样的实施例中,高速缓存间传输逻辑是经由数据总线从高级缓存请求高速缓存间传输线,并且进一步将数据总线上的缓存间传输线写入低级缓存。

    Early energy measurement
    9.
    发明授权
    Early energy measurement 有权
    早期能量测量

    公开(公告)号:US07991085B2

    公开(公告)日:2011-08-02

    申请号:US11427651

    申请日:2006-06-29

    申请人: Ron Shalev

    发明人: Ron Shalev

    IPC分类号: H04L27/28

    摘要: In a described implementation of early energy measurement, a wireless device adjusts a receiver gain during each current symbol time responsive to a signal energy level measured in a previous symbol time.

    摘要翻译: 在所描述的早期能量测量的实现中,无线设备响应于在先前符号时间中测量的信号能级而调整在每个当前符号时间期间的接收机增益。

    ENERGY AND AREA OPTIMIZED HETEROGENEOUS MULTIPROCESSOR FOR CASCADE CLASSIFIERS
    10.
    发明申请
    ENERGY AND AREA OPTIMIZED HETEROGENEOUS MULTIPROCESSOR FOR CASCADE CLASSIFIERS 审中-公开
    能源和区域优化分类器的异构多重分配器

    公开(公告)号:US20160275043A1

    公开(公告)日:2016-09-22

    申请号:US14662089

    申请日:2015-03-18

    IPC分类号: G06F15/80 G06F9/38

    摘要: In one embodiment, a heterogeneous multicore processor is described that is optimized to execute multi-stage computer vision algorithms such as cascade classifier workloads. In such embodiment the heterogeneous processor includes at least one SIMD core, such as a vector processor core, coupled with one or more scalar cores. In one embodiment the heterogeneous multiprocessor executes multi-stage compute operations, where the SIMD core computes a first set of stages and the one or more scalar cores compute the second set of stages. In one embodiment, a process for designing a heterogeneous multicore processor is disclosed which optimizes the ratio of scalar to SIMD cores based on execution time of the multi-stage compute operation in relation to processor die area consumed by a processor configuration having the ratio.

    摘要翻译: 在一个实施例中,描述了被优化以执行诸如级联分类器工作负载的多级计算机视觉算法的异构多核处理器。 在这种实施例中,异构处理器包括与一个或多个标量核耦合的至少一个SIMD核,例如向量处理器核。 在一个实施例中,异构多处理器执行多阶段计算操作,其中SIMD核心计算第一组阶段,并且一个或多个标量核心计算第二组阶段。 在一个实施例中,公开了一种用于设计异构多核处理器的过程,其基于多级计算操作的执行时间相对于具有该比率的处理器配置消耗的处理器管芯面积来优化标量到SIMD核的比率。