Hardware compression and decompression engine

    公开(公告)号:US11500638B1

    公开(公告)日:2022-11-15

    申请号:US16739464

    申请日:2020-01-10

    Applicant: Apple Inc.

    Abstract: A method and system for compressing and decompressing data is disclosed. A compression command may initiate the prefetching of first data, which may be stored in a first buffer. Multiple words of the first data may be read from the first buffer and used to generate a plurality of compressed packets, each of which includes a command specifying a type of packet. The compressed packets may be combined into a group and multiple groups may be combined and stored in a second buffer. A decompression command may initiate the prefetching of second data, which is stored in the first buffer. A portion of the second data may be read from the first buffer and used to generate a group of compressed packets. Multiple output words may be generated dependent upon the group of compressed packets.

    Completing load and store instructions in a weakly-ordered memory model
    22.
    发明授权
    Completing load and store instructions in a weakly-ordered memory model 有权
    在弱有序的内存模型中完成加载和存储指令

    公开(公告)号:US09535695B2

    公开(公告)日:2017-01-03

    申请号:US13750942

    申请日:2013-01-25

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to completion of load and store instructions in a weakly-ordered memory model. In one embodiment, a processor includes a load queue and a store queue and is configured to associate queue information with a load instruction in an instruction stream. In this embodiment, the queue information indicates a location of the load instruction in the load queue and one or more locations in the store queue that are associated with one or more store instructions that are older than the load instruction. The processor may determine, using the queue information, that the load instruction does not conflict with a store instruction in the store queue that is older than the load instruction. The processor may remove the load instruction from the load queue while the store instruction remains in the store queue. The queue information may include a wrap value for the load queue.

    Abstract translation: 公开了在弱有序存储器模型中完成负载和存储指令的技术。 在一个实施例中,处理器包括加载队列和存储队列,并且被配置为将队列信息与指令流中的加载指令相关联。 在该实施例中,队列信息指示加载队列中的加载指令的位置和存储队列中与一个或多个比加载指令更早的存储指令相关联的一个或多个位置。 处理器可以使用队列信息来确定加载指令不与存储队列中比加载指令更早的存储指令冲突。 当存储指令保留在存储队列中时,处理器可以从加载队列中移除加载指令。 队列信息可以包括加载队列的换行值。

    Hierarchical clock control using hysterisis and threshold management
    23.
    发明授权
    Hierarchical clock control using hysterisis and threshold management 有权
    使用滞后和阈值管理的分层时钟控制

    公开(公告)号:US09494997B2

    公开(公告)日:2016-11-15

    申请号:US14305514

    申请日:2014-06-16

    Applicant: Apple Inc.

    Abstract: In some embodiments, a system may include a sub-hierarchy clock control. In some embodiments, the system may include a master unit. The master unit may include an interface unit electrically coupled to a slave unit. The interface unit may monitor, during use, usage requests of the slave unit by the master unit. In some embodiments, the interface unit may turn off clocks to the slave unit during periods of nonuse. In some embodiments, the interface unit may determine if a predetermined period of time elapses before turning on clocks to the slave unit such that turning off the slave unit resulted in the system achieving greater efficiency. In some embodiments, the interface unit may maintain, during use, power to the slave unit during periods of nonuse. The interface unit may maintain power to the slave unit during periods of nonuse such that data stored in the slave unit is preserved.

    Abstract translation: 在一些实施例中,系统可以包括子层次时钟控制。 在一些实施例中,系统可以包括主单元。 主单元可以包括电耦合到从单元的接口单元。 接口单元可以在使用期间由主单元监视从单元的使用请求。 在一些实施例中,接口单元可以在不使用期间关闭从单元的时钟。 在一些实施例中,接口单元可以确定在向从单元开启时钟之前是否经过预定时间段,使得关闭从单元导致系统实现更高的效率。 在一些实施例中,在使用期间,接口单元可以在不使用期间维持从单元的电力。 接口单元可以在不使用期间维持从单元的电力,从而保存存储在从单元中的数据。

    HIERARCHICAL CLOCK CONTROL USING HYSTERISIS AND THRESHOLD MANAGEMENT
    24.
    发明申请
    HIERARCHICAL CLOCK CONTROL USING HYSTERISIS AND THRESHOLD MANAGEMENT 有权
    使用滞后和阈值管理的分级时钟控制

    公开(公告)号:US20150362978A1

    公开(公告)日:2015-12-17

    申请号:US14305514

    申请日:2014-06-16

    Applicant: Apple Inc.

    Abstract: In some embodiments, a system may include a sub-hierarchy clock control. In some embodiments, the system may include a master unit. The master unit may include an interface unit electrically coupled to a slave unit. The interface unit may monitor, during use, usage requests of the slave unit by the master unit. In some embodiments, the interface unit may turn off clocks to the slave unit during periods of nonuse. In some embodiments, the interface unit may determine if a predetermined period of time elapses before turning on clocks to the slave unit such that turning off the slave unit resulted in the system achieving greater efficiency. In some embodiments, the interface unit may maintain, during use, power to the slave unit during periods of nonuse. The interface unit may maintain power to the slave unit during periods of nonuse such that data stored in the slave unit is preserved.

    Abstract translation: 在一些实施例中,系统可以包括子层次时钟控制。 在一些实施例中,系统可以包括主单元。 主单元可以包括电耦合到从单元的接口单元。 接口单元可以在使用期间由主单元监视从单元的使用请求。 在一些实施例中,接口单元可以在不使用期间关闭从单元的时钟。 在一些实施例中,接口单元可以确定在向从单元开启时钟之前是否经过预定时间段,使得关闭从单元导致系统实现更高的效率。 在一些实施例中,在使用期间,接口单元可以在不使用期间维持从单元的电力。 接口单元可以在不使用期间维持从单元的电力,从而保存存储在从单元中的数据。

    Prefetching across page boundaries in hierarchically cached processors
    25.
    发明授权
    Prefetching across page boundaries in hierarchically cached processors 有权
    在分级缓存的处理器中预取页面边界

    公开(公告)号:US09047198B2

    公开(公告)日:2015-06-02

    申请号:US13689696

    申请日:2012-11-29

    Applicant: Apple Inc.

    Abstract: Processors and methods for preventing lower level prefetch units from stalling at page boundaries. An upper level prefetch unit closest to the processor core issues a preemptive request for a translation of the next page in a given prefetch stream. The upper level prefetch unit sends the translation to the lower level prefetch units prior to the lower level prefetch units reaching the end of the current page for the given prefetch stream. When the lower level prefetch units reach the boundary of the current page, instead of stopping, these prefetch units can continue to prefetch by jumping to the next physical page number provided in the translation.

    Abstract translation: 用于防止较低级别的预取单元在页面边界停止的处理器和方法。 最靠近处理器核心的高级预取单元在给定的预取流中发出对下一页的翻译的抢占请求。 在较低级预取单元到达给定预取流的当前页面的末尾之前,高级预取单元将转换发送到较低级预取单元。 当低级预取单元到达当前页面的边界而不是停止时,这些预取单元可以通过跳转到翻译中提供的下一个物理页码继续预取。

    Cache policies for uncacheable memory requests
    26.
    发明授权
    Cache policies for uncacheable memory requests 有权
    缓存不可缓存内存请求的策略

    公开(公告)号:US09043554B2

    公开(公告)日:2015-05-26

    申请号:US13725066

    申请日:2012-12-21

    Applicant: Apple Inc.

    CPC classification number: G06F12/0811 G06F12/0815 G06F12/0888

    Abstract: Systems, processors, and methods for keeping uncacheable data coherent. A processor includes a multi-level cache hierarchy, and uncacheable load memory operations can be cached at any level of the cache hierarchy. If an uncacheable load misses in the L2 cache, then allocation of the uncacheable load will be restricted to a subset of the ways of the L2 cache. If an uncacheable store memory operation hits in the L1 cache, then the hit cache line can be updated with the data from the memory operation. If the uncacheable store misses in the L1 cache, then the uncacheable store is sent to a core interface unit. Multiple contiguous store misses are merged into larger blocks of data in the core interface unit before being sent to the L2 cache.

    Abstract translation: 用于保持不可缓存的数据一致的系统,处理器和方法。 处理器包括多级缓存层次结构,并且不可缓存的加载存储器操作可以在高速缓存层级的任何级别缓存。 如果L2缓存中存在不可缓存的加载错误,则不可缓存的加载的分配将被限制为L2高速缓存的一部分。 如果不可缓存的存储器操作命中在L1缓存中,则命中高速缓存行可以用来自存储器操作的数据来更新。 如果不可缓存的商店在L1缓存中丢失,则不可缓存的商店被发送到核心接口单元。 在发送到L2缓存之前,多个连续的存储器缺失在核心接口单元中被合并到更大的数据块中。

    Coprocessors with Bypass Optimization, Variable Grid Architecture, and Fused Vector Operations

    公开(公告)号:US20250094381A1

    公开(公告)日:2025-03-20

    申请号:US18959080

    申请日:2024-11-25

    Applicant: Apple Inc.

    Abstract: In an embodiment, a coprocessor may include a plurality of processing element circuits arranged in a first grid, where a given coprocessor instruction of an instruction set for the coprocessor is defined to cause evaluation of a second plurality of processing element circuits arranged in a second grid, where the second grid includes more processing element circuits than the first grid. The coprocessor may further include a scheduler circuit configured to issue instruction operations to the plurality of processing element circuits, where the scheduler circuit is configured to issue a given instruction operation corresponding to the given coprocessor instruction a plurality of times to complete the given coprocessor instruction, wherein different issuances of the given instruction operation are configured to cause respective different portions of the evaluation defined by the given coprocessor instruction to be performed.

    DSB Operation with Excluded Region
    28.
    发明申请

    公开(公告)号:US20220083338A1

    公开(公告)日:2022-03-17

    申请号:US17469504

    申请日:2021-09-08

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed relating to data synchronization barrier operations. A system includes a first processor that may receive a data barrier operation request from a second processor include in the system. Based on receiving that data barrier operation request from the second processor, the first processor may ensure that outstanding load/store operations executed by the first processor that are directed to addresses outside of an exclusion region have been completed. The first processor may respond to the second processor that the data barrier operation request is complete at the first processor, even in the case that one or more load/store operations that are directed to addresses within the exclusion region are outstanding and not complete when the first processor responds that the data barrier operation request is complete.

    UNIFIED ADDRESS TRANSLATION
    29.
    发明申请

    公开(公告)号:US20210064539A1

    公开(公告)日:2021-03-04

    申请号:US16874997

    申请日:2020-05-15

    Applicant: Apple Inc.

    Abstract: A system and method for efficiently transferring address mappings and data access permissions corresponding to the address mappings. A computing system includes at least one processor and memory for storing a page table. In response to receiving a memory access operation comprising a first address, the address translation unit is configured to identify a data access permission based on a permission index corresponding to the first address, and access data stored in a memory location of the memory identified by a second address in a manner defined by the retrieved data access permission. The address translation unit is configured to access a table to identify the data access permission, and is configured to determine the permission index and the second address based on the first address. A single permission index may correspond to different permissions for different entities within the system.

    Coprocessors with Bypass Optimization, Variable Grid Architecture, and Fused Vector Operations

    公开(公告)号:US20200272597A1

    公开(公告)日:2020-08-27

    申请号:US16286170

    申请日:2019-02-26

    Applicant: Apple Inc.

    Abstract: In an embodiment, a coprocessor may include a bypass indication which identifies execution circuitry that is not used by a given processor instruction, and thus may be bypassed. The corresponding circuitry may be disabled during execution, preventing evaluation when the output of the circuitry will not be used for the instruction. In another embodiment, the coprocessor may implement a grid of processing elements in rows and columns, where a given coprocessor instruction may specify an operation that causes up to all of the processing elements to operate on vectors of input operands to produce results. Implementations of the coprocessor may implement a portion of the processing elements. The coprocessor control circuitry may be designed to operate with the full grid or partial grid, reissuing instructions in the partial grid case to perform the requested operation. In still another embodiment, the coprocessor may be able to fuse vector mode operations.

Patent Agency Ranking