NEURAL PROCESSING DEVICE
    1.
    发明公开

    公开(公告)号:US20230385198A1

    公开(公告)日:2023-11-30

    申请号:US18448102

    申请日:2023-08-10

    申请人: Rebellions Inc.

    IPC分类号: G06F12/084

    CPC分类号: G06F12/084 G06F2212/622

    摘要: A neural processing device is provided. The neural processing device comprises: a processing unit configured to perform calculations, an L0 memory configured to receive data from the processing unit and provide data to the processing unit, and an LSU (Load/Store Unit) configured to perform load and store operations of the data, wherein the LSU comprises: a neural core load unit configured to issue a load instruction of the data, a neural core store unit configured to issue a store instruction for transmitting and storing the data, and a sync ID logic configured to provide a sync ID to the neural core load unit and the neural core store unit to thereby cause a synchronization signal to be generated for each sync ID.

    ITERATOR REGISTER FOR STRUCTURED MEMORY
    3.
    发明申请

    公开(公告)号:US20170249992A1

    公开(公告)日:2017-08-31

    申请号:US15461262

    申请日:2017-03-16

    申请人: Intel Corporation

    IPC分类号: G11C15/00

    摘要: Loading data from a computer memory system is disclosed. A memory system is provided, wherein some or all data stored in the memory system is organized as one or more pointer-linked data structures. One or more iterator registers are provided. A first pointer chain is loaded, having two or more pointers leading to a first element of a selected pointer-linked data structure to a selected iterator register. A second pointer chain is loaded, having two or more pointers leading to a second element of the selected pointer-linked data structure to the selected iterator register. The loading of the second pointer chain reuses portions of the first pointer chain that are common with the second pointer chain.Modifying data stored in a computer memory system is disclosed. A memory system is provided. One or more iterator registers are provided, wherein the iterator registers each include two or more pointer fields for storing two or more pointers that form a pointer chain leading to a data element. A local state associated with a selected iterator register is generated by performing one or more register operations relating to the selected iterator register and involving pointers in the pointer fields of the selected iterator register. A pointer-linked data structure is updated in the memory system according to the local state.

    Set selection of a set-associative storage container
    7.
    发明授权
    Set selection of a set-associative storage container 有权
    设置集合关联存储容器的选择

    公开(公告)号:US09495300B2

    公开(公告)日:2016-11-15

    申请号:US15067305

    申请日:2016-03-11

    IPC分类号: G06F12/08 G06F12/12

    摘要: A computer-implemented method includes generating a vector that is a random number. Two or more residue functions are applied to the vector to produce a state signal including a different number of states. A set status of a set-associative storage container in a computer system is determined. The set status identifies whether each set of the set-associative storage container is enabled or disabled. One of the state signals is selected that has a same number of states as a number of the sets that are enabled. The selected state signal is mapped to the sets that are enabled to assign each of the states of the selected state signal to a corresponding one of the sets that are enabled. A set selection of the set-associative storage container is output based on the mapping to randomly select one of the sets that are enabled from the set-associative storage container.

    摘要翻译: 计算机实现的方法包括生成作为随机数的向量。 将两个或更多个残差函数应用于向量以产生包括不同数量状态的状态信号。 确定计算机系统中的集合关联存储容器的设置状态。 设置状态标识是否启用或禁用了组关联存储容器的每一组。 选择状态信号中的一个具有与启用的集合的数量相同数量的状态。 所选择的状态信号被映射到能够将所选状态信号的每个状态分配给被启用的集合中的相应一个的集合。 基于映射输出集合关联存储容器的集合选择,以随机选择从集合关联存储容器启用的集合之一。

    Photonics-optimized processor system
    8.
    发明授权
    Photonics-optimized processor system 有权
    光子优化处理器系统

    公开(公告)号:US09495295B1

    公开(公告)日:2016-11-15

    申请号:US14822778

    申请日:2015-08-10

    IPC分类号: G06F12/08 G11C7/10 G06F15/78

    摘要: A photonics-optimized multi-processor system may include a plurality of processor chips, each of the processor chips comprising at least one input/output (I/O) component. The multi-processor system may also include first and second photonic components. The at least one I/O component of at least one of the processor chips may be configured to directly drive the first photonic component and receive a signal from the second photonic component. A total latency from any one of the processor chips to data at any global memory location may not be dominated by a round trip speed-of-light propagation delay. A number of the processor chips may be at least 10,000, and the processor chips may be packaged into a total volume of no more than 8 m3. A density of the processor chips may be greater than 1,000 chips per cubic meter.

    摘要翻译: 光子学优化的多处理器系统可以包括多个处理器芯片,每个处理器芯片包括至少一个输入/输出(I / O)组件。 多处理器系统还可以包括第一和第二光子组件。 至少一个处理器芯片的至少一个I / O分量可以被配置为直接驱动第一光子分量并从第二光子分量接收信号。 从任何一个处理器芯片到任何全局存储器位置处的数据的总等待时间可能不受到往返光速传播延迟的支配。 许多处理器芯片可以至少为10,000,并且处理器芯片可以被封装成不超过8m 3的总体积。 处理器芯片的密度可以大于每立方米1,000个芯片。

    Non-Temporal Write Combining Using Cache Resources
    9.
    发明申请
    Non-Temporal Write Combining Using Cache Resources 有权
    使用缓存资源的非时间写入组合

    公开(公告)号:US20160314069A1

    公开(公告)日:2016-10-27

    申请号:US14691971

    申请日:2015-04-21

    IPC分类号: G06F12/08 G06F12/12

    摘要: A method and apparatus for performing non-temporal write combining using existing cache resources is disclosed. In one embodiment, a method includes executing a first thread on a processor core, the first thread including a first block initialization store (BIS) instruction. A cache query may be performed responsive to the BIS instruction, and if the query results in a cache miss, a cache line may be installed in a cache in an unordered dirty state in which it is exclusively owned by the first thread. The first BIS instruction and one or more additional BIS instructions may write data from the first processor core into the first cache line. After a cache coherence response is received, the state of the first cache line may be changed to an ordered dirty state in which it is no longer exclusive to the first thread.

    摘要翻译: 公开了一种使用现有高速缓存资源执行非时间写入组合的方法和装置。 在一个实施例中,一种方法包括执行处理器核心上的第一线程,第一线程包括第一块初始化存储(BIS)指令。 可以响应于BIS指令执行缓存查询,并且如果查询导致高速缓存未命中,则高速缓存行可以以无序的脏状态安装在高速缓存中,其中它是由第一线程专有的。 第一BIS指令和一个或多个附加BIS指令可以将数据从第一处理器核心写入第一高速缓存行。 在接收到高速缓存一致性响应之后,可以将第一高速缓存行的状态改变为不再对第一线程排斥的有序脏状态。

    ROLE BASED CACHE COHERENCE BUS TRAFFIC CONTROL
    10.
    发明申请
    ROLE BASED CACHE COHERENCE BUS TRAFFIC CONTROL 审中-公开
    基于角色的高速缓存总线交通控制

    公开(公告)号:US20160246721A1

    公开(公告)日:2016-08-25

    申请号:US14626913

    申请日:2015-02-19

    IPC分类号: G06F12/08 G06F9/46 G06F9/455

    摘要: A method for controlling cache snoop and/or invalidate coherence traffic for specific caches based on transaction attributes is described. A memory management unit (MMU) determines one or more transaction attributes for a cache coherence transaction from a requesting processor. A routing module identifies a cachability domain and/or shareability domain based on the transaction attributes and routes the cache coherence transaction to one or more caches in the cachability domain and/or shareability domain. Instead of coherence traffic being routed to all caches on a coherence bus, coherence traffic is selectively routed based on transaction attributes such as an address space identifier (ASID), a virtual machine identifier (VMID), a secure bit (NS), a hypervisor identifier (HYP), etc.

    摘要翻译: 描述了一种基于事务属性来控制缓存窥探和/或使特定高速缓存的相干流量无效的方法。 存储器管理单元(MMU)从请求处理器确定用于高速缓存一致性事务的一个或多个事务属性。 路由模块基于事务属性识别可访问域和/或可共享域,并将高速缓存一致性事务路由到可缓存域和/或可共享域中的一个或多个高速缓存。 代替一致性流量被路由到相干总线上的所有高速缓存,相干流量基于诸如地址空间标识符(ASID),虚拟机器标识符(VMID),安全位(NS),管理程序 标识符(HYP)等