Executing multiple programs simultaneously on a processor core

    公开(公告)号:US11531552B2

    公开(公告)日:2022-12-20

    申请号:US15425632

    申请日:2017-02-06

    IPC分类号: G06F9/38 G06F9/30

    摘要: Systems and methods are disclosed for allocating resources to contexts in block-based processor architectures. In one example of the disclosed technology, a processor is configured to spatially allocate resources between multiple contexts being executed by the processor, including caches, functional units, and register files. In a second example of the disclosed technology, a processor is configured to temporally allocate resources between multiple contexts, for example, on a clock cycle basis, including caches, register files, and branch predictors. Each context is guaranteed access to its allocated resources to avoid starvation from contexts competing for resources of the processor. A results buffer can be used for folding larger instruction blocks into portions that can be mapped to smaller-sized instruction windows. The results buffer stores operand results that can be passed to subsequent portions of an instruction block.

    Coupling wide memory interface to wide write back paths

    公开(公告)号:US10963379B2

    公开(公告)日:2021-03-30

    申请号:US15887640

    申请日:2018-02-02

    摘要: Systems and methods are disclosed for performing wide memory operations for a wide data cache line. In some examples of the disclosed technology, a processor having two or more execution lanes includes a data cache coupled to memory, a wide memory load circuit that concurrently loads two or more words from a cache line of the data cache, and a writeback circuit situated to send a respective word of the concurrently-loaded words to a selected execution lane of the processor, either into an operand buffer or bypassing the operand buffer. In some examples, a sharding circuit is provided that allows bitwise, byte-wise, and/or word-wise manipulation of memory operation data. In some examples, wide cache loads allows for concurrent execution of plural execution lanes of the processor.

    COMMIT LOGIC AND PRECISE EXCEPTIONS IN EXPLICIT DATAFLOW GRAPH EXECUTION ARCHITECTURES

    公开(公告)号:US20200089503A1

    公开(公告)日:2020-03-19

    申请号:US16224600

    申请日:2018-12-18

    IPC分类号: G06F9/38 G06F9/30

    摘要: Systems and methods are disclosed for executing instructions with a block-based processor. Instructions can be executed in any order as their dependencies arrive, but the individual instructions are committed in a serial fashion. Further, exception handling can be performed by storing transient state for an instruction block and resuming by restoring the transient state. This allows programmers to see intermediate state for the instruction block before the subject block has committed. In one examples of the disclosed technology, a method of operating a processor executing a block-based instruction set architecture includes executing at least one instruction encoded for an instruction block, responsive to determining that an individual instruction of the instruction block can commit, advancing a commit frontier for the instruction block to include all instructions in the instruction block that can commit, and committing one or more instructions inside the advanced commit frontier.

    Performance modeling and analysis of microprocessors using dependency graphs

    公开(公告)号:US11734480B2

    公开(公告)日:2023-08-22

    申请号:US16224718

    申请日:2018-12-18

    IPC分类号: G06F30/30 G06F11/34

    摘要: Embodiments described herein are directed to a microarchitecture modeling tool configured to model and analyze a microarchitecture using a dependency graph. The dependency graph may be generated based on an execution trace of a program and a microarchitecture definition that specifies various features and/or characteristics of the microarchitecture on which the execution trace is based. The dependency graph includes vertices representing different microarchitectural events. The vertices are coupled via edges representing a particular dependency therebetween. The edges are associated with a cost for performing microarchitectural event(s) corresponding to the vertices coupled thereto. The dependency graph also takes into account various policies for structural hazards of the microarchitecture. The microarchitecture modeling tool analyzes the costs associated with each of the edges to determine a design metric of the microarchitecture. A user is enabled to modify various features of the dependency graph to analyze different design choices and/or optimizations to the microarchitecture.

    Enabling peripheral device messaging via application portals in processor-based devices

    公开(公告)号:US11366769B1

    公开(公告)日:2022-06-21

    申请号:US17185855

    申请日:2021-02-25

    IPC分类号: G06F13/12 G06F9/30

    摘要: Enabling peripheral device messaging via application portals in processor-based devices is disclosed herein. In one embodiment, a processor-based device comprises a processing element (PE) including an application portal configured to logically operate as a message store, and that is exposed as an application portal address within an address space visible to a peripheral device that is communicatively coupled to the processor-based device. Upon receiving a message directed to the application portal address from the peripheral device, an application portal control circuit enqueues the message in the application portal. In some embodiments, the PE may further provide a dequeue instruction that may be executed as part of the application, and that results in a top element of the application portal being dequeued and transmitted to the application. Some embodiments may provide further mechanisms for sending success and/or failure notifications, and/or for informing the application that the message has been enqueued.

    Reach matrix scheduler circuit for scheduling instructions to be executed in a processor

    公开(公告)号:US11803389B2

    公开(公告)日:2023-10-31

    申请号:US16738362

    申请日:2020-01-09

    IPC分类号: G06F9/38

    CPC分类号: G06F9/3838 G06F9/3836

    摘要: A reach matrix scheduler circuit for scheduling instructions to be executed in a processor is disclosed. The scheduler circuit includes an N×R matrix wake-up circuit, where ‘N’ is the instruction window size of the scheduler circuit, and ‘R’ is the “reach” within the instruction window of the matrix wake-up circuit, with ‘R’ being less than ‘N’. A grant line associated with each instruction request entry in the N×R matrix wake-up circuit is coupled to ‘R’ other instruction entries among the ‘N’ instruction entries. When a producer instruction in an instruction request entry is ready for issuance, the grant line associated with the instruction request entry is activated so that any other instruction entries coupled to the grant line (i.e., within the “reach” of the instruction request entry) that consume the produced value generated by the producer instruction are “woken-up” and subsequently indicated as ready to be issued.