-
公开(公告)号:US20170139714A1
公开(公告)日:2017-05-18
申请号:US15357943
申请日:2016-11-21
Applicant: Intel Corporation
Inventor: Mohammad A. Abdallah
IPC: G06F9/30
CPC classification number: G06F9/30123 , G06F9/30043 , G06F9/3009 , G06F9/30134 , G06F9/30138 , G06F9/3016 , G06F9/34 , G06F9/342 , G06F9/3808 , G06F9/3824 , G06F9/3826 , G06F9/383 , G06F9/3838 , G06F9/384 , G06F9/3851 , G06F9/3853 , G06F9/3857 , G06F9/3885 , G06F9/3891 , G06F9/462 , G06F9/4843
Abstract: A unified architecture for dynamic generation, execution, synchronization and parallelization of complex instruction formats includes a virtual register file, register cache and register file hierarchy. A self-generating and synchronizing dynamic and static threading architecture provides efficient context switching.
-
12.
公开(公告)号:US11294680B2
公开(公告)日:2022-04-05
申请号:US16671109
申请日:2019-10-31
Applicant: Intel Corporation
Inventor: Mohammad A. Abdallah
Abstract: A microprocessor implemented method is disclosed. The method includes mapping a plurality of instructions in a guest address space to corresponding instructions in a native address space. The method further includes, for each of one or more guest branch instructions in said native address space fetched during execution, performing the following: determining a youngest prior guest branch target stored in a guest branch target register, determining a branch target for a respective guest branch instruction by adding an offset value for said respective guest branch instruction to said youngest prior guest branch target, where said offset value is adjusted to account for a difference in address in said guest address space between an instruction at a beginning of a guest instruction block and a branch instruction in said guest instruction block. The method further includes creating an entry in said guest branch target register for said branch target.
-
公开(公告)号:US09965281B2
公开(公告)日:2018-05-08
申请号:US15357943
申请日:2016-11-21
Applicant: Intel Corporation
Inventor: Mohammad A. Abdallah
IPC: G06F9/30
CPC classification number: G06F9/30123 , G06F9/30043 , G06F9/3009 , G06F9/30134 , G06F9/30138 , G06F9/3016 , G06F9/34 , G06F9/342 , G06F9/3808 , G06F9/3824 , G06F9/3826 , G06F9/383 , G06F9/3838 , G06F9/384 , G06F9/3851 , G06F9/3853 , G06F9/3857 , G06F9/3885 , G06F9/3891 , G06F9/462 , G06F9/4843
Abstract: A unified architecture for dynamic generation, execution, synchronization and parallelization of complex instruction formats includes a virtual register file, register cache and register file hierarchy. A self-generating and synchronizing dynamic and static threading architecture provides efficient context switching.
-
14.
公开(公告)号:US11163720B2
公开(公告)日:2021-11-02
申请号:US16371831
申请日:2019-04-01
Applicant: Intel Corporation
Inventor: Mohammad A. Abdallah
Abstract: An execution unit to execute instructions using a time-lag sliced architecture (TLSA). The execution unit includes a first computation unit and a second computation unit, where each of the first computation unit and the second computation unit includes a plurality of logic slices arranged in order, where each of the plurality of logic slices except a lattermost logic slice is coupled to an immediately following logic slice to provide an output of that logic slice to the immediately following logic slice, where the immediately following logic slice is to execute with a time lag with respect to its immediately previous logic slice. Further, each of the plurality of logic slices of the second computation unit is coupled to a corresponding logic slice of the first computation unit to receive an output of the corresponding logic slice of the first computation unit.
-
公开(公告)号:US10585670B2
公开(公告)日:2020-03-10
申请号:US15944655
申请日:2018-04-03
Applicant: Intel Corporation
Inventor: Mohammad A. Abdallah
Abstract: A processor architecture includes a register file hierarchy to implement virtual registers that provide a larger set of registers than those directly supported by an instruction set architecture to facilitate multiple copies of the same architecture register for different processing threads, where the register file hierarchy includes a plurality of hierarchy levels. The processor architecture further includes a plurality of execution units coupled to the register file hierarchy.
-
16.
公开(公告)号:US20190227982A1
公开(公告)日:2019-07-25
申请号:US16371831
申请日:2019-04-01
Applicant: Intel Corporation
Inventor: Mohammad A. Abdallah
Abstract: An execution unit to execute instructions using a time-lag sliced architecture (TLSA). The execution unit includes a first computation unit and a second computation unit, where each of the first computation unit and the second computation unit includes a plurality of logic slices arranged in order, where each of the plurality of logic slices except a lattermost logic slice is coupled to an immediately following logic slice to provide an output of that logic slice to the immediately following logic slice, where the immediately following logic slice is to execute with a time lag with respect to its immediately previous logic slice. Further, each of the plurality of logic slices of the second computation unit is coupled to a corresponding logic slice of the first computation unit to receive an output of the corresponding logic slice of the first computation unit.
-
公开(公告)号:US10048964B2
公开(公告)日:2018-08-14
申请号:US14569543
申请日:2014-12-12
Applicant: Intel Corporation
Inventor: Mohammad A. Abdallah
Abstract: In a processor, a disambiguation-free out of order load store queue method. The method includes implementing a memory resource that can be accessed by a plurality of asynchronous cores; implementing a store retirement buffer, wherein stores from a store queue have entries in the store retirement buffer in original program order; and upon dispatch of a subsequent load from a load queue, searching the store retirement buffer for address matching. The method further includes in cases where there are a plurality of address matches, locating a correct forwarding entry by scanning for the store retirement buffer for a first match; and forwarding data from the first match to the subsequent load.
-
18.
公开(公告)号:US20180137081A1
公开(公告)日:2018-05-17
申请号:US15853323
申请日:2017-12-22
Applicant: Intel Corporation
Inventor: Mohammad A. Abdallah
CPC classification number: G06F15/8007 , G06F7/483 , G06F7/5318 , G06F7/5338 , G06F7/5443 , G06F9/3001 , G06F9/30109 , G06F9/3012 , G06F9/30123 , G06F9/30141 , G06F9/3016 , G06F9/30181 , G06F9/30189 , G06F9/3824 , G06F9/3828 , G06F9/3838 , G06F9/3851 , G06F9/3853 , G06F9/3867 , G06F9/3885 , G06F9/3887 , G06F9/3889 , G06F9/3891 , G06F15/80
Abstract: A matrix of execution blocks form a set of rows and columns. The rows support parallel execution of instructions and the columns support execution of dependent instructions. The matrix of execution blocks process a single block of instructions specifying parallel and dependent instructions.
-
公开(公告)号:US09891924B2
公开(公告)日:2018-02-13
申请号:US14214176
申请日:2014-03-14
Applicant: Intel Corporation
Inventor: Mohammad A. Abdallah
CPC classification number: G06F9/3836 , G06F9/3802 , G06F9/3838 , G06F9/3851 , G06F9/3853 , G06F9/3857 , G06F9/3863
Abstract: A method for implementing a reduced size register view data structure in a microprocessor. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks; using a plurality of multiplexers to access ports of a scheduling array to store the instruction blocks as a series of chunks.
-
公开(公告)号:US09891915B2
公开(公告)日:2018-02-13
申请号:US14281663
申请日:2014-05-19
Applicant: Intel Corporation
Inventor: Mohammad A. Abdallah , Ravishankar Rao
IPC: G06F12/00 , G06F9/30 , G06F12/1045
CPC classification number: G06F9/30043 , G06F12/1054
Abstract: A microprocessor implemented method for resolving dependencies for a load instruction in a load store queue (LSQ) is disclosed. The method comprises initiating a computation of a virtual address corresponding to the load instruction in a first clock cycle. It also comprises transmitting early calculated lower address bits of the virtual address to a load store queue (LSQ) in the same cycle as the initiating. Finally, it comprises performing a partial match in the LSQ responsive to and using the lower address bits to find a prior aliasing store, wherein the prior aliasing store stores to a same address as the load instruction.
-
-
-
-
-
-
-
-
-