APPARATUS AND METHOD FOR TASK-SWITCHABLE SYNCHRONOUS HARDWARE ACCELERATORS
    1.
    发明申请
    APPARATUS AND METHOD FOR TASK-SWITCHABLE SYNCHRONOUS HARDWARE ACCELERATORS 审中-公开
    用于可切换同步硬件加速器的设备和方法

    公开(公告)号:WO2014105152A1

    公开(公告)日:2014-07-03

    申请号:PCT/US2013/046911

    申请日:2013-06-20

    CPC classification number: G06F9/3861 G06F9/30054 G06F9/30189 G06F9/3881

    Abstract: A processor comprising: execution logic to execute a first thread including an accelerator invocation instruction to invoke an accelerator command; an accelerator to execute an accelerator thread in response to the accelerator command, the accelerator to store state data associated with the accelerator thread in a application memory area in memory, wherein prior to executing the accelerator thread, the accelerator is to lock entries in a translation lookaside buffer (TLB) associated with the accelerator thread to prevent an exception which might otherwise result.

    Abstract translation: 一种处理器,包括执行逻辑,以执行包括调用加速器命令的加速器调用指令的第一线程; 加速器,其响应于加速器命令执行加速器线程,加速器,用于将与加速器线程相关联的状态数据存储在存储器中的应用存储器区域中,其中在执行加速器线程之前,加速器将锁定条目转换为 与加速器线程相关联的后备缓冲器(TLB),以防止否则可能导致的异常。

    APPARATUS AND METHOD FOR RETRIEVING ELEMENTS FROM A LINKED STRUCTURE
    2.
    发明申请
    APPARATUS AND METHOD FOR RETRIEVING ELEMENTS FROM A LINKED STRUCTURE 审中-公开
    用于从链接结构中检索元件的装置和方法

    公开(公告)号:WO2017112489A1

    公开(公告)日:2017-06-29

    申请号:PCT/US2016/066694

    申请日:2016-12-14

    CPC classification number: G06F9/30043 G06F9/30065 G06F9/3455

    Abstract: An apparatus and method are described for retrieving elements from a linked structure. For example, one embodiment of an apparatus comprises: a decode unit to decode a first instruction, the first instruction to utilize a current address value, an end address value, and an offset; and an execution unit to execute the first instruction to cause the execution unit to compare the current address value with the end address value, the execution unit to perform no additional operation with respect to the first instruction if the current address value is equal to the end address value; and if the current address value is not equal to the end address value, then the execution unit to add the offset value to the current address value to identify a next address pointer within an element structure, the execution unit to further set the current address value equal to the next address pointer.

    Abstract translation: 描述了一种用于从链接结构中检索元素的装置和方法。 例如,装置的一个实施例包括:解码单元,用于解码第一指令,第一指令利用当前地址值,结束地址值和偏移量; 以及执行单元,执行所述第一指令以使所述执行单元将所述当前地址值与所述结束地址值进行比较,如果所述当前地址值等于所述结束地址值,则所述执行单元不执行关于所述第一指令的附加操作。 地址值; 并且如果当前地址值不等于结束地址值,则执行单元将偏移值添加到当前地址值以标识元素结构内的下一个地址指针,执行单元进一步设置当前地址值 等于下一个地址指针。

    PROCESSING CORE HAVING SHARED FRONT END UNIT
    3.
    发明申请
    PROCESSING CORE HAVING SHARED FRONT END UNIT 审中-公开
    具有共享前端单元的加工芯

    公开(公告)号:WO2014105207A1

    公开(公告)日:2014-07-03

    申请号:PCT/US2013/048694

    申请日:2013-06-28

    Abstract: A processor having one or more processing cores is described. Each of the one or more processing cores has front end logic circuitry and a plurality of processing units. The front end logic circuitry is to fetch respective instructions of threads and decode the instructions into respective micro-code and input operand and resultant addresses of the instructions. Each of the plurality of processing units is to be assigned at least one of the threads, is coupled to said front end unit, and has a respective buffer to receive and store microcode of its assigned at least one of the threads. Each of the plurality of processing units also comprises: i) at least one set of functional units corresponding to a complete instruction set offered by the processor, the at least one set of functional units to execute its respective processing unit's received microcode; ii) registers coupled to the at least one set of functional units to store operands and resultants of the received microcode; iii) data fetch circuitry to fetch input operands for the at least one functional units' execution of the received microcode.

    Abstract translation: 描述具有一个或多个处理核的处理器。 一个或多个处理核心中的每一个具有前端逻辑电路和多个处理单元。 前端逻辑电路是提取线程的相应指令,并将指令解码为相应的微码和指令的输入操作数和结果地址。 多个处理单元中的每一个将被分配至少一个线程,耦合到所述前端单元,并且具有相应的缓冲器以接收和存储其分配的至少一个线程的微代码。 所述多个处理单元中的每一个还包括:i)至少一组对应于由所述处理器提供的完整指令集的功能单元,所述至少一组功能单元执行其各自处理单元的接收到的微代码; ii)耦合到所述至少一组功能单元的寄存器,以存储所接收的微代码的操作数和结果; iii)数据获取电路,用于获取至少一个功能单元执行所接收的微代码的输入操作数。

    INSTRUCTION AND LOGIC FOR DETECTING NUMERIC ACCUMULATION ERROR
    4.
    发明申请
    INSTRUCTION AND LOGIC FOR DETECTING NUMERIC ACCUMULATION ERROR 审中-公开
    用于检测数字累积误差的指令和逻辑

    公开(公告)号:WO2018063705A1

    公开(公告)日:2018-04-05

    申请号:PCT/US2017/049339

    申请日:2017-08-30

    Abstract: A processor includes circuitry to decode at least one instruction and an execution unit. The decoded instruction may compute a floating point result. The execution unit includes circuitry to execute the instruction to determine the floating point result, compute the amount of precision lost in a mantissa of the floating point result, compare the amount of precision lost to a numeric accumulation error precision threshold, determine whether a numeric accumulation error occurred based on the comparison, and write a value to a flag. The amount of precision lost corresponds to a plurality of bits lost in the mantissa of the floating point result. The value to be written to the flag may be based on the determination that the numeric accumulation error occurred. The flag may be for notification that the numeric accumulation error occurred.

    Abstract translation: 处理器包括用于解码至少一个指令和执行单元的电路。 解码的指令可以计算浮点结果。 执行单元包括执行指令以确定浮点结果的电路,计算浮点结果的尾数中丢失的精度的量,将失去的精度的量与数字累积错误精度阈值进行比较,确定数字累积 根据比较发生错误,并将值写入标志。 丢失的精度量对应于浮点结果的尾数中丢失的多个比特。 要写入该标志的值可以基于数字累积错误发生的确定。 该标志可能用于通知发生数字累积错误。

    USER-LEVEL FORK AND JOIN PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS
    5.
    发明申请
    USER-LEVEL FORK AND JOIN PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS 审中-公开
    用户层级和加工处理器,方法,系统和指令

    公开(公告)号:WO2016160125A1

    公开(公告)日:2016-10-06

    申请号:PCT/US2016/016700

    申请日:2016-02-05

    Abstract: A processor of an aspect includes a plurality of processor elements, and a first processor element. The first processor element may perform a user-level fork instruction of a software thread. The first processor element may include a decoder to decode the user-level fork instruction. The user-level fork instruction is to indicate at least one instruction address. The first processor element may also include a user-level thread fork module. The user-level fork module, in response to the user-level fork instruction being decoded, may configure each of the plurality of processor elements to perform instructions in parallel. Other processors, methods, systems, and instructions are disclosed.

    Abstract translation: 一方面的处理器包括多个处理器元件和第一处理器元件。 第一处理器元件可以执行软件线程的用户级叉指令。 第一处理器元件可以包括用于解码用户级叉指令的解码器。 用户级fork指令至少指示一个指令地址。 第一处理器元件还可以包括用户级螺纹叉模块。 用户级叉模块响应于被解码的用户级fork指令,可以配置多个处理器元件中的每一个并行执行指令。 公开了其他处理器,方法,系统和指令。

    INSTRUCTIONS AND LOGIC TO PROVIDE ATOMIC RANGE OPERATIONS
    7.
    发明申请
    INSTRUCTIONS AND LOGIC TO PROVIDE ATOMIC RANGE OPERATIONS 审中-公开
    说明和逻辑提供原子范围操作

    公开(公告)号:WO2016160248A1

    公开(公告)日:2016-10-06

    申请号:PCT/US2016/020394

    申请日:2016-03-02

    CPC classification number: G06F9/3001 G06F9/30018 G06F9/3004

    Abstract: Instructions and logic provide atomic range operations in a multiprocessing system. In one embodiment an atomic range modification instruction specifies an address for a set of range indices. The instruction locks access to the set of range indices and loads the range indices to check the range size. The range size is compared with a size sufficient to perform the range modification. If the range size is sufficient to perform the range modification, the range modification is performed and one or more modified range indices of the set of range indices is stored back to memory. Otherwise an error signal is set when the range size is not sufficient to perform said range modification. Access to the set of range indices is unlocked responsive to completion of the atomic range modification instruction. Embodiments may include atomic increment next instructions, add next instructions, decrement end instructions, and/or subtract end instructions.

    Abstract translation: 说明和逻辑在多处理系统中提供原子范围操作。 在一个实施例中,原子范围修改指令指定一组范围索引的地址。 该指令锁定对范围索引的访问,并加载范围索引以检查范围大小。 将范围大小与足以执行范围修改的大小进行比较。 如果范围大小足以执行范围修改,则执行范围修改,并且将范围索引集合的一个或多个修改的范围索引存储回存储器。 否则当范围大小不足以执行所述范围修改时,设置错误信号。 响应于原子范围修改指令的完成,对范围索引集的访问被解锁。 实施例可以包括原子增量下一个指令,添加下一个指令,递减结束指令和/或减去结束指令。

    APPARATUS AND METHOD FOR FAST FAILURE HANDLING OF INSTRUCTIONS
    9.
    发明申请
    APPARATUS AND METHOD FOR FAST FAILURE HANDLING OF INSTRUCTIONS 审中-公开
    快速故障处理指令的装置和方法

    公开(公告)号:WO2014105164A1

    公开(公告)日:2014-07-03

    申请号:PCT/US2013/047387

    申请日:2013-06-24

    Abstract: A processor is described comprising: instruction failure logic to perform a plurality of operations in response to a detected instruction execution failure, the instruction failure logic to be used for instructions which have complex failure modes and which are expected to have a failure frequency above a threshold, wherein the operations include: detecting an instruction execution failure and determining a reason for the failure; storing failure data in a destination register to indicate the failure and to specify details associated with the failure; and allowing application program code to read the failure data and responsively take one or more actions responsive to the failure, wherein the instruction failure logic performs its operations without invocation of an exception handler or switching to a low level domain on a system which employs hierarchical protection domains.

    Abstract translation: 描述了一种处理器,包括:响应于检测到的指令执行失败执行多个操作的指令失败逻辑,用于具有复杂故障模式并且预期具有高于阈值的故障频率的指令的指令故障逻辑 其中,所述操作包括:检测指令执行失败并确定所述故障的原因; 将故障数据存储在目的地寄存器中以指示故障并指定与故障相关的细节; 并且允许应用程序代码读取故障数据并且响应于故障响应地采取一个或多个动作,其中指令失败逻辑执行其操作而不调用异常处理程序或切换到采用分级保护的系统上的低级域 域。

    APPARATUS AND METHOD FOR A HYBRID LATENCY-THROUGHPUT PROCESSOR
    10.
    发明申请
    APPARATUS AND METHOD FOR A HYBRID LATENCY-THROUGHPUT PROCESSOR 审中-公开
    混合式延迟加工器的装置和方法

    公开(公告)号:WO2014105128A1

    公开(公告)日:2014-07-03

    申请号:PCT/US2013/046166

    申请日:2013-06-17

    Abstract: An apparatus and method are described for executing both latency-optimized execution logic and throughput-optimized execution logic on a processing device. For example, a processor according to one embodiment comprises: latency-optimized execution logic to execute a first type of program code; throughput-optimized execution logic to execute a second type of program code, wherein the first type of program code and the second type of program code are designed for the same instruction set architecture; logic to identify the first type of program code and the second type of program code within a process and to distribute the first type of program code for execution on the latency-optimized execution logic and the second type of program code for execution on the throughput-optimized execution logic.

    Abstract translation: 描述了用于在处理设备上执行延迟优化的执行逻辑和吞吐量优化的执行逻辑的装置和方法。 例如,根据一个实施例的处理器包括:执行第一类型的程序代码的等待时间优化的执行逻辑; 吞吐量优化执行逻辑以执行第二类型的程序代码,其中所述第一类型的程序代码和所述第二类型的程序代码被设计用于相同的指令集架构; 识别过程中的第一类型的程序代码和第二类型的程序代码的逻辑,并且将用于执行的第一类型的程序代码分配在延迟优化的执行逻辑和第二类型的程序代码上以便在吞吐量 - 优化的执行逻辑。

Patent Agency Ranking