专利检索 ap:"Mark J Charney" 第 1 页

1.

发明授权
Packed data operation mask register arithmetic combination processors, methods, systems, and instructions 有权

公开(公告)号：US09760371B2

公开(公告)日：2017-09-12

申请号：US13976885

申请日：2011-12-22

申请人： Bret L. Toll , Robert Valentine , Jesus Corbal San Adrian , Elmoustapha Ould-Ahmed-Vall , Mark J. Charney

发明人： Bret L. Toll , Robert Valentine , Jesus Corbal San Adrian , Elmoustapha Ould-Ahmed-Vall , Mark J. Charney

IPC分类号： G06F9/30

CPC分类号： G06F9/3001 , G06F9/30014 , G06F9/30018 , G06F9/30036

摘要： A method of an aspect includes receiving a packed data operation mask register arithmetic combination instruction. The packed data operation mask register arithmetic combination instruction indicates a first packed data operation mask register, indicates a second packed data operation mask register, and indicates a destination storage location. An arithmetic combination of at least a portion of bits of the first packed data operation mask register and at least a corresponding portion of bits of the second packed data operation mask register is stored in the destination storage location in response to the packed data operation mask register arithmetic combination instruction. Other methods, apparatus, systems, and instructions are disclosed.

2.

发明申请
Systems, Apparatuses, and Methods for Strided Loads 审中-公开

公开(公告)号：US20170192781A1

公开(公告)日：2017-07-06

申请号：US14984124

申请日：2015-12-30

申请人： Robert Valentine , Elmoustapha Ould-Ahmed-Vall , Jason W. Brandt , Mark J. Charney , Ashish Jha , Milind B. Girkar , Bret L. Toll , Evgeny V. Stupachenko , Sergey Y. Ostanevich

发明人： Robert Valentine , Elmoustapha Ould-Ahmed-Vall , Jason W. Brandt , Mark J. Charney , Ashish Jha , Milind B. Girkar , Bret L. Toll , Evgeny V. Stupachenko , Sergey Y. Ostanevich

IPC分类号： G06F9/30

CPC分类号： G06F9/3016 , G06F9/30036 , G06F9/30043 , G06F9/30098 , G06F9/30109 , G06F9/30112 , G06F9/30192 , G06F9/3455

摘要： Detailed herein are systems, apparatuses, and methods for strided loads. In an embodiment, an apparatus includes a decoder to decode an instruction, wherein the instruction to include fields a starting source memory address operand and a starting destination register operand; and execution circuitry to execute the decoded instruction to extract data elements of a defined number of types from contiguous memory beginning at the starting source memory address and, for each type, store the extracted data elements in a packed data register dedicated to that type beginning with starting destination register operand.

3.

发明授权
Apparatus and method of improved extract instructions 有权
标题翻译：改进提取指令的装置和方法

公开(公告)号：US09588764B2

公开(公告)日：2017-03-07

申请号：US13976998

申请日：2011-12-23

申请人： Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Bret L. Toll , Mark J. Charney , Zeev Sperber , Amit Gradstein

发明人： Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Bret L. Toll , Mark J. Charney , Zeev Sperber , Amit Gradstein

IPC分类号： G06F9/30

CPC分类号： G06F9/30149 , G06F9/3001 , G06F9/30014 , G06F9/30018 , G06F9/30032 , G06F9/30036 , G06F9/3013 , G06F9/30145

摘要： An apparatus is described that includes instruction execution circuitry to execute first, second, third, and fourth instructions, the first and second instructions select a first group of input vector elements from one of multiple first non-overlapping sections of respective first and second input vectors. Each of the multiple first non-overlapping sections have a same bit width as the first group. Both the third and fourth instructions select a second group of input vector elements from one of multiple second non overlapping sections of respective third and fourth input vectors. The second group has a second bit width that is larger than the first bit width. Each of multiple second non overlapping sections have a same bit width as the second group. The apparatus includes masking layer circuitry to mask the first and second groups at a first granularity and second granularity.

摘要翻译： 描述了一种装置，其包括执行第一，第二，第三和第四指令的指令执行电路，第一和第二指令从第一和第二输入向量的多个第一非重叠部分之一中选择第一组输入向量元素。多个第一非重叠部分中的每一个具有与第一组相同的位宽度。第三和第四指令都从相应的第三和第四输入向量的多个第二非重叠部分之一中选择第二组输入向量元素。第二组具有比第一位宽大的第二位宽度。多个第二非重叠部分中的每一个具有与第二组相同的位宽度。该装置包括掩蔽层电路，以第一粒度和第二粒度掩蔽第一和第二组。

4.

发明授权
Fusible instructions and logic to provide OR-test and AND-test functionality using multiple test sources 有权
标题翻译：使用多个测试源提供OR-test和AND-test功能的易熔指令和逻辑

公开(公告)号：US09483266B2

公开(公告)日：2016-11-01

申请号：US13843020

申请日：2013-03-15

申请人： Maxim Loktyukhin , Robert Valentine , Julian C. Horn , Mark J. Charney

发明人： Maxim Loktyukhin , Robert Valentine , Julian C. Horn , Mark J. Charney

IPC分类号： G06F9/30 , G06F9/38

CPC分类号： G06F9/3822 , G06F9/30029 , G06F9/30058 , G06F9/30094 , G06F9/3836

摘要： Fusible instructions and logic provide OR-test and AND-test functionality on multiple test sources. Some embodiments include a processor decode stage to decode a test instruction for execution, the instruction specifying first, second and third source data operands, and an operation type. Execution units, responsive to the decoded test instruction, perform one logical operation, according to the specified operation type, between data from the first and second source data operands, and perform a second logical operation between the data from the third source data operand and the result of the first logical operation to set a condition flag. Some embodiments generate the test instruction dynamically by fusing one logical instruction with a prior-art test instruction. Other embodiments generate the test instruction through a just-in-time compiler. Some embodiments also fuse the test instruction with a subsequent conditional branch instruction, and perform a branch according to how the condition flag is set.

摘要翻译： 易熔指令和逻辑在多个测试源上提供OR测试和与测试功能。一些实施例包括解码用于执行的测试指令的处理器解码级，指定第一，第二和第三源数据操作数的指令以及操作类型。执行单元响应于解码的测试指令，根据指定的操作类型在来自第一和第二源数据操作数的数据之间执行一个逻辑操作，并且执行来自第三源数据操作数的数据和第一个逻辑运算结果设置条件标志。一些实施例通过将一个逻辑指令与现有技术的测试指令进行融合来动态地产生测试指令。其他实施例通过即时编译器生成测试指令。一些实施例还将测试指令与随后的条件分支指令融合，并且根据条件标志的设置来执行分支。

5.

发明申请
METHODS, APPARATUS, INSTRUCTIONS, AND LOGIC TO PROVIDE VECTOR ADDRESS CONFLICT DETECTION FUNCTIONALITY 有权
标题翻译：方法，装置，说明和逻辑提供矢量地址冲突检测功能

公开(公告)号：US20140189308A1

公开(公告)日：2014-07-03

申请号：US13731006

申请日：2012-12-29

申请人： Christopher J. Hughes , Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Brett L. Toll , Mark J. Charney , Milind B. Girkar

发明人： Christopher J. Hughes , Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Brett L. Toll , Mark J. Charney , Milind B. Girkar

IPC分类号： G06F9/30

CPC分类号： G06F9/30021 , G06F9/30018 , G06F9/30036 , G06F9/30109 , G06F9/30145 , G06F9/30185 , G06F9/3838 , G06F9/3887

摘要： Instructions and logic provide SIMD address conflict detection functionality. Some embodiments include processors with a register with a variable plurality of data fields, each of the data fields to store an offset for a data element in a memory. A destination register has corresponding data fields, each of these data fields to store a variable second plurality of bits to store a conflict mask having a mask bit for each offset. Responsive to decoding a vector conflict instruction, execution units compare the offset in each data field with every less significant data field to determine if they hold a matching offset, and in corresponding conflict masks in the destination register, set any mask bits corresponding to a less significant data field with a matching offset. Vector address conflict detection can be used with variable sized elements and to generate conflict masks to resolve dependencies in gather-modify-scatter SIMD operations.

摘要翻译： 指令和逻辑提供SIMD地址冲突检测功能。一些实施例包括具有可变多个数据字段的寄存器的处理器，每个数据字段存储用于存储器中的数据元素的偏移量。目的地寄存器具有对应的数据字段，这些数据字段中的每一个用于存储可变的第二多个位以存储具有每个偏移的掩码位的冲突掩码。响应于对向量冲突指令进行解码，执行单元将每个数据字段中的偏移量与每个较不重要的数据字段进行比较，以确定它们是否保持匹配的偏移，并且在目标寄存器中的相应冲突掩码中，设置对应于较少具有匹配偏移的重要数据字段。向量地址冲突检测可以与可变大小的元素一起使用，并生成冲突掩码来解决收集修改分散SIMD操作中的依赖关系。

6.

发明申请
METHODS, APPARATUS, INSTRUCTIONS, AND LOGIC TO PROVIDE VECTOR ADDRESS CONFLICT RESOLUTION WITH VECTOR POPULATION COUNT FUNCTIONALITY 有权
标题翻译：方法，设备，说明和逻辑提供向量地址冲突分解与向量人口计数功能

公开(公告)号：US20140189307A1

公开(公告)日：2014-07-03

申请号：US13731005

申请日：2012-12-29

申请人： Robert Valentine , Mark J. Charney , Jesus Corbal , Milind B. Girkar , Christopher J. Hughes , Elmoustapha Ould-Ahmed-Vall , Brett L. Toll

发明人： Robert Valentine , Mark J. Charney , Jesus Corbal , Milind B. Girkar , Christopher J. Hughes , Elmoustapha Ould-Ahmed-Vall , Brett L. Toll

IPC分类号： G06F9/30

CPC分类号： G06F9/30145 , G06F7/607 , G06F9/30014 , G06F9/30018 , G06F9/30032 , G06F9/30036 , G06F9/3836 , G06F9/3887 , H03M7/20

摘要： Instructions and logic provide SIMD address conflict resolution with vector population count functionality. Some embodiments include processors with a register with a variable plurality of data fields, each of the data fields to store a variable second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of bits set to one for corresponding data fields. Responsive to decoding a vector population count instruction, execution units count the number of bits set to one for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector population count instructions can be used with variable sized elements and conflict masks to generate iteration counts and completion masks to be used each iteration to resolve dependencies in gather-modify-scatter SIMD operations.

摘要翻译： 指令和逻辑提供SIMD地址冲突解决与向量群体计数功能。一些实施例包括具有可变多个数据字段的寄存器的处理器，每个数据字段用于存储可变的第二多个位。目的地寄存器具有对应的数据字段，这些数据字段中的每一个用于存储为相应的数据字段设置为1的位数的计数。响应于对向量群体计数指令进行解码，执行单元对寄存器中的每个数据字段设置为1的位数进行计数，并将计数存储在第一目的地寄存器的相应数据字段中。矢量人口计数指令可用于可变大小的元素和冲突掩码，以生成迭代计数和完成掩码，以便在每次迭代中使用以解决聚集修改散射SIMD操作中的依赖关系。

7.

发明申请
INSTRUCTION EXECUTION UNIT THAT BROADCASTS DATA VALUES AT DIFFERENT LEVELS OF GRANULARITY 有权
标题翻译：指定执行单位在不同级别的范围内广播数据值

公开(公告)号：US20130339664A1

公开(公告)日：2013-12-19

申请号：US13976003

申请日：2011-12-23

申请人： Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Bret L. Toll , Mark J. Charney

发明人： Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Bret L. Toll , Mark J. Charney

IPC分类号： G06F9/30

CPC分类号： G06F9/30145 , G06F9/30018 , G06F9/30032 , G06F9/30036 , G06F9/30109 , G06F9/3887

摘要： An apparatus is described that includes an execution unit to execute a first instruction and a second instruction. The execution unit includes input register space to store a first data structure to be replicated when executing the first instruction and to store a second data structure to be replicated when executing the second instruction. The first and second data structures are both packed data structures. Data values of the first packed data structure are twice as large as data values of the second packed data structure. The first data structure is four times as large as the second data structure. The execution unit also includes replication logic circuitry to replicate the first data structure when executing the first instruction to create a first replication data structure, and, to replicate the second data structure when executing the second instruction to create a second replication data structure.

摘要翻译： 描述了包括执行第一指令和第二指令的执行单元的装置。执行单元包括输入寄存器空间，用于在执行第一指令时存储要复制的第一数据结构，并且在执行第二指令时存储要复制的第二数据结构。第一和第二数据结构都是打包数据结构。第一打包数据结构的数据值是第二打包数据结构的数据值的两倍。第一个数据结构是第二个数据结构的四倍。执行单元还包括复制逻辑电路，以在执行第一指令以创建第一复制数据结构时复制第一数据结构，并且在执行第二指令以创建第二复制数据结构时复制第二数据结构。

8.

发明申请
APPARATUS AND METHOD OF IMPROVED EXTRACT INSTRUCTIONS 有权
标题翻译：改进提取说明的装置和方法

公开(公告)号：US20130275730A1

公开(公告)日：2013-10-17

申请号：US13976998

申请日：2011-12-23

申请人： Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Bret L. Toll , Mark J. Charney , Zeev Sperber , Amit Gradstein

发明人： Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Bret L. Toll , Mark J. Charney , Zeev Sperber , Amit Gradstein

IPC分类号： G06F9/30

CPC分类号： G06F9/30149 , G06F9/3001 , G06F9/30014 , G06F9/30018 , G06F9/30032 , G06F9/30036 , G06F9/3013 , G06F9/30145

摘要： An apparatus is described that includes instruction execution logic circuitry to execute first, second, third and fourth instructions. Both the first instruction and the second instruction select a first group of input vector elements from one of multiple first non overlapping sections of respective first and second input vectors. The first group has a first bit width. Each of the multiple first non overlapping sections have a same bit width as the first group. Both the third instruction and the fourth instruction select a second group of input vector elements from one of multiple second non overlapping sections of respective third and fourth input vectors. The second group has a second bit width that is larger than the first bit width. Each of the multiple second non overlapping sections have a same bit width as the second group. The apparatus includes masking layer circuitry to mask the first and second groups of the first and third instructions at a first granularity, where, respective resultants produced therewith are respective resultants of the first and third instructions. The masking circuitry is also to mask the first and second groups of the second and fourth instructions at a second granularity, where, respective resultants produced therewith are respective resultants of the second and fourth instructions.

摘要翻译： 描述了包括执行第一，第二，第三和第四指令的指令执行逻辑电路的装置。第一指令和第二指令都从相应的第一和第二输入向量的多个第一非重叠部分之一中选择第一组输入向量元素。第一组具有第一位宽度。多个第一非重叠部分中的每一个具有与第一组相同的位宽度。第三指令和第四指令都从相应的第三和第四输入向量的多个第二非重叠部分之一中选择第二组输入向量元素。第二组具有比第一位宽大的第二位宽度。多个第二非重叠部分中的每一个具有与第二组相同的位宽度。该装置包括掩蔽层电路，以第一粒度掩蔽第一和第三指令的第一和第二组，其中由其产生的相应结果是第一和第三指令的相应结果。掩蔽电路还以第二粒度掩蔽第二和第四指令的第一和第二组，其中由其产生的相应结果是第二和第四指令的相应结果。

9.

发明授权
Apparatus and method of improved permute instructions 有权

公开(公告)号：US09658850B2

公开(公告)日：2017-05-23

申请号：US13976993

申请日：2011-12-23

申请人： Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Bret L. Toll , Mark J. Charney , Zeev Sperber , Amit Gradstein

发明人： Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Bret L. Toll , Mark J. Charney , Zeev Sperber , Amit Gradstein

IPC分类号： G06F9/30

CPC分类号： G06F9/30029 , G06F9/30018 , G06F9/30032 , G06F9/30036

摘要： An apparatus is described having instruction execution logic circuitry. The instruction execution logic circuitry has input vector element routing circuitry to perform the following for each of three different instructions: for each of a plurality of output vector element locations, route into an output vector element location an input vector element from one of a plurality of input vector element locations that are available to source the output vector element. The output vector element and each of the input vector element locations are one of three available bit widths for the three different instructions. The apparatus further includes masking layer circuitry coupled to the input vector element routing circuitry to mask a data structure created by the input vector routing element circuitry. The masking layer circuitry is designed to mask at three different levels of granularity that correspond to the three available bit widths.

10.

发明授权
Apparatus and method of improved insert instructions 有权

公开(公告)号：US09619236B2

公开(公告)日：2017-04-11

申请号：US13976992

申请日：2011-12-23

申请人： Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Bret L. Toll , Mark J. Charney , Zeev Sperber , Amit Gradstein

发明人： Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Bret L. Toll , Mark J. Charney , Zeev Sperber , Amit Gradstein

IPC分类号： G06F9/30

CPC分类号： G06F9/30181 , G06F9/30018 , G06F9/30032 , G06F9/30036 , G06F9/3013 , G06F9/30167 , G06F9/3802

摘要： An apparatus is described having instruction execution logic circuitry to execute first, second, third and fourth instruction. Both the first instruction and the second instruction insert a first group of input vector elements to one of multiple first non overlapping sections of respective first and second resultant vectors. The first group has a first bit width. Each of the multiple first non overlapping sections have a same bit width as the first group. Both the third instruction and the fourth instruction insert a second group of input vector elements to one of multiple second non overlapping sections of respective third and fourth resultant vectors. The second group has a second bit width that is larger than said first bit width. Each of the multiple second non overlapping sections have a same bit width as the second group. The apparatus also includes masking layer circuitry to mask the first and third instructions at a first resultant vector granularity, and, mask the second and fourth instructions at a second resultant vector granularity.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类