-
公开(公告)号:US20140149724A1
公开(公告)日:2014-05-29
申请号:US14170397
申请日:2014-01-31
申请人: Robert C. Valentine , Jesus Corbal San Adrian , Roger Espasa Sans , Robert D. Cavin , Bret L. Toll , Santiago Galan Duran , Jeffrey G. Wiedemeier , Sridhar Samudrala , Milind Baburao Girkar , Edward Thomas Grochowski , Jonathan Cannon Hall , Dennis R. Bradford , Elmoustapha Ould-Ahmed-Vall , James C. Abel , Mark Charney , Seth Abraham , Suleyman Sair , Andrew Thomas Forsyth , Lisa Wu , Charles Yount
发明人: Robert C. Valentine , Jesus Corbal San Adrian , Roger Espasa Sans , Robert D. Cavin , Bret L. Toll , Santiago Galan Duran , Jeffrey G. Wiedemeier , Sridhar Samudrala , Milind Baburao Girkar , Edward Thomas Grochowski , Jonathan Cannon Hall , Dennis R. Bradford , Elmoustapha Ould-Ahmed-Vall , James C. Abel , Mark Charney , Seth Abraham , Suleyman Sair , Andrew Thomas Forsyth , Lisa Wu , Charles Yount
IPC分类号: G06F9/30
CPC分类号: G06F9/30181 , G06F9/3001 , G06F9/30014 , G06F9/30018 , G06F9/30032 , G06F9/30036 , G06F9/30047 , G06F9/30145 , G06F9/30149 , G06F9/30185 , G06F9/30192 , G06F9/34
摘要: A vector friendly instruction format and execution thereof. According to one embodiment of the invention, a processor is configured to execute an instruction set. The instruction set includes a vector friendly instruction format. The vector friendly instruction format has a plurality of fields including a base operation field, a modifier field, an augmentation operation field, and a data element width field, wherein the first instruction format supports different versions of base operations and different augmentation operations through placement of different values in the base operation field, the modifier field, the alpha field, the beta field, and the data element width field, and wherein only one of the different values may be placed in each of the base operation field, the modifier field, the alpha field, the beta field, and the data element width field on each occurrence of an instruction in the first instruction format in instruction streams.
摘要翻译: 一种向量友好的指令格式及其执行。 根据本发明的一个实施例,处理器被配置为执行指令集。 指令集包括向量友好指令格式。 向量友好指令格式具有多个字段,包括基本操作字段,修改字段,增加操作字段和数据元素宽度字段,其中第一指令格式支持不同版本的基本操作和不同的扩充操作, 基本操作字段,修饰符字段,α字段,β字段和数据元素宽度字段中的不同值,并且其中只有一个不同的值可以被放置在基本操作字段,修饰符字段, 在指令流中的第一指令格式的指令的每次出现时的alpha字段,β字段和数据元素宽度字段。
-
公开(公告)号:US20130305020A1
公开(公告)日:2013-11-14
申请号:US13976707
申请日:2011-09-30
申请人: Robert C. Valentine , Jesus Corbal San Adrian , Roger Espasa Sans , Robert D. Cavin , Bret L. Toll , Santiago Galan Duran , Jeffrey G. Wiedemeier , Sridhar Samudrala , Milind Baburao Girkar , Edward Thomas Grochowski , Jonathan Cannon Hall , Dennis R. Bradford , Elmoustapha Ould-Ahmed-Vall , James C. Abel , Mark Charney , Seth Abraham , Suleyman Sair , Andrew Thomas Forsyth , Lisa Wu , Charles Yount
发明人: Robert C. Valentine , Jesus Corbal San Adrian , Roger Espasa Sans , Robert D. Cavin , Bret L. Toll , Santiago Galan Duran , Jeffrey G. Wiedemeier , Sridhar Samudrala , Milind Baburao Girkar , Edward Thomas Grochowski , Jonathan Cannon Hall , Dennis R. Bradford , Elmoustapha Ould-Ahmed-Vall , James C. Abel , Mark Charney , Seth Abraham , Suleyman Sair , Andrew Thomas Forsyth , Lisa Wu , Charles Yount
IPC分类号: G06F9/30
CPC分类号: G06F9/30145 , G06F9/3001 , G06F9/30014 , G06F9/30018 , G06F9/30025 , G06F9/30032 , G06F9/30036 , G06F9/30047 , G06F9/30149 , G06F9/30181 , G06F9/30185 , G06F9/30192 , G06F9/34
摘要: A vector friendly instruction format and execution thereof. According to one embodiment of the invention, a processor is configured to execute an instruction set. The instruction set includes a vector friendly instruction format. The vector friendly instruction format has a plurality of fields including a base operation field, a modifier field, an augmentation operation field, and a data element width field, wherein the first instruction format supports different versions of base operations and different augmentation operations through placement of different values in the base operation field, the modifier field, the alpha field, the beta field, and the data element width field, and wherein only one of the different values may be placed in each of the base operation field, the modifier field, the alpha field, the beta field, and the data element width field on each occurrence of an instruction in the first instruction format in instruction streams.
摘要翻译: 一种向量友好的指令格式及其执行。 根据本发明的一个实施例,处理器被配置为执行指令集。 指令集包括向量友好指令格式。 向量友好指令格式具有多个字段,包括基本操作字段,修改字段,增加操作字段和数据元素宽度字段,其中第一指令格式支持不同版本的基本操作和不同的扩充操作, 基本操作字段,修饰符字段,α字段,β字段和数据元素宽度字段中的不同值,并且其中只有一个不同的值可以被放置在基本操作字段,修饰符字段, 在指令流中的第一指令格式的指令的每次出现时的alpha字段,β字段和数据元素宽度字段。
-
公开(公告)号:US20130326192A1
公开(公告)日:2013-12-05
申请号:US13995430
申请日:2011-12-22
申请人: Elmoustapha Ould-Ahmed-Vall , Milind Baburao Girkar , Robert C. Valentine , Suleyman Sair , Jesus Corbal San Adrian
发明人: Elmoustapha Ould-Ahmed-Vall , Milind Baburao Girkar , Robert C. Valentine , Suleyman Sair , Jesus Corbal San Adrian
IPC分类号: G06F9/30
CPC分类号: G06F9/30098 , G06F9/30032 , G06F9/30036
摘要: Embodiments of systems, apparatuses, and methods for performing a mask broadcast instruction in a computer processor are described. In some embodiments, the execution of a mask broadcast instruction causes a broadcast of a data element of the source operand to a destination register of the destination operand according to the broadcast size.
摘要翻译: 描述了用于在计算机处理器中执行掩码广播指令的系统,装置和方法的实施例。 在一些实施例中,掩码广播指令的执行使源操作数的数据元素根据广播大小向目的地操作数的目的地寄存器进行广播。
-
公开(公告)号:US10241792B2
公开(公告)日:2019-03-26
申请号:US13993068
申请日:2011-12-30
摘要: A processor core that includes a hardware decode unit and an execution engine unit. The hardware decode unit to decode a vector frequency expand instruction, wherein the vector frequency compress instruction includes a source operand and a destination operand, wherein the source operand specifies a source vector register that includes one or more pairs of a value and run length that are to be expanded into a run of that value based on the run length. The execution engine unit to execute the decoded vector frequency expand instruction which causes, a set of one or more source data elements in the source vector register to be expanded into a set of destination data elements comprising more elements than the set of source data elements and including at least one run of identical values which were run length encoded in the source vector register.
-
公开(公告)号:US20140019714A1
公开(公告)日:2014-01-16
申请号:US13993068
申请日:2011-12-30
IPC分类号: G06F9/30
CPC分类号: G06F9/30145 , G06F9/30018 , G06F9/30025 , G06F9/30036 , H03M7/46 , H03M7/6005
摘要: A processor core that includes a hardware decode unit and an execution engine unit. The hardware decode unit to decode a vector frequency expand instruction, wherein the vector frequency compress instruction includes a source operand and a destination operand, wherein the source operand specifies a source vector register that includes one or more pairs of a value and run length that are to be expanded into a run of that value based on the run length. The execution engine unit to execute the decoded vector frequency expand instruction which causes, a set of one or more source data elements in the source vector register to be expanded into a set of destination data elements comprising more elements than the set of source data elements and including at least one run of identical values which were run length encoded in the source vector register.
摘要翻译: 包括硬件解码单元和执行引擎单元的处理器核心。 所述硬件解码单元对矢量频率扩展指令进行解码,其中所述向量频率压缩指令包括源操作数和目的操作数,其中所述源操作数指定源向量寄存器,所述源向量寄存器包括一对或多对值和游程长度, 根据运行长度将其扩展为该值的运行。 执行引擎单元,用于执行解码矢量频率扩展指令,其使得源向量寄存器中的一个或多个源数据元素的集合被扩展为包括比该源数据元素集合更多的元素的一组目的地数据元素,以及 包括在源向量寄存器中运行长度编码的至少一个相同值的运行。
-
公开(公告)号:US09459866B2
公开(公告)日:2016-10-04
申请号:US13993058
申请日:2011-12-30
申请人: Elmoustapha Ould-Ahmed-Vall , Suleyman Sair , Kshitij A. Doshi , Charles R. Yount , Bret L. Toll
发明人: Elmoustapha Ould-Ahmed-Vall , Suleyman Sair , Kshitij A. Doshi , Charles R. Yount , Bret L. Toll
CPC分类号: G06F9/30036 , G06F9/30018 , G06F9/30025 , G06F9/30032 , G06F9/3016 , H03M7/46 , H03M7/6005
摘要: A processor core that includes a hardware decode unit to decode a vector frequency compress instruction that includes a source operand and a destination operand. The source operand specifying a source vector register that includes a plurality of source data elements including one or more runs of identical data elements that are each to be compressed in a destination vector register as a value and run length pair. The destination operand identifies the destination vector register. The processor core also includes an execution engine unit to execute the decoded vector frequency compress instruction which causes, for each source data element, a value to be copied into the destination vector register to indicate that source data element's value. One or more runs of the source data elements equal are encoded in the destination vector register as the predetermined compression value followed by a run length for that run.
摘要翻译: 一种处理器核心,其包括用于解码包括源操作数和目的地操作数的向量频率压缩指令的硬件解码单元。 源操作数指定源向量寄存器,其包括多个源数据元素,其包括在目的地向量寄存器中各自被压缩的相同数据元素的一个或多个游程作为值和游程长度对。 目标操作数标识目标向量寄存器。 处理器核心还包括执行引擎单元,用于执行解码的向量频率压缩指令,其对于每个源数据元素,其将被复制到目的地向量寄存器中的值指示源数据元素的值。 源数据元素相等的一个或多个运行在目标向量寄存器中被编码为预定压缩值,后跟该运行的运行长度。
-
公开(公告)号:US09864602B2
公开(公告)日:2018-01-09
申请号:US13977229
申请日:2011-12-30
申请人: Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal San Andrian , Suleyman Sair , Bret L. Toll , Zeev Sperber , Amit Gradstein , Asaf Rubenstein
发明人: Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal San Andrian , Suleyman Sair , Bret L. Toll , Zeev Sperber , Amit Gradstein , Asaf Rubenstein
IPC分类号: G06F9/30
CPC分类号: G06F9/30032 , G06F9/30036
摘要: A method of an aspect includes receiving a masked packed rotate instruction. The instruction indicates a first source packed data including a plurality of packed data elements, a packed data operation mask having a plurality of mask elements, at least one rotation amount, and a destination storage location. A result packed data is stored in the destination storage location in response to the instruction. The result packed data includes result data elements that each correspond to a different one of the mask elements in a corresponding relative position. Result data elements that are not masked out by the corresponding mask element include one of the data elements of the first source packed data in a corresponding position that has been rotated. Result data elements that are masked out by the corresponding mask element include a masked out value. Other methods, apparatus, systems, and instructions are disclosed.
-
公开(公告)号:US20140317377A1
公开(公告)日:2014-10-23
申请号:US13993058
申请日:2011-12-30
申请人: Elmoustapha Ould-Ahmed-Vall , Suleyman Sair , Kshitij A. Doshi , Charles R. Yount , Bret L. Toll
发明人: Elmoustapha Ould-Ahmed-Vall , Suleyman Sair , Kshitij A. Doshi , Charles R. Yount , Bret L. Toll
IPC分类号: G06F9/30
CPC分类号: G06F9/30036 , G06F9/30018 , G06F9/30025 , G06F9/30032 , G06F9/3016 , H03M7/46 , H03M7/6005
摘要: A processor core that includes a hardware decode unit to decode a vector frequency compress instruction that includes a source operand and a destination operand. The source operand specifying a source vector register that includes a plurality of source data elements including one or more runs of identical data elements that are each to be compressed in a destination vector register as a value and run length pair. The destination operand identifies the destination vector register. The processor core also includes an execution engine unit to execute the decoded vector frequency compress instruction which causes, for each source data element, a value to be copied into the destination vector register to indicate that source data element's value. One or more runs of the source data elements equal are encoded in the destination vector register as the predetermined compression value followed by a run length for that run.
摘要翻译: 一种处理器核心,其包括用于解码包括源操作数和目的地操作数的向量频率压缩指令的硬件解码单元。 源操作数指定源向量寄存器,其包括多个源数据元素,其包括在目的地向量寄存器中各自被压缩的相同数据元素的一个或多个游程作为值和游程长度对。 目标操作数标识目标向量寄存器。 处理器核心还包括执行引擎单元,用于执行解码的向量频率压缩指令,其对于每个源数据元素,其将被复制到目的地向量寄存器中的值指示源数据元素的值。 源数据元素相等的一个或多个运行在目标向量寄存器中被编码为预定压缩值,后跟该运行的运行长度。
-
公开(公告)号:US20140040604A1
公开(公告)日:2014-02-06
申请号:US13977229
申请日:2011-12-30
申请人: Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal San Andrian , Suleyman Sair , Bret L. Toll , Zeev Sperber , Amit Gradstein , Asaf Rubenstein
发明人: Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal San Andrian , Suleyman Sair , Bret L. Toll , Zeev Sperber , Amit Gradstein , Asaf Rubenstein
IPC分类号: G06F9/30
CPC分类号: G06F9/30032 , G06F9/30036
摘要: A method of an aspect includes receiving a masked packed rotate instruction. The instruction indicates a first source packed data including a plurality of packed data elements, a packed data operation mask having a plurality of mask elements, at least one rotation amount, and a destination storage location. A result packed data is stored in the destination storage location in response to the instruction. The result packed data includes result data elements that each correspond to a different one of the mask elements in a corresponding relative position. Result data elements that are not masked out by the corresponding mask element include one of the data elements of the first source packed data in a corresponding position that has been rotated. Result data elements that are masked out by the corresponding mask element include a masked out value. Other methods, apparatus, systems, and instructions are disclosed.
摘要翻译: 一种方面的方法包括接收掩蔽的包装旋转指令。 指示指示包括多个打包数据元素的第一源打包数据,具有多个掩码元素的打包数据操作掩码,至少一个旋转量和目的地存储位置。 响应于该指令,结果打包数据被存储在目的地存储位置。 结果打包数据包括每个对应于相应相对位置中的不同掩模元素的结果数据元素。 未被对应的掩码元素掩蔽的结果数据元素包括在已经旋转的对应位置中的第一源打包数据的数据元素中的一个。 由相应的掩码元素屏蔽的结果数据元素包括一个被屏蔽的值。 公开了其它方法,装置,系统和指令。
-
公开(公告)号:US20130339661A1
公开(公告)日:2013-12-19
申请号:US13991858
申请日:2011-12-30
申请人: Elmoustapha Ould-Ahmed-Vall , Suleyman Sair , Kshitij A. Doshi , Charles R. Yount , Bret L. Toll
发明人: Elmoustapha Ould-Ahmed-Vall , Suleyman Sair , Kshitij A. Doshi , Charles R. Yount , Bret L. Toll
IPC分类号: G06F9/30
CPC分类号: G06F9/30018 , G06F9/30036 , H03M7/46
摘要: A processor core including a hardware decode unit to decode vector instructions for decompressing a run length encoded (RLE) set of source data elements and an execution unit to execute the decoded instructions. The execution unit generates a first mask by comparing set of source data elements with a set of zeros and then counts the trailing zeros in the mask. A second mask is made based on the count of trailing zeros. The execution unit then copies the set of source data elements to a buffer using the second mask and then reads the number of RLE zeros from the set of source data elements. The buffer is shifted and copied to a result and the set of source data elements is shifted to the right. If more valid data elements are in the set of source data elements this is repeated until all valid data is processed.
摘要翻译: 一种处理器核心,包括硬件解码单元,用于解码用于解压缩源数据元素的游程长度编码(RLE)集合的向量指令和执行单元以执行解码的指令。 执行单元通过将源数据元素的集合与一组零进行比较来生成第一掩码,然后计数掩码中的尾随零。 第二个掩码基于尾随零的计数。 执行单元然后使用第二掩码将源数据元素集合复制到缓冲器,然后从源数据元素集合读取RLE零的数目。 将缓冲区移位并复制到结果,并将源数据元素集合向右移动。 如果源数据元素集合中有更多有效的数据元素,则重复此操作,直到处理所有有效数据。
-
-
-
-
-
-
-
-
-