-
公开(公告)号:US20190042496A1
公开(公告)日:2019-02-07
申请号:US16140472
申请日:2018-09-24
申请人: Simon N. Peffers , Kirk S. Yap , Sean Gulley , Vinodh Gopal , Wajdi Feghali
发明人: Simon N. Peffers , Kirk S. Yap , Sean Gulley , Vinodh Gopal , Wajdi Feghali
摘要: Apparatus, systems and methods for implementing delayed decompression schemes. As a burst of packets comprising compressed packets and uncompressed packets are received over an interconnect link, they are buffered in a receive buffer without decompression. Subsequently, the packets are forwarded from the receive buffer to a consumer such as processor core, with the compressed packets being decompressed prior to reaching the processor core. Under a first delayed decompression approach, packets are decompressed when they are read from the receive buffer in conjunction with forwarding the uncompressed packet (or uncompressed data contained therein) to the consumer. Under a second delayed decompression scheme, the packets are read from the receive buffer and forwarded to a decompressor using a first datapath width matching the width of the packets, decompressed, and then forwarded to the consumer using a second datapath width matching the width of the uncompressed data.
-
公开(公告)号:US20190045031A1
公开(公告)日:2019-02-07
申请号:US16014690
申请日:2018-06-21
申请人: Wajdi Feghali , Vinodh Gopal , Kirk Yap , Sean Gulley , Simon Peffers
发明人: Wajdi Feghali , Vinodh Gopal , Kirk Yap , Sean Gulley , Simon Peffers
IPC分类号: H04L29/06 , H04L12/863
摘要: Methods and apparatus for low-latency link compression schemes. Under the schemes, selected packets or messages are dynamically selected for compression in view of current transmit queue levels. The latency incurred during compression and decompression is not added to the data-path, but sits on the side of the transmit queue. The system monitors the queue depth and, accordingly, initiates compression jobs based on the depth. Different compression levels may be dynamically selected and used based on queue depth. Under various schemes, either packets or messages are enqueued in the transmit queue or pointers to such packets and messages are enqueued. Additionally, packets/message may be compressed prior to being enqueued, or after being enqueued, wherein an original uncompressed packet is replaced with a compressed packet. Compressed and uncompressed packets may be stored in queues or buffers and transmitted using a different numbers of transmit cycles based on their compression ratios. The schemes may be implemented to improve the effective bandwidth of various types of links, including serial links, bus-type links, and socket-to-socket links in multi-socket systems.
-
公开(公告)号:US08363828B2
公开(公告)日:2013-01-29
申请号:US12368196
申请日:2009-02-09
申请人: Vinodh Gopal , Kirk Yap , Gilbert Wolrich , Wajdi Feghali , Robert Ottavi , Sean Gulley
发明人: Vinodh Gopal , Kirk Yap , Gilbert Wolrich , Wajdi Feghali , Robert Ottavi , Sean Gulley
IPC分类号: G06F21/00
CPC分类号: G09C1/00 , H04L9/0637 , H04L2209/12
摘要: An embodiment includes at least one processing unit to perform at least first and second sets of diffusion-related operations to produce a resulting block from a data block, and that includes at least one stage and at least one other stage. The at least one stage is to select one of first operands and second operands input to the at least one other stage. The first and second operands are respectively associated with the first and second sets of operations, respectively. The at least one other stage involves arithmetic and logical operations common to both the first and second sets of operations. At least one other processing unit is to perform at least one set of cryptographic-related operations (different, at least in part, from the first and second sets of operations) on at least one of (1) another block to produce the data block and (2) the resulting block.
摘要翻译: 一个实施例包括至少一个处理单元,用于执行至少第一和第二组扩散相关操作以从数据块产生结果块,并且其包括至少一个阶段和至少一个其他阶段。 所述至少一个级是选择输入至少一个其他级的第一操作数和第二操作数之一。 第一和第二操作数分别分别与第一和第二组操作相关联。 所述至少一个其他阶段涉及对于第一和第二组操作共同的算术和逻辑运算。 至少一个其他处理单元将在(1)另一个块中的至少一个上执行至少一组密码相关操作(至少部分地不同于第一和第二组操作),以产生数据块 和(2)得到的块。
-
公开(公告)号:US20100205455A1
公开(公告)日:2010-08-12
申请号:US12368196
申请日:2009-02-09
申请人: Vinodh Gopal , Kirk Yap , Gilbert Wolrich , Wajdi Feghali , Robert Ottavi , Sean Gulley
发明人: Vinodh Gopal , Kirk Yap , Gilbert Wolrich , Wajdi Feghali , Robert Ottavi , Sean Gulley
IPC分类号: H04L9/00
CPC分类号: G09C1/00 , H04L9/0637 , H04L2209/12
摘要: An embodiment includes at least one processing unit to perform at least first and second sets of diffusion-related operations to produce a resulting block from a data block, and that includes at least one stage and at least one other stage. The at least one stage is to select one of first operands and second operands input to the at least one other stage. The first and second operands are respectively associated with the first and second sets of operations, respectively. The at least one other stage involves arithmetic and logical operations common to both the first and second sets of operations. At least one other processing unit is to perform at least one set of cryptographic-related operations (different, at least in part, from the first and second sets of operations) on at least one of (1) another block to produce the data block and (2) the resulting block.
摘要翻译: 一个实施例包括至少一个处理单元,用于执行至少第一和第二组扩散相关操作以从数据块产生结果块,并且其包括至少一个阶段和至少一个其他阶段。 所述至少一个级是选择输入至少一个其他级的第一操作数和第二操作数之一。 第一和第二操作数分别分别与第一和第二组操作相关联。 所述至少一个其他阶段涉及对于第一和第二组操作共同的算术和逻辑运算。 至少一个其他处理单元将在(1)另一个块中的至少一个上执行至少一组密码相关操作(至少部分地不同于第一和第二组操作),以产生数据块 和(2)得到的块。
-
公开(公告)号:US20160191238A1
公开(公告)日:2016-06-30
申请号:US14582707
申请日:2014-12-24
申请人: Kirk YAP , Gilbert Wolrich , Sudhir Satpathy , Sean Gulley , Vinodh Gopal , Sanu Mathew , Wajdi Feghali
发明人: Kirk YAP , Gilbert Wolrich , Sudhir Satpathy , Sean Gulley , Vinodh Gopal , Sanu Mathew , Wajdi Feghali
IPC分类号: H04L9/08
CPC分类号: H04L9/0822 , G09C1/00 , H04L9/0631 , H04L2209/122
摘要: Embodiments of an invention for SMS4 acceleration hardware are disclosed. In an embodiment, an apparatus includes SMS4 hardware and key transformation hardware. The SMS4 hardware is to execute a round of encryption and a round of key expansion. The key transformation hardware is to transform a key to provide for the SMS4 hardware to execute a round of decryption.
摘要翻译: 公开了用于SMS4加速硬件的发明的实施例。 在一个实施例中,一种装置包括SMS4硬件和密钥变换硬件。 SMS4硬件是执行一轮加密和一轮关键扩展。 密钥转换硬件是转换密钥以提供SMS4硬件来执行一轮解密。
-
公开(公告)号:US20190243780A1
公开(公告)日:2019-08-08
申请号:US16380114
申请日:2019-04-10
申请人: Vinodh Gopal , Simon N. Peffers
发明人: Vinodh Gopal , Simon N. Peffers
IPC分类号: G06F12/1027 , G06F12/0815 , G06F12/1009 , G06F12/0811
CPC分类号: G06F12/1027 , G06F12/0811 , G06F12/0815 , G06F12/1009 , G06F2212/401
摘要: Methods and apparatus for scalable application-customized memory compression. Data is selectively stored in system memory using compressed formats or uncompressed format using a plurality of compression schemes. A compression ID is used to identify the compression scheme (or no compression) to be used and included with read and write requests submitted to a memory controller. For memory writes, the memory controller dynamically compresses data written to memory cache lines using compression algorithms (or no compression) identified by compression ID. For memory reads, the memory controller dynamically decompresses data stored memory cache lines in compressed formats using decompression algorithms identified by the compression ID. Page tables and TLB entries are augments to include a compression ID field. The format of memory cache lines includes a compression metabit indicating whether the data in the cache line is compressed. Support for DMA reads and writes from IO devices such as GPUs using selective memory compression is also provided.
-
公开(公告)号:US20140095845A1
公开(公告)日:2014-04-03
申请号:US13631807
申请日:2012-09-28
申请人: Vinodh Gopal , Wajdi Feghali , Gilbert Wolrich , Kirk Yap
发明人: Vinodh Gopal , Wajdi Feghali , Gilbert Wolrich , Kirk Yap
IPC分类号: G06F9/30
摘要: An apparatus and method are described for performing efficient Boolean operations in a pipelined processor which, in one embodiment, does not natively support three operand instructions. For example, a processor according to one embodiment of the invention comprises: a set of registers for storing packed operands; Boolean operation logic to execute a single instruction which uses three or more source operands packed in the set of registers, the Boolean operation logic to read at least three source operands and an immediate value to perform a Boolean operation on the three source operands, wherein the Boolean operation comprises: combining a bit read from each of the three operands to form an index to the immediate value, the index identifying a bit position within the immediate value; reading the bit from the identified bit position of the immediate value; and storing the bit from the identified bit position of the immediate value in a destination register.
摘要翻译: 描述了一种用于在流水线处理器中执行有效的布尔运算的装置和方法,其在一个实施例中不本地支持三个操作数指令。 例如,根据本发明的一个实施例的处理器包括:一组用于存储打包操作数的寄存器; 用于执行单个指令的布尔运算逻辑,其使用打包在该组寄存器中的三个或更多个源操作数,布尔运算逻辑读取至少三个源操作数,并且立即值对三个源操作数执行布尔运算,其中, 布尔操作包括:组合从三个操作数中的每一个读取的位以形成立即值的索引,该索引标识立即值内的位位置; 从识别的位置读取该位从立即值; 并将来自所识别的立即值的比特位置的比特存储在目的地寄存器中。
-
公开(公告)号:US07961877B2
公开(公告)日:2011-06-14
申请号:US11610886
申请日:2006-12-14
申请人: Vinodh Gopal , Erdinc Ozturk , Matt Bace , Wajdi Feghali , Robert P. Ottavi
发明人: Vinodh Gopal , Erdinc Ozturk , Matt Bace , Wajdi Feghali , Robert P. Ottavi
CPC分类号: G06F7/723
摘要: The present disclosure provides a system and method for performing modular exponentiation. The method may include dividing a first polynomial into a plurality of segments and generating a first product by multiplying the plurality of segments of the first polynomial with a second polynomial. The method may also include generating a second product by shifting the contents of an accumulator with a factorization base. The method may further include adding the first product and the second product to yield a first intermediate result and reducing the first intermediate result to yield a second intermediate result. The method may also include generating a public key based on, at least in part, the second intermediate result. Of course, many alternatives, variations and modifications are possible without departing from this embodiment.
摘要翻译: 本公开提供了一种用于执行模幂运算的系统和方法。 该方法可以包括将第一多项式划分成多个段,并通过将第一多项式的多个段乘以第二多项式来生成第一乘积。 该方法还可以包括通过用因式分解基座移位累加器的内容来产生第二乘积。 该方法还可以包括添加第一产物和第二产物以产生第一中间结果并减少第一中间结果以产生第二中间结果。 该方法还可以包括至少部分地基于第二中间结果生成公钥。 当然,在不脱离本实施例的情况下,可以进行许多替代,变化和修改。
-
公开(公告)号:US20100153829A1
公开(公告)日:2010-06-17
申请号:US12336029
申请日:2008-12-16
申请人: Vinodh Gopal , Erdinc Ozturk , Gilbert Wolrich , Wajdi Feghali
发明人: Vinodh Gopal , Erdinc Ozturk , Gilbert Wolrich , Wajdi Feghali
CPC分类号: G06F7/724 , H03M13/091
摘要: In one embodiment, circuitry is provided to generate a residue based at least in part upon operations and a data stream generated based at least in part upon a packet. The operations may include at least one iteration of at least one reduction operation including (a) multiplying a first value with at least one portion of the data stream, and (b) producing a reduction by adding at least one other portion of the data stream to a result of the multiplying. The operations may include at least one other reduction operation including (c) producing another result by multiplying with a second value at least one portion of another stream based at least in part upon the reduction, (d) producing a third value by adding at least one other portion of the another stream to the another result, and (e) producing the residue by performing a Barrett reduction based at least in part upon the third value.
摘要翻译: 在一个实施例中,提供电路以至少部分地基于至少部分地基于分组产生的操作和数据流来生成残差。 操作可以包括至少一个缩减操作的迭代,包括(a)将第一值与数据流的至少一部分相乘,以及(b)通过添加数据流的至少一个其他部分来产生减少 是乘法的结果。 所述操作可以包括至少一个其它减少操作,其包括(c)至少部分地基于所述减少,通过与另一个流的至少一部分乘以第二值来产生另一结果,(d)通过至少加入来产生第三值 另一个流的另一部分到另一个结果,以及(e)至少部分地基于第三个值执行巴雷特还原来产生残留物。
-
公开(公告)号:US07607068B2
公开(公告)日:2009-10-20
申请号:US11469222
申请日:2006-08-31
IPC分类号: G11C29/00
CPC分类号: G06F11/1076 , G06F2211/1054 , G06F2211/1057
摘要: The present disclosure provides an apparatus and method for generating a Galois-field syndrome. One exemplary method may include loading a first data byte from a first storage device to a first register and loading a second data byte from a second storage device to a second register; ANDing the most significant bit (MSB) of the first data byte and a Galois-field polynomial to generate a first intermediate output; XORing each bit of the first intermediate output with the least significant bits (LSBs) of the first data byte to generate a second intermediate output; MUXing the second intermediate output with each bit of the first data byte to generate a third intermediate output; XORing each bit of the third intermediate output with each bit of the second data byte to generate at a fourth intermediate output; and generating a RAID Q syndrome based on, at least in part, the fourth intermediate output. Of course, many alternatives, variations and modifications are possible without departing from this embodiment.
摘要翻译: 本公开提供了一种用于产生伽罗瓦域综合征的装置和方法。 一个示例性方法可以包括将第一数据字节从第一存储设备加载到第一寄存器,并将第二数据字节从第二存储设备加载到第二寄存器; 将第一数据字节的最高有效位(MSB)和伽罗瓦域多项式进行比较以产生第一中间输出; 用第一数据字节的最低有效位(LSB)对第一中间输出的每个位进行异或,以产生第二中间输出; 将第二中间输出与第一数据字节的每个位进行多路复用以产生第三中间输出; 将第三中间输出的每个位与第二数据字节的每个位进行异或,以在第四中间输出处产生; 以及至少部分地基于第四中间输出产生RAID Q综合征。 当然,在不脱离本实施例的情况下,可以进行许多替代,变化和修改。
-
-
-
-
-
-
-
-
-