-
公开(公告)号:US20180129504A1
公开(公告)日:2018-05-10
申请号:US15804655
申请日:2017-11-06
Applicant: Advanced Micro Devices, Inc.
Inventor: Lee W. Howes , Benedict R. Gaster , Michael C. Houston
CPC classification number: G06F9/3851 , G06F8/458 , G06F9/3009
Abstract: A method, system, and computer program product synchronize a group of workitems executing an instruction stream on a processor. The processor is yielded by a first workitem responsive to a synchronization instruction in the instruction stream. A first one of a plurality of program counters is updated to point to a next instruction following the synchronization instruction in the instruction stream to be executed by the first workitem. A second workitem is run on the processor after the yielding.
-
752.
公开(公告)号:US20180121386A1
公开(公告)日:2018-05-03
申请号:US15354560
申请日:2016-11-17
Applicant: Advanced Micro Devices, Inc.
Inventor: Jiasheng Chen , Angel E. Socarras , Michael Mantor , YunXiao Zou , Bin He
IPC: G06F15/80 , G06F9/30 , G06F12/0875 , G06F12/0891
CPC classification number: G06F15/8007 , G06F9/3001 , G06F9/30105 , G06F9/3012 , G06F9/30123 , G06F9/3828 , G06F9/3851 , G06F9/3887 , G06F9/3891 , G06F12/0875 , G06F12/0891 , G06F2212/604
Abstract: A super single instruction, multiple data (SIMD) computing structure and a method of executing instructions in the super-SIMD is disclosed. The super-SIMD structure is capable of executing more than one instruction from a single or multiple thread and includes a plurality of vector general purpose registers (VGPRs), a first arithmetic logic unit (ALU), the first ALU coupled to the plurality of VGPRs, a second ALU, the second ALU coupled to the plurality of VGPRs, and a destination cache (Do$) that is coupled via bypass and forwarding logic to the first ALU, the second ALU and receiving an output of the first ALU and the second ALU. The Do$ holds multiple instructions results to extend an operand by-pass network to save read and write transactions power. A compute unit (CU) and a small CU including a plurality of super-SIMDs are also disclosed.
-
公开(公告)号:US20180121204A1
公开(公告)日:2018-05-03
申请号:US15845641
申请日:2017-12-18
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Jaewoong CHUNG , David S. CHRISTIE , Michael P. HOHMUTH , Stephan DIESTELHORST , Martin POHLACK , Luke YEN
CPC classification number: G06F9/3842 , G06F9/3004 , G06F9/30087 , G06F9/3834 , G06F9/3857 , G06F9/3859 , G06F9/467
Abstract: A processing core of a plurality of processing cores is configured to execute a speculative region of code a single atomic memory transaction with respect one or more others of the plurality of processing cores. In response to determining an abort condition for issued one of the plurality of program instructions and in response to determining that the issued program instruction is not part of a mispredicted execution path, the processing core is configured to abort an attempt to execute the speculative region of code.
-
公开(公告)号:US20180114290A1
公开(公告)日:2018-04-26
申请号:US15331278
申请日:2016-10-21
Applicant: Advanced Micro Devices, Inc.
Inventor: Timour T. Paltashev , Michael Mantor , Rex Eldon McCrary
Abstract: A graphics processing unit (GPU) includes a plurality of programmable processing cores configured to process graphics primitives and corresponding data and a plurality of fixed-function hardware units. The plurality of processing cores and the plurality of fixed-function hardware units are configured to implement a configurable number of virtual pipelines to concurrently process different command flows. Each virtual pipeline includes a configurable number of fragments and an operational state of each virtual pipeline is specified by a different context. The configurable number of virtual pipelines can be modified from a first number to a second number that is different than the first number. An emulation of a fixed-function hardware unit can be instantiated on one or more of the graphics processing cores in response to detection of a bottleneck in a fixed-function hardware unit. One or more of the virtual pipelines can then be reconfigured to utilize the emulation instead of the fixed-function hardware unit.
-
公开(公告)号:US09953687B1
公开(公告)日:2018-04-24
申请号:US15299709
申请日:2016-10-21
Applicant: Advanced Micro Devices, Inc.
Inventor: John J. Wuu , Ryan Freese , Russell J. Schreiber
IPC: G11C11/41 , G11C11/419 , G11C7/12 , H03K19/0185 , G11C7/06 , G11C7/22
CPC classification number: G11C7/12 , G11C5/14 , G11C7/08 , G11C7/222 , G11C7/225 , H03K19/00323 , H03K19/018507
Abstract: An interlock circuit utilizes a single combinatorial pseudo-dynamic logic gate to take inputs from two voltage domains at the same time without requiring either input to be level shifted. The interlock design allows hold timing to be met across a large voltage range of both supplies in a dual-voltage supply environment while not significantly hurting setup time by having much lower latency than the latency of a level shifter.
-
公开(公告)号:US20180109561A1
公开(公告)日:2018-04-19
申请号:US15298049
申请日:2016-10-19
Applicant: Advanced Micro Devices, Inc.
Inventor: Andrew G. Kegel
IPC: H04L29/06
CPC classification number: H04L67/10 , G06F21/57 , G06F21/575 , H04L9/3234 , H04L2209/127
Abstract: Systems, apparatuses, and methods for implementing trusted cluster attestation techniques are disclosed. A cluster includes multiple computing devices connected together and at least one cluster security module. The cluster security module collects measurement logs and attestations from N computing devices, with N being a positive integer greater than one. The cluster security module also maintains a log and calculates an attestation for its own hardware and/or software. The cluster security module combines the logs from the N computing device and the log of the cluster security module into an aggregate log, with N+1 logs combined into the aggregate log. Then, the cluster security module generates a single attestation for the cluster to represent the cluster as a whole. The cluster security module is configured to provide the single attestation and aggregate log to an external device responsive to receiving a challenge request from the external device.
-
公开(公告)号:US20180088948A1
公开(公告)日:2018-03-29
申请号:US15273916
申请日:2016-09-23
Applicant: Advanced Micro Devices, Inc.
Inventor: Anupama Rajesh Rasale , Dibyendu Das , Ashutosh Nema , Md Asghar Ahmad Shahid , Prathiba Kumar
CPC classification number: G06F9/30036 , G06F9/30032 , G06F9/30043 , G06F9/3455 , G06F15/8007 , G06F15/8053
Abstract: Systems, apparatuses, and methods for utilizing efficient vectorization techniques for operands in non-sequential memory locations are disclosed. A system includes a vector processing unit (VPU) and one or more memory devices. In response to determining that a plurality of vector operands are stored in non-sequential memory locations, the VPU performs a plurality of vector load operations to load the plurality of vector operands into a plurality of vector registers. Next, the VPU performs a shuffle operation to consolidate the plurality of vector operands from the plurality of vector registers into a single vector register. Then, the VPU performs a vector operation on the vector operands stored in the single vector register. The VPU can also perform a vector store operation by permuting and storing a plurality of vector operands in appropriate locations within multiple vector registers and then storing the vector registers to locations in memory using a mask.
-
758.
公开(公告)号:US20180084270A1
公开(公告)日:2018-03-22
申请号:US15271055
申请日:2016-09-20
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Ihab Amer , Gabor Sines , Edward Harold , Jinbo Qiu , Lei Zhang , Yang Liu , Zhen Chen , Ying Luo , Shu-Hsien Wu , Zhong Cai
IPC: H04N19/513 , H04N19/172 , H04N19/105
CPC classification number: H04N19/57 , H04N19/433
Abstract: A processing apparatus is provided that includes an encoder configured to encode current frames of video data using previously encoded reference frames and perform motion searches within a search window about each of a plurality of co-located portions of a reference frame. The processing apparatus also includes a processor configured to determine, prior to performing the motion searches, which locations of the reference frame to reload the search window according to a threshold number of search window reloads using predicted motions of portions of the reference frame corresponding to each of the locations. The processor is also configured to cause the encoder to reload the search window at the determined locations of the reference frame and, for each of the remaining locations of the reference frame, slide the search window in a first direction indicated by the location of the next co-located portion of the reference frame.
-
公开(公告)号:US20180081818A1
公开(公告)日:2018-03-22
申请号:US15268974
申请日:2016-09-19
Applicant: Advanced Micro Devices, Inc.
Inventor: Shuai Che , Jieming Yin
IPC: G06F12/0897
CPC classification number: G06F12/0897 , G06F2212/1024 , G06F2212/60
Abstract: A method and apparatus for transmitting data includes determining whether to apply a mask to a cache line that includes a first type of data and a second type of data for transmission based upon a first criteria. The second type of data is filtered from the cache line, and the first type of data along with an identifier of the applied mask is transmitted. The first type of data and the identifier is received, and the second type of data is combined with the first type of data to recreate the cache line based upon the received identifier.
-
公开(公告)号:US20180081625A1
公开(公告)日:2018-03-22
申请号:US15271077
申请日:2016-09-20
Applicant: Advanced Micro Devices, Inc.
Inventor: XuHong Xiong , Pingping Shao , ZhongXiang Luo , ChenBin Wang
IPC: G06F5/14 , G11C21/00 , G06F12/0811 , G06F13/16 , G06F5/06
CPC classification number: G06F5/14 , G06F5/065 , G06F12/0811 , G06F13/1673 , G06F2205/067 , G06F2205/126 , G06F2212/283 , G11C21/00
Abstract: A system and method for managing data in a ring buffer is disclosed. The system includes a legacy ring buffer functioning as an on-chip ring buffer, a supplemental buffer for storing data in the ring buffer, a preload ring buffer that is on-chip and capable of receiving preload data from the supplemental buffer, a write controller that determines where to write data that is write requested by a write client of the ring buffer, and a read controller that controls a return of data to a read client pursuant to a read request to the ring buffer.
-
-
-
-
-
-
-
-
-