-
公开(公告)号:US11769041B2
公开(公告)日:2023-09-26
申请号:US16177218
申请日:2018-10-31
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Sateesh Lagudu , Lei Zhang , Allen H. Rush
CPC classification number: G06N3/063 , G06F7/5443 , G06F17/16 , G06N20/00
Abstract: Systems, apparatuses, and methods for implementing a low latency long short-term memory (LSTM) machine learning engine using sequence interleaving techniques are disclosed. A computing system includes at least a host processing unit, a machine learning engine, and a memory. The host processing unit detects a plurality of sequences which will be processed by the machine learning engine. The host processing unit interleaves the sequences into data blocks and stores the data blocks in the memory. When the machine learning engine receives a given data block, the machine learning engine performs, in parallel, a plurality of matrix multiplication operations on the plurality of sequences in the given data block and a plurality of coefficients. Then, the outputs of the matrix multiplication operations are coupled to one or more LSTM layers.
-
2.
公开(公告)号:US11503310B2
公开(公告)日:2022-11-15
申请号:US16176826
申请日:2018-10-31
Applicant: ATI TECHNOLOGIES ULC , ADVANCED MICRO DEVICES, INC.
Inventor: Lei Zhang , David Glen , Kim A. Meinerth
IPC: H04N19/186 , H04N19/156 , H04N19/124 , G06T5/00 , G06T7/90 , H04N19/70
Abstract: A device includes an encoder, decoder, codec or combination thereof and inline hardware conversion units that are operative to convert stored image data into one of: an HDR/WCG format and an SDR/SCG format during the conversion process. Each of the inline hardware conversion units is operative to perform the conversion process independent of another read operation with the memory that stores the image data to be converted. In one example, an encoding unit is operative to perform a write operation with a memory to store the converted image data after completing the conversion process. In another example, a decoding unit is operative to perform a read operation with the memory to retrieve the image data from the memory before initiating the conversion process. In another example, an encoder/decoder unit is operative to perform at least one of: the read operation and the write operation.
-
公开(公告)号:US10432988B2
公开(公告)日:2019-10-01
申请号:US15130885
申请日:2016-04-15
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Lei Zhang , Gabor Sines , Khaled Mammou , David Glen , Layla A. Mah , Rajabali M. Koduri , Bruce Montag
IPC: G06F15/16 , H04N21/2343 , H04L29/06 , H04N21/2368 , H04N21/236 , H04N21/414 , H04N21/422 , H04N21/434 , H04N21/43 , H04N21/437
Abstract: Virtual Reality (VR) systems, apparatuses and methods of processing data are provided which include predicting, at a server, a user viewpoint of a next frame of video data based on received user feedback information sensed at a client, rendering a portion of the next frame using the prediction, encoding the portion, formatting the encoded portion into packets and transmitting the video data. At a client, the encoded and packetized A/V data is received and depacketized. The portion of video data and corresponding audio data is decoded and controlled to be displayed and aurally provided in synchronization. Latency may be minimized by utilizing handshaking between hardware components and/or software components such as a 3D server engine, one or more client processors, one or more client processors, a video encoder, a server NIC, a video decoder, a client NIC; and a 3D client engine.
-
4.
公开(公告)号:US10368087B2
公开(公告)日:2019-07-30
申请号:US15271055
申请日:2016-09-20
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Ihab Amer , Gabor Sines , Edward Harold , Jinbo Qiu , Lei Zhang , Yang Liu , Zhen Chen , Ying Luo , Shu-Hsien Wu , Zhong Cai
IPC: H04N19/513 , H04N19/105 , H04N19/172 , H04N19/57 , H04N19/433
Abstract: A processing apparatus is provided that includes an encoder configured to encode current frames of video data using previously encoded reference frames and perform motion searches within a search window about each of a plurality of co-located portions of a reference frame. The processing apparatus also includes a processor configured to determine, prior to performing the motion searches, which locations of the reference frame to reload the search window according to a threshold number of search window reloads using predicted motions of portions of the reference frame corresponding to each of the locations. The processor is also configured to cause the encoder to reload the search window at the determined locations of the reference frame and, for each of the remaining locations of the reference frame, slide the search window in a first direction indicated by the location of the next co-located portion of the reference frame.
-
公开(公告)号:US20190028752A1
公开(公告)日:2019-01-24
申请号:US15657613
申请日:2017-07-24
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Lei Zhang , Sateesh Lagudu , Allen Rush , Razvan Dan-Dobre
IPC: H04N21/4143 , G06F3/14 , H04N7/14 , H04N7/15
Abstract: Systems, apparatuses, and methods for integrating a video codec with an inference engine are disclosed. A system is configured to implement an inference engine and a video codec while sharing at least a portion of its processing elements between the inference engine and the video codec. By sharing processing elements when combining the inference engine and the video codec, the silicon area of the combination is reduced. In one embodiment, the portion of processing elements which are shared include a motion prediction/motion estimation/MACs engine with a plurality of multiplier-accumulator (MAC) units, an internal memory, and peripherals. The peripherals include a memory interface, a direct memory access (DMA) engine, and a microprocessor. The system is configured to perform a context switch to reprogram the processing elements to switch between operating modes. The context switch can occur at a frame boundary or at a sub-frame boundary.
-
6.
公开(公告)号:US20150092856A1
公开(公告)日:2015-04-02
申请号:US14043427
申请日:2013-10-01
Applicant: ATI Technologies ULC , Advanced Micro Devices, Inc.
Inventor: Khaled MAMMOU , Ihab Amer , Sines Gabor , Lei Zhang , Michael Schmit , Daniel Wong
IPC: H04N19/597 , H04N19/583 , H04N19/51 , H04N19/56
CPC classification number: H04N19/51 , H04N19/46 , H04N19/52 , H04N19/521 , H04N19/56 , H04N19/597
Abstract: The present disclosure is directed a system and method for exploiting camera and depth information associated with rendered video frames, such as those rendered by a server operating as part of a cloud gaming service, to more efficiently encode the rendered video frames for transmission over a network. The method and system of the present disclosure can be used in a server operating in a cloud gaming service to improve, for example, the amount of latency, downstream bandwidth, and/or computational processing power associated with playing a video game over its service. The method and system of the present disclosure can be further used in other applications where camera and depth information of a rendered or captured video frame is available.
Abstract translation: 本公开涉及一种用于利用与渲染的视频帧相关联的相机和深度信息的系统和方法,诸如由作为云游戏服务的一部分操作的服务器呈现的那些视频帧,以更有效地对所渲染的视频帧进行编码以在网络上传输 。 本公开的方法和系统可以用于在云游戏服务中操作的服务器中,以改善例如与通过其服务播放视频游戏相关的等待时间,下行带宽和/或计算处理能力的量。 本公开的方法和系统可以进一步用于其中可用于渲染或捕获的视频帧的相机和深度信息的其他应用中。
-
公开(公告)号:US12120364B2
公开(公告)日:2024-10-15
申请号:US18094161
申请日:2023-01-06
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Lei Zhang , Gabor Sines , Khaled Mammou , David Glen , Layla A. Mah , Rajabali M. Koduri , Bruce Montag
IPC: G06F15/16 , H04L65/70 , H04L65/75 , H04L67/131 , H04L69/24 , H04N21/2343 , H04N21/236 , H04N21/2368 , H04N21/414 , H04N21/422 , H04N21/43 , H04N21/434 , H04N21/437
CPC classification number: H04N21/2343 , H04L65/70 , H04L65/762 , H04L67/131 , H04L69/24 , H04N21/23605 , H04N21/2368 , H04N21/41407 , H04N21/42202 , H04N21/43072 , H04N21/4341 , H04N21/4343 , H04N21/437
Abstract: A device and method for processing Virtual Reality (VR) data is disclosed. The method comprises transmitting feedback information from the device to a server, wherein the feedback information is captured in the device, receiving data from the server to be presented on the device based on the feedback information, wherein the data includes video data and audio data where the video data is a frame of video data in a sequence of frames and the audio data is the corresponding audio data of the frame, decoding the video data and corresponding audio data of the frame, and controlling the presentation of the video data and corresponding audio data on the device such that the video data is synchronized with the corresponding audio data.
-
公开(公告)号:US20230156250A1
公开(公告)日:2023-05-18
申请号:US18094161
申请日:2023-01-06
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Lei Zhang , Gabor Sines , Khaled Mammou , David Glen , Layla A. Mah , Rajabali M. Koduri , Bruce Montag
IPC: H04N21/2343 , H04N21/2368 , H04N21/236 , H04N21/414 , H04N21/422 , H04N21/434 , H04N21/437 , H04L69/24 , H04N21/43 , H04L65/70 , H04L65/75 , H04L67/131
CPC classification number: H04N21/2343 , H04N21/2368 , H04N21/23605 , H04N21/41407 , H04N21/42202 , H04N21/4341 , H04N21/437 , H04N21/4343 , H04L69/24 , H04N21/43072 , H04L65/70 , H04L65/762 , H04L67/131
Abstract: Virtual Reality (VR) processing devices and methods are provided for transmitting user feedback information comprising at least one of user position information and user orientation information, receiving encoded audio-video (AN) data, which is generated based on the transmitted user feedback information, separating the A/V data into video data and audio data corresponding to a portion of a next frame of a sequence of frames of the video data to be displayed, decoding the portion of a next frame of the video data and the corresponding audio data, providing the audio data for aural presentation and controlling the portion of the next frame of the video data to be displayed in synchronization with the corresponding audio data.
-
公开(公告)号:US20220129752A1
公开(公告)日:2022-04-28
申请号:US17571045
申请日:2022-01-07
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Sateesh Lagudu , Lei Zhang , Allen Rush
IPC: G06N3/08 , G06N3/063 , G06N3/04 , G06F1/3296
Abstract: Systems, apparatuses, and methods for implementing memory bandwidth reduction techniques for low power convolutional neural network inference applications are disclosed. A system includes at least a processing unit and an external memory coupled to the processing unit. The system detects a request to perform a convolution operation on input data from a plurality of channels. Responsive to detecting the request, the system partitions the input data from the plurality of channels into 3D blocks so as to minimize the external memory bandwidth utilization for the convolution operation being performed. Next, the system loads a selected 3D block from external memory into internal memory and then generates convolution output data for the selected 3D block for one or more features. Then, for each feature, the system adds convolution output data together across channels prior to writing the convolution output data to the external memory.
-
公开(公告)号:US20190325305A1
公开(公告)日:2019-10-24
申请号:US16117302
申请日:2018-08-30
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Lei Zhang , Sateesh Lagudu , Allen Rush
Abstract: Systems, apparatuses, and methods for adaptively mapping a machine learning model to a multi-core inference accelerator engine are disclosed. A computing system includes a multi-core inference accelerator engine with multiple inference cores coupled to a memory subsystem. The system also includes a control unit which determines how to adaptively map a machine learning model to the multi-core inference accelerator engine. In one implementation, the control unit selects a mapping scheme which minimizes the memory bandwidth utilization of the multi-core inference accelerator engine. In one implementation, this mapping scheme involves having one inference core of the multi-core inference accelerator engine fetch given data and broadcast the given data to other inference cores of the inference accelerator engine. Each inference core fetches second data unique to the respective inference core. The inference cores then perform computations on the first and second data in order to implement the machine learning model.
-
-
-
-
-
-
-
-
-