Abstract:
Disclosed herein are an apparatus and method for managing cache memory. The apparatus includes one or more processors and executable memory for storing at least one program executed by the one or more processors. The at least one program reads an s1-tag and an s2-tag of cache memory upon receiving an access request address for reading data in response to a request to access the cache memory, checks whether the access request address matches the value of the s1-tag and the value of the s2-tag, and reads the data from data memory when the access request address matches all of the value of the s1-tag and the value of the s2-tag.
Abstract:
Disclosed herein is a method for outer-product-based matrix multiplication for a floating-point data type includes receiving first floating-point data and second floating-point data and performing matrix multiplication on the first floating-point data and the second floating-point data, and the result value of the matrix multiplication is calculated based on the suboperation result values of floating-point units.
Abstract:
Disclosed herein are a prefetching device and method for an artificial intelligence processor. The prefetching method includes prefetching data, stored in external off-chip memory, into internal on-chip memory in the artificial intelligence processor, and storing information including an address value and a total amount of matrix operation data in at least one control and status register, as a kernel program is executed, extracting a matrix operation instruction among instructions provided from an instruction cache of the off-chip memory, determining whether prefetching is enabled based on a result of extracting the matrix operation instruction, as prefetching is enabled, determining a number of blocks to be prefetched based on the information stored in the at least one control and status register, and determining a bus burst value corresponding to the determined number of blocks and transmitting the bus burst value as a data request signal through a bus interface.
Abstract:
Disclosed herein is an apparatus for fast Sample Adaptive Offset filtering based on a convolution method, for decoding of a video. According an embodiment, the apparatus may include: an input stream provider for sequentially providing a window buffer with pixels read from a buffer that stores input data related to an SAO filter; a window buffer for defining the provided pixels as one or more windows, and for delivering the pixels on a defined window basis to one or more calculation logics; and one or more calculation logics for calculating an offset for the pixels input on the window basis, and for outputting a corrected pixel by adding the calculated offset to a target pixel.