Transfer descriptor for memory access commands

    公开(公告)号:US09977619B2

    公开(公告)日:2018-05-22

    申请号:US14934707

    申请日:2015-11-06

    发明人: Mankit Lo

    IPC分类号: G06F3/06

    摘要: A computer system processes instructions including an instruction code, source type, source address, destination type, and destination address. The source and destination type may indicate a memory device in which case data is read from the memory device at the source address and written to the destination address. One or both of the source type and destination type may include a transfer descriptor flag, in which case a transfer descriptor identified by the source or destination address is executed. A transfer descriptor referenced by a source address may be executed to obtain an intermediate result that is used for performing the operation indicated by the instruction code. The transfer descriptor referenced by a destination address may be executed to determine a location at which the result of the operation will be stored.

    Calculating trigonometric functions using a four input dot product circuit

    公开(公告)号:US09875084B2

    公开(公告)日:2018-01-23

    申请号:US15141625

    申请日:2016-04-28

    IPC分类号: G06F7/548

    CPC分类号: G06F7/548 G06F7/483

    摘要: A circuit is disclosed that uses a four element dot product circuit (DP4) to approximate an argument t=x/pi for an input x. The argument is then input to a trigonometric function such as Sin Pi( ) or Cos Pi( ). The DP4 circuit calculates x times a representation of the reciprocal of pi. The bits of the reciprocal of pi that are used are selected based on the magnitude of the exponent of x. The DP4 circuit includes four multipliers, two intermediate adders, and a final adder. The outputs of the multipliers, intermediate adders, and final adder are adjusted such that the output of the final adder is a value of the argument t that will provide an accurate output when input to the trigonometric function.

    EFFICIENT TILE-BASED RASTERIZATION
    3.
    发明申请
    EFFICIENT TILE-BASED RASTERIZATION 有权
    高效地基于RASTERIZATION

    公开(公告)号:US20120044245A1

    公开(公告)日:2012-02-23

    申请号:US13188359

    申请日:2011-07-21

    IPC分类号: G06T15/00

    CPC分类号: G06T11/40 G06T15/005

    摘要: An apparatus and method for rasterizing a primitive in a graphics system is disclosed in one example of the invention as including scanning a first row of tiles, one tile at a time, starting from a first point and scanning in a first direction. Immediately after scanning the first row of tiles, the method includes moving from the first point to a second point in an orthogonal direction relative to the first row. Immediately after moving from the first point to the second point, the method includes scanning a second row of tiles, one tile at a time, starting from the second point and scanning in the first direction. By scanning rows in the same direction immediately prior to and after moving from one row to another, cache utilization is improved.

    摘要翻译: 在本发明的一个示例中公开了一种用于在图形系统中光栅化图形的装置和方法,包括从第一点开始并从第一方向扫描,一次扫描第一行瓦片,一个瓦片。 在扫描第一排瓦片之后,该方法包括从第一点到相对于第一行正交方向的第二点移动。 在从第一点移动到第二点之后,立即从第二点开始并沿第一方向进行扫描,一次扫描第二排瓦片,一个瓦片。 通过在从一行移动到另一行之前和之后立即扫描相同方向的行,提高了缓存利用率。

    Post-rendering anti-aliasing with a smoothing filter
    4.
    发明授权
    Post-rendering anti-aliasing with a smoothing filter 有权
    使用平滑滤波器进行后渲染抗锯齿

    公开(公告)号:US07920148B2

    公开(公告)日:2011-04-05

    申请号:US11786223

    申请日:2007-04-10

    IPC分类号: G09G5/00

    CPC分类号: G06T15/503

    摘要: A system to apply a smoothing filter during anti-aliasing at a post-rendering stage. An embodiment of the system includes a three-dimensional renderer, an edge detector, and a smoothing filter. The three-dimensional renderer is configured to render a three-dimensional scene. The edge detector is coupled to the three-dimensional renderer. The edge detector is configured to read values of a depth buffer and to apply edge detection criteria to the values of the depth buffer in order to detect an object edge within the three -dimensional scene. The smoothing filter coupled to the edge detector. The smoothing filter is configured to read values of a color buffer and to apply a smoothing coefficient to the values of the color buffer. The values of the color buffer include a pixel sample at the detected object edge.

    摘要翻译: 在后渲染阶段的抗锯齿期间应用平滑滤波器的系统。 系统的实施例包括三维渲染器,边缘检测器和平滑滤波器。 三维渲染器被配置为渲染三维场景。 边缘检测器耦合到三维渲染器。 边缘检测器被配置为读取深度缓冲器的值,并且将边缘检测准则应用于深度缓冲器的值,以便检测三维场景内的对象边缘。 平滑滤波器耦合到边缘检测器。 平滑滤波器被配置为读取彩色缓冲器的值,并将平滑系数应用于色彩缓冲器的值。 颜色缓冲器的值包括检测到的对象边缘处的像素样本。

    THIN-LINE DETECTION APPARATUS AND METHOD
    6.
    发明申请
    THIN-LINE DETECTION APPARATUS AND METHOD 有权
    薄线检测装置及方法

    公开(公告)号:US20090122076A1

    公开(公告)日:2009-05-14

    申请号:US11938223

    申请日:2007-11-09

    IPC分类号: G09G5/00

    摘要: An apparatus and method for detecting and handling thin lines in a raster image includes reading depth values for each pixel of an n×m block of pixels surrounding a substantially central pixel. Differences are then calculated for selected depth values of the n×m block of pixels to yield multiple difference values. These difference values may then be compared with multiple pre-computed difference values associated with thin lines pre-determined to pass through the n×m block of pixels. If the difference values of the pixel block substantially match the difference values of one of the pre-determined thin lines, the pixel block may be deemed to describe a thin line. The apparatus and method may preclude application of an anti-aliasing filter to the substantially central pixel of the pixel block in the event it describes a thin line.

    摘要翻译: 用于检测和处理光栅图像中的细线的装置和方法包括读取围绕基本中心像素的n×m个像素块的每个像素的深度值。 然后针对nxm像素块的所选深度值计算差异,以产生多个差值。 然后可以将这些差分值与预先确定为穿过n×m个像素块的细线相关联的多个预先计算的差值进行比较。 如果像素块的差值基本上与预定细线之一的差值匹配,则可以认为像素块描述细线。 在其描述细线的情况下,该装置和方法可以排除将抗混叠滤波器应用于像素块的基本上中心的像素。

    Zero coefficient skipping convolution neural network engine

    公开(公告)号:US10242311B2

    公开(公告)日:2019-03-26

    申请号:US15671860

    申请日:2017-08-08

    发明人: Mankit Lo

    摘要: A convolution engine, such as a convolution neural network, operates efficiently with respect to sparse kernels by implementing zero skipping. An input tile is loaded and accumulated sums are calculated for the input tile for non-zero coefficients by shifting the tile according to a row and column index of the coefficient in the kernel. Each coefficient is applied individually to tile and the result written to an accumulation buffer before moving to the next non-zero coefficient. A 3D or 4D convolution may be implemented in this manner with separate regions of the accumulation buffer storing accumulated sums for different indexes along one dimension. Images are completely processed and results for each image are stored in the accumulation buffer before moving to the next image.

    Hardware access counters and event generation for coordinating multithreaded processing

    公开(公告)号:US09928117B2

    公开(公告)日:2018-03-27

    申请号:US14966867

    申请日:2015-12-11

    发明人: Mankit Lo

    IPC分类号: G06F9/46 G06F9/52 G06F9/30

    摘要: A computer system includes a hardware synchronization component (HSC). Multiple concurrent threads of execution issue instructions to update the state of the HSC. Multiple threads may update the state in the same clock cycle and a thread does not need to receive control of the HSC prior to updating its states. Instructions referencing the state received during the same clock cycle are aggregated and the state is updated according to the number of the instructions. The state is evaluated with respect to a threshold condition. If it is met, then the HSC outputs an event to a processor. The processor then identifies a thread impacted by the event and takes a predetermined action based on the event (e.g. blocking, branching, unblocking of the thread).

    Tile-based compression and decompression for graphic applications
    9.
    发明授权
    Tile-based compression and decompression for graphic applications 有权
    用于图形应用程序的基于平铺的压缩和解压缩

    公开(公告)号:US09460525B2

    公开(公告)日:2016-10-04

    申请号:US13919691

    申请日:2013-06-17

    摘要: Systems and method for tile-based compression are disclosed. Image data, such as a frame, may be divided into tiles. The tiles may be sized based on a size of a line buffer. Tiles are compressed and decompressed individually. As portions of the image frame are updated, corresponding updated tiles may be compressed and stored. Likewise, as tiles are accessed they may be de-compressed and streamed to a requesting device. In some embodiments, a decoder operable to decompress tiles may be interposed between a memory device and a requesting device. Data encoding one or more compressed tiles may be grouped to enable decompression at a rate of four pixels per clock cycle. Methods for compressing image data including both RGB and RGBα components are disclosed.

    摘要翻译: 公开了用于基于瓦片的压缩的系统和方法。 诸如帧的图像数据可以被划分成瓦片。 可以基于线缓冲器的大小来确定瓦片的尺寸。 瓦片被单独压缩和解压缩。 随着图像帧的部分被更新,相应的更新的瓦片可以被压缩和存储。 同样,当访问瓦片时,它们可以被解压缩并流传输到请求设备。 在一些实施例中,可操作以解压缩瓦片的解码器可以插入在存储器设备和请求设备之间。 可以对一个或多个压缩瓦片进行数据编码,以便以每个时钟周期四个像素的速率进行解压缩。 公开了压缩包括RGB和RGBα分量的图像数据的方法。

    Low power and low memory single-pass multi-dimensional digital filtering
    10.
    发明授权
    Low power and low memory single-pass multi-dimensional digital filtering 有权
    低功耗和低内存单程多维数字滤波

    公开(公告)号:US09077313B2

    公开(公告)日:2015-07-07

    申请号:US13274129

    申请日:2011-10-14

    IPC分类号: G06F17/15 H03H17/02 G06F17/10

    摘要: Disclosed are new approaches to Multi-dimensional filtering with a reduced number of memory reads and writes. In one embodiment, a filter includes first and second coefficients. A block of a data having width and height each equal to the number of one of the first or second coefficients is read from a memory device. Arrays of values from the block are filtering using the first filter coefficients and the results filtered using the second coefficients. The final result may be optionally blended with another data value and written to a memory device. Registers store results of filtering with the first coefficients. The block of data may be read from a location including a source coordinate. The final result of filtering may be written to a destination coordinate obtained by rotating and/or mirroring the source coordinate. The orientation of arrays filtered using the first coefficients varies according to a rotation mode.

    摘要翻译: 公开了具有减少的存储器读取和写入数量的多维过滤的新方法。 在一个实施例中,滤波器包括第一和第二系数。 从存储器件读取具有各自等于第一或第二系数中的一个的数量的宽度和高度的数据块。 来自块的值的数组使用第一滤波器系数进行滤波,并且使用第二系数滤波结果。 最终结果可以可选地与另一数据值混合并写入存储器件。 寄存器存储具有第一系数的滤波结果。 可以从包括源坐标的位置读取数据块。 滤波的最终结果可以写入通过旋转和/或镜像源坐标获得的目的地坐标。 使用第一系数滤波的阵列的取向根据旋转模式而变化。