Apparatus and method for inhibiting roundoff error in a floating point argument reduction operation

    公开(公告)号:US10019232B2

    公开(公告)日:2018-07-10

    申请号:US15140739

    申请日:2016-04-28

    Applicant: ARM LIMITED

    Inventor: Jørn Nystad

    CPC classification number: G06F7/49915 G06F7/483

    Abstract: An apparatus and method are provided for inhibiting roundoff error in a floating point argument reduction operation. The apparatus has reciprocal estimation circuitry that is responsive to a first floating point value to determine a second floating point value that is an estimated reciprocal of the first floating point value. During this determination, the second floating point value has both its magnitude and its error bound constrained in dependence on a specified value N. Argument reduction circuitry then performs an argument reduction operation using the first and second floating point values as inputs, in order to generate a third floating point value. The use of the specified value N to constrain both the magnitude and the error bound of the second floating point value causes roundoff error to be inhibited in the third floating point value that is generated by the argument reduction operation. This enables such an argument reduction operation to be used as part of a more complex computation, such as a logarithm computation, with the inhibiting of roundoff error in the argument reduction result allowing the overall result to exhibit small relative error across the whole representable input range.

    Clipping of graphics primitives
    2.
    发明授权
    Clipping of graphics primitives 有权
    剪切图形图元

    公开(公告)号:US09530241B2

    公开(公告)日:2016-12-27

    申请号:US14536070

    申请日:2014-11-07

    Applicant: ARM Limited

    CPC classification number: G06T15/30 G06T1/20 G06T1/60 G06T15/005 G06T2210/52

    Abstract: Techniques for performing clipping of graphics primitives 60 with respect to a clipping boundary 65 are described. The clipping step 10 may be performed separately for each tile of a graphics frame to be rendered, after a primitive list for the tile has been read from a primitive memory 38. Clipping may be performed only for larger primitives whose size exceeds a given threshold. Clipping of a primitive 60 to the clipping boundary 65 may be performed inexactly so that only a single clipped primitive is generated which may extend beyond the clipping boundary. A clipped primitive generated by clipping may be used for a depth function calculation of a primitive setup operation and not for an edge determination.

    Abstract translation: 描述用于执行关于剪切边界65的图形基元60的削波的技术。 在从原始存储器38读取瓦片的原始列表之后,可以针对要渲染的图形帧的每个瓦片分别执行限幅步骤10.对于尺寸超过给定阈值的较大图元,可以执行裁剪。 可以精确地执行将原始图像60剪切到剪切边界65,使得仅生成可以延伸超过剪切边界的单个剪切的图元。 由削波产生的剪切原语可用于原始设置操作的深度函数计算,而不用于边缘确定。

    Encoding instructions identifying first and second architectural register numbers

    公开(公告)号:US10331449B2

    公开(公告)日:2019-06-25

    申请号:US15003828

    申请日:2016-01-22

    Applicant: ARM LIMITED

    Abstract: Various encoding schemes are discussed for more efficiently encoding instructions which identify first and second architectural register numbers. In the first example, by constraining the first architectural register number to be greater than the second architectural register number, this frees up encodings for use in encoding other operations. In a second example, the first and second architectural register numbers may take any value but one of a first type of processing operation and a second type of processing operation is selected depending on a comparison of the first and second architectural register numbers.

    DATA PROCESSING SYSTEMS
    4.
    发明申请
    DATA PROCESSING SYSTEMS 审中-公开
    数据处理系统

    公开(公告)号:US20170024847A1

    公开(公告)日:2017-01-26

    申请号:US15208459

    申请日:2016-07-12

    Applicant: ARM Limited

    CPC classification number: G06T1/20 G06T15/005

    Abstract: A graphics processing unit 3 includes a rasteriser 25, a thread spawner 40, a programmable execution unit 41, a varying interpolator 42, a texture mapper 43, and a blender 29. The programmable execution unit 41 is able to communicate with the varying interpolator 42, the texture mapper 43 and the blender 29 to request processing operations by those graphic specific accelerators. In addition to this, these graphics-specific accelerators are also able to communicate directly with each other and with the thread spawner 40, independently of the programmable execution unit 41. This allows for certain graphics processing operations to be performed using direct communication between the graphics-specific accelerators of the graphics processing unit, instead of executing instructions in the programmable execution unit to trigger the performance of those operations by the graphics-specific accelerators.

    Abstract translation: 图形处理单元3包括光栅化器25,线程器40,可编程执行单元41,变化内插器42,纹理映射器43和混合器29.可编程执行单元41能够与变化内插器42 ,纹理映射器43和混合器29,以请求那些图形特定加速器的处理操作。 除此之外,这些特定于图形的加速器还能够独立于可编程执行单元41彼此直接地与线程线程器40进行通信。这允许使用图形之间的直接通信执行某些图形处理操作 而不是执行可编程执行单元中的指令以触发图形特定加速器对这些操作的执行。

    Comparison of wide data types
    5.
    发明授权

    公开(公告)号:US10474427B2

    公开(公告)日:2019-11-12

    申请号:US15743008

    申请日:2016-05-25

    Applicant: ARM Limited

    Inventor: Jørn Nystad

    Abstract: There is provided an apparatus and method for comparing wide data types. The apparatus comprises processing circuitry to perform a plurality of comparison operations in order to compare a first value and a second value, each of the first value and the second value having a length greater than N bits, and each comparison operation operating on a corresponding N bits of the first and second values. The plurality of comparison operations are chained to form a sequence such that each comparison operation is arranged to output an accumulated comparison result incorporating the comparison results of any previous comparison operations in the sequence, and such that for each comparison operation other than a final comparison operation in the sequence the accumulated comparison result is provided for use as an input by a next comparison operation in the sequence.

    Data processing systems
    6.
    发明授权

    公开(公告)号:US10089709B2

    公开(公告)日:2018-10-02

    申请号:US15208459

    申请日:2016-07-12

    Applicant: ARM Limited

    Abstract: A graphics processing unit 3 includes a rasterizer 25, a thread spawner 40, a programmable execution unit 41, a varying interpolator 42, a texture mapper 43, and a blender 29. The programmable execution unit 41 is able to communicate with the varying interpolator 42, the texture mapper 43 and the blender 29 to request processing operations by those graphic specific accelerators. In addition to this, these graphics-specific accelerators are also able to communicate directly with each other and with the thread spawner 40, independently of the programmable execution unit 41. This allows for certain graphics processing operations to be performed using direct communication between the graphics-specific accelerators of the graphics processing unit, instead of executing instructions in the programmable execution unit to trigger the performance of those operations by the graphics-specific accelerators.

    Accumulation of floating-point values

    公开(公告)号:US09959092B2

    公开(公告)日:2018-05-01

    申请号:US15060778

    申请日:2016-03-04

    Applicant: ARM LIMITED

    Inventor: Jørn Nystad

    CPC classification number: G06F7/485 G06F7/49968 G06F7/49973

    Abstract: An apparatus and method for generating a sum of floating-point input values are provided. To sum the values multiple partial sum floating-point values are maintained and the partial sum to which an input value may be added is selected by a least significant portion of the exponent of the input value. If the exponent of the input value is equal to the exponent of the value stored in the selected partial sum a mantissa sum of the input value and stored partial sum value replaces the mantissa value of the selected partial sum value. If the exponent of the input value is larger than the exponent of the value stored in the selected partial sum the selected partial sum value is replaced with the input value. An associative and deterministic summation is thus provided.

    GRAPHICS PROCESSING
    8.
    发明申请

    公开(公告)号:US20220392146A1

    公开(公告)日:2022-12-08

    申请号:US17805387

    申请日:2022-06-03

    Applicant: Arm Limited

    Abstract: There is provided an instruction, or instructions, that can be included in a program to perform a ray tracing operation, with individual execution threads in a group of execution threads executing the program performing the ray tracing operation for a respective ray in a corresponding group of rays such that the group of rays performing the ray tracing operation together. The instruction(s), when executed by the execution threads will cause one or more rays from the group of plural rays to be tested for intersection with a set of primitives. A result of the ray-primitive intersection testing can then be returned for the traversal operation.

    Forward killing of threads corresponding to graphics fragments obscured by later graphics fragments

    公开(公告)号:US10789768B2

    公开(公告)日:2020-09-29

    申请号:US16128807

    申请日:2018-09-12

    Applicant: ARM Limited

    Abstract: A graphics processing apparatus comprises fragment generating circuitry to generate graphics fragments corresponding to graphics primitives, thread processing circuitry to perform threads of processing corresponding to the fragments, and forward kill circuitry to trigger a forward kill operation to prevent further processing of a target thread of processing corresponding to an earlier graphics fragment when the forward kill operation is enabled for the target thread and the earlier graphics fragment is determined to be obscured by one or more later graphics fragments. The thread processing circuitry supports enabling of the forward kill operation for a thread including at least one forward kill blocking instruction having a property indicative that the forward kill operation should be disabled for the given thread, when the thread processing circuitry has not yet reached a portion of the thread including the at least one forward kill blocking instruction.

    Apparatus and method for performing division

    公开(公告)号:US10230376B2

    公开(公告)日:2019-03-12

    申请号:US15168436

    申请日:2016-05-31

    Applicant: ARM LIMITED

    Inventor: Jørn Nystad

    Abstract: An apparatus and method are provided, the apparatus comprising: storage circuitry to store an input data value; divider circuitry to split the input data value into at least one sub-value in dependence on a number of lanes for a current iteration, each sub-value occupying a lane, and to operate on each sub-value to generate a quotient corresponding to the division of that sub-value by a divisor, wherein the divisor is an odd integer; remainder circuitry to operate on each sub-value to generate a remainder corresponding to the remainder of dividing that sub-value by the divisor; concatenation circuitry to concatenate each quotient to produce a concatenated division value, and to concatenate each remainder to produce a concatenated remainder value, in each subsequent iteration, the input data value being formed from the concatenated remainder value of a preceding iteration; and output circuitry to output, after a plurality of iterations, a result of adding the concatenated division values produced by said plurality of iterations.

Patent Agency Ranking