摘要:
A system and method for producing a fused instruction is described. In one embodiment, a first instruction and a second instruction that are both simple instructions (e.g., perform only one operation) and are dependent are fused together to create the fused instruction. The fused instruction has an opcode that represents the operation performed by the first instruction and the operation performed by the second instruction. The fused instruction has three source operands and one destination operand. Two of the three source operands are the two source operands of the first instruction, and the third source operand is the source operand of the second instruction that is not the destination operand of the first instruction. The destination operand of the fused instruction is the destination operand of the second instruction. An execution unit that can execute a fused instruction in one clock cycle is also disclosed. In one embodiment, the execution unit has two arithmetic logic units (“ALUs”), each of the ALUs performs one of the two operations of the fused instruction. The result of the first ALU is input into the second ALU to produce the desired result.
摘要:
A system and method for producing a fused instruction is described. In one embodiment, a first instruction and a second instruction that are both simple instructions (e.g., perform only one operation) and are dependent are fused together to create the fused instruction. The fused instruction has an opcode that represents the operation performed by the first instruction and the operation performed by the second instruction. The fused instruction has three source operands and one destination operand. Two of the three source operands are the two source operands of the first instruction, and the third source operand is the source operand of the second instruction that is not the destination operand of the first instruction. The destination operand of the fused instruction is the destination operand of the second instruction. An execution unit that can execute a fused instruction in one clock cycle is also disclosed. In one embodiment, the execution unit has two arithmetic logic units (“ALUs”), each of the ALUs performs one of the two operations of the fused instruction. The result of the first ALU is input into the second ALU to produce the desired result.
摘要:
There is provided a graphics processing system that includes a main processing unit and a graphics processing unit (GPU). The main processing unit puts rendering commands generated using a graphics library in the queue of a command buffer in a main memory. In this process, the library function offered by the graphics library is converted into the rendering commands, without any rendering attributes retained in the library. The GPU reads and executes the rendering commands stacked in the command buffer, and generates rendering data in a frame buffer.
摘要:
A method for use in computer graphics includes establishing a surface that is represented by at least one polygon that includes a plurality of vertices, establishing one or more light sources that are configured to illuminate the surface, for each vertex of the polygon, computing at least one vector quantity that represents an aggregation of a visual attribute and a direction of each of the one or more light sources, and interpolating the computed vector quantities across the polygon to provide at least one interpolated vector quantity value for each of a plurality of pixels included in the polygon. A storage medium stores a computer program, and an apparatus includes a display and a processor based system.
摘要:
A method for use in computer graphics includes establishing a surface that is represented by at least one polygon that includes a plurality of vertices, establishing one or more light sources that are configured to illuminate the surface, and computing an aggregate light source position for each vertex of the polygon, wherein the computation for each vertex includes averaging directions from the vertex to the one or more light sources. A storage medium stores a computer program, and an apparatus includes a display and a processor based system.
摘要:
A method for use in computer graphics includes establishing a surface that is represented by at least one polygon that includes a plurality of vertices, establishing one or more light sources that are configured to illuminate the surface, and computing an aggregate light source position for each vertex of the polygon, wherein the computation for each vertex includes averaging directions from the vertex to the one or more light sources. A storage medium stores a computer program, and an apparatus includes a display and a processor based system.
摘要:
There is provided a graphics processing system that includes a main processing unit and a graphics processing unit (GPU). The main processing unit puts rendering commands generated using a graphics library in the queue of a command buffer in a main memory. In this process, the library function offered by the graphics library is converted into the rendering commands, without any rendering attributes retained in the library. The GPU reads and executes the rendering commands stacked in the command buffer, and generates rendering data in a frame buffer.
摘要:
A method and apparatus for performing N bit by 2*N (or 2*N−1) bit signed multiplication using two N bit multiply instructions. According to one aspect of the invention, a method for performing signed multiplication of A times B (where B has N bits and A has N*2 bits) is described. In this method, Ahigh and Alow respectively represent the most and least significant halves of A. According to this method, Alow is logically shifted right by one bit to generate Alow>>1. Then, Alow>>1 is multiplied by B using signed multiplication to generate a first partial result. In addition, a second partial result is generated by performing signed multiplication of Ahigh times B. One or both of the first and second partial results is shifted to align the first and second partial results for addition, and then the addition is performed to generate a final result representing A multiplied by B.
摘要:
A method comprises decoding a single instruction having a first operand identifying a plurality of bytes of packed data and a second operand identifying a corresponding plurality of byte masks. Each of the plurality of byte masks identified by the second operand of the single decoded instruction are analyzed, wherein select bytes of the plurality of bytes identified by the first operand are moved to an implicitly defined location based, at least in part, on the analysis of the individual byte masks identified by the second operand of the single decoded instruction.
摘要:
A method and apparatus for performing N bit by 2*N (or 2*N-1) bit signed multiplication using two N bit multiply instructions. According to one aspect of the invention, a method for performing signed multiplication of A times B (where B has N bits and A has N*2 bits) is described. In this method, A.sub.high and A.sub.low respectively represent the most and least significant halves of A. According to this method, A.sub.low is logically shifted right by one bit to generate A.sub.low >>1. Then, A.sub.low >>1 is multiplied by B using signed multiplication to generate a first partial result. In addition, a second partial result is generated by performing signed multiplication of A.sub.high times B. One or both of the first and second partial results is shifted to align the first and second partial results for addition, and then the addition is performed to generate a final result representing A multiplied by B.