Method for performing cacheline polling utilizing a store and reserve instruction
    1.
    发明授权
    Method for performing cacheline polling utilizing a store and reserve instruction 有权
    使用存储和预留指令执行高速缓存线轮询的方法

    公开(公告)号:US09390015B2

    公开(公告)日:2016-07-12

    申请号:US11377505

    申请日:2006-03-16

    申请人: Charles R. Johns

    发明人: Charles R. Johns

    摘要: A method, system, apparatus, and article of manufacture for performing cacheline polling utilizing a store and reserve instruction are disclosed. In accordance with one embodiment of the present invention, a first process initially-requests an action to be performed by a second process. A reservation is set at a cacheable memory location via a store operation. The first process reads the cacheable memory location via a load operation to determine whether or not the requested action has been completed by the second process. The load operation of the first process is stalled until the reservation on the cacheable memory location is lost. After the requested action has been completed, the reservation in the cacheable memory location is reset by the second process.

    摘要翻译: 公开了一种使用存储和预约指令执行高速缓存行轮询的方法,系统,装置和制品。 根据本发明的一个实施例,第一处理最初请求通过第二处理执行动作。 通过存储操作在可高速缓存的存储器位置设置预留。 第一进程通过加载操作读取可高速缓存的存储器位置,以确定所请求的动作是否已由第二进程完成。 第一个进程的加载操作停止,直到可缓存的内存位置的预留丢失。 在请求的动作完成之后,可缓存存储器位置中的预留由第二进程复位。

    Runtime extraction of data parallelism
    2.
    发明授权
    Runtime extraction of data parallelism 有权
    运行时提取数据并行性

    公开(公告)号:US08572359B2

    公开(公告)日:2013-10-29

    申请号:US12649860

    申请日:2009-12-30

    IPC分类号: G06F9/30

    摘要: Mechanisms for extracting data dependencies during runtime are provided. The mechanisms execute a portion of code having a loop and generate, for the loop, a first parallel execution group comprising a subset of iterations of the loop less than a total number of iterations of the loop. The mechanisms further execute the first parallel execution group and determining, for each iteration in the subset of iterations, whether the iteration has a data dependence. Moreover, the mechanisms commit store data to system memory only for stores performed by iterations in the subset of iterations for which no data dependence is determined. Store data of stores performed by iterations in the subset of iterations for which a data dependence is determined is not committed to the system memory.

    摘要翻译: 提供了在运行时提取数据依赖关系的机制。 所述机制执行具有循环的一部分代码,并为所述循环生成包括小于所述循环的总迭代次数的循环迭代子集的第一并行执行组。 机制进一步执行第一个并行执行组,并确定迭代子集中的每个迭代,迭代是否具有数据依赖性。 此外,机制仅将数据存储到系统存储器中,用于仅在确定了数据依赖性的迭代子集中通过迭代执行的存储。 在确定数据相关性的迭代子集中存储由迭代执行的存储数据不会提交给系统存储器。

    Arithmetic decoding acceleration
    3.
    发明授权
    Arithmetic decoding acceleration 失效
    算术解码加速

    公开(公告)号:US08520740B2

    公开(公告)日:2013-08-27

    申请号:US12874564

    申请日:2010-09-02

    IPC分类号: H04N7/12

    摘要: Mechanisms for performing decoding of context-adaptive binary arithmetic coding (CABAC) encoded data. The mechanisms receive, in a first single instruction multiple data (SIMD) vector register of the data processing system, CABAC encoded data of a bit stream. The CABAC encoded data includes a value to be decoded and bit stream state information. The mechanisms receive, in a second SIMD vector register of the data processing system, CABAC decoder context information. The mechanisms process the value, the bit stream state information, and the CABAC decoder context information in a non-recursive manner to generate a decoded value, updated bit stream state information, and updated CABAC decoder context information. The mechanisms store, in a third SIMD vector register, a result vector that combines the decoded value, updated bit stream state information, and updated CABAC decoder context information. The mechanisms use the decoded value to generate a video output on the data processing system.

    摘要翻译: 用于执行上下文自适应二进制算术编码(CABAC)编码数据的解码的机制。 这些机制在数据处理系统的第一个单指令多数据(SIMD)向量寄存器中接收位数据流的CABAC编码数据。 CABAC编码数据包括要解码的值和位流状态信息。 该机制在数据处理系统的第二SIMD向量寄存器中接收CABAC解码器上下文信息。 该机制以非递归方式处理值,比特流状态信息和CABAC解码器上下文信息,以生成解码值,更新的比特流状态信息和更新的CABAC解码器上下文信息。 该机制在第三SIMD向量寄存器中存储组合解码值,更新位流状态信息和更新的CABAC解码器上下文信息的结果向量。 这些机制使用解码的值在数据处理系统上生成视频输出。

    Efficient Communication of Producer/Consumer Buffer Status
    4.
    发明申请
    Efficient Communication of Producer/Consumer Buffer Status 审中-公开
    生产者/消费者缓冲区状态的高效通信

    公开(公告)号:US20120317372A1

    公开(公告)日:2012-12-13

    申请号:US13593030

    申请日:2012-08-23

    IPC分类号: G06F12/00

    CPC分类号: G06F15/17337

    摘要: A mechanism is provided for efficient communication of producer/consumer buffer status. With the mechanism, devices in a data processing system notify each other of updates to head and tail pointers of a shared buffer region when the devices perform operations on the shared buffer region using signal notification channels of the devices. Thus, when a producer device that produces data to the shared buffer region writes data to the shared buffer region, an update to the head pointer is written to a signal notification channel of a consumer device. When a consumer device reads data from the shared buffer region, the consumer device writes a tail pointer update to a signal notification channel of the producer device. In addition, channels may operate in a blocking mode so that the corresponding device is kept in a low power state until an update is received over the channel.

    摘要翻译: 提供了一种用于生成器/消费者缓冲器状态的有效通信的机制。 利用该机制,当设备使用设备的信号通知通道在共享缓冲区域上执行操作时,数据处理系统中的设备通知彼此对共享缓冲区域的头和尾指针的更新。 因此,当向共享缓冲区域产生数据的生成器设备将数据写入到共享缓冲区域时,对头指针的更新被写入消费者设备的信号通知通道。 当消费者设备从共享缓冲区域读取数据时,消费者设备将尾指针更新写入生成器设备的信号通知通道。 此外,信道可以以阻塞模式操作,使得对应的设备保持在低功率状态,直到通过信道接收到更新。

    Multithreaded Programmable Direct Memory Access Engine
    5.
    发明申请
    Multithreaded Programmable Direct Memory Access Engine 有权
    多线程可编程直接存储器访问引擎

    公开(公告)号:US20120246354A1

    公开(公告)日:2012-09-27

    申请号:US13488856

    申请日:2012-06-05

    IPC分类号: G06F13/28

    摘要: A mechanism programming a direct memory access engine operating as a multithreaded processor is provided. A plurality of programs is received from a host processor in a local memory associated with the direct memory access engine. A request is received in the direct memory access engine from the host processor indicating that the plurality of programs located in the local memory is to be executed. The direct memory access engine executes two or more of the plurality of programs without intervention by a host processor. As each of the two or more of the plurality of programs completes execution, the direct memory access engine sends a completion notification to the host processor that indicates that the program has completed execution.

    摘要翻译: 提供了编程作为多线程处理器操作的直接存储器访问引擎的机制。 从与直接存储器访问引擎相关联的本地存储器中的主机处理器接收多个程序。 在来自主机处理器的直接存储器访问引擎中接收到指示将要执行位于本地存储器中的多个程序的请求。 直接存储器访问引擎在主机处理器的干预下执行多个程序中的两个或多个。 当多个程序中的两个或更多个程序中的每一个完成执行时,直接存储器访问引擎向主处理器发送指示程序已经完成执行的完成通知。

    Mechanisms for priority control in resource allocation
    6.
    发明授权
    Mechanisms for priority control in resource allocation 有权
    资源配置优先控制机制

    公开(公告)号:US08180941B2

    公开(公告)日:2012-05-15

    申请号:US12631407

    申请日:2009-12-04

    CPC分类号: G06F13/362

    摘要: Mechanisms for priority control in resource allocation is provided. With these mechanisms, when a unit makes a request to a token manager, the unit identifies the priority of its request as well as the resource which it desires to access and the unit's resource access group (RAG). This information is used to set a value of a storage device associated with the resource, priority, and RAG identified in the request. When the token manager generates and grants a token to the RAG, the token is in turn granted to a unit within the RAG based on a priority of the pending requests identified in the storage devices associated with the resource and RAG. Priority pointers are utilized to provide a round-robin fairness scheme between high and low priority requests within the RAG for the resource.

    摘要翻译: 提供资源分配优先控制机制。 利用这些机制,当单元向令牌管理器发出请求时,该单元识别其请求的优先级以及它希望访问的资源和单元的资源访问组(RAG)。 该信息用于设置与请求中标识的资源,优先级和RAG相关联的存储设备的值。 当令牌管理器生成并向RAG授予令牌时,根据在与资源和RAG相关联的存储设备中标识的未决请求的优先级,将令牌授予RAG内的单元。 优先级指针用于在资源的RAG内提供高优先级请求和低优先级请求之间的循环公平性方案。

    PARALLEL LOOP MANAGEMENT
    7.
    发明申请
    PARALLEL LOOP MANAGEMENT 失效
    平行环路管理

    公开(公告)号:US20120023316A1

    公开(公告)日:2012-01-26

    申请号:US12843224

    申请日:2010-07-26

    IPC分类号: G06F9/30 G06F9/32

    摘要: The illustrative embodiments comprise a method, data processing system, and computer program product having a processor unit for processing instructions with loops. A processor unit creates a first group of instructions having a first set of loops and second group of instructions having a second set of loops from the instructions. The first set of loops have a different order of parallel processing from the second set of loops. A processor unit processes the first group. The processor unit monitors terminations in the first set of loops during processing of the first group. The processor unit determines whether a number of terminations being monitored in the first set of loops is greater than a selectable number of terminations. In response to a determination that the number of terminations is greater than the selectable number of terminations, the processor unit ceases processing the first group and processes the second group.

    摘要翻译: 示例性实施例包括具有用于处理具有循环的指令的处理器单元的方法,数据处理系统和计算机程序产品。 处理器单元创建具有第一组循环和第二组指令的第一组指令,其具有来自指令的第二组循环。 第一组循环与第二组循环具有不同的并行处理顺序。 处理器单元处理第一组。 处理器单元在第一组处理期间监视第一组回路中的终端。 处理器单元确定在第一组环路中正在监视的终端数量是否大于可选数量的终端。 响应于确定终端的数量大于可选择的终端数量,处理器单元停止处理第一组并处理第二组。

    Parallel Execution Unit that Extracts Data Parallelism at Runtime
    8.
    发明申请
    Parallel Execution Unit that Extracts Data Parallelism at Runtime 有权
    并行执行单元在运行时提取数据并行

    公开(公告)号:US20110161642A1

    公开(公告)日:2011-06-30

    申请号:US12649805

    申请日:2009-12-30

    IPC分类号: G06F9/32

    摘要: Mechanisms for extracting data dependencies during runtime are provided. With these mechanisms, a portion of code having a loop is executed. A first parallel execution group is generated for the loop, the group comprising a subset of iterations of the loop less than a total number of iterations of the loop. The first parallel execution group is executed by executing each iteration in parallel. Store data for iterations are stored in corresponding store caches of the processor. Dependency checking logic of the processor determines, for each iteration, whether the iteration has a data dependence. Only the store data for stores where there was no data dependence determined are committed to memory.

    摘要翻译: 提供了在运行时提取数据依赖关系的机制。 利用这些机制,执行具有循环的一部分代码。 为该循环生成第一个并行执行组,该组包括小于循环迭代总次数的循环迭代子集。 通过并行执行每个迭代来执行第一个并行执行组。 存储用于迭代的数据存储在处理器的相应存储高速缓存中。 处理器的依赖性检查逻辑为每次迭代确定迭代是否具有数据依赖性。 只有确定了没有数据依赖关系的商店的商店数据被提交到内存。

    Data Parallel Function Call for Determining if Called Routine is Data Parallel
    9.
    发明申请
    Data Parallel Function Call for Determining if Called Routine is Data Parallel 失效
    数据并行函数调用确定调用例程是否是数据并行的

    公开(公告)号:US20110161623A1

    公开(公告)日:2011-06-30

    申请号:US12649751

    申请日:2009-12-30

    IPC分类号: G06F9/38 G06F15/76 G06F9/02

    摘要: Mechanisms for performing data parallel function calls in code during runtime are provided. These mechanisms may operate to execute, in the processor, a portion of code having a data parallel function call to a target portion of code. The mechanisms may further operate to determine, at runtime by the processor, whether the target portion of code is a data parallel portion of code or a scalar portion of code and determine whether the calling code is data parallel code or scalar code. Moreover, the mechanisms may operate to execute the target portion of code based on the determination of whether the target portion of code is a data parallel portion of code or a scalar portion of code, and the determination of whether the calling code is data parallel code or scalar code.

    摘要翻译: 提供了在运行期间执行代码中数据并行函数调用的机制。 这些机制可以操作以在处理器中执行具有对目标代码部分的数据并行函数调用的代码的一部分。 这些机制可以进一步操作以在运行时由处理器确定目标代码部分是代码的数据并行部分还是代码的标量部分,并确定调用代码是数据并行代码还是标量代码。 此外,这些机制可以基于代码的目标部分是代码的数据并行部分还是代码的标量部分的确定来执行代码的目标部分,以及确定调用代码是否是数据并行代码 或标量代码。

    Livelock resolution
    10.
    发明授权
    Livelock resolution 有权
    Livelock分辨率

    公开(公告)号:US07861022B2

    公开(公告)日:2010-12-28

    申请号:US12393469

    申请日:2009-02-26

    摘要: A mechanism is provided for resolving livelock conditions in a multiple processor data processing system. When a bus unit detects a timeout condition, or potential timeout condition, the bus unit activates a livelock resolution request signal. A livelock resolution unit receives livelock resolution requests from the bus units and signals an attention to a control processor. The control processor performs actions to attempt to resolve the livelock condition. Once a bus unit that issued a livelock resolution request has managed to successfully issue its command, it deactivates its livelock resolution request. If all livelock resolution request signals are deactivated, then the control processor instructs the bus and all bus units to resume normal activity. On the other hand, if the control processor determines that a predetermined amount of time passes without any progress being made, it determines that a hang condition has occurred.

    摘要翻译: 提供了用于解决多处理器数据处理系统中的活动锁定状态的机制。 当总线单元检测到超时条件或潜在的超时条件时,总线单元激活一个动态锁定解析请求信号。 活动锁定解析单元从总线单元接收实时锁定解析请求并且向控制处理器发出注意。 控制处理器执行动作以尝试解决动态锁定状态。 一旦发出了一个活动锁解决方案请求的总线单元成功地发出了它的命令,它将取消激活其活动锁定解决请求。 如果所有活动锁定解析请求信号被去激活,则控制处理器指令总线和所有总线单元恢复正常活动。 另一方面,如果控制处理器确定预定量的时间通过而没有进行任何进展,则确定已经发生了挂起状况。