摘要:
In some embodiments, a motion estimation search window cache is adaptively re-organized according to frame properties including a frame width and a number of reference frames corresponding to the current frame to be encoded/decoded. The cache reorganization may include an adaptive mapping of reference frame locations to search window cache allocation units (addresses). In some embodiments, a search window is shaped as a quasi-rectangle with truncated upper left and lower right corners, having a full-frame horizontal extent. A search range is defined in a central region of the search window, and is laterally bounded by the truncated corners.
摘要:
A method for processing a plurality of sub-blocks in a block of video is disclosed. The method generally includes the steps of (A) intra predicting a first group of the sub-blocks in a first quadrant of the block, (B) intra predicting a second group of the sub-blocks in a second quadrant of the block after starting the intra predicting of the first group and (C) intra predicting a third group of the sub-blocks in the first quadrant after starting the intra predicting of the second group, wherein the first group and the third group together account for all of the sub-blocks in the first quadrant.
摘要:
Described systems and methods allow a reduction in the memory bandwidth required in video coding (decoding/encoding) applications. According to a first aspect, the data assigned to each memory word is chosen to correspond to a 2D subarray of a larger array such as a macroblock. An array memory word organization allows reducing both the average and worst-case bandwidth required to retrieve predictions from memory in video coding applications, particularly for memory word sizes (memory bus widths) larger than the size of typical predictions. According to a second aspect, two or more 2D subarrays such as video predictions are retrieved from memory simultaneously as part of a larger 2D array, if retrieving the larger array requires fewer clock cycles than retrieving the subarrays individually. Allowing the combination of multiple predictions in one memory access operation can lead to a reduction in the average bandwidth required to retrieve predictions from memory.
摘要:
A memory controller for a multi-bank random access memory (RAM) such as SDRAM includes a transaction slicer for slicing complex client transactions into simple slices, and a command scheduler for re-ordering preparatory memory commands such as activate and precharge in an order that can be different from the order of the corresponding client transactions. The command scheduler may also re-order memory access commands such as read and write. The slicing and out-of-order command scheduling allow a reduction in memory latency. The data transfer to and from clients can be kept in order.
摘要:
An integrated circuit is designed by interconnecting pre-designed data-driven cores (intellectual property, functional blocks). Hardware description language (e.g. Verilog or VHDL) and software language (e.g. C or C++) code for interconnecting the cores is automatically generated by software tools from a central circuit specification. The central specification recites pre-designed hardware cores (intellectual property) and the interconnections between the cores. HDL and software language test benches, and timing constraints are also automatically generated from the central specification. The automatic generation of code simplifies the interconnection of pre-existing cores for the design of complex integrated circuits.
摘要:
Pre-designed and verified data-driven hardware cores (intellectual property, functional blocks) are assembled to generate large systems on a single chip. Token transfer between cores is achieved upon synchronous assertion, over dedicated connections, of a one-bit ready signal by the transmitter and a one-bit request signal by the receiver. The ready-request signal handshake is necessary and sufficient for token transfer. There are no combinational paths through the cores, and no latches or master controller are used. The architecture and interface allow a significant simplification in the design and verification of large systems integrated on a single chip.
摘要:
According to some embodiments, a multithreaded microcontroller includes a thread control unit comprising thread control hardware (logic) configured to perform a number of multithreading system calls essentially in real time, e.g. in one or a few clock cycles. System calls can include mutex lock, wait condition, and signal instructions. The thread controller includes a number of thread state, mutex, and condition variable registers used for executing the multithreading system calls. Threads can transition between several states including free, run, ready and wait. The wait state includes interrupt, condition, mutex, I-cache, and memory substrates. A thread state transition controller controls thread states, while a thread instructions execution unit executes multithreading system calls and manages thread priorities to avoid priority inversion. A thread scheduler schedules threads according to their priorities. A hardware thread profiler including global, run and wait profiler registers is used to monitor thread performance to facilitate software development.
摘要:
Described systems and methods allow a reduction in the memory bandwidth required in video coding (decoding/encoding) applications. According to a first aspect, the data assigned to each memory word is chosen to correspond to a 2D subarray of a larger array such as a macroblock. An array memory word organization allows reducing both the average and worst-case bandwidth required to retrieve predictions from memory in video coding applications, particularly for memory word sizes (memory bus widths) larger than the size of typical predictions. According to a second aspect, two or more 2D subarrays such as video predictions are retrieved from memory simultaneously as part of a larger 2D array, if retrieving the larger array requires fewer clock cycles than retrieving the subarrays individually. Allowing the combination of multiple predictions in one memory access operation can lead to a reduction in the average bandwidth required to retrieve predictions from memory.
摘要:
According to some embodiments, a multithreaded microcontroller includes a thread control unit comprising thread control hardware (logic) configured to perform a number of multithreading system calls essentially in real time, e.g. in one or a few clock cycles. System calls can include mutex lock, wait condition, and signal instructions. The thread controller includes a number of thread state, mutex, and condition variable registers used for executing the multithreading system calls. Threads can transition between several states including free, run, ready and wait. The wait state includes interrupt, condition, mutex, I-cache, and memory substrates. A thread state transition controller controls thread states, while a thread instructions execution unit executes multithreading system calls and manages thread priorities to avoid priority inversion. A thread scheduler schedules threads according to their priorities. A hardware thread profiler including global, run and wait profiler registers is used to monitor thread performance to facilitate software development.
摘要:
According to some embodiments, a multithreaded microcontroller includes a thread control unit comprising thread control hardware (logic) configured to perform a number of multithreading system calls essentially in real time, e.g. in one or a few clock cycles. System calls can include mutex lock, wait condition, and signal instructions. The thread controller includes a number of thread state, mutex, and condition variable registers used for executing the multithreading system calls. Threads can transition between several states including free, run, ready and wait. The wait state includes interrupt, condition, mutex, I-cache, and memory substates. A thread state transition controller controls thread states, while a thread instructions execution unit executes multithreading system calls and manages thread priorities to avoid priority inversion. A thread scheduler schedules threads according to their priorities. A hardware thread profiler including global, run and wait profiler registers is used to monitor thread performance to facilitate software development.