摘要:
A plurality of target field programmable gate arrays are interconnected in accordance with a connection topology and map portions of a target system. A control module is coupled to the plurality of target field programmable gate arrays. A balanced clock distribution network is configured to distribute a reference clock signal, and a balanced reset distribution network is coupled to the control module and configured to distribute a reset signal to the plurality of target field programmable gate arrays. The control module and the balanced reset distribution network are cooperatively configured to initiate and control a simulation of the target system with the plurality of target field programmable gate arrays. A plurality of local clock control state machines reside in the target field programmable gate arrays. The local clock control state machines are coupled to the balanced clock distribution network and obtain the reference clock signal therefrom. The plurality of local clock control state machines are configured to generate a set of synchronized free-running and stoppable clocks to maintain cycle-accurate and cycle-reproducible execution of the simulation of the target system. A method is also provided.
摘要:
A method, system and computer program product are disclosed for using a Field Programmable Gate Array (FPGA) to simulate operations of a device under test (DUT). The DUT includes a device memory having a number of input ports, and the FPGA is associated with a target memory having a second number of input ports, the second number being less than the first number. In one embodiment, a given set of inputs is applied to the device memory at a frequency Fd and in a defined cycle of time, and the given set of inputs is applied to the target memory at a frequency Ft. Ft is greater than Fd and cycle accuracy is maintained between the device memory and the target memory. In an embodiment, a cycle accurate model of the DUT memory is created by separating the DUT memory interface protocol from the target memory storage array.
摘要:
A method for adaptive runtime reconfiguration of a co-processor instruction set, in a computer system with at least a main processor communicatively connected to at least one reconfigurable co-processor, includes the steps of configuring the co-processor to implement an instruction set comprising one or more co-processor instructions, issuing a co-processor instruction to the co-processor, and determining whether the instruction is implemented in the co-processor. For an instruction not implemented in the co-processor instruction set, raising a stall signal to delay the main processor, determining whether there is enough space in the co-processor for the non-implemented instruction, and if there is enough space for said instruction, reconfiguring the instruction set of the co-processor by adding the non-implemented instruction to the co-processor instruction set. The stall signal is cleared and the instruction is executed.
摘要:
A method of (and system for) of transporting a sideband signal through a physical layer of an extended bridge, includes on a first node of the extended bridge, providing an interface to a sideband component coupled to a side of the extended bridge, encoding a first data stream being output from the sideband component with a unique header to identify the data output from the sideband component, and multiplexing the first data stream from the sideband component with a second data stream from a principal signal port, and outputting the multiplexed first and second data streams to another node of the extended bridge.
摘要:
A method for adaptive runtime reconfiguration of a co-processor instruction set, in a computer system with at least a main processor communicatively connected to at least one reconfigurable co-processor, includes the steps of configuring the co-processor to implement an instruction set comprising one or more co-processor instructions, issuing a co-processor instruction to the co-processor, and determining whether the instruction is implemented in the co-processor. For an instruction not implemented in the co-processor instruction set, raising a stall signal to delay the main processor, determining whether there is enough space in the co-processor for the non-implemented instruction, and if there is enough space for said instruction, reconfiguring the instruction set of the co-processor by adding the non-implemented instruction to the co-processor instruction set. The stall signal is cleared and the instruction is executed.
摘要:
A method for adaptive runtime reconfiguration of a co-processor instruction set, in a computer system with at least a main processor communicatively connected to at least one reconfigurable co-processor, includes the steps of configuring the co-processor to implement an instruction set comprising one or more co-processor instructions, issuing a co-processor instruction to the co-processor, and determining whether the instruction is implemented in the co-processor. For an instruction not implemented in the co-processor instruction set, raising a stall signal to delay the main processor, determining whether there is enough space in the co-processor for the non-implemented instruction, and if there is enough space for said instruction, reconfiguring the instruction set of the co-processor by adding the non-implemented instruction to the co-processor instruction set. The stall signal is cleared and the instruction is executed.
摘要:
Techniques are provided for hardware-accelerated relational joins. A first table comprising one or more rows is processed through a hardware accelerator. At least one join column in at least one of the one or more rows of the first table is hashed to set at least one bit in at least one bit vector. A second table comprising one or more rows is processed through a hardware accelerator. At least one join column in at least one of the one or more rows of the second table is hashed to generate at least one hash value. At least one bit vector is probed using the at least one hash value. A joined row is constructed responsive to the probing step. The row-construction step is performed in the hardware accelerator.
摘要:
Techniques are provided for hardware-accelerated relational joins. A first table comprising one or more rows is processed through a hardware accelerator. At least one join column in at least one of the one or more rows of the first table is hashed to set at least one bit in at least one bit vector. A second table comprising one or more rows is processed through a hardware accelerator. At least one join column in at least one of the one or more rows of the second table is hashed to generate at least one hash value. At least one bit vector is probed using the at least one hash value. A joined row is constructed responsive to the probing step. The row-construction step is performed in the hardware accelerator.
摘要:
A system for instruction memory storage and processing in a computing device having a processor, the system is based on backwards branch control information and comprises a dynamic loop buffer (DLB) which is a tagless array of data organized as a direct-mapped structure; a DLB controller having a primary memory unit partitioned into a plurality of banks for controlling the state of the instruction memory system and accepting a program counter address as an input, the DLB controller outputs distinct signals. The system further comprises an address register located in the memory of the computing device, it is a staging register for the program counter address and an instruction fetch process that takes two cycles of the processor clock; and a bank select unit for serving as a program counter address decoder to accept the program counter address and to output a bank enable signal for selecting a bank in a primary memory unit, and a decoded address for access within the selected bank.
摘要:
Generating a data feed specific parser circuit is provided. An input of a number of bytes of feed data associated with a particular data feed that the data feed specific parser circuit is to process is received. A feed format specification file that describes a data format of the particular data feed is parsed to generate an internal data structure of the feed format specification file. A minimum number of parallel pipeline stages in the data feed specific parser circuit to process the number of bytes of feed data associated with the particular data is determined based on the generated internal data structure of the feed format specification file. Then, a description of the data feed specific parser circuit with the determined number of parallel pipeline stages is generated.