摘要:
In one embodiment, a processor includes a performance monitor including a last branch record (LBR) stack to store a call stack to an event of interest, where the call stack is collected responsive to a trigger for the event. The processor further includes logic to control the LBR stack to operate in a call stack mode such that an entry to a call instruction for a leaf function is cleared on return from the leaf function. Other embodiments are described and claimed.
摘要:
In one embodiment, a processor includes a performance monitor including a last branch record (LBR) stack to store a call stack to an event of interest, where the call stack is collected responsive to a trigger for the event. The processor further includes logic to control the LBR stack to operate in a call stack mode such that an entry to a call instruction for a leaf function is cleared on return from the leaf function. Other embodiments are described and claimed.
摘要:
Embodiments of the present invention perform efficient decoding of variable length codes statically defined by a coding standard for a wide range of source data. According to the disclosed method, special data structures (decoding tables) are created. A bit set size is associated with each decoding table. Each decoding table contains a decoded value, actual code length, reference to another table (from the set of created tables), and validity indicator for each bit combination that can be formed from the number of bits equal to the bit set size. An active decoding table is selected. Then the number of bits equal to the bit set size associated with the active decoding table is read from a bit stream. The active decoding table is indexed with the actual value of bits read to obtain the decoded value, actual code length, reference to another table, and validity indicator. The validity indicator is then checked to determine whether the decoded value obtained is valid. If the decoded value is indicated to be invalid, the decoding table that is referenced by the currently active table is selected to become active, and the decoding process continues. Otherwise, the bit steam is adjusted in accordance with the actual code length obtained and the hit set size associated with the decoding tables that were active during the decoding. The decoded value is then returned.
摘要:
Efficient performance monitoring for symmetric multi-threading systems is applicable to systems that have limited performance monitoring resources and enables efficient resource sharing on a per-execution unit basis. The performance monitoring unit being shared is programmed to reset its counter and to start performance monitoring operation if there is only one execution unit requesting this operation. In case there are several requests pending, an attempt is made to program the performance monitoring unit to collect performance data for a subset of execution units the hardware is capable to support. Upon a request to stop performance monitoring operation the previously allocated indicator may be removed, and the performance monitoring unit may be programmed to stop operating if there are no more active or pending requests. In case the performance monitoring was inactive for the current execution unit, this request may be discarded, and no performance data may be returned.
摘要:
The method disclosed may be used together with any prefix oriented decoding method to enable faster decoding of variable length codes when a subset of most frequently used codes with relatively short prefixes may be determined. An embodiment of the present invention reads a number of bits, not less than the maximal possible length of a code, from a bit stream. Then a predetermined number of bits is selected and used as an index to a data structure that contains at least a decoded value and a validity indicator, along with other pre-decoded data, namely: prefix type and length, maximal code length for a group of codes, actual code length, the number of bits to return to the bit stream, etc. The validity indicator is used to determine whether to proceed with the decoding operation, or obtain the valid decoded value from the data structure and return excess bits to the bit stream. If the decoded value is indicated to be invalid, the decoding operation is continued, and a decoding method that estimates the length of the code prefix and the number of significant bits corresponding to the length estimated is applied to the bits initially read from the bit stream.
摘要:
Embodiments of the present invention provide for concurrent instruction execution in heterogeneous computer systems by forming a parallel execution context whenever a first software thread encounters a parallel execution construct. The parallel execution context may comprise a reference to instructions to be executed concurrently, a reference to data said instructions may depend on, and a parallelism level indicator whose value specifies the number of times said instructions are to be executed. The first software thread may then signal to other software threads to begin concurrent execution of instructions referenced in said context. Each software thread may then decrease the parallelism level indicator and copy data referenced in the parallel execution context to said thread's private memory location and modify said data to accommodate for the new location. Software threads may be executed by a processor and operate on behalf of other processing devices or remote computer systems.
摘要:
One embodiment provides an apparatus. The apparatus includes collector circuitry to capture processor trace (PT) data from a PT driver. The PT data includes a first target instruction pointer (TIP) packet including a first runtime target address of an indirect branch instruction of an executing target application. The apparatus further includes decoder circuitry to extract the first TIP packet from the PT data and to decode the first TIP packet to yield the first runtime target address. The apparatus further includes control flow validator circuitry to determine whether a control flow transfer to the first runtime target address corresponds to a control flow violation based, at least in part, on a control flow graph (CFG). The CFG including a plurality of nodes, each node including a start address of a first basic block, an end address of the first basic block and a next possible address of a second basic block or a not found tag.
摘要:
Embodiments of the present invention perform efficient decoding of variable length codes statically defined by a coding standard for a wide range of source data. According to the disclosed method, special data structures (decoding tables) are created. A bit set size is associated with each decoding table. Each decoding table contains a decoded value, actual code length, reference to another table (from the set of created tables), and validity indicator for each bit combination that can be formed from the number of bits equal to the bit set size. An active decoding table is selected. Then the number of bits equal to the bit set size associated with the active decoding table is read from a bit stream. The active decoding table is indexed with the actual value of bits read to obtain the decoded value, actual code length, reference to another table, and validity indicator. The validity indicator is then checked to determine whether the decoded value obtained is valid. If the decoded value is indicated to be invalid, the decoding table that is referenced by the currently active table is selected to become active, and the decoding process continues. Otherwise, the bit steam is adjusted in accordance with the actual code length obtained and the hit set size associated with the decoding tables that were active during the decoding. The decoded value is then returned.