摘要:
An associative cache memory, comprising: an array of storage elements arranged as M sets by N ways; an allocation unit allocates the storage elements in response to memory accesses that miss in the cache memory. Each memory access selects a set. Each memory access has an associated memory access type (MAT) of a plurality of predetermined MATs. Each valid storage element has an associated MAT; a mapping that includes, for each MAT, a MAT priority. In response to a memory access that misses in the array, the allocation unit: determines a most eligible way and a second most eligible way of the selected set for replacement based on a replacement policy; and replaces the second most eligible way rather than the most eligible way when the MAT priority of the most eligible way is greater than the MAT priority of the second most eligible way.
摘要:
A set associative cache memory, comprising: an array of storage elements arranged as M sets by N ways; an allocation unit that allocates the storage elements in response to memory accesses that miss in the cache memory. Each memory access selects a set; for each parcel of a plurality of parcels, a parcel specifier specifies: a subset of ways of the N ways included in the parcel. The subsets of ways of parcels associated with a selected set are mutually exclusive; a replacement scheme associated with the parcel from among a plurality of predetermined replacement schemes. For each memory access, the allocation unit: selects the parcel specifier in response to the memory access; and uses the replacement scheme associated with the parcel to allocate into the subset of ways of the selected set included in the parcel.
摘要:
The present invention provides a method and a caching node entity for ensuring at least a predetermined number of a content object to be kept stored in a network, comprising a plurality of cache nodes for storing copies of content objects. The present invention makes use of ranking states values, deletable or non-deletable, which when assigned to copies of content objects are indicating whether a copy is either deletable or non-deletable. At least one copy of each content object is assigned the value non-deletable. The value for a copy of a content object changing from deletable to non-deletable in one cache node of the network, said copy being a candidate for the value non-deletable, if a certain condition is fulfilled.
摘要:
A computing method includes accepting a definition of a computing task (68), which includes multiple atomic Processing Elements (PEs - 76) having execution dependencies (80). Each execution dependency specifies that a respective first PE is to be executed before a respective second PE. The computing task is compiled for concurrent execution on a multiprocessor device (32), which includes multiple processors (44) that are capable of executing a first number of the PEs simultaneously, by arranging the PEs, without violating the execution dependencies, in an invocation data structure (90) including a second number of execution sequences (98) that is greater than one but does not exceed the first number. The multiprocessor device is invoked to run software code that executes the execution sequences in parallel responsively to the invocation data structure, so as to produce a result of the computing task.
摘要:
A formalized method and a design system are described for part of the design decisions, related to memory, involved while designing an essentially digital device. The method and system determine an optimized memory organization starting from a representation of said digital device, the representation describing the functionality of the digital device and comprising data access instructions on basic groups, which are groups of scalar signals. The method and system determine optimized scheduling intervals of said data access instructions such that execution of said functionality with the digital device is guaranteed to be within a predetermined cycle budget, the determining of the optimized scheduling intervals comprising optimizing access conflicts with respect to an evaluation criterion related to the memory cost of said digital device. An optimized memory organization is selected in accordance with the optimized scheduling intervals and the optimized access conflicts.
摘要:
Methods and apparatus for reducing memory latency in a software application are disclosed. A disclosed system uses one or more helper threads to prefetch variables for a main thread to reduce performance bottlenecks due to memory latency and/or a cache miss. A performance analysis tool is used to profile the software application's resource usage and identifies areas in the software application experiencing performance bottlenecks. Compiler-runtime instructions are generated into the software application to create and manage the helper thread. The helper thread prefetches data in the identified areas of the software application experiencing performance bottlenecks. A counting mechanism is inserted into the helper thread and a counting mechanism is inserted into the main thread to coordinate the execution of the helper thread with the main thread and to help ensure the prefetched data is not removed from the cache before the main thread is able to take advantage of the prefetched data.
摘要:
The apparatus includes a virtual multiprocessor context, one or more virtual processing element contexts, and configuration logic. The virtual multiprocessor context, prescribes the resources, and controls a configuration state of the virtual multiprocessor. The one or more virtual processing element contexts each exclusively correspond to one of the one or more virtual processing elements. The one or more virtual processing element contexts each have first logic, for prescribing whether the one of the one or more virtual processing elements is permitted to configure the resources; and second logic, for prescribing a subset of the resources that is allocated to said one of the one or more virtual processing elements. The configuration logic is coupled to the virtual multiprocessor context and the one or more virtual processing element contexts. The configuration logic detects whether the one of the one or more virtual processing elements is permitted to configure the resources, updates the virtual multiprocessor context to direct that the virtual multiprocessor enter the configuration state, and configures the resources by updating a prescribed virtual processing element context.