Abstract:
Techniques are described herein for performing efficient matrix multiplication in architectures with scratchpad memories or associative caches using asymmetric allocation of space for the different matrices. The system receives a left matrix and a right matrix. In an embodiment, the system allocates, in a scratchpad memory, asymmetric memory space for tiles for each of the two matrices as well as a dot product matrix. The system proceeds with then performing dot product matrix multiplication involving the tiles of the left and the right matrices, storing resulting dot product values in corresponding allocated dot product matrix tiles. The system then proceeds to write the stored dot product values from the scratchpad memory into main memory.
Abstract:
Techniques provide for hardware accelerated data movement between main memory and an on-chip data movement system that comprises multiple core processors that operate on the tabular data. The tabular data is moved to or from the scratch pad memories of the core processors. While the data is in-flight, the data may be manipulated by data manipulation operations. The data movement system includes multiple data movement engines, each dedicated to moving and transforming tabular data from main memory data to a subset of the core processors. Each data movement engine is coupled to an internal memory that stores data (e.g. a bit vector) that dictates how data manipulation operations are performed on tabular data moved from a main memory to the memories of a core processor, or to and from other memories. The internal memory of each data movement engine is private to the data movement engine. Tabular data is efficiently copied between internal memories of the data movement system via a copy ring that is coupled to the internal memories of the data movement system and/or is coupled to a data movement engine. Also, a data movement engine internally broadcasts data to other data movement engines, which then transfer the data to respective core processors. Partitioning may also be performed by the hardware of the data movement system. Techniques are used to partition data “in flight”. The data movement system also generates a column of row identifiers (RIDs). A row identifier is a number treated as identifying a row or element's position within a column. Row identifiers each identifying a row in column are also generated.
Abstract:
A method, apparatus, and system for determining a data distribution is provided by using an adaptive resolution histogram. In an embodiment, the adaptive resolution histogram is created using a trie, wherein node values in the trie represent frequency distributions and node positions define associated keys or key prefixes. Keys are derived from input data such as database records that are streamed from a record source. These keys may be processed as received to build the trie in parallel with the production of the input data. To provide adaptive resolution, new child nodes may only be created in the trie when a node value is incremented beyond a predetermined threshold. In this manner, the histogram adjusts the allocation of nodes according to the actual distribution of the data. The completed adaptive resolution histogram may be used for various tasks such as partitioning for balanced parallel processing of the input data.
Abstract:
Techniques described herein provide methods and systems for scalable distribution of computer vision workloads. In an embodiment, a method comprises receiving, at each of a first node and a second node of a distributed system of nodes, two images. The first image comprises a first set of pixels and the second image comprising a second set of pixels. The method further comprises shifting, at the first node, each pixel of the first set of pixels of the first image in a uniform direction by a first number of pixels to form a first shifted image and shifting, at the second node, each pixel of the first set of pixels of the first image in the uniform direction by a second number of pixels to form a second shifted image. The second number of pixels is different from the first number of pixels. The method further comprises overlaying each of the first shifted image and the second shifted image with the second image, such that each pixel of the first shifted image and second shifted image has a corresponding pixel in the second image. The method further comprises creating, at the first node, a first disparity map that indicates, for each pixel of the first shifted image, a level of similarity between the pixel of the first shifted image and the corresponding pixel in the second image and creating, at the second node, a second disparity map that indicates, for each pixel of the second shifted image, a level of similarity between the pixel of the second shifted image and the corresponding pixel in the second image.
Abstract:
Techniques are described herein for efficient movement of data from a source memory to a destination memory. In an embodiment, in response to a particular memory location being pushed into a first register within a first register space, the first set of electronic circuits accesses a descriptor stored at the particular memory location. The descriptor indicates a width of a column of tabular data, a number of rows of tabular data, and one or more tabular data manipulation operations to perform on the column of tabular data. The descriptor also indicates a source memory location for accessing the tabular data and a destination memory location for storing data manipulation result from performing the one or more data manipulation operations on the tabular data. Based on the descriptor, the first set of electronic circuits determines control information indicating that the one or more data manipulation operations are to be performed on the tabular data and transmits the control information, using a hardware data channel, to a second set of electronic circuits to perform the one or more operations. Based on the control information, the second set of electronic circuits retrieve the tabular data from source memory location and apply the one or more data manipulation operations to generate the data manipulation result. The second set of electronic circuits cause the data manipulation result to be stored at the destination memory location.
Abstract:
A method for storing XML documents a hybrid navigation/streaming format is provided to allow efficient storage and processing of queries on the XML data that provides the benefits of both navigation and streaming and ameliorates the disadvantages of each. Each XML document to be stored is independently analyzed to determine a combination of navigable and streamable storage format that optimizes the processing of the data for anticipated access patterns.
Abstract:
A method and apparatus are described for summarizing a document. For each node in the document that satisfies a marking criteria, a start and end mark pair is stored in a summary in document order. The start mark specifies a location in the document where the node starts, and the end mark specifies a location in the document where the node ends. When evaluating a query for a hierarchical path, the document is streamed into memory until the mark of a tag matches a start mark in the summary. If that tag does not fit within the path, then streaming of the document may resume at the end mark, thereby skipping the node during streaming evaluation. Translation information may be used to indicate a logical position relative to the marks in the summary when the document is modified.
Abstract:
A two-level cache to facilitate resolving resource path expressions for a hierarchy of resources is described, which includes a system-wide shared cache and a session-level cache. The shared cache is organized as a hierarchy of hash tables that mirrors the structure of a repository hierarchy. A particular hash table in a shared cache includes information for the child resources of a particular resource. A database management system that manages a shared cache may control the amount of memory used by the cache by implementing a replacement policy for the cache based on one or more characteristics of the resources in the repository. The session-level cache is a single level cache in which information for target resources of resolved path expressions may be tracked. In the session-level cache, the resource information is associated with the entire path expression of the associated resource.