ASSYMETRIC ALLOCATION OF SRAM AND DATA LAYOUT FOR EFFICIENT MATRIX MULTIPLICATION

    公开(公告)号:US20190095399A1

    公开(公告)日:2019-03-28

    申请号:US15716225

    申请日:2017-09-26

    Abstract: Techniques are described herein for performing efficient matrix multiplication in architectures with scratchpad memories or associative caches using asymmetric allocation of space for the different matrices. The system receives a left matrix and a right matrix. In an embodiment, the system allocates, in a scratchpad memory, asymmetric memory space for tiles for each of the two matrices as well as a dot product matrix. The system proceeds with then performing dot product matrix multiplication involving the tiles of the left and the right matrices, storing resulting dot product values in corresponding allocated dot product matrix tiles. The system then proceeds to write the stored dot product values from the scratchpad memory into main memory.

    Row identification number generation in database direct memory access engine

    公开(公告)号:US10176114B2

    公开(公告)日:2019-01-08

    申请号:US15362693

    申请日:2016-11-28

    Abstract: Techniques provide for hardware accelerated data movement between main memory and an on-chip data movement system that comprises multiple core processors that operate on the tabular data. The tabular data is moved to or from the scratch pad memories of the core processors. While the data is in-flight, the data may be manipulated by data manipulation operations. The data movement system includes multiple data movement engines, each dedicated to moving and transforming tabular data from main memory data to a subset of the core processors. Each data movement engine is coupled to an internal memory that stores data (e.g. a bit vector) that dictates how data manipulation operations are performed on tabular data moved from a main memory to the memories of a core processor, or to and from other memories. The internal memory of each data movement engine is private to the data movement engine. Tabular data is efficiently copied between internal memories of the data movement system via a copy ring that is coupled to the internal memories of the data movement system and/or is coupled to a data movement engine. Also, a data movement engine internally broadcasts data to other data movement engines, which then transfer the data to respective core processors. Partitioning may also be performed by the hardware of the data movement system. Techniques are used to partition data “in flight”. The data movement system also generates a column of row identifiers (RIDs). A row identifier is a number treated as identifying a row or element's position within a column. Row identifiers each identifying a row in column are also generated.

    Adaptive resolution hsitogram
    93.
    发明授权

    公开(公告)号:US10146806B2

    公开(公告)日:2018-12-04

    申请号:US14621204

    申请日:2015-02-12

    Abstract: A method, apparatus, and system for determining a data distribution is provided by using an adaptive resolution histogram. In an embodiment, the adaptive resolution histogram is created using a trie, wherein node values in the trie represent frequency distributions and node positions define associated keys or key prefixes. Keys are derived from input data such as database records that are streamed from a record source. These keys may be processed as received to build the trie in parallel with the production of the input data. To provide adaptive resolution, new child nodes may only be created in the trie when a node value is incremented beyond a predetermined threshold. In this manner, the histogram adjusts the allocation of nodes according to the actual distribution of the data. The completed adaptive resolution histogram may be used for various tasks such as partitioning for balanced parallel processing of the input data.

    SCALABLE DISTRIBUTED COMPUTATION FRAMEWORK FOR DATA-INTENSIVE COMPUTER VISION WORKLOADS

    公开(公告)号:US20180288384A1

    公开(公告)日:2018-10-04

    申请号:US15471710

    申请日:2017-03-28

    Abstract: Techniques described herein provide methods and systems for scalable distribution of computer vision workloads. In an embodiment, a method comprises receiving, at each of a first node and a second node of a distributed system of nodes, two images. The first image comprises a first set of pixels and the second image comprising a second set of pixels. The method further comprises shifting, at the first node, each pixel of the first set of pixels of the first image in a uniform direction by a first number of pixels to form a first shifted image and shifting, at the second node, each pixel of the first set of pixels of the first image in the uniform direction by a second number of pixels to form a second shifted image. The second number of pixels is different from the first number of pixels. The method further comprises overlaying each of the first shifted image and the second shifted image with the second image, such that each pixel of the first shifted image and second shifted image has a corresponding pixel in the second image. The method further comprises creating, at the first node, a first disparity map that indicates, for each pixel of the first shifted image, a level of similarity between the pixel of the first shifted image and the corresponding pixel in the second image and creating, at the second node, a second disparity map that indicates, for each pixel of the second shifted image, a level of similarity between the pixel of the second shifted image and the corresponding pixel in the second image.

    Tuple encoding aware direct memory access engine for scratchpad enabled multicore processors

    公开(公告)号:US10061714B2

    公开(公告)日:2018-08-28

    申请号:US15073905

    申请日:2016-03-18

    Abstract: Techniques are described herein for efficient movement of data from a source memory to a destination memory. In an embodiment, in response to a particular memory location being pushed into a first register within a first register space, the first set of electronic circuits accesses a descriptor stored at the particular memory location. The descriptor indicates a width of a column of tabular data, a number of rows of tabular data, and one or more tabular data manipulation operations to perform on the column of tabular data. The descriptor also indicates a source memory location for accessing the tabular data and a destination memory location for storing data manipulation result from performing the one or more data manipulation operations on the tabular data. Based on the descriptor, the first set of electronic circuits determines control information indicating that the one or more data manipulation operations are to be performed on the tabular data and transmits the control information, using a hardware data channel, to a second set of electronic circuits to perform the one or more operations. Based on the control information, the second set of electronic circuits retrieve the tabular data from source memory location and apply the one or more data manipulation operations to generate the data manipulation result. The second set of electronic circuits cause the data manipulation result to be stored at the destination memory location.

    Technique for skipping irrelevant portions of documents during streaming XPath evaluation

    公开(公告)号:US10037311B2

    公开(公告)日:2018-07-31

    申请号:US14231491

    申请日:2014-03-31

    CPC classification number: G06F17/2247 G06F16/835 G06F17/22

    Abstract: A method and apparatus are described for summarizing a document. For each node in the document that satisfies a marking criteria, a start and end mark pair is stored in a summary in document order. The start mark specifies a location in the document where the node starts, and the end mark specifies a location in the document where the node ends. When evaluating a query for a hierarchical path, the document is streamed into memory until the mark of a tag matches a start mark in the summary. If that tag does not fit within the path, then streaming of the document may resume at the end mark, thereby skipping the node during streaming evaluation. Translation information may be used to indicate a logical position relative to the marks in the summary when the document is modified.

    Efficient file access in a large repository using a two-level cache
    98.
    发明授权
    Efficient file access in a large repository using a two-level cache 有权
    使用两级缓存在大型存储库中高效地进行文件访问

    公开(公告)号:US09256607B2

    公开(公告)日:2016-02-09

    申请号:US13692014

    申请日:2012-12-03

    CPC classification number: G06F17/30097 G06F12/0811 G06F12/084 G06F17/30929

    Abstract: A two-level cache to facilitate resolving resource path expressions for a hierarchy of resources is described, which includes a system-wide shared cache and a session-level cache. The shared cache is organized as a hierarchy of hash tables that mirrors the structure of a repository hierarchy. A particular hash table in a shared cache includes information for the child resources of a particular resource. A database management system that manages a shared cache may control the amount of memory used by the cache by implementing a replacement policy for the cache based on one or more characteristics of the resources in the repository. The session-level cache is a single level cache in which information for target resources of resolved path expressions may be tracked. In the session-level cache, the resource information is associated with the entire path expression of the associated resource.

    Abstract translation: 描述了用于促进解决资源层级的资源路径表达式的两级缓存,其包括系统范围共享高速缓存和会话级缓存。 共享缓存被组织为映射存储库层次结构的散列表的层次结构。 共享缓存中的特定哈希表包括特定资源的子资源的信息。 管理共享高速缓存的数据库管理系统可以基于存储库中的资源的一个或多个特性来实现对高速缓存的替换策略来控制高速缓存所使用的存储器量。 会话级缓存是单级缓存,其中可以跟踪解析的路径表达式的目标资源的信息。 在会话级缓存中,资源信息与相关资源的整个路径表达式相关联。

Patent Agency Ranking