DISTRIBUTED COMPUTING IN R
    2.
    发明申请

    公开(公告)号:US20180041562A1

    公开(公告)日:2018-02-08

    申请号:US15230050

    申请日:2016-08-05

    IPC分类号: H04L29/08 H04L12/24

    摘要: Examples disclosed herein relate to distributed computing in R. Some examples disclosed herein may include identifying a distributed multivariate apply (dmapply) operation and an invocation of a distributed computing backend and determining a function referenced in the dmapply operation. A distributed backend driver associated with the invoked distributed computing backend may translate the determined function to a function native to an R application programming interface (API) of the invoked distributed computing backend and may provide the translated function to the invoked distributed computing backend to perform the translated function on a distributed data set referenced in the dmapply operation.

    Processing streaming data with open executors
    4.
    发明授权
    Processing streaming data with open executors 有权
    使用开放执行程序处理流数据

    公开(公告)号:US09348580B2

    公开(公告)日:2016-05-24

    申请号:US14416610

    申请日:2012-10-26

    摘要: Processing streaming data with open executors includes receiving input data at a computation dataflow station where the computation dataflow station contains a computation file and an open executor that accepts code plug-ins, converting contents of the computation file into a program string with the code plug-ins from a system library, and launching the program string together with the input data to calculate an output with a graphics processing unit.

    摘要翻译: 使用开放式执行器处理流数据包括在计算数据流站接收输入数据,其中计算数据流站包含计算文件和接受代码插件的开放执行器,将计算文件的内容转换成具有代码插件的程序串, 并将程序串与输入数据一起启动,以使用图形处理单元计算输出。

    Singular value decompositions
    5.
    发明授权

    公开(公告)号:US10762101B2

    公开(公告)日:2020-09-01

    申请号:US15340218

    申请日:2016-11-01

    IPC分类号: G06F16/25 G06F16/215

    摘要: In one example in accordance with the present disclosure, a system comprises a computing node. The computing node comprises: a memory, and a processor to: execute a database in the memory, and invoke, with the database, singular value decomposition (SVD) on a data set. To invoke SVD, the processor may sparsify, with the database, the data set to produce a sparse data set, iteratively decompose, with the database, the data set to produce a set of eigenvalues, solve, with the database a linear system to produce a set of eigenvectors, and multiply, with the database, the eigenvectors with the data set to produce a data set of reduced dimension.

    QUERIES BASED ON RANGES OF HASH VALUES
    6.
    发明申请

    公开(公告)号:US20180268030A1

    公开(公告)日:2018-09-20

    申请号:US15762586

    申请日:2015-09-25

    IPC分类号: G06F17/30

    摘要: A system includes a database client, and a distributed database comprising database nodes. The distributed database may receive a database query from the client, determine that the query comprises a range of hash values of a table partition stored by a node of the distributed database, and determine that the range of hash values is not stored by other nodes of the distributed database. Responsive to determining that the range of hash values of the query is stored by the node and not by the other nodes, the database may generate an optimized distributed execution plan that includes the node that stores the range of hash values and excludes the nodes that do not include the range of hash values.

    PARALLELIZING SQL ON DISTRIBUTED FILE SYSTEMS
    7.
    发明申请
    PARALLELIZING SQL ON DISTRIBUTED FILE SYSTEMS 审中-公开
    在分布式文件系统上并行SQL

    公开(公告)号:US20170011090A1

    公开(公告)日:2017-01-12

    申请号:US15114328

    申请日:2014-03-31

    IPC分类号: G06F17/30

    摘要: Example embodiments relate to parallelizing structured query language (SQL) on distributed file systems. In example embodiments, a subquery of a distributed file system is received from a query engine, where the subquery is one of multiple subqueries that are scheduled to execute on a cluster of server nodes. At this stage, a user defined function that comprises local, role-based functionality is executed, where the partitioned magic table triggers parallel execution of the user defined function. The execution of the UDF determines a sequence number based on a quantity of the cluster of server nodes and retrieve nonconsecutive chunks from a file of the distributed file system, where each of the nonconsecutive chunks is offset by the sequence number.

    摘要翻译: 示例性实施例涉及在分布式文件系统上并行化结构化查询语言(SQL)。 在示例实施例中,从查询引擎接收分布式文件系统的子查询,其中子查询是被调度为在服务器节点集群上执行的多个子查询之一。 在此阶段,执行包含本地,基于角色的功能的用户定义的功能,其中分区魔术表触发并行执行用户定义的功能。 UDF的执行基于服务器节点的簇的数量来确定序列号,并从分布式文件系统的文件中检索非连续的块,其中每个非连续的块被序列号偏移。

    VISUALIZING TOPICS WITH BUBBLES INCLUDING PIXELS
    8.
    发明申请
    VISUALIZING TOPICS WITH BUBBLES INCLUDING PIXELS 审中-公开
    可视化主题包括像素的泡沫

    公开(公告)号:US20160371350A1

    公开(公告)日:2016-12-22

    申请号:US15114198

    申请日:2014-04-30

    IPC分类号: G06F17/30 G06T11/60

    摘要: Selected topics are identified from records based on scoring candidate terms in the records according to a user-specified metric and at least one further metric selected from among frequencies of occurrence of records pertaining to the respective candidate terms, and negativity of sentiment expressed with respect to the candidate terms in the records. A visualization is generated that includes bubbles representing the respective topics, the bubbles including pixels representing corresponding records, where a given one of the bubbles has a shape dependent upon a number of records represented by the given bubble and a time interval represented by the given bubble. Visual indicators are assigned to the pixels in the given bubble according to values of an attribute expressed in the corresponding records for the topic represented by the given bubble.

    摘要翻译: 所选择的主题是根据记录中根据用户指定的度量记录中的评分候选项的记录和从相应候选项的有关记录的出现频率中选出的至少一个另外的度量来识别的,以及相对于 记录中的候选词。 生成可视化,其包括表示相应主题的气泡,气泡包括表示对应记录的像素,其中给定的一个气泡具有取决于由给定气泡表示的记录数量的形状和由给定气泡表示的时间间隔 。 根据给定气泡表示的主题的相应记录中表示的属性的值,将可视指示符分配给给定气泡中的像素。

    Accessing electronic databases
    9.
    发明授权

    公开(公告)号:US10909119B2

    公开(公告)日:2021-02-02

    申请号:US15202636

    申请日:2016-07-06

    摘要: Examples disclosed herein relate to accessing electronic databases. Some examples disclosed herein may include partitioning a computation task into subtasks. A processing node of a computation engine may generate a database query for retrieving an electronic data segment associated with at least one of the subtasks from a database. The database query may include pre-processing instructions for a database management system (DBMS) associated with the database to pre-process the electronic data segment before providing the electronic data segment to the processing node. The pre-processing instructions may include at least one of: filtering, projection, join, aggregation, count, and user-defined instructions. The generated query may be provided to the DBMS.

    Parallelizing SQL on distributed file systems

    公开(公告)号:US10534770B2

    公开(公告)日:2020-01-14

    申请号:US15114328

    申请日:2014-03-31

    摘要: Example embodiments relate to parallelizing structured query language (SQL) on distributed file systems. In example embodiments, a subquery of a distributed file system is received from a query engine, where the subquery is one of multiple subqueries that are scheduled to execute on a cluster of server nodes. At this stage, a user defined function that comprises local, role-based functionality is executed, where the partitioned magic table triggers parallel execution of the user defined function. The execution of the UDF determines a sequence number based on a quantity of the cluster of server nodes and retrieve nonconsecutive chunks from a file of the distributed file system, where each of the nonconsecutive chunks is offset by the sequence number.