Abstract:
A multi-subgraph matching method and apparatus, and a device are provided. After receiving a plurality of query graphs, the multi-subgraph matching apparatus groups the plurality of query graphs based on a hash value of each query graph, to generate a plurality of groups of query graphs. A plurality of query graphs whose hash values fall within a same value range belong to a same group. Then, the multi-subgraph matching apparatus respectively matches the plurality of groups of query graphs with a data graph in parallel, to obtain matching results. The matching results are matching results between the plurality of query graphs and the data graph. According to the multi-subgraph matching method in this application, grouping efficiency can be increased, and subgraph matching efficiency can be effectively increased.
Abstract:
A stored-procedure execution method includes receiving an execution request, where the execution request is used to request to execute a task including at least two stored procedures, requesting at least two threads, and dispatching each stored procedure in the task to one of the at least two threads for execution, receiving Structured Query Language (SQL) statements sent by the at least two threads when the at least two threads execute the stored procedures included in the task, and grouping and caching the received SQL statements based on a same access characteristic, and for an SQL statement cache group that satisfies a preset trigger condition, calling an SQL statement execution engine to execute an SQL statement in the SQL statement cache group.
Abstract:
Embodiments of the present invention provide a data object processing method and apparatus, which can divide a data object into one or more blocks; calculate a sample compression ratio of each block, aggregate neighboring consecutive blocks with a same sample compression ratio characteristic into one data segment, and obtain the sample compression ratio of each of the data segments; and select, according to a length range to which a length of each of the data segments belongs and a compression ratio range to which the sample compression ratio of each of the data segments belongs, an expected length to divide the data segment into data chunks, where the sample compression ratio of each of the data segments uniquely belongs to one of the compression ratio ranges, and the length of each of the data segments uniquely belongs to one of the length ranges.
Abstract:
A data query method and apparatus, and a database system, where the method includes receiving a data query request, generating an original query plan according to the data query request, obtaining a candidate query plan set according to the original query plan, restructuring a join predicate in the original query plan, determining basic information of a restructured join predicate in the original query plan, determining a constraint condition of the restructured join predicate in the original query plan, determining an equal-cost query plan for the original query plan according to the basic information of the restructured join predicate in the original query plan and the constraint condition of the restructured join predicate in the original query plan, and performing querying according to the equal-cost query plan for the original query plan. Hence, data query performance can be improved.
Abstract:
A metadata updating method based on columnar storage in a distributed file system includes acquiring to-be-updated metadata in a data table, splitting data records of the data table into multiple row groups on a row basis, converting the data table into global file metadata and multiple row group files, where the row group file includes an actual data block, a data index block, a local metadata block, a metadata index block, and a file footer, determining whether the to-be-updated metadata belongs to the global file metadata, updating local metadata when the to-be-updated metadata does not belong to the global file metadata, and adding an updated local metadata block, an updated metadata index block, and an updated file footer to the multiple row group files according to updated local metadata. Dynamic updating of metadata saves time of executing an updating operation of this type and needed computing resources.
Abstract:
A distributed database synchronization method and system. A distributed database includes a master server cluster and a backup server cluster, where the master server cluster includes a first master node and a second master node, and the backup server cluster includes a first backup node and a second backup node. The method includes: generating a hash tree of the master server cluster and a hash tree of the backup server cluster; determining a range hash tree of the second master node and a range hash tree of the second backup node that have inconsistent range hash values; determining a data unit to be synchronized in the second master node and a data unit to be synchronized in the second backup node; and performing data synchronization. Because data units to be synchronized are determined separately and simultaneously in multiple nodes, thereby improving efficiency of data synchronization.
Abstract:
Embodiments of the present invention disclose a method and apparatus of cache management for a non-volatile storage device. The method embodiment includes: determining a size relationship between a capacity sum of a clean page subpool and a dirty page subpool and a cache capacity; determining, when the capacity sum is equal to the cache capacity, whether identification information of a to-be-accessed page is in a history list of clean pages or a history list of dirty pages; and when it is determined that the identification information of the to-be-accessed page is in the history list of clean pages, adding a first adjustment value to a clean page subpool capacity threshold; and when the identification information of the to-be-accessed page is in the history list of dirty pages, subtracting a second adjustment value from the clean page subpool capacity threshold.
Abstract:
A data query method and apparatus, and a database system, where the method includes receiving a data query request, generating an original query plan according to the data query request, obtaining a candidate query plan set according to the original query plan, restructuring a join predicate in the original query plan, determining basic information of a restructured join predicate in the original query plan, determining a constraint condition of the restructured join predicate in the original query plan, determining an equal-cost query plan for the original query plan according to the basic information of the restructured join predicate in the original query plan and the constraint condition of the restructured join predicate in the original query plan, and performing querying according to the equal-cost query plan for the original query plan. Hence, data query performance can be improved.
Abstract:
A method for scheduling a data flow task and an apparatus. The method includes: preprocessing a data flow task to obtain at least one subtask; classifying the subtask into a central processing unit (CPU) task group, a graphics processing unit (GPU) task group, or a to-be-determined task group; allocating the subtask to a working node; when the subtask belongs to the CPU task group, determining that a CPU executes the subtask; when the subtask belongs to the GPU task group, determining that a GPU executes the subtask; or when the subtask belongs to the to-be-determined task group, determining, according to costs of executing the subtask by a CPU and a GPU, a running platform (e.g., the CPU or the GPU) executes the subtask, where the cost includes duration of executing the subtask.
Abstract:
A method for scheduling a data flow task and an apparatus. The method includes: preprocessing a data flow task to obtain at least one subtask; classifying the subtask into a central processing unit (CPU) task group, a graphics processing unit (GPU) task group, or a to-be-determined task group; allocating the subtask to a working node; when the subtask belongs to the CPU task group, determining that a CPU executes the subtask; when the subtask belongs to the GPU task group, determining that a GPU executes the subtask; or when the subtask belongs to the to-be-determined task group, determining, according to costs of executing the subtask by a CPU and a GPU, a running platform (e.g., the CPU or the GPU) executes the subtask, where the cost includes duration of executing the subtask.