摘要:
Embodiments for processing data with multiple machine learning models are provided. Input data is received. The input data is caused to be evaluated by a first machine learning model to generate a first inference result. The first inference result is compared to at least one quality of service (QoS) parameter. Based on the comparison of the first inference result to the at least one QoS parameter, the input data is caused to be evaluated by a second machine learning model to generate a second inference result.
摘要:
Embodiments for providing failure tolerance to containerized applications by one or more processors. A layered filesystem is initialized to maintain checkpoint information of stateful processes in separate and exclusive layers on individual containers. A most recent checkpoint layer is transferred from a main container exclusively to an additional node to maintain an additional, shadow container.
摘要:
A computer implemented method for optimizing performance of workflow includes associating each of a plurality of workflow nodes in a workflow with a data cache and managing the data cache on a local storage device on one of one or more compute nodes. A scheduler can request execution of the tasks of a given one of the plurality of workflow nodes on one of the one of more compute nodes that hosts the data cache associated with the given one of the plurality of workflow nodes. Each of the plurality of workflow nodes is permitted to access a distributed filesystem that is visible to each of the plurality of compute nodes. The data cache stores data produced by the tasks of the given one of the plurality of workflow nodes.
摘要:
A computer implemented method for optimizing performance of workflow includes associating each of a plurality of workflow nodes in a workflow with a data cache and managing the data cache on a local storage device on one of one or more compute nodes. A scheduler can request execution of the tasks of a given one of the plurality of workflow nodes on one of the one of more compute nodes that hosts the data cache associated with the given one of the plurality of workflow nodes. Each of the plurality of workflow nodes is permitted to access a distributed filesystem that is visible to each of the plurality of compute nodes. The data cache stores data produced by the tasks of the given one of the plurality of workflow nodes.
摘要:
A computer-implemented method of providing data transformation includes installing one or more data transformation plugins in a dataset made accessible for processing an end user's workload. A dataset-specific policy for the accessible dataset is ingested. A data transformation of the accessible dataset is executed by invoking one or more of the data transformation plugins to the accessible dataset based on the dataset-specific policy to generate a transformed dataset. The user's workload is deployed to provide data access for processing using the transformed dataset in accordance with a data governance policy.
摘要:
A computer-implemented method, a computer program product, and a computer system for determining optimal data access for deep learning applications on a cluster. A server determines candidate cache locations for one or more compute nodes in the cluster. The server fetches a mini-batch of a dataset located at a remote storage service into the candidate cache locations. The server collects information about time periods of completing a job on the one or more nodes, where the job is executed against fetched mini-batch at the candidate cache locations and the mini-batch at the remote storage location. The server selects, from the candidate cache locations and the remote storage location, a cache location. The server fetches the data of the dataset from the remote storage service to the cache location, and the one or more nodes execute the job against fetched data of the dataset at the cache location.
摘要:
A job execution scheduling system and associated methods are provided for accommodating a request for additional computing resources to execute a job that is currently being executed or a request for computing resources to execute a new job. The job execution scheduling system may utilize a decision function to determine one or more currently executing jobs to select for resizing. Resizing a currently executing job may include de-allocating one or more computing resources from the currently executing job and allocating the de-allocated resources to the job for which the request was received. In this manner, the request for additional computing resources is accommodated, while at the same time, the one or more jobs from which computing resources were de-allocated continue to be executed using a reduced set of computing resources.
摘要:
In an approach for storage, search, acquisition, and composition of a digital artifact, a processor obtains the digital artifact in a digital marketplace platform. The digital artifact is a collection of digital data with automatically generated and verifiable provenance and usage data. A processor transforms the digital artifact to define an access privilege. A processor shares the digital artifact in the digital marketplace platform by providing a view of a catalogue including the digital artifact. A processor authorizes a usage request based on the access privilege. A processor rewards a source of the digital artifact based on the usage of the digital artifact.
摘要:
Embodiments for providing intelligent data replication and distribution in a computing environment. Data access patterns of one or more queries issued to a plurality of data partitions may be forecasted. Data may be dynamically distributed and replicated to one or more existing data partitions or additional of the plurality of data partitions according to the forecasting.
摘要:
A job execution scheduling system and associated methods are provided for accommodating a request for additional computing resources to execute a job that is currently being executed or a request for computing resources to execute a new job. The job execution scheduling system may utilize a decision function to determine one or more currently executing jobs to select for resizing. Resizing a currently executing job may include de-allocating one or more computing resources from the currently executing job and allocating the de-allocated resources to the job for which the request was received. In this manner, the request for additional computing resources is accommodated, while at the same time, the one or more jobs from which computing resources were de-allocated continue to be executed using a reduced set of computing resources.