Abstract:
Performance prediction systems and method of an Internet of Things (IoT) platform and applications includes obtaining input(s) comprising one of (i) user requests and (ii) sensor observations from sensor(s); invoking Application Programming Interface (APIs) of the platform based on input(s); identifying open flow (OF) and closed flow (CF) requests of system(s) connected to the platform; identifying workload characteristics of the OF and CF requests to obtain segregated OF and segregated CF requests, and a combination of open and closed flow requests; executing performance tests with the APIs based on the workload characteristics; measuring resource utilization of the system(s) and computing service demands of resource(s) from measured utilization, and user requests processed by the platform per unit time; executing the performance tests with the invoked APIs based on volume of workload characteristics pertaining to the application(s); and predicting, using queuing network, performance of the application(s) for the volume of workload characteristics.
Abstract:
The present disclosure discloses a method and system for non-uniform intensity mapping using a high performance enterprise computing system with enhanced precision cooling, enabling extended over-clocking and over-voltage operation. A Kalman filter embedded in the processor predicts and corrects the input data flux for real-time use by taking care of over-clocking and over-voltage.
Abstract:
The present disclosure discloses a method and system for non-uniform intensity mapping using a high performance enterprise computing system with enhanced precision cooling, enabling extended over-clocking and over-voltage operation. A Kalman filter embedded in the processor predicts and corrects the input data flux for real-time use by taking care of over-clocking and over-voltage.
Abstract:
Works in the literature fail to leverage embedding access patterns and memory units' access/storage capabilities, which when combined can yield high-speed heterogeneous systems by dynamically re-organizing embedding tables partitions across hardware during inference. A method and system for optimal deployment of embeddings tables across heterogeneous memory architecture for high-speed recommendations inference is disclosed, which dynamically partitions and organizes embedding tables across fast memory architectures to reduce access time. Partitions are chosen to take advantage of the past access patterns of those tables to ensure that frequently accessed data is available in the fast memory most of the time. Partition and replication is used to co-optimize memory access time and resources. Dynamic organization of embedding tables changes location of embedding, hence needs an efficient mechanism to track if a required embedding is present in the fast memory with its current address for faster look-up, which is performed using spline-based learned index.
Abstract:
The present disclosure provides systems and methods for performance evaluation of Input/Output (I/O) intensive enterprise applications. Representative workloads may be generated for enterprise applications using synthetic benchmarks that can be used across multiple platforms with different storage systems. I/O traces are captured for an application of interest at low concurrencies and features that affect performance significantly are extracted, fed to a synthetic benchmark and replayed on a target system thereby accurately creating the same behavior of the application. Statistical methods are used to extrapolate the extract features to predict performance at higher concurrency level without generating traces at those concurrency levels. The method does not require deploying the application or database on the target system since performance of system is dependent on access patterns instead of actual data. Identical access patterns are re-created using only replica of database files of the same size as in the real database.
Abstract:
This disclosure relates generally to correlation filters, and more particularly to designing of correlation filter. In one embodiment, a system for designing a correlation filter in a multi-processor system includes a multi-core processor coupled to a first memory and one or more co-processors coupled to one or more respective second memories. The multi-core processor partitions each of a plurality of frames associated with media content into a plurality of pixel-columns, and systematically stores said pixel-columns width-wise in a plurality of temporary matrices by a plurality of threads of the multi-core processor. The plurality of temporary matrices are transferred by the multi-core processor to one or more respective second memories in a plurality of streams simultaneously in an asynchronous mode. A plurality of filter harmonics of the correlation filter are computed by performing compute operations involving at least the plurality of temporary matrices, to obtain the correlation filter.
Abstract:
The present disclosure generally relates to a system and method for predicting performance of a multi-threaded application, and particularly, to a system and method for predicting performance of the multi-threaded application in the presence of resource bottlenecks. In one embodiment, a system for predicting performance of a multi-threaded software application is disclosed. The system may include one or more processors and a memory storing processor-executable instructions for configuring a processor to: represent one or more queuing networks corresponding to resources, the resources being employed to run the multi-threaded application; detect, based on the one or more queuing networks, a concurrency level associated with encountering of a first resource bottleneck; determine, based on the concurrency level, performance metrics associated with the multi-threaded application; and predict the performance of the multi-threaded application based on the performance metrics.
Abstract:
This disclosure relates generally to methods and systems for providing exactly-once transaction semantics for fault tolerant FPGA based transaction systems. The systems comprise middleware components in a server as well as client end. The server comprises Hosts and FPGAs. The FPGAs control transaction execution (the application processing logic also resides in the FPGA) and provide fault tolerance with high performance by means of a modified TCP implementation. The Hosts buffer and persist transaction records for failure recovery and achieving exactly-once transaction semantics. The monitoring and fault detecting components are distributed across the FPGA's and Hosts. Exactly-once transaction semantics is implemented without sacrificing performance by switching between a high performance mode and a conservative mode depending on component failures. PCIE switches for connectivity between FPGAs and Hosts ensure FPGAs are available even if Hosts fail. When FPGA's provide higher processing elements and memory, the Hosts may be eliminated.
Abstract:
A method and system is provided for pre-deployment performance estimation of input-output intensive workloads. Particularly, the present application provides a method and system for predicting the performance of input-output intensive distributed enterprise application on multiple storage devices without deploying the application and the complete database in the target environment. The present method comprises of generating the input-output traces of an application on a source system with varying concurrencies; replaying the generated traces from the source system on a target system where application needs to be migrated; gathering performance data in the form of resource utilization, through-put and response time from the target system; extrapolating the data gathered from the target system in order to accurately predict the performance of multi-threaded input-output intensive applications in the target system for higher concurrencies.
Abstract:
Disclosed is a system and method for parallelizing grid search technique facilitating determination of PK-PD parameters. The method may comprise determining number of grids. The method may further comprise creating grid points based upon the number of grids (N) and a number of parameters (p). The method may further comprise distributing the grid points amongst number of threads. The method may further comprise evaluating an objective function value corresponding to each grid point in order to compute an objective function value associated with each of the grid points. Further, the method may comprise identifying a grid point having minimum objective function value. The grid point having the least objective function value may indicate the estimated initial PK-PD parameters.