Abstract:
The disclosure herein describes a method and a system for message based communication and failure recovery for FPGA middleware framework. A combination of FPGA and middleware framework provides a high throughput, low latency messaging and can reduce development time as most of the components can be re-used. Further the message based communication architecture built on a FPGA framework performs middleware activities that would enable reliable communication using TCP/UDP between different platforms regardless of their deployment. The proposed FPGA middleware framework provides for reliable communication of UDP based on TCP as well as failure recovery with minimum latency during a failover of an active FPGA framework during its operation, by using a passive FPGA in real-time and dynamic synchronization with the active FPGA.
Abstract:
This disclosure relates generally to methods and systems for providing exactly-once transaction semantics for fault tolerant FPGA based transaction systems. The systems comprise middleware components in a server as well as client end. The server comprises Hosts and FPGAs. The FPGAs control transaction execution (the application processing logic also resides in the FPGA) and provide fault tolerance with high performance by means of a modified TCP implementation. The Hosts buffer and persist transaction records for failure recovery and achieving exactly-once transaction semantics. The monitoring and fault detecting components are distributed across the FPGAs and Hosts. Exactly-once transaction semantics is implemented without sacrificing performance by switching between a high performance mode and a conservative mode depending on component failures. PCIE switches for connectivity between FPGAs and Hosts ensure FPGAs are available even if Hosts fail. When FPGAs provide higher processing elements and memory, the Hosts may be eliminated.
Abstract:
This disclosure relates generally to a method and system for latency optimized heterogeneous deployment of convolutional neural network (CNN). State-of-the-art methods for optimal deployment of convolutional neural network provide a reasonable accuracy. However, for unseen networks the same level of accuracy is not attained. The disclosed method provides an automated and unified framework for the convolutional neural network (CNN) that optimally partitions the CNN and maps these partitions to hardware accelerators yielding a latency optimized deployment configuration. The method provides an optimal partitioning of the CNN for deployment on heterogeneous hardware platforms by searching network partition and hardware pair optimized for latency while including communication cost between hardware. The method employs performance model-based optimization algorithm to optimally deploy components of a deep learning pipeline across right heterogeneous hardware for high performance.
Abstract:
Disclosed is a system and method for parallelizing grid search technique facilitating determination of PK-PD parameters. The method may comprise determining number of grids. The method may further comprise creating grid points based upon the number of grids (N) and a number of parameters (p). The method may further comprise distributing the grid points amongst number of threads. The method may further comprise evaluating an objective function value corresponding to each grid point in order to compute an objective function value associated with each of the grid points. Further, the method may comprise identifying a grid point having minimum objective function value. The grid point having the least objective function value may indicate the estimated initial PK-PD parameters.
Abstract:
State of the art techniques used for document processing and particularly for handling processing of images for data extraction have the disadvantage that they have large computational load and memory footprint. The disclosure herein generally relates to text processing, and, more particularly, to a method and system for generating a data model for text extraction from documents. The system prunes a pretrained base model using a Lottery Ticket Hypothesis (LTH) algorithm, to generate a LTH pruned data model. The system further trims the LTH pruned data model to obtain a structured pruned data model, which involves discarding filters that have filter sparsity exceeding a threshold of filter sparsity. The structured pruned data model is then trained from a teacher model in a Knowledge Distillation algorithm, wherein a resultant data model obtained after training the structured pruned data model forms the data model for text detection.
Abstract:
The present disclosure provides systems and methods for performance evaluation of Input/Output (I/O) intensive enterprise applications. Representative workloads may be generated for enterprise applications using synthetic benchmarks that can be used across multiple platforms with different storage systems. I/O traces are captured for an application of interest at low concurrencies and features that affect performance significantly are extracted, fed to a synthetic benchmark and replayed on a target system thereby accurately creating the same behavior of the application. Statistical methods are used to extrapolate the extract features to predict performance at higher concurrency level without generating traces at those concurrency levels. The method does not require deploying the application or database on the target system since performance of system is dependent on access patterns instead of actual data. Identical access patterns are re-created using only replica of database files of the same size as in the real database.
Abstract:
State of the art techniques provide dedicated High-Level Synthesis (HLS) performance estimator tools that can give insights on performance bottlenecks, stall rate, stall cause etc., in HLS designs. These estimators often limit themselves to simple loop topologies and limited pragma use which makes them unreliable for large designs with complex datapaths. Embodiments herein provide a method and system for non-intrusive profiling for high-level synthesis HLS based applications. The method provides a cycle-accurate, fine-grained performance profiling framework that is non-intrusive and provides an end-to-end profile of the design. Such profiling tool can help the designer/DSE tool to quickly identify the performance bottlenecks and have a guided approach towards tuning it.
Abstract:
This disclosure relates generally to data meta model and meta file generation for feature engineering and training of machine learning models thereof. Conventional methods do not facilitate appropriate relevant data identification for feature engineering and also do not implement standardization for use of solution across domains. Embodiments of the present disclosure provide systems and methods wherein datasets from various sources/domains are utilized for meta file generation that is based on mapping of the dataset with a data meta model based on the domains, the meta file comprises meta data and information pertaining to action(s) being performed. Further functions are generated using the meta file and the functions are assigned to corresponding data characterized in the meta file. Further functions are invoked to generate feature vector set and machine learning model(s) are trained using the features vector set. Implementation of the generated data meta-model enables re-using of feature engineering code.
Abstract:
Systems and methods for benchmark based cross platform service demand prediction includes generation of performance mimicking benchmarks that require only application level profiling and provide a representative value of service demand of an application under consideration on a production platform, thereby eliminating need for actually deploying the application under consideration on a production platform. The PMBs require only a representative estimate of service demand of the application under test and can be reused to represent multiple applications. The PMBs are generated based on a skeletal benchmark corresponding to the technology stack used by the application under test and an input file generated based on application profiling that provides pre-defined lower level method calls, data flow sequences between multi-tiers of the application under test and send and receive network calls made by the application under consideration.
Abstract:
A method and system is provided for pre-deployment performance estimation of input-output intensive workloads. Particularly, the present application provides a method and system for predicting the performance of input-output intensive distributed enterprise application on multiple storage devices without deploying the application and the complete database in the target environment. The present method comprises of generating the input-output traces of an application on a source system with varying concurrencies; replaying the generated traces from the source system on a target system where application needs to be migrated; gathering performance data in the form of resource utilization, through-put and response time from the target system; extrapolating the data gathered from the target system in order to accurately predict the performance of multi-threaded input-output intensive applications in the target system for higher concurrencies.