-
公开(公告)号:US20240420464A1
公开(公告)日:2024-12-19
申请号:US18742019
申请日:2024-06-13
Applicant: Tata Consultancy Services Limited
Inventor: Shruti Kunal KUNDE , Ravi Kumar SINGH , Chaman BANOLIA , Rekha SINGHAL , Balamuralidhar PURUSHOTHAMAN , Shailesh Shankar DESHPANDE
IPC: G06V20/10 , G06V10/26 , G06V10/762 , G06V10/764 , G06V10/766 , G06V10/77
Abstract: The disclosure addresses problems associated with a systematic integration of multi-modal data for effective training, and handling of large volume of data because of high resolution of the multiple modalities. Embodiments herein provide a method and a system for a distributed training of a multi-modal data fusion transformer. Herein, a distributed training approach called a Distributed Architecture for Fusion-Transformer Training Acceleration (DAFTA) is proposed for processing large multimodal remote sensing data. DAFTA is enabled to handle any combination of remote sensing modalities. Additionally, similarity of feature space is leveraged to optimize the training process and to achieve the training with reduced data set which is equivalent to a complete data set. The proposed approach provides a systematic and efficient method for managing large sensing data and enables accurate and timely insights for various applications.
-
2.
公开(公告)号:US20230409967A1
公开(公告)日:2023-12-21
申请号:US18140219
申请日:2023-04-27
Applicant: Tata Consultancy Services Limited
Inventor: Dheeraj CHAHAL , Surya Chaitanya Venkata PALEPU , Mayank MISHRA , Ravi Kumar SINGH , Rekha SINGHAL
IPC: G06N20/00
CPC classification number: G06N20/00
Abstract: State of the art methods require size of DL model, or its gradients be less than maximum data item size of storage used as a communication channel for model training with serverless platform. Embodiments of the present disclosure provide method and system for training large DL models via serverless architecture using communication channel when the gradients are larger than maximum size of one data item allowed by the channel. Gradients that are generated by each worker during current training instance, are chunked into segments and stored in the communication channel. Corresponding segments of each worker are aggregated by aggregators and stored back. Each of the aggregated corresponding segments are read by each worker to generate an aggregated model to be used during successive training instance. Optimization techniques are used for reading-from and writing-to the channel resulting in significant improvement in performance and cost of training.
-