-
公开(公告)号:US20250148768A1
公开(公告)日:2025-05-08
申请号:US18937628
申请日:2024-11-05
Applicant: NEC Laboratories America, Inc.
Inventor: Kai Li , Deep Patel , Renqiang Min , Wentao Bao
Abstract: Methods and systems for action detection include encoding a text feature of an input textual description of an action using a visual language model (VLM). A video feature of an input video is encoded using the VLM. The action in the video is recognized, based on the text feature and the video feature, to localize the action within the video. A person performing the action is located within the video using the VLM.
-
公开(公告)号:US20250008132A1
公开(公告)日:2025-01-02
申请号:US18755150
申请日:2024-06-26
Applicant: NEC Laboratories America, Inc.
Inventor: Biplob Debnath , Deep Patel , Srimat Chakradhar , Christoph Reich
IPC: H04N19/33 , H04N19/124 , H04N19/176 , H04N19/186 , H04N19/625
Abstract: Systems and methods are provided for encoding and decoding images using differentiable JPEG compression, including converting images from RGB color space to YCbCr color space to obtain a luminance and chrominance channels, and applying chroma subsampling to the chrominance channels to reduce resolution. The YCbCr image is divided into pixel blocks and a DCT is performed on the pixel blocks to obtain DCT coefficients. DCT coefficients are quantized using a scaled quantization table to reduce precision, and quantized DCT coefficients are encoded using lossless entropy coding, forming a compressed JPEG file decoded by reversing the lossless entropy coding to obtain quantized DCT coefficients, which are dequantized using the scaled quantization table to restore the precision. The dequantized DCT coefficients are converted back to a spatial domain using an IDCT, the chrominance channels are upsampled to original resolution, and the YCbCr image is converted back to the RGB color space.
-
公开(公告)号:US20240161473A1
公开(公告)日:2024-05-16
申请号:US18504469
申请日:2023-11-08
Applicant: NEC Laboratories America, Inc.
Inventor: Kai Li , Deep Patel , Erik Kruus , Renqiang Min
IPC: G06V10/774 , G06V10/75 , G06V20/40 , G16H15/00
CPC classification number: G06V10/7753 , G06V10/751 , G06V20/44 , G16H15/00
Abstract: Methods and systems for training a model include performing spatial augmentation on an unlabeled input video to generate spatially augmented video. Temporal augmentation is performed on the input video to generate temporally augmented video. Predictions are generated, using a model that was pre-trained on a labeled dataset, for the unlabeled input video, the spatially augmented video, and the temporally augmented video. Parameters of the model are adapted using the predictions while enforcing temporal consistency, temporal consistency, and historical consistency. The model may be used for action recognition in a healthcare context, with recognition results being used for determining whether patients are performing a rehabilitation exercise correctly.
-
公开(公告)号:US20240378892A1
公开(公告)日:2024-11-14
申请号:US18654620
申请日:2024-05-03
Applicant: NEC Laboratories America, Inc.
Inventor: Iain Melvin , Alexandru Niculescu-Mizil , Deep Patel
Abstract: Systems and methods for optimizing multi-camera multi-entity artificial intelligence tracking systems. Visual and location information of entities from video feeds received from multiple cameras can be obtained by employing an entity detection model and re-identification model. Likelihood scores that entity detections belong to an entity track can be predicted from the visual and location information. The entity detections predicted into entity tracks can be processed by employing combinatorial optimization of the likelihood scores by identifying assumptions from the likelihood scores, entity detections, and the entity tracks, filtering the assumptions with unsatisfiable problems to obtain a filtered assumptions set, and optimizing an answer set by utilizing the filtered assumptions set and the likelihood scores to maximize an overall score and obtain optimized entity tracks. Multiple entities can be monitored by utilizing the optimized entity tracks.
-
公开(公告)号:US20240161313A1
公开(公告)日:2024-05-16
申请号:US18505732
申请日:2023-11-09
Applicant: NEC Laboratories America, Inc.
Inventor: Deep Patel , Alexandru Niculescu-Mizil , Iain Melvin , Seonghyeon Moon
CPC classification number: G06T7/248 , G06T7/292 , G06V10/82 , G06V20/41 , G06V20/52 , G06V40/10 , G16H15/00 , G16H40/67 , G06T2207/10016 , G06T2207/20081 , G06T2207/20084 , G06T2207/30004 , G06T2207/30196 , G06V2201/03
Abstract: Methods and systems for tracking movement include performing person detection in frames from multiple video streams to identify detection images. Visual and location information from the detection images are combined to generate scores for pairs of detection images across the multiple video streams and across frames of respective video streams. A pairwise detection graph is generated using the detection images as nodes and the scores as weighted edges. Movement of an individual is tracked based a constrained answer set programming problem, with constraints determined based on matching scores and logical assumptions. An action responsive to the tracked movement is performed. Tracking of movement of a patient in a healthcare facility can be used to inform treatment decisions by healthcare professionals.
-
公开(公告)号:US20250148624A1
公开(公告)日:2025-05-08
申请号:US18934512
申请日:2024-11-01
Applicant: NEC Laboratories America, Inc.
Inventor: Deep Patel , Iain Melvin , Alexandru Niculescu-Mizil
Abstract: Systems and methods for a multi-entity tracking transformer model (MCTR). To train the MCTR, processing track embeddings and detection embeddings of video feeds obtained from multiple cameras to generate updated track embeddings with a tracking module. The updated track embeddings can be associated with the detection embeddings to generate track-detection associations (TDA) for each camera view and camera frame with an association module. A cost module can calculate a differentiable loss from the TDA by combining a detection loss, a track loss and an auxiliary track loss. A model trainer can train the MCTR using the differentiable loss and contiguous video segments sampled from a training dataset to track multiple objects with multiple cameras.
-
公开(公告)号:US20240046606A1
公开(公告)日:2024-02-08
申请号:US18363175
申请日:2023-08-01
Applicant: NEC Laboratories America, Inc.
Inventor: Kai Li , Renqiang Min , Deep Patel , Erik Kruus , Xin Hu
IPC: G06V10/62 , G06V20/40 , G06V10/82 , G06V10/774 , G06V10/776 , G06V10/77
CPC classification number: G06V10/62 , G06V20/41 , G06V20/46 , G06V10/82 , G06V10/774 , G06V10/776 , G06V10/7715
Abstract: Methods and systems for temporal action localization include processing a video stream to identify an action and a start time and a stop time for the action using a neural network model that separately processes information of appearance and motion modalities from the video stream using transformer branches that include a self-attention and a cross-attention between the appearance and motion modalities. An action is performed responsive to the identified action.
-
公开(公告)号:US20240275996A1
公开(公告)日:2024-08-15
申请号:US18439291
申请日:2024-02-12
Applicant: NEC Laboratories America, Inc.
Inventor: Biplob Debnath , Deep Patel , Srimat Chakradhar , Oliver Po , Christoph Reich
IPC: H04N19/42 , G06N20/00 , H04N7/18 , H04N19/119 , H04N19/124 , H04N19/14 , H04N19/154 , H04N19/156 , H04N19/172 , H04N19/176 , H04N19/177 , H04N19/463 , H04N19/61
CPC classification number: H04N19/42 , G06N20/00 , H04N7/183 , H04N19/119 , H04N19/124 , H04N19/14 , H04N19/154 , H04N19/156 , H04N19/172 , H04N19/176 , H04N19/177 , H04N19/463 , H04N19/61
Abstract: Systems and methods are provided for optimizing video compression using end-to-end learning, including capturing, using an edge device, raw video frames from a video clip and determining maximum network bandwidth. Predicting, using a control network implemented on the edge device, optimal codec parameters, based on dynamic network conditions and content of the video clip, encoding, using a differentiable surrogate model of a video codec, the video clip using the predicted codec parameters and to propagate gradients from a server-side vision model to adjust the codec parameters. Decoding, using a server, the video clip and analyzing the video clip with a deep vision model located on the server, transmitting, using a feedback mechanism, analysis from the deep vision model back to the control network to facilitate end-to-end training of the system. Adjusting the encoding parameters based on the analysis from the deep vision model received from the feedback mechanism.
-
公开(公告)号:US20240275983A1
公开(公告)日:2024-08-15
申请号:US18439341
申请日:2024-02-12
Applicant: NEC Laboratories America, Inc.
Inventor: Biplob Debnath , Christoph Reich , Deep Patel , Srimat Chakradhar
IPC: H04N19/146 , G06V20/40 , G06V20/58 , H04N19/124 , H04N19/154
CPC classification number: H04N19/146 , G06V20/49 , G06V20/58 , H04N19/124 , H04N19/154
Abstract: Systems and methods are provided for optimizing video compression for remote vehicle control, including capturing, capturing video and sensor data from a vehicle using a plurality of sensors and high-resolution cameras, analyzing the captured video to identify critical regions within frames of the video using an attention-based module. Current network bandwidth is assessed and future bandwidth availability is predicted. Video compression parameters are predicted based on an analysis of the video and an assessment of the current network bandwidth using a control network, and the video is compressed based on the predicted parameters with an adaptive video compression module. The compressed video and sensor data is transmitted to a remote-control center, and received video and sensor data is decoded at the remote-control center. The vehicle is autonomously or remotely controlled from the remote-control center based on the decoded video and sensor data.
-
公开(公告)号:US20240273902A1
公开(公告)日:2024-08-15
申请号:US18439242
申请日:2024-02-12
Applicant: NEC Laboratories America, Inc.
Inventor: Deep Patel , Giovanni Milione , Kai Li , Farley Lai , Erik Kruus
IPC: G06V20/40 , G06V10/776 , G06V40/20 , G16H30/40
CPC classification number: G06V20/44 , G06V10/776 , G06V40/20 , G16H30/40
Abstract: Methods and systems of training a machine learning model include identifying an object or person related to an action in a first video. The object or person is copied from the first video to a second video to generate a third video. A machine learning model is trained using the first video and the third video.
-
-
-
-
-
-
-
-
-