-
公开(公告)号:US20230239499A1
公开(公告)日:2023-07-27
申请号:US18011922
申请日:2022-05-27
Applicant: Google LLC
Inventor: Mohammad Babaeizadeh , Chelsea Breanna Finn , Dumitru Erhan , Mohammad Taghi Saffar , Sergey Vladimir Levine , Suraj Nair
IPC: H04N19/59 , H04N19/117 , H04N19/176 , H04N19/42 , G06V10/70
CPC classification number: H04N19/59 , H04N19/117 , H04N19/176 , H04N19/42 , G06V10/70
Abstract: One aspect provides a machine-learned video prediction model configured to receive and process one or more previous video frames to generate one or more predicted subsequent video frames, wherein the machine-learned video prediction model comprises a convolutional variational auto encoder, and wherein the convolutional variational auto encoder comprises an encoder portion comprising one or more encoding cells and a decoder portion comprising one or more decoding cells.
-
2.
公开(公告)号:US20240289981A1
公开(公告)日:2024-08-29
申请号:US18173557
申请日:2023-02-23
Applicant: Google LLC
Inventor: Wei-Cheng Kuo , Fred Bertsch , Wei Li , Anthony J. Piergiovanni , Mohammad Taghi Saffar , Anelia Angelova
IPC: G06T7/73 , G06F40/126 , G06F40/40 , G06V10/77 , G06V10/80
CPC classification number: G06T7/73 , G06F40/126 , G06F40/40 , G06V10/7715 , G06V10/806
Abstract: Generally, the disclosure is directed to generalized objected location, where the located object is in accordance to a natural language (NL) query. More specifically, the embodiments include a unified generalized visual localization architecture. The architecture achieves enhanced performance on the following three tasks: referring expression comprehension, object localization, and object detection. The embodiments employ machine-learned NL models and/or image models. The architecture is enabled to understand and answer natural localization questions towards an image, to output multiple boxes, provide no output if the object is not present (e.g., a null result), as well as, solve general detection tasks.
-