Patent search ap:("Google LLC") AND inv:"Chen Sun" Page 1

1.

发明申请
Systems and Methods for Improved Video Understanding 有权

公开(公告)号：US20240428586A1

公开(公告)日：2024-12-26

申请号：US18827088

申请日：2024-09-06

Applicant: Google LLC

Inventor： Anurag Arnab , Mostafa Dehghani , Georg Heigold , Chen Sun , Mario Lucic , Cordelia Luise Schmid

IPC: G06V20/40 , G06N20/00

Abstract: A computer-implemented method for classifying video data with improved accuracy includes obtaining, by a computing system comprising one or more computing devices, video data comprising a plurality of video frames; extracting, by the computing system, a plurality of spatiotemporal representations from the video data, the plurality of spatiotemporal representations comprising a representation of spatiotemporal information in the video data; providing, by the computing system, the plurality of spatiotemporal representations as input to a video understanding model, the video understanding model comprising a video transformer encoder model; and receiving, by the computing system, a classification output from the video understanding model.

2.

发明公开
ACTION LOCALIZATION IN VIDEOS USING LEARNED QUERIES 审中-公开

公开(公告)号：US20240346824A1

公开(公告)日：2024-10-17

申请号：US18634794

申请日：2024-04-12

Applicant: Google LLC

Inventor： Alexey Alexeevich Gritsenko , Xuehan Xiong , Josip Djolonga , Mostafa Dehghani , Chen Sun , Mario Lucic , Cordelia Luise Schmid , Anurag Arnab

IPC: G06V20/40 , G06T7/73 , G06V10/62 , G06V10/764 , G06V10/77 , G06V10/774 , G06V10/776 , G06V10/82

CPC classification number: G06V20/46 , G06T7/73 , G06V10/62 , G06V10/764 , G06V10/7715 , G06V10/774 , G06V10/776 , G06V10/82 , G06T2207/10016 , G06T2207/20081 , G06T2207/20084

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing action localization on an input video. In particular, a system maintains a set of query vectors and uses the input video and the set of query vectors to generate an action localization output for the input video. The action localization output includes, for each of one or more agents depicted in the video, data specifying, for each of one or more video frames in the video, a respective bounding box in the video frame that depicts the agent and a respective action from a set of actions that is being performed by the agent in the video frame.

3.

发明申请
Identifying Information Using Referenced Text 审中-公开

公开(公告)号：US20180232344A1

公开(公告)日：2018-08-16

申请号：US15950335

申请日：2018-04-11

Applicant: Google LLC

Inventor： Chen Sun , Yifan Xu

IPC: G06F17/22 , G06F17/30 , G06F17/24

CPC classification number: G06F17/2247 , G06F16/951 , G06F17/248

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining summary content for resources in a domain. In one aspect, a method includes accessing a first resource belonging to a particular domain, selecting an anchor in the first resource linking to a second resource belonging to the particular domain, identifying particular text content in the first resource that is subordinate to the anchor that the second resource includes the particular text content that is subordinate to the anchor, based on determining that the second resource includes the particular text content that is subordinate to the anchor, generating a domain template for the particular domain, the domain template specifying a location of the particular text content in the second resource, and determining, for each respective resource belonging to the particular domain having a structure matching the domain template, respective text content for the respective resource.

4.

发明授权
Multimodal image classifier using textual and visual embeddings 有权

公开(公告)号：US11907337B2

公开(公告)日：2024-02-20

申请号：US17046313

申请日：2019-11-18

Applicant: Google LLC

Inventor： Ariel Fuxman , Aleksei Timofeev , Zhen Li , Chun-Ta Lu , Manan Shah , Chen Sun , Krishnamurthy Viswanathan , Chao Jia

IPC: G06K9/62 , G06K9/46 , G06F18/24 , G06F18/214 , G06F18/2413

CPC classification number: G06F18/24 , G06F18/214 , G06F18/24147

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for realizing a multimodal image classifier. In an aspect, a method includes, for each image of a plurality of images: processing the image by a textual generator model to obtain a set of phrases that are descriptive of the content of the image, wherein each phrase is one or more terms, processing the set of phrases by a textual embedding model to obtain an embedding of predicted text for the image, and processing the image using an image embedding model to obtain an embedding of image pixels of the image. Then a multimodal image classifier is trained on the embeddings of predicted text for the images and the embeddings of image pixels for the images to produce, as output, labels of an output taxonomy to classify an image based on the image as input.

5.

发明授权
Action localization in images and videos using relational features 有权

公开(公告)号：US11163989B2

公开(公告)日：2021-11-02

申请号：US16637960

申请日：2019-08-06

Applicant: Google LLC

Inventor： Chen Sun , Abhinav Shrivastava , Cordelia Luise Schmid , Rahul Sukthankar , Kevin Patrick Murphy , Carl Martin Vondrick

IPC: G06K9/00 , G06K9/46 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing action localization in images and videos. In one aspect, a system comprises a data processing apparatus; a memory in data communication with the data processing apparatus and storing instructions that cause the data processing apparatus to perform image processing and video processing operations comprising: receiving an input comprising an image depicting a person; identifying a plurality of context positions from the image; determining respective feature representations of each of the context positions; providing a feature representation of the person and the feature representations of each of the context positions to a context neural network to obtain relational features, wherein the relational features represent relationships between the person and the context positions; and determining an action performed by the person using the feature representation of the person and the relational features.

6.

发明申请
Dense Video Object Captioning from Disjoint Vision 有权

公开(公告)号：US20250053753A1

公开(公告)日：2025-02-13

申请号：US18448508

申请日：2023-08-11

Applicant: Google LLC

Inventor： Xingyi Zhou , Anurag Arnab , Chen Sun , Cordelia Luise Schmid

IPC: G06F40/40 , G06T7/246 , G06V10/22 , G06V10/774 , G06V10/776 , G06V20/40

Abstract: Provided are a new task and model for dense video object captioning—detecting, tracking, and captioning trajectories of all objects in a video. This task unifies spatial and temporal understanding of the video, and requires fine-grained language description. Example implementations of the proposed model for dense video object captioning can be trained end-to-end and can include different models for spatial localization, tracking, and captioning. As such, some example implementations of the present disclosure can train the proposed model with a mixture of disjoint tasks, and leverage diverse, large-scale datasets which supervise different parts of an example proposed model. This results in noteworthy zero-shot performance.

7.

发明申请
Systems and Methods for Improved Video Understanding 有权

公开(公告)号：US20240428587A1

公开(公告)日：2024-12-26

申请号：US18827133

申请日：2024-09-06

Applicant: Google LLC

Inventor： Anurag Arnab , Mostafa Dehghani , Georg Heigold , Chen Sun , Mario Lucic , Cordelia Luise Schmid

IPC: G06V20/40 , G06N20/00

Abstract: A computer-implemented method for classifying video data with improved accuracy includes obtaining, by a computing system comprising one or more computing devices, video data comprising a plurality of video frames; extracting, by the computing system, a plurality of video tokens from the video data, the plurality of video tokens comprising a representation of spatiotemporal information in the video data; providing, by the computing system, the plurality of video tokens as input to a video understanding model, the video understanding model comprising a video transformer encoder model; and receiving, by the computing system, a classification output from the video understanding model.

8.

发明授权
Systems and methods for improved video understanding 有权

公开(公告)号：US12112538B2

公开(公告)日：2024-10-08

申请号：US17370522

申请日：2021-07-08

Applicant: Google LLC

Inventor： Anurag Arnab , Mostafa Dehghani , Georg Heigold , Chen Sun , Mario Lucic , Cordelia Luise Schmid

IPC: G06V20/40 , G06N20/00

CPC classification number: G06V20/41 , G06N20/00 , G06V20/46 , G06V20/49

Abstract: A computer-implemented method for classifying video data with improved accuracy includes obtaining, by a computing system comprising one or more computing devices, video data comprising a plurality of video frames; extracting, by the computing system, a plurality of video tokens from the video data, the plurality of video tokens comprising a representation of spatiotemporal information in the video data; providing, by the computing system, the plurality of video tokens as input to a video understanding model, the video understanding model comprising a video transformer encoder model; and receiving, by the computing system, a classification output from the video understanding model.

9.

发明公开
Identifying Information Using Referenced Text 审中-公开

公开(公告)号：US20230229714A1

公开(公告)日：2023-07-20

申请号：US18150739

申请日：2023-01-05

Applicant: Google LLC

Inventor： Chen Sun , Yifan Xu

IPC: G06F16/951 , G06F40/143 , G06F40/186

CPC classification number: G06F16/951 , G06F40/143 , G06F40/186

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining summary content for resources in a domain. In one aspect, a method includes accessing a first resource belonging to a particular domain, selecting an anchor in the first resource linking to a second resource belonging to the particular domain, identifying particular text content in the first resource that is subordinate to the anchor that the second resource includes the particular text content that is subordinate to the anchor, based on determining that the second resource includes the particular text content that is subordinate to the anchor, generating a domain template for the particular domain, the domain template specifying a location of the particular text content in the second resource, and determining, for each respective resource belonging to the particular domain having a structure matching the domain template, respective text content for the respective resource.

10.

发明授权
Identifying information using referenced text 有权

公开(公告)号：US11580177B2

公开(公告)日：2023-02-14

申请号：US17065256

申请日：2020-10-07

Applicant: Google LLC

Inventor： Chen Sun , Yifan Xu

IPC: G06F16/951 , G06F40/186 , G06F40/143

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining summary content for resources in a domain. In one aspect, a method includes accessing a first resource belonging to a particular domain, selecting an anchor in the first resource linking to a second resource belonging to the particular domain, identifying particular text content in the first resource that is subordinate to the anchor that the second resource includes the particular text content that is subordinate to the anchor, based on determining that the second resource includes the particular text content that is subordinate to the anchor, generating a domain template for the particular domain, the domain template specifying a location of the particular text content in the second resource, and determining, for each respective resource belonging to the particular domain having a structure matching the domain template, respective text content for the respective resource.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification