Patent search ap:("GOOGLE LLC") AND inv:"Yilin Wang" Page 1

1.

发明授权
Multi-scale transformer for image analysis 有权

公开(公告)号：US12217382B2

公开(公告)日：2025-02-04

申请号：US18527528

申请日：2023-12-04

Applicant: Google LLC

Inventor： Junjie Ke , Feng Yang , Qifei Wang , Yilin Wang , Peyman Milanfar

IPC: G06K9/00 , G06T3/04 , G06T3/40 , G06T7/00

Abstract: The technology employs a patch-based multi-scale Transformer (300) that is usable with various imaging applications. This avoids constraints on image fixed input size and predicts the quality effectively on a native resolution image. A native resolution image (304) is transformed into a multi-scale representation (302), enabling the Transformer's self-attention mechanism to capture information on both fine-grained detailed patches and coarse-grained global patches. Spatial embedding (316) is employed to map patch positions to a fixed grid, in which patch locations at each scale are hashed to the same grid. A separate scale embedding (318) is employed to distinguish patches coming from different scales in the multiscale representation. Self-attention (508) is performed to create a final image representation. In some instances, prior to performing self-attention, the system may prepend a learnable classification token (322) to the set of input tokens.

2.

发明公开
Multi-scale Transformer for Image Analysis 审中-公开

公开(公告)号：US20240119555A1

公开(公告)日：2024-04-11

申请号：US18527528

申请日：2023-12-04

Applicant: Google LLC

Inventor： Junjie Ke , Feng Yang , Qifei Wang , Yilin Wang , Peyman Milanfar

IPC: G06T3/00 , G06T3/40 , G06T7/00

CPC classification number: G06T3/0012 , G06T3/40 , G06T7/0002 , G06T2207/20016 , G06T2207/20081 , G06T2207/30168

Abstract: The technology employs a patch-based multi-scale Transformer (300) that is usable with various imaging applications. This avoids constraints on image fixed input size and predicts the quality effectively on a native resolution image. A native resolution image (304) is transformed into a multi-scale representation (302), enabling the Transformer's self-attention mechanism to capture information on both fine-grained detailed patches and coarse-grained global patches. Spatial embedding (316) is employed to map patch positions to a fixed grid, in which patch locations at each scale are hashed to the same grid. A separate scale embedding (318) is employed to distinguish patches coming from different scales in the multiscale representation. Self-attention (508) is performed to create a final image representation. In some instances, prior to performing self-attention, the system may prepend a learnable classification token (322) to the set of input tokens.

3.

发明申请
Systems and Techniques for Retraining Models for Video Quality Assessment and for Transcoding Using the Retrained Models 有权

公开(公告)号：US20220415039A1

公开(公告)日：2022-12-29

申请号：US17762289

申请日：2019-11-26

Applicant: Google LLC

Inventor： Yilin Wang , Hossein Talebi , Peyman Milanfar , Feng Yang , Balineedu Adsumilli

IPC: G06V10/98 , G06V10/82 , G06V20/40 , G06N3/04

Abstract: A trained model is retrained for video quality assessment and used to identify sets of adaptive compression parameters for transcoding user generated video content. Using transfer learning, the model, which is initially trained for image object detection, is retrained for technical content assessment and then again retrained for video quality assessment. The model is then deployed into a transcoding pipeline and used for transcoding an input video stream of user generated content. The transcoding pipeline may be structured in one of several ways. In one example, a secondary pathway for video content analysis using the model is introduced into the pipeline, which does not interfere with the ultimate output of the transcoding should there be a network or other issue. In another example, the model is introduced as a library within the existing pipeline, which would maintain a single pathway, but ultimately is not expected to introduce significant latency.

4.

发明授权
Systems and techniques for retraining models for video quality assessment and for transcoding using the retrained models 有权

公开(公告)号：US12230024B2

公开(公告)日：2025-02-18

申请号：US17762289

申请日：2019-11-26

Applicant: Google LLC

Inventor： Yilin Wang , Hossein Talebi , Peyman Milanfar , Feng Yang , Balineedu Adsumilli

IPC: G06V10/98 , G06N3/045 , G06V10/82 , G06V20/40

Abstract: A trained model is retrained for video quality assessment and used to identify sets of adaptive compression parameters for transcoding user generated video content. Using transfer learning, the model, which is initially trained for image object detection, is retrained for technical content assessment and then again retrained for video quality assessment. The model is then deployed into a transcoding pipeline and used for transcoding an input video stream of user generated content. The transcoding pipeline may be structured in one of several ways. In one example, a secondary pathway for video content analysis using the model is introduced into the pipeline, which does not interfere with the ultimate output of the transcoding should there be a network or other issue. In another example, the model is introduced as a library within the existing pipeline, which would maintain a single pathway, but ultimately is not expected to introduce significant latency.

5.

发明授权
Multi-scale transformer for image analysis 有权

公开(公告)号：US11887270B2

公开(公告)日：2024-01-30

申请号：US17787699

申请日：2021-07-01

Applicant: Google LLC

Inventor： Junjie Ke , Feng Yang , Qifei Wang , Yilin Wang , Peyman Milanfar

IPC: G06K9/00 , G06T3/00 , G06T3/40 , G06T7/00

CPC classification number: G06T3/0012 , G06T3/40 , G06T7/0002 , G06T2207/20016 , G06T2207/20081 , G06T2207/30168

Abstract: The technology employs a patch-based multi-scale Transformer (300) that is usable with various imaging applications. This avoids constraints on image fixed input size and predicts the quality effectively on a native resolution image. A native resolution image (304) is transformed into a multi-scale representation (302), enabling the Transformer's self-attention mechanism to capture information on both fine-grained detailed patches and coarse-grained global patches. Spatial embedding (316) is employed to map patch positions to a fixed grid, in which patch locations at each scale are hashed to the same grid. A separate scale embedding (318) is employed to distinguish patches coming from different scales in the multiscale representation. Self-attention (508) is performed to create a final image representation. In some instances, prior to performing self-attention, the system may prepend a learnable classification token (322) to the set of input tokens.

6.

发明申请
Debanding Using A Novel Banding Metric 有权

公开(公告)号：US20230131228A1

公开(公告)日：2023-04-27

申请号：US17922531

申请日：2020-05-19

Applicant: Google LLC

Inventor： Yilin Wang , Balineedu Adsumilli , Feng Yang

IPC: G06T5/00 , G06T5/20 , G06T7/13 , G06V10/56 , G06V10/74

Abstract: A method includes training a first model to measure the banding artefacts, training a second model to deband the image, and generating a debanded image for the image using the second model. Training the first model can include selecting a first set of first training images, generating a banding edge map for a first training image, where the map includes weights that emphasize banding edges and de-emphasize true edges in the first training image, and using the map and a luminance plane of the first training image as input to the first model. Training the second model can include selecting a second set of second training images, generating a debanded training image for a second training image, generating a banding score for the debanded training image using the first model, and using the banding score in a loss function used in training the second model.

7.

发明授权
Methods, systems, and media for determining perceptual quality indicators of video content items 有权

公开(公告)号：US12206914B2

公开(公告)日：2025-01-21

申请号：US18021636

申请日：2022-06-08

Applicant: Google LLC

Inventor： Yilin Wang , Balineedu Adsumilli , Junjie Ke , Hossein Talebi , Joong Yim , Neil Birkbeck , Peyman Milanfar , Feng Yang

IPC: H04N21/266 , G06N3/045 , H04N17/02 , H04N19/154 , H04N21/234 , H04N21/434 , H04N21/44 , H04N21/466

Abstract: Methods, systems, and media for determining perceptual quality indicators of video content items are provided. In some embodiments, the method comprises: receiving a video content item; extracting a plurality of frames from the video content item; determining, using a first subnetwork of a deep neural network, a content quality indicator for each frame of the plurality of frames of the video content item; determining, using a second subnetwork of the deep neural network, a video distortion indicator for each frame of the plurality of frames of the video content item; determining, using a third subnetwork of the deep neural network, a compression sensitivity indicator for each frame of the plurality of frames of the video content item; generating a quality level for each frame of the plurality of frames of the video content item that concatenates the content quality indicator, the video distortion indicator, and the compression sensitivity indicator for that frame of the video content item; generating an overall quality level for video content item by aggregating the quality level of each frame of the plurality of frames; and causing a video recommendation to be presented based on the overall quality level of the video content item.

8.

发明公开
OBTAINING VIDEO QUALITY SCORES FROM INCONSISTENT TRAINING QUALITY SCORES 审中-公开

公开(公告)号：US20240022726A1

公开(公告)日：2024-01-18

申请号：US17862571

申请日：2022-07-12

Applicant: GOOGLE LLC

Inventor： Yilin Wang , Balineedu Adsumilli

IPC: H04N19/13 , G06N20/00 , G06K9/62

CPC classification number: H04N19/13 , G06N20/00 , G06K9/6256

Abstract: A training dataset that includes a first dataset and a second dataset is received. The first dataset includes a first subset of first videos corresponding to a first context and respective first ground truth quality scores of the first videos, and the second dataset includes a second subset of second videos corresponding to a second context and respective second ground truth quality scores of the second videos. A machine learning model is trained to predict the respective first ground truth quality scores and the respective second ground truth quality scores. Training the model includes training it to obtain a global quality score for one of the videos; and training it to map the global quality score to context-dependent predicted quality scores. The context-dependent predicted quality scores include a first context-dependent predicted quality score corresponding to the first context and a second context-dependent predicted quality score corresponding to the second context.

9.

发明公开
MULTI-SCALE TRANSFORMER FOR IMAGE ANALYSIS 审中-公开

公开(公告)号：US20230222623A1

公开(公告)日：2023-07-13

申请号：US17787699

申请日：2021-07-01

Applicant: Google LLC

Inventor： Junjie Ke , Feng Yang , Qifei Wang , Yilin Wang , Peyman Milanfar

IPC: G06T3/00 , G06T3/40 , G06T7/00

CPC classification number: G06T3/0012 , G06T3/40 , G06T7/0002 , G06T2207/30168 , G06T2207/20081 , G06T2207/20016

Abstract: The technology employs a patch-based multi-scale Transformer (300) that is usable with various imaging applications. This avoids constraints on image fixed input size and predicts the quality effectively on a native resolution image. A native resolution image (304) is transformed into a multi-scale representation (302), enabling the Transformer's self-attention mechanism to capture information on both fine-grained detailed patches and coarse-grained global patches. Spatial embedding (316) is employed to map patch positions to a fixed grid, in which patch locations at each scale are hashed to the same grid. A separate scale embedding (318) is employed to distinguish patches coming from different scales in the multiscale representation. Self-attention (508) is performed to create a final image representation. In some instances, prior to performing self-attention, the system may prepend a learnable classification token (322) to the set of input tokens.

10.

发明申请
OPTIMAL FORMAT SELECTION FOR VIDEO PLAYERS BASED ON PREDICTED VISUAL QUALITY USING MACHINE LEARNING 有权

公开(公告)号：US20230054130A1

公开(公告)日：2023-02-23

申请号：US17790102

申请日：2019-12-31

Applicant: GOOGLE LLC

Inventor： Yilin Wang , Yue Guo , Balineedu Chowdary Adsumilli

IPC: H04N21/2343 , H04N19/154 , H04N19/40

Abstract: A system and methods are disclosed for optimal format selection for video players based on visual quality. The method includes generating a plurality of reference transcoded versions of a reference video, obtaining quality scores for frames of the plurality of reference transcoded versions of the reference video, generating a first training input comprising a set of color attributes, spatial attributes, and temporal attributes of the frames of the reference video, and generating a first target output for the first training input, wherein the first target output comprises the quality scores for the frames of the plurality of reference transcoded versions of the reference video. The method further includes providing the training data to train a machine learning model on (i) a set of training inputs comprising the first training input and (ii) a set of target outputs comprising the first target output.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification