Patent search ap:("Google LLC") AND inv:"Peyman Milanfar" Page 2

11.

发明申请
VIDEO CODING WITH DEGRADATION OF RESIDUALS 审中-公开

公开(公告)号：US20180302643A1

公开(公告)日：2018-10-18

申请号：US16010833

申请日：2018-06-18

Applicant: GOOGLE LLC

Inventor： Debargha Mukherjee , Shunyao Li , Peyman Milanfar

IPC: H04N19/52 , H04N19/182 , H04N19/176 , H04N19/107 , H04N19/86 , H04N19/91 , H04N19/59 , H04N19/44 , H04N19/82 , H04N19/124 , H04N19/61

Abstract: A method for encoding a video signal using a computing device, the video signal having a plurality of frames, each frame having a plurality of blocks, and each block having a plurality of pixels. The method includes generating a residual block from an original block of a current frame and a prediction block, degrading the residual block to decrease a bit-cost for encoding the residual block, and encoding the residual block into an encoded residual block.

12.

发明授权
Image upscaling 有权

公开(公告)号：US09996902B2

公开(公告)日：2018-06-12

申请号：US15000670

申请日：2016-01-19

Applicant: Google LLC

Inventor： Peyman Milanfar , Yaniv Romano

IPC: G06T3/40 , G06K9/46 , G06T5/00 , G06T5/50

CPC classification number: G06T3/4053 , G06K9/4604 , G06T3/40 , G06T3/403 , G06T5/003 , G06T5/50 , G06T2207/20024 , G06T2207/20081 , G06T2207/20221 , G06T2210/36

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for upscaling an image. One of the methods includes upscaling a low resolution image, creating first pixel subsets of the first upscaled image, creating second pixel subsets of a high resolution image, determining, for each subset in the pixel subsets, a value of a property of the pixel subset, determining, for each subset in the pixel subsets, a group of subsets to which the corresponding pixel subset belongs using the value of the property, and determining, for each of the groups of subsets, a filter to apply to each of the first pixel subsets that correspond to the pixel subsets in the group to create a final pixel subset that approximates the corresponding second pixel subset using the first pixel subset, a combination of all of the final pixel subsets representing a second upscaled image.

13.

发明申请
Multi-scale Transformer for Image Analysis 有权

公开(公告)号：US20250124537A1

公开(公告)日：2025-04-17

申请号：US18999336

申请日：2024-12-23

Applicant: Google LLC

Inventor： Junjie Ke , Feng Yang , Qifei Wang , Yilin Wang , Peyman Milanfar

IPC: G06T3/04 , G06T3/40 , G06T7/00

Abstract: The technology employs a patch-based multi-scale Transformer (300) that is usable with various imaging applications. This avoids constraints on image fixed input size and predicts the quality effectively on a native resolution image. A native resolution image (304) is transformed into a multi-scale representation (302), enabling the Transformer's self-attention mechanism to capture information on both fine-grained detailed patches and coarse-grained global patches. Spatial embedding (316) is employed to map patch positions to a fixed grid, in which patch locations at each scale are hashed to the same grid. A separate scale embedding (318) is employed to distinguish patches coming from different scales in the multiscale representation. Self-attention (508) is performed to create a final image representation. In some instances, prior to performing self-attention, the system may prepend a learnable classification token (322) to the set of input tokens.

14.

发明公开
MULTILAYER LAPLACIAN RESIZER FOR COMPUTER VISION SYSTEMS 审中-公开

公开(公告)号：US20240331091A1

公开(公告)日：2024-10-03

申请号：US18621434

申请日：2024-03-29

Applicant: Google LLC

Inventor： Hossein Talebi , Zhengzhong Tu , Peyman Milanfar

IPC: G06T5/20 , G06T5/50

CPC classification number: G06T5/20 , G06T5/50 , G06T2207/20024 , G06T2207/20212

Abstract: The technology provides an image resizer that is jointly trainable with neural network classification (recognition) models, and is designed to improve classification performance. Systems and method include applying an input image to a baseline resizer to obtain a default resized image, and applying the input image to a plurality of filters. Each respective filter in the plurality is configured to perform sub-band filtering on the input image to obtain a sub-band filtered result. This includes applying the sub-band filtered result to the baseline resizer to obtain a respective resized result, and also includes applying to the respective resized result a scaling parameter, a bias parameter, and a nonlinear function to obtain a respective filtered image. The process then combines the default resized image and the respective filtered images to generate a combined resized image.

15.

发明公开
METHODS, SYSTEMS, AND MEDIA FOR DETERMINING PERCEPTUAL QUALITY INDICATORS OF VIDEO CONTENT ITEMS 审中-公开

公开(公告)号：US20230319327A1

公开(公告)日：2023-10-05

申请号：US18021636

申请日：2022-06-08

Applicant: Google LLC

Inventor： Yilin Wang , Balineedu Adsumilli , Junjie Ke , Hossein Talebi , Joong Yim , Neil Birkbeck , Peyman Milanfar , Feng Yang

IPC: H04N21/234 , H04N19/154 , H04N21/466

CPC classification number: H04N21/23418 , H04N19/154 , H04N21/4668

Abstract: Methods, systems, and media for determining perceptual quality indicators of video content items are provided. In some embodiments, the method comprises: receiving a video content item; extracting a plurality of frames from the video content item; determining, using a first subnetwork of a deep neural network, a content quality indicator for each frame of the plurality of frames of the video content item; determining, using a second subnetwork of the deep neural network, a video distortion indicator for each frame of the plurality of frames of the video content item; determining, using a third subnetwork of the deep neural network, a compression sensitivity indicator for each frame of the plurality of frames of the video content item; generating a quality level for each frame of the plurality of frames of the video content item that concatenates the content quality indicator, the video distortion indicator, and the compression sensitivity indicator for that frame of the video content item; generating an overall quality level for video content item by aggregating the quality level of each frame of the plurality of frames; and causing a video recommendation to be presented based on the overall quality level of the video content item.

16.

发明申请
Super-Resolution Using Natural Handheld-Motion Applied to a User Device 有权

公开(公告)号：US20210304359A1

公开(公告)日：2021-09-30

申请号：US17263814

申请日：2019-08-06

Applicant: Google LLC

Inventor： Yi Hung Chen , Chia-Kai Liang , Bartlomiej Maciej Wronski , Peyman Milanfar , Ignacio Garcia Dorado

IPC: G06T3/40 , G06T5/50 , G06T7/33 , G06N7/00

Abstract: The present disclosure describes systems and techniques for creating a super-resolution image (122) of a scene captured by a user device (102). Natural handheld motion (110) introduces, across multiple frames (204, 206, 208) of an image of a scene, sub-pixel offsets that enable the use of super-resolution computations (210) to form color planes (212, 214, 216), which are accumulated (218) and combined (220) to create a super-resolution image (122) of the scene.

17.

发明申请
Attribute Recognition with Image-Conditioned Prefix Language Modeling 有权

公开(公告)号：US20250054322A1

公开(公告)日：2025-02-13

申请号：US18787616

申请日：2024-07-29

Applicant: Google LLC

Inventor： Keren Ye , Yicheng Zhu , Junjie Ke , Jiahui Yu , Leonidas John Guibas , Peyman Milanfar , Feng Yang

IPC: G06V20/70 , G06F40/279

Abstract: Systems and methods for attribute recognition can include obtaining an image and a text string. The text string can be processed with a language model to generate a set of candidate attributes based on sequence based prediction. The image and the candidate attributes can be processed with an image-text model to determine a likelihood that the respective candidate attribute is depicted in the image. The likelihood determination can then be utilized to determine a predicted attribute for the object of interest.

18.

发明授权
Multi-scale transformer for image analysis 有权

公开(公告)号：US12217382B2

公开(公告)日：2025-02-04

申请号：US18527528

申请日：2023-12-04

Applicant: Google LLC

Inventor： Junjie Ke , Feng Yang , Qifei Wang , Yilin Wang , Peyman Milanfar

IPC: G06K9/00 , G06T3/04 , G06T3/40 , G06T7/00

Abstract: The technology employs a patch-based multi-scale Transformer (300) that is usable with various imaging applications. This avoids constraints on image fixed input size and predicts the quality effectively on a native resolution image. A native resolution image (304) is transformed into a multi-scale representation (302), enabling the Transformer's self-attention mechanism to capture information on both fine-grained detailed patches and coarse-grained global patches. Spatial embedding (316) is employed to map patch positions to a fixed grid, in which patch locations at each scale are hashed to the same grid. A separate scale embedding (318) is employed to distinguish patches coming from different scales in the multiscale representation. Self-attention (508) is performed to create a final image representation. In some instances, prior to performing self-attention, the system may prepend a learnable classification token (322) to the set of input tokens.

19.

发明申请
Multi-Axis Vision Transformer 有权

公开(公告)号：US20250022269A1

公开(公告)日：2025-01-16

申请号：US18902546

申请日：2024-09-30

Applicant: Google LLC

Inventor： Yinxiao Li , Feng Yang , Peyman Milanfar , Han Zhang , Zhengzhong Tu , Hossein Talebi

IPC: G06V10/82 , G06V10/77

Abstract: Provided is an efficient and scalable attention model that can be referred to as multi-axis attention. Example implementations can include two aspects: blocked local and dilated global attention. These design choices allow global-local spatial interactions on arbitrary input resolutions with only linear complexity. The present disclosure also presents a new architectural element by effectively blending the proposed multi-axis attention model with convolutions. In addition, the present disclosure proposes a simple hierarchical vision backbone, example implementations of which can be referred to as MaxViT, by simply repeating the basic building block over multiple stages. Notably, MaxViT is able to “see” globally throughout the entire network, even in earlier, high-resolution stages.

20.

发明授权
Super-resolution using natural handheld-motion applied to a user device 有权

公开(公告)号：US12182967B2

公开(公告)日：2024-12-31

申请号：US17263814

申请日：2019-08-06

Applicant: Google LLC

Inventor： Yi Hung Chen , Chia-Kai Liang , Bartlomiej Maciej Wronski , Peyman Milanfar , Ignacio Garcia Dorado

IPC: G06T3/40 , G06N7/01 , G06T3/4053 , G06T5/50 , G06T7/33

Abstract: The present disclosure describes systems and techniques for creating a super-resolution image (122) of a scene captured by a user device (102). Natural handheld motion (110) introduces, across multiple frames (204, 206, 208) of an image of a scene, sub-pixel offsets that enable the use of super-resolution computations (210) to form color planes (212, 214, 216), which are accumulated (218) and combined (220) to create a super-resolution image (122) of the scene.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification