Generating modified digital images utilizing a multimodal selection model based on verbal and gesture input

Invention Grant

US10817713B2 Generating modified digital images utilizing a multimodal selection model based on verbal and gesture input 有权

Please log in to see more content

Patent Title: Generating modified digital images utilizing a multimodal selection model based on verbal and gesture input
Application No.: US16192573

Application Date: 2018-11-15
Publication No.: US10817713B2

Publication Date: 2020-10-27
Inventor: Trung Bui , Zhe Lin , Walter Chang , Nham Le , Franck Dernoncourt
Applicant: Adobe Inc.
Applicant Address: US CA San Jose
Assignee: ADOBE INC.
Current Assignee: ADOBE INC.
Current Assignee Address: US CA San Jose
Agency: Keller Jolley Preece
Main IPC: G06K9/00
IPC: G06K9/00 ; G06N3/04 ; G10L15/26 ; G10L15/25

Generating modified digital images utilizing a multimodal selection model based on verbal and gesture input

Abstract:

The present disclosure relates to systems, methods, and non-transitory computer readable media for generating modified digital images based on verbal and/or gesture input by utilizing a natural language processing neural network and one or more computer vision neural networks. The disclosed systems can receive verbal input together with gesture input. The disclosed systems can further utilize a natural language processing neural network to generate a verbal command based on verbal input. The disclosed systems can select a particular computer vision neural network based on the verbal input and/or the gesture input. The disclosed systems can apply the selected computer vision neural network to identify pixels within a digital image that correspond to an object indicated by the verbal input and/or gesture input. Utilizing the identified pixels, the disclosed systems can generate a modified digital image by performing one or more editing actions indicated by the verbal input and/or gesture input.

Public/Granted literature

US20200160042A1 GENERATING MODIFIED DIGITAL IMAGES UTILIZING A MULTIMODAL SELECTION MODEL BASED ON VERBAL AND GESTURE INPUT Public/Granted day:2020-05-21

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06K	图形数据读取（图像或视频识别或理解G06V）；数据的呈现；记录载体；处理记录载体
G06K9/00	识别模式的方法或装置（图形读取或将机械参数模式（例如力或存在）转换为电信号的方法或装置 G06K11/00）（图像或视频识别或理解 G06V）（语音识别 G10L15/00 )