Collecting multimodal image editing requests

Invention Grant

US10769495B2 Collecting multimodal image editing requests 有权

Please log in to see more content

Patent Title: Collecting multimodal image editing requests
Application No.: US16052246

Application Date: 2018-08-01
Publication No.: US10769495B2

Publication Date: 2020-09-08
Inventor: Trung Huu Bui , Zhe Lin , Walter Wei-Tuh Chang , Nham Van Le , Franck Dernoncourt
Applicant: Adobe Inc.
Applicant Address: US CA San Jose
Assignee: Adobe Inc.
Current Assignee: Adobe Inc.
Current Assignee Address: US CA San Jose
Agency: SBMC
Main IPC: G06K9/62
IPC: G06K9/62 ; G06F3/16 ; G06F3/0488 ; G10L15/06 ; G06F9/451 ; G06F3/0482 ; G06F16/54 ; G06N3/08 ; G06N20/00 ; G06F3/0484

Abstract:

In implementations of collecting multimodal image editing requests (IERs), a user interface is generated that exposes an image pair including a first image and a second image including at least one edit to the first image. A user simultaneously speaks a voice command and performs a user gesture that describe an edit of the first image used to generate the second image. The user gesture and the voice command are simultaneously recorded and synchronized with timestamps. The voice command is played back, and the user transcribes their voice command based on the play back, creating an exact transcription of their voice command. Audio samples of the voice command with respective timestamps, coordinates of the user gesture with respective timestamps, and a transcription are packaged as a structured data object for use as training data to train a neural network to recognize multimodal IERs in an image editing application.

Public/Granted literature

US20200042286A1 Collecting Multimodal Image Editing Requests Public/Granted day:2020-02-06

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06K	图形数据读取（图像或视频识别或理解G06V）；数据的呈现；记录载体；处理记录载体
G06K9/00	识别模式的方法或装置（图形读取或将机械参数模式（例如力或存在）转换为电信号的方法或装置 G06K11/00）（图像或视频识别或理解 G06V）（语音识别 G10L15/00 )
G06K9/62	.应用电子设备进行识别的方法或装置