Invention Grant
- Patent Title: Collecting multimodal image editing requests
-
Application No.: US16052246Application Date: 2018-08-01
-
Publication No.: US10769495B2Publication Date: 2020-09-08
- Inventor: Trung Huu Bui , Zhe Lin , Walter Wei-Tuh Chang , Nham Van Le , Franck Dernoncourt
- Applicant: Adobe Inc.
- Applicant Address: US CA San Jose
- Assignee: Adobe Inc.
- Current Assignee: Adobe Inc.
- Current Assignee Address: US CA San Jose
- Agency: SBMC
- Main IPC: G06K9/62
- IPC: G06K9/62 ; G06F3/16 ; G06F3/0488 ; G10L15/06 ; G06F9/451 ; G06F3/0482 ; G06F16/54 ; G06N3/08 ; G06N20/00 ; G06F3/0484

Abstract:
In implementations of collecting multimodal image editing requests (IERs), a user interface is generated that exposes an image pair including a first image and a second image including at least one edit to the first image. A user simultaneously speaks a voice command and performs a user gesture that describe an edit of the first image used to generate the second image. The user gesture and the voice command are simultaneously recorded and synchronized with timestamps. The voice command is played back, and the user transcribes their voice command based on the play back, creating an exact transcription of their voice command. Audio samples of the voice command with respective timestamps, coordinates of the user gesture with respective timestamps, and a transcription are packaged as a structured data object for use as training data to train a neural network to recognize multimodal IERs in an image editing application.
Public/Granted literature
- US20200042286A1 Collecting Multimodal Image Editing Requests Public/Granted day:2020-02-06
Information query