Speaker thumbnail selection and speaker visualization in diarized transcripts for text-based video

Invention Grant

US12300272B2 Speaker thumbnail selection and speaker visualization in diarized transcripts for text-based video 有权

Please log in to see more content

Patent Title: Speaker thumbnail selection and speaker visualization in diarized transcripts for text-based video
Application No.: US17967697

Application Date: 2022-10-17
Publication No.: US12300272B2

Publication Date: 2025-05-13
Inventor: Lubomira Assenova Dontcheva , Xue Bai , Aseem Omprakash Agarwala , Joel Richard Brandt
Applicant: Adobe Inc.
Applicant Address: US CA San Jose
Assignee: Adobe Inc.
Current Assignee: Adobe Inc.
Current Assignee Address: US CA San Jose
Agency: Shook, Hardy & Bacon L.L.P.
Main IPC: G11B27/02
IPC: G11B27/02 ; G06V20/40 ; G06V40/16

Speaker thumbnail selection and speaker visualization in diarized transcripts for text-based video

Abstract:

Embodiments of the present invention provide systems, methods, and computer storage media for selection of the best image of a particular speaker's face in a video, and visualization in a diarized transcript. In an example embodiment, candidate images of a face of a detected speaker are extracted from frames of a video identified by a detected face track for the face, and a representative image of the detected speaker's face is selected from the candidate images based on image quality, facial emotion (e.g., using an emotion classifier that generates a happiness score), a size factor (e.g., favoring larger images), and/or penalizing images that appear towards the beginning or end of a face track. As such, each segment of the transcript is presented with the representative image of the speaker who spoke that segment and/or input is accepted changing the representative image associated with each speaker.

Public/Granted literature

US20240127855A1 SPEAKER THUMBNAIL SELECTION AND SPEAKER VISUALIZATION IN DIARIZED TRANSCRIPTS FOR TEXT-BASED VIDEO Public/Granted day:2024-04-18

Information query

Espacenet

IPC分类:

G	物理
G11	信息存储
G11B	基于记录载体和换能器之间的相对运动而实现的信息存储（以不需要通过换能器重现记录值的方式记录测量值的入G01D9/00；利用有机械标记的带子，例如，穿孔纸带或利用单元记录卡，如穿孔卡片或具有磁性标记的卡片的记录或重现设备入G06K；将数据从记录载体的一种类型转移到另一种类型上的入G06K1/18；将重放装置的输出耦合到无线电接收机上去的电路入H04B1/20；唱机拾音器之类的声音机电传感器或为此所用的电路入H04R）
G11B27/00	编辑；索引；寻址；定时或同步；监控；磁带行程的测量
G11B27/02	.编辑，例如，改变记录在记录载体上或从记录载体上重现的信息信号的次序