발화 대상을 구분하여 음성 처리하는 방법 및 장치

Invention Application

WO2023027308A1 발화 대상을 구분하여 음성 처리하는 방법 및 장치 审中-公开

Please log in to see more content

Patent Title: 발화 대상을 구분하여 음성 처리하는 방법 및 장치
Patent Title (English): METHOD AND DEVICE FOR PROCESSING SPEECH BY DISTINGUISHING SPEAKERS
Application No.: PCT/KR2022/008593

Application Date: 2022-06-17
Publication No.: WO2023027308A1

Publication Date: 2023-03-02
Inventor: 박민정 , 김철귀 , 유주영 , 조남민
Applicant: 삼성전자 주식회사
Applicant Address: 16677 경기도 수원시 영통구 삼성로 129, Gyeonggi-do
Assignee: 삼성전자 주식회사
Current Assignee: 삼성전자 주식회사
Current Assignee Address: 16677 경기도 수원시 영통구 삼성로 129, Gyeonggi-do
Agency: 윤앤리특허법인(유한)
Priority: KR10-2021-0113794 2021-08-27
Main IPC: G10L15/28
IPC: G10L15/28 ; G10L15/04 ; G10L15/25 ; H04N7/18 ; H04R1/40 ; G06F3/01 ; G10L15/06 ; G10L15/22

Abstract:

본 발명의 다양한 실시 예들은 서로 다른 위치에 배치된 복수의 카메라들, 서로 다른 위치에 배치된 복수의 마이크들, 메모리, 및 상기 복수의 카메라들, 상기 복수의 마이크들, 및 상기 메모리 중 적어도 하나와 작동적으로 연결된 프로세서를 포함하고, 상기 프로세서는, 상기 복수의 카메라들 중 적어도 하나를 이용하여 상기 전자 장치를 착용한 사용자 또는 상기 사용자와 대화하는 상대방 중 적어도 하나가 발화하는지 여부를 판단하고, 상기 판단 결과에 기반하여 상기 복수의 마이크들 중 적어도 하나의 지향성을 설정하고, 상기 설정된 지향성에 기반하여 상기 복수의 마이크들 중 적어도 하나로부터 오디오를 획득하고, 상기 복수의 카메라들 중 적어도 하나로부터 상기 사용자 또는 상기 상대방의 입 모양이 포함된 이미지를 획득하고, 상기 획득한 오디오 및 상기 이미지에 기반하여 발화하는 대상의 음성을 서로 다른 방식으로 처리하도록 설정된 방법 및 장치에 관하여 개시한다. 다양한 실시 예들이 가능하다.

Abstract(English):

Disclosed, according to various embodiments of the present invention, are a method and a device, the device comprising: a plurality of cameras arranged at mutually different locations; a plurality of microphones arranged at mutually different locations; a memory; and a processor operatively connected to at least one among the plurality of cameras, the plurality of microphones, and the memory. The processor is set to: by using at least one of the plurality of cameras, determine whether at least one among a user wearing an electronic device or a partner engaged in conversation with the user is speaking or not; on the basis of the result of determining, set the directivity of at least one of the plurality of microphones; on the basis of the directivity that has been set, acquire audio from at least one of the plurality of microphones; acquire, from at least one of the plurality of cameras, an image containing the shape of the mouth of the user or the partner; and on the basis of the acquired audio and image, process speech of the speakers by using mutually different methods. Various embodiments are possible.

Information query

Global Dossier Patent Scope Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/28	.语音识别系统的结构细节