专利检索 ipc:G10L25/78 第 1 页

1.

发明公开
VIDEO PROCESSING METHOD AND ELECTRONIC DEVICE 审中-公开

公开(公告)号：US20240357231A1

公开(公告)日：2024-10-24

申请号：US18252907

申请日：2022-09-06

申请人： BEIJING HONOR DEVICE CO., LTD.

发明人： Zhenyi LIU , Jianyong XUAN , Guozhi CAO

IPC分类号： H04N23/667 , G10L15/08 , G10L15/22 , G10L21/028 , G10L25/18 , G10L25/21 , G10L25/78 , H04N23/63 , H04N23/90 , H04R1/02 , H04R1/40 , H04R3/00

CPC分类号： H04N23/667 , G10L15/08 , G10L15/22 , G10L21/028 , G10L25/18 , G10L25/21 , G10L25/78 , H04N23/632 , H04N23/90 , H04R1/028 , H04R1/406 , H04R3/005 , G10L2015/088 , H04R2499/11

摘要： A video processing method and an electronic device, wherein the method includes: running a camera application program on an electronic device; displaying a first image, where the first image is an image captured when the electronic device is in a first shooting mode; obtaining audio data, where the audio data is data captured by the at least two pickup apparatuses; obtaining a switching instruction based on the audio data, where the switching instruction is used to instruct the electronic device to switch from the first shooting mode to a second shooting mode; and displaying a second image, where the second image is an image captured when the electronic device is in the second shooting mode. Based on the technical solution of this application, video recording can be completed without requiring a user to switch shooting modes of the electronic device, thereby improving shooting experience of the user.

2.

发明公开
CALL WORD LEARNING DATA GENERATION DEVICE AND METHOD 审中-公开

公开(公告)号：US20240355321A1

公开(公告)日：2024-10-24

申请号：US18463788

申请日：2023-09-08

申请人： Hyundai Motor Company , Kia Corporation

发明人： Eui Hyeok LEE , Hyung Sik Gim , Han Woong Choi , Sung Woo Moon

IPC分类号： G10L15/06 , G10L15/02 , G10L15/08 , G10L15/22 , G10L25/78 , G10L25/90

CPC分类号： G10L15/063 , G10L15/02 , G10L15/08 , G10L15/22 , G10L25/78 , G10L25/90 , G10L2015/025 , G10L2015/088

摘要： The present disclosure relates to a device and method for generating call word learning data, and the call word learning data generation device includes a processor and storage. The storage stores utterance data and an utterance phrase corresponding to the utterance data. The processor is configured to decompose the utterance data into phoneme units based on the utterance data and the utterance phrase, to receive a call word through a user input, to decompose the received call word into phoneme units, to compare phoneme data of the call word with phoneme data of the utterance data, and to generate call word learning data by combining phoneme data matched as the comparison result.

3.

发明授权
Adaptively recognizing speech using key phrases 有权

公开(公告)号：US12125482B2

公开(公告)日：2024-10-22

申请号：US16692150

申请日：2019-11-22

申请人： INTEL CORPORATION

发明人： Krzysztof Czarnowski , Munir Nikolai Alexander Georges , Tobias Bocklet , Georg Stemmer

IPC分类号： G10L15/22 , G06F16/638 , G10L15/02 , G10L15/16 , G10L17/00 , G10L25/78 , G10L15/08

CPC分类号： G10L15/22 , G06F16/638 , G10L15/02 , G10L15/16 , G10L17/00 , G10L25/78 , G10L2015/027 , G10L2015/088 , G10L2015/223

摘要： An example apparatus for recognizing speech includes an audio receiver to receive a stream of audio. The apparatus also includes a key phrase detector to detect a key phrase in the stream of audio. The apparatus further includes a model adapter to dynamically adapt a model based on the detected key phrase. The apparatus also includes a query recognizer to detect a voice query following the key phrase in a stream of audio via the adapted model.

4.

发明授权
Privacy-preserving social interaction measurement 有权

公开(公告)号：US12119019B2

公开(公告)日：2024-10-15

申请号：US17578217

申请日：2022-01-18

申请人： Google LLC

发明人： Julian Maclaren , Karolis Misiunas , Vahe Tshitoyan , Brian Foo , Kelly Dobson

IPC分类号： G10L25/51 , A61B5/024 , G10L15/22 , G10L21/028 , G10L25/78 , H04R1/04 , H04R1/40 , H04R3/00

CPC分类号： G10L25/51 , A61B5/02416 , G10L15/22 , G10L21/028 , G10L25/78 , H04R1/04 , H04R1/406 , H04R3/005 , A61B2503/12

摘要： Various systems, devices, and methods for social interaction measurement that preserve privacy are presented. An audio signal can be captured using a microphone. The audio signal can be processed using an audio-based machine learning model that is trained to detect the presence of speech. The audio signal can be discarded such that content of the audio signal is not stored after the audio signal is processed using the machine learning model. An indication of whether speech is present within the audio signal can be output based at least in part on processing the audio signal using the audio-based machine learning model.

5.

发明授权
Multimodal speech recognition method and system, and computer-readable storage medium 有权

公开(公告)号：US12112744B2

公开(公告)日：2024-10-08

申请号：US17684958

申请日：2022-03-02

申请人： Zhejiang University

发明人： Feng Lin , Tiantian Liu , Ming Gao , Chao Wang , Zhongjie Ba , Jinsong Han , Wenyao Xu , Kui Ren

IPC分类号： G10L15/20 , G01S13/88 , G10L15/06 , G10L15/18 , G10L15/22 , G10L15/28 , G10L25/18 , G10L25/78

CPC分类号： G10L15/20 , G01S13/88 , G10L15/063 , G10L15/1815 , G10L15/22 , G10L15/28 , G10L25/18 , G10L25/78

摘要： The disclosure provides a multimodal speech recognition method and system, and a computer-readable storage medium. The method includes calculating a first logarithmic mel-frequency spectral coefficient and a second logarithmic mel-frequency spectral coefficient when a target millimeter-wave signal and a target audio signal both contain speech information corresponding to a target user; inputting the first and the second logarithmic mel-frequency spectral coefficient into a fusion network to determine a target fusion feature, where the fusion network includes at least a calibration module and a mapping module, the calibration module is configured to perform mutual feature calibration on the target audio/millimeter-wave signals, and the mapping module is configured to fuse a calibrated millimeter-wave feature and a calibrated audio feature; and inputting the target fusion feature into a semantic feature network to determine a speech recognition result corresponding to the target user. The disclosure can implement high-accuracy speech recognition.

6.

发明公开
METHODS AND SYSTEMS FOR ENHANCED CONFERENCING 审中-公开

公开(公告)号：US20240331690A1

公开(公告)日：2024-10-03

申请号：US18194924

申请日：2023-04-03

申请人： Comcast Cable Communications, LLC

发明人： Maria Battle-Miller , Christopher Day , Rima Shah

IPC分类号： G10L15/22 , G10L15/01 , G10L15/05 , G10L15/26 , G10L25/78

CPC分类号： G10L15/22 , G10L15/01 , G10L15/05 , G10L15/26 , G10L25/78

摘要： Methods, systems, and apparatus are described herein for enhanced conferencing. A computing device monitor user participation. One or more conference features may be activated or deactivated based on speech patterns of conference participants.

7.

发明授权
Method, apparatus, and system for processing audio data 有权

公开(公告)号：US12100406B2

公开(公告)日：2024-09-24

申请号：US18344445

申请日：2023-06-29

申请人： Huawei Technologies Co., Ltd.

发明人： Zhe Wang

IPC分类号： G10L19/012 , G10L19/02 , G10L19/22 , G10L19/26 , G10L25/21 , G10L25/78 , G10L19/18

CPC分类号： G10L19/012 , G10L19/0204 , G10L19/22 , G10L19/265 , G10L25/21 , G10L25/78 , G10L19/18

摘要： A method for processing an audio signal includes receiving a bitstream corresponding to the audio signal; obtaining a silence insertion descriptor (SID) type of a current frame of the audio signal by decoding the bitstream; obtaining a low-band parameter of the current frame by decoding the bitstream; obtaining a low-band signal of the current frame based on the low-band parameter; obtaining, based on the SID type of the current frame, a high-band parameter of the current frame; obtaining a high-band signal of the current frame based on the high-band parameter; and obtaining a synthesis signal of the current frame based on the low-band signal and the high-band signal.

8.

发明授权
Audio system for spatializing virtual sound sources 有权

公开(公告)号：US12094487B2

公开(公告)日：2024-09-17

申请号：US17480740

申请日：2021-09-21

申请人： META PLATFORMS TECHNOLOGIES, LLC

发明人： Pablo Francisco Faundez Hoffmann , Peter Harty Dodds

IPC分类号： G10L25/78 , G10L25/18 , H04R5/027 , H04S7/00 , H04R5/033

CPC分类号： G10L25/78 , G10L25/18 , H04R5/027 , H04S7/303 , G10L2025/783 , H04R5/033

摘要： An audio system for spatializing virtual sound sources is described. A microphone array of the audio system is configured to monitor sound in a local area. A controller of the audio system identifies sound sources within the local area using the monitored sound from the microphone array and determines their locations. The controller of the audio system generates a target position for a virtual sound source based on one or more constraints. The one or more constraints include that the target position be at least a threshold distance away from each of the determined locations of the identified sound sources. The controller generates one or more sound filters based in part on the target position to spatialize the virtual sound source. A transducer array of the audio system presents spatialized audio including the virtual sound source content based in part on the one or more sound filters.

9.

发明授权
Default assistant fallback in multi-assistant devices 有权

公开(公告)号：US12094463B1

公开(公告)日：2024-09-17

申请号：US17540623

申请日：2021-12-02

申请人： Amazon Technologies, Inc.

发明人： Robert John Mars

IPC分类号： G10L15/22 , G06F3/01 , G10L15/08 , G10L25/78 , G10L15/30

CPC分类号： G10L15/22 , G06F3/017 , G10L15/08 , G10L25/78 , G10L2015/088 , G10L2015/223 , G10L15/30

摘要： A speech-processing system may provide access to one or more virtual assistants via an audio-controlled device. A virtual assistant may be invoked by speaking a wakeword. In some cases, a default virtual assistant may be invoked when an utterance is spoken without a preceding wakeword. Such a multi-assistant speech-processing system may make an early determination that a received utterance does not include a wakeword, and begin processing the utterance prior to completion of the utterance, thereby reducing user perceived latency. For example, the system may start a timer when a gesture and/or voice activity is detected. If no wakeword is detected after a time corresponding to speaking durations of known wakewords, the system may determine that the utterance is to be processed according to the default virtual assistant.

10.

发明公开
DYNAMICALLY MUTING CONVERSATIONS BASED ON CONTEXT 审中-公开

公开(公告)号：US20240304170A1

公开(公告)日：2024-09-12

申请号：US18180267

申请日：2023-03-08

申请人： International Business Machines Corporation

发明人： Aaron K. Baughman , Shikhar Kwatra , Jeremy R. Fox , Jagadesh Ramaswamy Hulugundi , Raghuveer Prasad Nagar , Sarbajit K. Rakshit

IPC分类号： G10K11/178 , G10L21/034 , G10L25/78

CPC分类号： G10K11/17823 , G10K11/17827 , G10K11/17873 , G10L21/034 , G10L25/78 , G10K2210/108 , G10K2210/30231 , G10K2210/3027 , G10K2210/3044 , G10L2025/783

摘要： A computer-implemented method dynamically mutes irrelevant sources of noise. The method includes identifying one or more sources of a noise in a vicinity of a listening device, where the listening device is associated with a user and the listening device includes a noise canceling function. The method also includes determining, for the user, a context, where the context represents a subject of a conversation related to the user. The method further includes calculating, for each of the one or more sources of noise, a relevance score. The method includes muting, by the listening device, each of the sources of noise where the associated relevance score is below a relevance threshold.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类