-
公开(公告)号:US20240357231A1
公开(公告)日:2024-10-24
申请号:US18252907
申请日:2022-09-06
发明人: Zhenyi LIU , Jianyong XUAN , Guozhi CAO
IPC分类号: H04N23/667 , G10L15/08 , G10L15/22 , G10L21/028 , G10L25/18 , G10L25/21 , G10L25/78 , H04N23/63 , H04N23/90 , H04R1/02 , H04R1/40 , H04R3/00
CPC分类号: H04N23/667 , G10L15/08 , G10L15/22 , G10L21/028 , G10L25/18 , G10L25/21 , G10L25/78 , H04N23/632 , H04N23/90 , H04R1/028 , H04R1/406 , H04R3/005 , G10L2015/088 , H04R2499/11
摘要: A video processing method and an electronic device, wherein the method includes: running a camera application program on an electronic device; displaying a first image, where the first image is an image captured when the electronic device is in a first shooting mode; obtaining audio data, where the audio data is data captured by the at least two pickup apparatuses; obtaining a switching instruction based on the audio data, where the switching instruction is used to instruct the electronic device to switch from the first shooting mode to a second shooting mode; and displaying a second image, where the second image is an image captured when the electronic device is in the second shooting mode. Based on the technical solution of this application, video recording can be completed without requiring a user to switch shooting modes of the electronic device, thereby improving shooting experience of the user.
-
公开(公告)号:US20240355321A1
公开(公告)日:2024-10-24
申请号:US18463788
申请日:2023-09-08
发明人: Eui Hyeok LEE , Hyung Sik Gim , Han Woong Choi , Sung Woo Moon
CPC分类号: G10L15/063 , G10L15/02 , G10L15/08 , G10L15/22 , G10L25/78 , G10L25/90 , G10L2015/025 , G10L2015/088
摘要: The present disclosure relates to a device and method for generating call word learning data, and the call word learning data generation device includes a processor and storage. The storage stores utterance data and an utterance phrase corresponding to the utterance data. The processor is configured to decompose the utterance data into phoneme units based on the utterance data and the utterance phrase, to receive a call word through a user input, to decompose the received call word into phoneme units, to compare phoneme data of the call word with phoneme data of the utterance data, and to generate call word learning data by combining phoneme data matched as the comparison result.
-
公开(公告)号:US12125482B2
公开(公告)日:2024-10-22
申请号:US16692150
申请日:2019-11-22
申请人: INTEL CORPORATION
CPC分类号: G10L15/22 , G06F16/638 , G10L15/02 , G10L15/16 , G10L17/00 , G10L25/78 , G10L2015/027 , G10L2015/088 , G10L2015/223
摘要: An example apparatus for recognizing speech includes an audio receiver to receive a stream of audio. The apparatus also includes a key phrase detector to detect a key phrase in the stream of audio. The apparatus further includes a model adapter to dynamically adapt a model based on the detected key phrase. The apparatus also includes a query recognizer to detect a voice query following the key phrase in a stream of audio via the adapted model.
-
公开(公告)号:US12119019B2
公开(公告)日:2024-10-15
申请号:US17578217
申请日:2022-01-18
申请人: Google LLC
发明人: Julian Maclaren , Karolis Misiunas , Vahe Tshitoyan , Brian Foo , Kelly Dobson
CPC分类号: G10L25/51 , A61B5/02416 , G10L15/22 , G10L21/028 , G10L25/78 , H04R1/04 , H04R1/406 , H04R3/005 , A61B2503/12
摘要: Various systems, devices, and methods for social interaction measurement that preserve privacy are presented. An audio signal can be captured using a microphone. The audio signal can be processed using an audio-based machine learning model that is trained to detect the presence of speech. The audio signal can be discarded such that content of the audio signal is not stored after the audio signal is processed using the machine learning model. An indication of whether speech is present within the audio signal can be output based at least in part on processing the audio signal using the audio-based machine learning model.
-
公开(公告)号:US12112744B2
公开(公告)日:2024-10-08
申请号:US17684958
申请日:2022-03-02
申请人: Zhejiang University
发明人: Feng Lin , Tiantian Liu , Ming Gao , Chao Wang , Zhongjie Ba , Jinsong Han , Wenyao Xu , Kui Ren
IPC分类号: G10L15/20 , G01S13/88 , G10L15/06 , G10L15/18 , G10L15/22 , G10L15/28 , G10L25/18 , G10L25/78
CPC分类号: G10L15/20 , G01S13/88 , G10L15/063 , G10L15/1815 , G10L15/22 , G10L15/28 , G10L25/18 , G10L25/78
摘要: The disclosure provides a multimodal speech recognition method and system, and a computer-readable storage medium. The method includes calculating a first logarithmic mel-frequency spectral coefficient and a second logarithmic mel-frequency spectral coefficient when a target millimeter-wave signal and a target audio signal both contain speech information corresponding to a target user; inputting the first and the second logarithmic mel-frequency spectral coefficient into a fusion network to determine a target fusion feature, where the fusion network includes at least a calibration module and a mapping module, the calibration module is configured to perform mutual feature calibration on the target audio/millimeter-wave signals, and the mapping module is configured to fuse a calibrated millimeter-wave feature and a calibrated audio feature; and inputting the target fusion feature into a semantic feature network to determine a speech recognition result corresponding to the target user. The disclosure can implement high-accuracy speech recognition.
-
公开(公告)号:US20240331690A1
公开(公告)日:2024-10-03
申请号:US18194924
申请日:2023-04-03
发明人: Maria Battle-Miller , Christopher Day , Rima Shah
摘要: Methods, systems, and apparatus are described herein for enhanced conferencing. A computing device monitor user participation. One or more conference features may be activated or deactivated based on speech patterns of conference participants.
-
公开(公告)号:US12100406B2
公开(公告)日:2024-09-24
申请号:US18344445
申请日:2023-06-29
发明人: Zhe Wang
CPC分类号: G10L19/012 , G10L19/0204 , G10L19/22 , G10L19/265 , G10L25/21 , G10L25/78 , G10L19/18
摘要: A method for processing an audio signal includes receiving a bitstream corresponding to the audio signal; obtaining a silence insertion descriptor (SID) type of a current frame of the audio signal by decoding the bitstream; obtaining a low-band parameter of the current frame by decoding the bitstream; obtaining a low-band signal of the current frame based on the low-band parameter; obtaining, based on the SID type of the current frame, a high-band parameter of the current frame; obtaining a high-band signal of the current frame based on the high-band parameter; and obtaining a synthesis signal of the current frame based on the low-band signal and the high-band signal.
-
公开(公告)号:US12094487B2
公开(公告)日:2024-09-17
申请号:US17480740
申请日:2021-09-21
摘要: An audio system for spatializing virtual sound sources is described. A microphone array of the audio system is configured to monitor sound in a local area. A controller of the audio system identifies sound sources within the local area using the monitored sound from the microphone array and determines their locations. The controller of the audio system generates a target position for a virtual sound source based on one or more constraints. The one or more constraints include that the target position be at least a threshold distance away from each of the determined locations of the identified sound sources. The controller generates one or more sound filters based in part on the target position to spatialize the virtual sound source. A transducer array of the audio system presents spatialized audio including the virtual sound source content based in part on the one or more sound filters.
-
公开(公告)号:US12094463B1
公开(公告)日:2024-09-17
申请号:US17540623
申请日:2021-12-02
发明人: Robert John Mars
CPC分类号: G10L15/22 , G06F3/017 , G10L15/08 , G10L25/78 , G10L2015/088 , G10L2015/223 , G10L15/30
摘要: A speech-processing system may provide access to one or more virtual assistants via an audio-controlled device. A virtual assistant may be invoked by speaking a wakeword. In some cases, a default virtual assistant may be invoked when an utterance is spoken without a preceding wakeword. Such a multi-assistant speech-processing system may make an early determination that a received utterance does not include a wakeword, and begin processing the utterance prior to completion of the utterance, thereby reducing user perceived latency. For example, the system may start a timer when a gesture and/or voice activity is detected. If no wakeword is detected after a time corresponding to speaking durations of known wakewords, the system may determine that the utterance is to be processed according to the default virtual assistant.
-
公开(公告)号:US20240304170A1
公开(公告)日:2024-09-12
申请号:US18180267
申请日:2023-03-08
发明人: Aaron K. Baughman , Shikhar Kwatra , Jeremy R. Fox , Jagadesh Ramaswamy Hulugundi , Raghuveer Prasad Nagar , Sarbajit K. Rakshit
IPC分类号: G10K11/178 , G10L21/034 , G10L25/78
CPC分类号: G10K11/17823 , G10K11/17827 , G10K11/17873 , G10L21/034 , G10L25/78 , G10K2210/108 , G10K2210/30231 , G10K2210/3027 , G10K2210/3044 , G10L2025/783
摘要: A computer-implemented method dynamically mutes irrelevant sources of noise. The method includes identifying one or more sources of a noise in a vicinity of a listening device, where the listening device is associated with a user and the listening device includes a noise canceling function. The method also includes determining, for the user, a context, where the context represents a subject of a conversation related to the user. The method further includes calculating, for each of the one or more sources of noise, a relevance score. The method includes muting, by the listening device, each of the sources of noise where the associated relevance score is below a relevance threshold.
-
-
-
-
-
-
-
-
-