-
公开(公告)号:WO2023273747A1
公开(公告)日:2023-01-05
申请号:PCT/CN2022/095732
申请日:2022-05-27
Applicant: 青岛海尔科技有限公司 , 海尔智家股份有限公司
Inventor: 郝斌
IPC: G10L21/0208 , G10L15/22 , G10L25/54 , G10L2015/223 , G10L2021/02082
Abstract: 一种智能设备的唤醒方法和装置、存储介质及电子装置,其中,该方法包括:从多个智能设备中获取允许被唤醒信号唤醒的智能设备作为候选设备;在候选设备的数量为多个的情况下,确定多个候选设备中每个候选设备对应的目标唤醒角度以及目标唤醒能量;根据目标唤醒角度和目标唤醒能量,从多个候选设备中确定目标设备,其中,目标设备用于响应唤醒信号。解决了相关技术中,确定响应唤醒指令的智能设备的准确性较低等问题。
-
公开(公告)号:WO2022272281A1
公开(公告)日:2022-12-29
申请号:PCT/US2022/073113
申请日:2022-06-23
Applicant: SRI INTERNATIONAL
Inventor: KATHOL, Andreas , RICHEY, Colleen , ABRASH, Victor , KWON, Homin
IPC: G06F16/632 , G10L15/08 , G10L15/187 , G10L15/26 , G10L21/06 , G10L25/54 , G06F16/638 , G10L21/12
Abstract: Techniques are disclosed for searching audio recordings in a second language with a key phrase in a first language. For example, a system as described herein receives a first key phrase in the first language and an audio recording in the second language. The system converts the first key phrase into a second key phrase in the second language. The system processes the second key phrase to produce a second key phrase variant. The system identifies, from a graph of words in the second language generated from the audio recording, instances of the second key phrase or the second key phrase variant within the audio recording. The system displays the identified instances of the second key phrase or the second key phrase variant within the audio recording to enhance searchability of the audio recording in the second language.
-
公开(公告)号:WO2022178122A1
公开(公告)日:2022-08-25
申请号:PCT/US2022/016788
申请日:2022-02-17
Applicant: CERENCE OPERATING COMPANY
Inventor: BEN GIGI, Yitshak Lior , VACHON, Caitlin
Abstract: A system for interacting with an audio stream to obtain lyric information, control playback of the audio stream, and control aspects of the audio stream. In some instances, end users can request that the audio stream play with a lead vocal track or without a lead vocal track. Obtaining lyric information includes receiving via a text to speech module an audio playback of the lyric information.
-
4.
公开(公告)号:WO2021247156A1
公开(公告)日:2021-12-09
申请号:PCT/US2021/028278
申请日:2021-04-21
Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
Inventor: KIKIN-GIL, Erez , PARISH, Daniel Yancy
IPC: G10L25/54 , G06F16/34 , G10L15/16 , G10L15/26 , G06T7/00 , G06F16/345 , G06K9/6256 , G06K9/6293 , G06N3/0445 , G06N3/08 , G06V10/82 , G06V20/41 , G06V20/47 , G06V2201/10 , G06V40/16 , G06V40/172 , G10L17/00 , G10L17/18 , H04N7/155
Abstract: In non-limiting examples of the present disclosure, systems, methods and devices for generating summary content are presented. Voice audio data and video data for an electronic meeting may be received. A language processing model may be applied to a transcript of the audio data and textual importance scores may be calculated. A video/image model may be applied to the video data and visual importance scores may be calculated. A combined importance score may be calculated for sections of the electronic meeting based on the textual importance scores and the visual importance scores. A meeting summary that includes summary content from sections for which combined importance scores exceed a threshold value may be generated.
-
公开(公告)号:WO2021244057A1
公开(公告)日:2021-12-09
申请号:PCT/CN2021/074912
申请日:2021-02-02
Applicant: 北京搜狗智能科技有限公司
IPC: G10L15/22 , G10L15/26 , G10L15/18 , G10L15/30 , G10L25/54 , G10L15/1822 , G10L2015/223
Abstract: 本发明实施例提供了一种交互方法、一种交互装置、一种耳机和耳机收纳装置,所述耳机与耳机收纳装置通信连接,所述耳机具有交互助手,所述方法包括:所述耳机从所述耳机收纳装置获取用户语音的语音识别结果;调用所述交互助手根据所述语音识别结果执行交互操作。不需要用户使用手操作耳机,实现耳机的多种交互功能。
-
公开(公告)号:WO2016209924A1
公开(公告)日:2016-12-29
申请号:PCT/US2016/038708
申请日:2016-06-22
Applicant: AMAZON TECHNOLOGIES, INC.
Inventor: BASYE, Kenneth John , TOTH, Arthur Richard , BARTON, William Folwell
CPC classification number: G10L15/26 , G10L13/02 , G10L13/033 , G10L15/18 , G10L15/22 , G10L17/26 , G10L25/54 , G10L2015/223 , G10L2015/225
Abstract: A system matches text-to-speech (TTS) or other output to a quality of an input spoken utterance. The system uses trained models to detect a speech quality and generates an indicator of the speech quality. The speech quality may be determined from audio or non-audio data. The indicator is sent to downstream components of the system such as a command processor or TTS system. The output of the system is then determined using the indicator of speech quality, thus customizing an output of the system to the manner in which the utterance was spoken.
Abstract translation: 系统将文本到语音(TTS)或其他输出与输入口头语音的质量相匹配。 该系统使用经过训练的模型来检测语音质量并产生语音质量的指标。 可以从音频或非音频数据确定语音质量。 指示符发送到系统的下游组件,如命令处理器或TTS系统。 然后使用语音质量的指示符来确定系统的输出,从而将系统的输出定制成说话的方式。
-
公开(公告)号:WO2016003735A1
公开(公告)日:2016-01-07
申请号:PCT/US2015/037484
申请日:2015-06-24
Applicant: DOLBY LABORATORIES LICENSING CORPORATION
Inventor: BAUER, Claus , LU, Lie , HU, Mingqing , WANG, Jun , CRUM, Poppy , WILSON, Rhonda , RADHAKRISHNAN, Regunathan
CPC classification number: G10L25/54 , G06K9/6259 , G06K9/6261 , G10L25/03
Abstract: Example embodiments disclosed herein relate to perception based multimedia processing. There is provided a method for processing multimedia data, the method includes automatically determining user perception on a segment of the multimedia data based on a plurality of clusters, the plurality of clusters obtained in association with predefined user perceptions and processing the segment of the multimedia data at least in part based on determined user perception on the segment. Corresponding system and computer program products are disclosed as well.
Abstract translation: 本文公开的示例实施例涉及基于感知的多媒体处理。 提供了一种用于处理多媒体数据的方法,所述方法包括基于多个聚类自动确定多媒体数据的段上的用户感知,所述多个群集与预定义的用户感知相关联地获得并且处理多媒体数据的段 至少部分地基于对段的确定的用户感知。 还公开了相应的系统和计算机程序产品。
-
公开(公告)号:WO2014155526A1
公开(公告)日:2014-10-02
申请号:PCT/JP2013/058791
申请日:2013-03-26
Applicant: 株式会社 東芝
Inventor: 舘野 剛
CPC classification number: G10H1/0041 , G06F17/30743 , G10H2240/151 , G10L25/54 , G11B27/28
Abstract: 実施の形態によれば、情報処理装置は、検索手段28cと解析手段28dとを備える。検索手段28cは、 解析対象となるコンテンツに対して所定の時間間隔で曲検索を行なう。解析手段28dは、検索手段28cにより所定の時間間隔で得られる曲検索結果に基づいて、コンテンツに含まれる曲の再生状態を解析する。
Abstract translation: 在一个实施例中,信息处理装置具有搜索装置(28c)和分析装置(28d)。 搜索装置(28c)以规定的时间间隔对要分析的内容执行歌曲搜索。 分析装置(28d)基于通过搜索装置(28c)以规定的间隔获得的歌曲搜索结果,分析包含在内容中的歌曲的重放状态。
-
公开(公告)号:WO2023285425A1
公开(公告)日:2023-01-19
申请号:PCT/EP2022/069393
申请日:2022-07-12
Applicant: UTOPIA MUSIC AG
Inventor: WAHLGREN, Linus , FLACH, Max
Abstract: Apparatus, method, and computer program code for processing audio stream. The method includes: obtaining (202) first peaks of an audio stream, wherein the first peak comprises a first peak amplitude at a first frequency and at a first time offset from a beginning of the audio stream; for each first peak, detecting (216, 218) a second peak in a window with a predetermined offset from the first peak, wherein the second peak comprises a second peak amplitude at a second frequency and at a second time offset from the beginning of the audio stream; and for each first peak, generating (216, 222) a fingerprint hash based on the first frequency, a time difference between the first time offset and the second time offset, a frequency difference between the first frequency and the second frequency, and an amplitude difference between the first amplitude and the second amplitude.
-
公开(公告)号:WO2022243778A1
公开(公告)日:2022-11-24
申请号:PCT/IB2022/054124
申请日:2022-05-04
Applicant: COCHLEAR LIMITED
Inventor: WINDEYER, Jamon , CHEN, Henry, Hu , FRIEDING, Jan, Patrick , FUNG, Stephen
Abstract: An apparatus includes voice activity detection (VAD) circuitry configured to analyze one or more audio broadcast streams and to identify first segments of the one or more broadcast streams in which the audio data includes speech data. The apparatus further includes derivation circuitry configured to receive the first segments and, for each first segment, to derive one or more words from the speech data of the first segment. The apparatus further includes keyword detection circuitry configured to, for each first segment, receive the one or more words and to generate keyword information indicative of whether at least one word of the one or more words is among a set of stored keywords. The apparatus further includes decision circuitry configured to receive the first segments, the one or more words of each of the first segments, and the keyword information for each of the first segments and, for each first segment, to select, based at least in part on the keyword information, among a plurality of options regarding communication of information indicative of the first segment to a recipient.
-
-
-
-
-
-
-
-
-