-
1.
公开(公告)号:US10283130B2
公开(公告)日:2019-05-07
申请号:US15392485
申请日:2016-12-28
发明人: Sascha Disch , Mikko-Ville Laitinen , Ville Pulkki
IPC分类号: G10L19/02 , G10L21/01 , G10L21/038 , G10L19/18 , G10L21/007 , G10L19/26 , G10L19/025
摘要: An audio processor for processing an audio signal includes a target phase measure determiner for determining a target phase measure for the audio signal in a time frame, a phase error calculator for calculating a phase error using a phase of the audio signal in the time frame and the target phase measure, and a phase corrector configured for correcting the phase of the audio signal in the time frame using the phase error.
-
公开(公告)号:US09747927B2
公开(公告)日:2017-08-29
申请号:US15119747
申请日:2014-08-15
发明人: Tomoyasu Nakano , Kazuyoshi Yoshii , Masataka Goto
IPC分类号: G10L21/00 , G10L25/54 , G10L13/02 , G06F17/30 , G10H1/00 , G10L19/022 , G10L21/01 , G10L21/14 , G10L25/12 , G10L25/00
CPC分类号: G10L25/54 , G06F17/30758 , G10H1/00 , G10H2210/056 , G10L13/02 , G10L19/022 , G10L21/003 , G10L21/01 , G10L21/10 , G10L21/14 , G10L25/12 , G10L25/24 , G10L25/90
摘要: A system for multifaceted singing analysis for retrieval of songs or music including singing voices having some relationship in latent semantics with a singing voice included in one particular song or music. A topic analyzing processor uses a topic model to analyze a plurality of vocal symbolic time series obtained for a plurality of musical audio signals. The topic analyzing processor generates a vocal topic distribution for each of the musical audio signals whereby the vocal topic distribution is composed of a plurality of vocal topics each indicating a relationship of one of the musical audio signals with the other musical audio signals. The topic analyzing processor generates a vocal symbol distribution for each of the vocal topics whereby the vocal symbol distribution indicates occurrence probabilities for the vocal symbols. A multifaceted singing analyzing processor performs analysis of singing voices included in musical audio signals, in the multifaceted viewpoint.
-
公开(公告)号:US09613635B2
公开(公告)日:2017-04-04
申请号:US14411094
申请日:2013-06-26
申请人: Yamaha Corporation
发明人: Norihiro Uemura , Eiji Murata
IPC分类号: G10L21/04 , G10H7/00 , G10H1/22 , G10H7/02 , G10L21/01 , G11B20/10 , G11B27/00 , G10H1/40 , G10H1/00
CPC分类号: G10L21/04 , G10H1/00 , G10H1/0066 , G10H1/22 , G10H1/40 , G10H7/006 , G10H7/02 , G10H2210/385 , G10H2230/041 , G10H2250/541 , G10H2250/621 , G10H2250/635 , G10H2250/645 , G10L21/01 , G11B20/10527 , G11B27/005
摘要: In order to play waveform data back at a variable performance tempo by using waveform data which complies with a desired reference tempo, the present invention performs a timeline-expansion/contraction control on the waveform data to be played back, according to the relationship between the performance tempo and the reference tempo. The present invention also determines whether to limit the playback of the waveform data according to the relationship between the performance tempo and the reference tempo. In the case that playback is to be limited, the present invention stops playback of the waveform data, or reduces the resolution of playback processing and continues playback of the waveform data. The present invention stops playback of the waveform data when, for example, the relationship between the performance tempo and the reference tempo is a relationship in which the waveform data would be played back at a performance tempo which would cause a processing delay or a deterioration of sound quality. As a result, it is possible to preemptively prevent a system freeze and solve problems such as the generation of music which has a slower tempo than the desired performance tempo, or the generation of music which includes the intermittent cutting out of sound due to noise, or a significant reduction to sound quality.
-
公开(公告)号:US20240127838A1
公开(公告)日:2024-04-18
申请号:US18047572
申请日:2022-10-18
摘要: A device includes one or more processors configured to input one or more segments of an input media stream into a feature extractor. The one or more processors are further configured to pass an output of the feature extractor into an utterance classifier to produce at least one representation of at least one utterance class of a plurality of utterance classes. The one or more processors are further configured to pass the output of the feature extractor and the at least one representation into a segment matcher to produce a media output segment identifier.
-
5.
公开(公告)号:US10770083B2
公开(公告)日:2020-09-08
申请号:US16209571
申请日:2018-12-04
发明人: Sascha Disch , Mikko-Ville Laitinen , Ville Pulkki
IPC分类号: G10L19/025 , G10L21/038 , G10L19/02 , G10L19/22 , G10L19/18 , G10L21/007 , G10L19/26 , G10L21/01
摘要: An audio processor for processing an audio signal includes a target phase measure determiner for determining a target phase measure for the audio signal in a time frame, a phase error calculator for calculating a phase error using a phase of the audio signal in the time frame and the target phase measure, and a phase corrector configured for correcting the phase of the audio signal in the time frame using the phase error.
-
公开(公告)号:US09666207B2
公开(公告)日:2017-05-30
申请号:US14878737
申请日:2015-10-08
IPC分类号: G10L21/0364 , G06F3/16 , G10L21/01
CPC分类号: G10L21/0364 , G06F3/162 , G06F3/167 , G10L21/01 , G10L21/02 , G10L2021/02082 , H04M9/082
摘要: Methods and systems for controlling audio communications between occupants of a vehicle are provided. In accordance with one embodiment, a system includes an interface and a processor. The interface is configured to at least facilitate receiving a request for sound transmission from a first occupant inside a vehicle to a second occupant inside the vehicle. The processor is coupled to the interface, and is configured to at least facilitate identifying respective locations of the first occupant and the second occupant, and performing the sound transmission with an adjustment for a phase difference based at least in part on the respective locations of the first occupant and the second occupant.
-
公开(公告)号:US20170103773A1
公开(公告)日:2017-04-13
申请号:US14878737
申请日:2015-10-08
IPC分类号: G10L21/0364 , G10L21/01 , G06F3/16
CPC分类号: G10L21/0364 , G06F3/162 , G06F3/167 , G10L21/01 , G10L21/02 , G10L2021/02082 , H04M9/082
摘要: Methods and systems for controlling audio communications between occupants of a vehicle are provided. In accordance with one embodiment, a system includes an interface and a processor. The interface is configured to at least facilitate receiving a request for sound transmission from a first occupant inside a vehicle to a second occupant inside the vehicle. The processor is coupled to the interface, and is configured to at least facilitate identifying respective locations of the first occupant and the second occupant, and performing the sound transmission with an adjustment for a phase difference based at least in part on the respective locations of the first occupant and the second occupant.
-
公开(公告)号:US20170061988A1
公开(公告)日:2017-03-02
申请号:US15119747
申请日:2014-08-15
发明人: Tomoyasu Nakano , Kazuyoshi Yoshii , Masataka Goto
CPC分类号: G10L25/54 , G06F17/30758 , G10H1/00 , G10H2210/056 , G10L13/02 , G10L19/022 , G10L21/003 , G10L21/01 , G10L21/10 , G10L21/14 , G10L25/12 , G10L25/24 , G10L25/90
摘要: A system for multifaceted singing analysis for retrieval of songs or music including singing voices having some relationship in latent semantics with a singing voice included in one particular song or music. A topic analyzing processor uses a topic model to analyze a plurality of vocal symbolic time series obtained for a plurality of musical audio signals. The topic analyzing processor generates a vocal topic distribution for each of the musical audio signals whereby the vocal topic distribution is composed of a plurality of vocal topics each indicating a relationship of one of the musical audio signals with the other musical audio signals. The topic analyzing processor generates a vocal symbol distribution for each of the vocal topics whereby the vocal symbol distribution indicates occurrence probabilities for the vocal symbols. A multifaceted singing analyzing processor performs analysis of singing voices included in musical audio signals, in the multifaceted viewpoint.
摘要翻译: 一种用于检索歌曲或音乐的多方面歌唱分析系统,包括在一种特定歌曲或音乐中包括具有歌声的潜在语义中具有一些关系的歌唱声音。 主题分析处理器使用主题模型来分析为多个音乐音频信号获得的多个声乐符号时间序列。 主题分析处理器为每个音乐音频信号生成声乐主题分布,由此声乐主题分布由多个声乐主题组成,每个声乐主题各自表示音乐音频信号之一与其他音乐音频信号的关系。 主题分析处理器为每个声乐主题生成声乐符号分布,由此声乐符号分布指示声乐符号的出现概率。 多方面的歌唱分析处理器在多方面的观点中对包括在音乐音频信号中的歌声进行分析。
-
公开(公告)号:US08825186B2
公开(公告)日:2014-09-02
申请号:US12190853
申请日:2008-08-13
申请人: Jeff Butters , Tim Addy
发明人: Jeff Butters , Tim Addy
CPC分类号: G10L21/01
摘要: The invention concerns digital audio processing and in particular the detection of periods where samples can be deleted or repeated unobtrusively so as to change the average sample-rate or to provide time delay modification. Differences between succeeding sample values are evaluated and compared with a threshold and samples are deleted or repeated where two or more consecutive sample value differences are less than the said threshold value.
摘要翻译: 本发明涉及数字音频处理,特别是检测可以不引人注意地删除或重复采样以便改变平均采样率或提供时间延迟修改的周期。 评估后续采样值之间的差异并与阈值进行比较,并且在两个或更多个连续采样值差小于所述阈值时删除或重复采样。
-
10.
公开(公告)号:US12131745B1
公开(公告)日:2024-10-29
申请号:US18754280
申请日:2024-06-26
申请人: Sanas.ai Inc.
发明人: Lukas Pfeifenberger , Shawn Zhang
IPC分类号: G10L21/007 , G06F3/16 , G10L13/00 , G10L13/033 , G10L15/02 , G10L15/06 , G10L15/16 , G10L15/26 , G10L21/003 , G10L21/01 , G10L21/013
CPC分类号: G10L21/007 , G06F3/162 , G10L13/00 , G10L13/033 , G10L15/02 , G10L15/063 , G10L15/16 , G10L15/26 , G10L21/003 , G10L21/013 , G10L21/01 , G10L2021/0135
摘要: The disclosed technology relates to methods, accent conversion systems, and non-transitory computer readable media for real-time accent conversion. In some examples, a set of phonetic embedding vectors is obtained for phonetic content representing a source accent and obtained from input audio data. A trained machine learning model is applied to the set of phonetic embedding vectors to generate a set of transformed phonetic embedding vectors corresponding to phonetic characteristics of speech data in a target accent. An alignment is determined by maximizing a cosine distance between the set of phonetic embedding vectors and the set of transformed phonetic embedding vectors. The speech data is then aligned to the phonetic content based on the determined alignment to generate output audio data representing the target accent. The disclosed technology transforms phonetic characteristics of a source accent to match the target accent more closely for efficient and seamless accent conversion in real-time applications.
-
-
-
-
-
-
-
-
-