Patent search ipc:"G10L21/01" Page 1

1.

发明授权
Audio processor and method for processing an audio signal using vertical phase correction 有权

公开(公告)号：US10283130B2

公开(公告)日：2019-05-07

申请号：US15392485

申请日：2016-12-28

Applicant: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.

Inventor： Sascha Disch , Mikko-Ville Laitinen , Ville Pulkki

IPC: G10L19/02 , G10L21/01 , G10L21/038 , G10L19/18 , G10L21/007 , G10L19/26 , G10L19/025

Abstract: An audio processor for processing an audio signal includes a target phase measure determiner for determining a target phase measure for the audio signal in a time frame, a phase error calculator for calculating a phase error using a phase of the audio signal in the time frame and the target phase measure, and a phase corrector configured for correcting the phase of the audio signal in the time frame using the phase error.

2.

发明授权
System and method for multifaceted singing analysis 有权

公开(公告)号：US09747927B2

公开(公告)日：2017-08-29

申请号：US15119747

申请日：2014-08-15

Applicant: National Institute of Advanced Industrial Science and Technology

Inventor： Tomoyasu Nakano , Kazuyoshi Yoshii , Masataka Goto

IPC: G10L21/00 , G10L25/54 , G10L13/02 , G06F17/30 , G10H1/00 , G10L19/022 , G10L21/01 , G10L21/14 , G10L25/12 , G10L25/00

CPC classification number: G10L25/54 , G06F17/30758 , G10H1/00 , G10H2210/056 , G10L13/02 , G10L19/022 , G10L21/003 , G10L21/01 , G10L21/10 , G10L21/14 , G10L25/12 , G10L25/24 , G10L25/90

Abstract: A system for multifaceted singing analysis for retrieval of songs or music including singing voices having some relationship in latent semantics with a singing voice included in one particular song or music. A topic analyzing processor uses a topic model to analyze a plurality of vocal symbolic time series obtained for a plurality of musical audio signals. The topic analyzing processor generates a vocal topic distribution for each of the musical audio signals whereby the vocal topic distribution is composed of a plurality of vocal topics each indicating a relationship of one of the musical audio signals with the other musical audio signals. The topic analyzing processor generates a vocal symbol distribution for each of the vocal topics whereby the vocal symbol distribution indicates occurrence probabilities for the vocal symbols. A multifaceted singing analyzing processor performs analysis of singing voices included in musical audio signals, in the multifaceted viewpoint.

3.

发明授权
Automated performance technology using audio waveform data 有权

公开(公告)号：US09613635B2

公开(公告)日：2017-04-04

申请号：US14411094

申请日：2013-06-26

Applicant: Yamaha Corporation

Inventor： Norihiro Uemura , Eiji Murata

IPC: G10L21/04 , G10H7/00 , G10H1/22 , G10H7/02 , G10L21/01 , G11B20/10 , G11B27/00 , G10H1/40 , G10H1/00

CPC classification number: G10L21/04 , G10H1/00 , G10H1/0066 , G10H1/22 , G10H1/40 , G10H7/006 , G10H7/02 , G10H2210/385 , G10H2230/041 , G10H2250/541 , G10H2250/621 , G10H2250/635 , G10H2250/645 , G10L21/01 , G11B20/10527 , G11B27/005

Abstract: In order to play waveform data back at a variable performance tempo by using waveform data which complies with a desired reference tempo, the present invention performs a timeline-expansion/contraction control on the waveform data to be played back, according to the relationship between the performance tempo and the reference tempo. The present invention also determines whether to limit the playback of the waveform data according to the relationship between the performance tempo and the reference tempo. In the case that playback is to be limited, the present invention stops playback of the waveform data, or reduces the resolution of playback processing and continues playback of the waveform data. The present invention stops playback of the waveform data when, for example, the relationship between the performance tempo and the reference tempo is a relationship in which the waveform data would be played back at a performance tempo which would cause a processing delay or a deterioration of sound quality. As a result, it is possible to preemptively prevent a system freeze and solve problems such as the generation of music which has a slower tempo than the desired performance tempo, or the generation of music which includes the intermittent cutting out of sound due to noise, or a significant reduction to sound quality.

4.

发明授权
System and method for automatic alignment of phonetic content for real-time accent conversion 有权

公开(公告)号：US12131745B1

公开(公告)日：2024-10-29

申请号：US18754280

申请日：2024-06-26

Applicant: Sanas.ai Inc.

Inventor： Lukas Pfeifenberger , Shawn Zhang

IPC: G10L21/007 , G06F3/16 , G10L13/00 , G10L13/033 , G10L15/02 , G10L15/06 , G10L15/16 , G10L15/26 , G10L21/003 , G10L21/01 , G10L21/013

CPC classification number: G10L21/007 , G06F3/162 , G10L13/00 , G10L13/033 , G10L15/02 , G10L15/063 , G10L15/16 , G10L15/26 , G10L21/003 , G10L21/013 , G10L21/01 , G10L2021/0135

Abstract: The disclosed technology relates to methods, accent conversion systems, and non-transitory computer readable media for real-time accent conversion. In some examples, a set of phonetic embedding vectors is obtained for phonetic content representing a source accent and obtained from input audio data. A trained machine learning model is applied to the set of phonetic embedding vectors to generate a set of transformed phonetic embedding vectors corresponding to phonetic characteristics of speech data in a target accent. An alignment is determined by maximizing a cosine distance between the set of phonetic embedding vectors and the set of transformed phonetic embedding vectors. The speech data is then aligned to the phonetic content based on the determined alignment to generate output audio data representing the target accent. The disclosed technology transforms phonetic characteristics of a source accent to match the target accent more closely for efficient and seamless accent conversion in real-time applications.

5.

发明授权
Automatic conversion of speech into song, rap or other audible expression having target meter or rhythm 有权

公开(公告)号：US12033644B2

公开(公告)日：2024-07-09

申请号：US17479912

申请日：2021-09-20

Applicant: SMULE, INC.

Inventor： Parag Chordia , Mark Godfrey , Alexander Rae , Prerna Gupta , Perry R. Cook

IPC: G10L21/04 , G10H1/36 , G10L19/00 , G10L19/02 , G10L21/01 , G10L21/055

CPC classification number: G10L19/02 , G10H1/366 , G10L19/00 , G10L21/055 , G10H2210/051 , G10H2240/141 , G10H2250/235

Abstract: Captured vocals may be automatically transformed using advanced digital signal processing techniques that provide captivating applications, and even purpose-built devices, in which mere novice user-musicians may generate, audibly render and share musical performances. In some cases, the automated transformations allow spoken vocals to be segmented, arranged, temporally aligned with a target rhythm, meter or accompanying backing tracks and pitch corrected in accord with a score or note sequence. Speech-to-song music applications are one such example. In some cases, spoken vocals may be transformed in accord with musical genres such as rap using automated segmentation and temporal alignment techniques, often without pitch correction. Such applications, which may employ different signal processing and different automated transformations, may nonetheless be understood as speech-to-rap variations on the theme.

6.

发明授权
Coherent pitch and intensity modification of speech signals 有权

公开(公告)号：US09922661B2

公开(公告)日：2018-03-20

申请号：US15378100

申请日：2016-12-14

Applicant: International Business Machines Corporation

Inventor： Alexander Sorin

IPC: G10L21/003 , G10L21/01 , G10L21/013 , G10L15/08 , G10L25/24 , G10L13/033

CPC classification number: G10L21/013 , G10L13/0335 , G10L15/08 , G10L21/003 , G10L21/01 , G10L25/24 , G10L2021/0135

Abstract: A method comprising: receiving an utterance, an original pitch contour of the utterance, and a target pitch contour for the utterance, wherein the utterance comprises a plurality of consecutive frames, and wherein at least one of said frames is a voiced frame; calculating an original intensity contour of said utterance; generating a pitch modified utterance based on the target pitch contour; calculating an intensity modification factor for each of said frames, based on said original pitch contour and on said target pitch contour, to produce a sequence of intensity modification factors corresponding to said plurality of consecutive frames; calculating a final intensity contour for said utterance by applying said intensity modification factors to said original intensity contour; and generating a coherently modified speech signal by time dependent scaling of the intensity of said pitch modified utterance according to said final intensity contour.

7.

发明授权
Pitch marking in speech processing 有权

公开(公告)号：US09685170B2

公开(公告)日：2017-06-20

申请号：US14918601

申请日：2015-10-21

Applicant: International Business Machines Corporation

Inventor： Slava Shechtman

IPC: G10L21/00 , G10L15/00 , G10L25/00 , G10L21/01 , G10L25/09 , G10L25/06 , G10L25/90

CPC classification number: G10L21/01 , G10L21/013 , G10L25/06 , G10L25/90

Abstract: According to some embodiments of the present invention, there is provided a computerized method for selecting and correcting pitch marks in speech processing and modification. The method comprises an action of receiving a continuous speech signal representing audible speech recorded by a microphone, where a sequence of pitch values and two or more pitch mark temporal values are computed from the continuous speech signal. The method comprises an action of computing for each of the pitch mark temporal values a lower limit temporal value and an upper limit temporal value by a cross-correlation function of the continuous speech signal around the pitch mark temporal values associated with pairs of elements in the sequence and replacing one or more of the pitch mark temporal values with one or more new temporal value between the lower limit temporal value and the upper limit temporal value.

8.

发明申请
PITCH MARKING IN SPEECH PROCESSING 有权

公开(公告)号：US20170117001A1

公开(公告)日：2017-04-27

申请号：US14918601

申请日：2015-10-21

Applicant: International Business Machines Corporation

Inventor： Slava Shechtman

IPC: G10L21/01 , G10L25/06 , G10L25/90 , G10L25/09

CPC classification number: G10L21/01 , G10L21/013 , G10L25/06 , G10L25/90

Abstract: According to some embodiments of the present invention, there is provided a computerized method for selecting and correcting pitch marks in speech processing and modification. The method comprises an action of receiving a continuous speech signal representing audible speech recorded by a microphone, where a sequence of pitch values and two or more pitch mark temporal values are computed from the continuous speech signal. The method comprises an action of computing for each of the pitch mark temporal values a lower limit temporal value and an upper limit temporal value by a cross-correlation function of the continuous speech signal around the pitch mark temporal values associated with pairs of elements in the sequence and replacing one or more of the pitch mark temporal values with one or more new temporal value between the lower limit temporal value and the upper limit temporal value.

9.

发明授权
Clock compensation techniques for audio decoding 有权
Title translation: 音频解码的时钟补偿技术

公开(公告)号：US09420332B2

公开(公告)日：2016-08-16

申请号：US11691688

申请日：2007-03-27

Applicant: Nischal Abrol , Sivaramakrishna Veerepalli , Stephen Verrall , Sandeep Singhai

Inventor： Nischal Abrol , Sivaramakrishna Veerepalli , Stephen Verrall , Sandeep Singhai

IPC: G06F17/00 , H04N21/439 , G10L21/01 , H04J3/06 , H04N21/2368 , H04N21/43 , H04N21/434

CPC classification number: H04N21/439 , G10L21/01 , H04J3/0632 , H04N21/2368 , H04N21/4305 , H04N21/4307 , H04N21/4341 , H04N21/4398

Abstract: This disclosure describes audio decoding techniques for decoding audio information that needs to be properly clocked. In accordance with this disclosure, the number of audio samples in decoded audio output can be adjusted to compensate for an estimated error the in decoder clock. That is to say, rather than adjust the decoder clock to synchronize the decoder clock to the encoder clock, this disclosure proposes adding or removing audio samples from the decoded audio output in order to ensure that the decoded audio output is properly timed. In this way, the techniques of this disclosure can eliminate the need for an adjustable or controllable clock at the decoding device, which can save cost and/or allow legacy devices that do not include an adjustable or controllable clock to decode and output audio information that needs to be properly clocked.

Abstract translation: 本公开描述了用于解码需要正确计时的音频信息的音频解码技术。根据本公开，可以调整解码音频输出中的音频样本的数量以补偿解码器时钟中的估计误差。也就是说，而不是调整解码器时钟以将解码器时钟同步到编码器时钟，本公开提议从解码的音频输出中添加或去除音频样本，以便确保解码的音频输出被正确定时。以这种方式，本公开的技术可以消除在解码设备处对可调节或可控制的时钟的需要，这可以节省成本和/或允许不包括可调节或可控时钟的传统设备来解码和输出音频信息，需要正确计时。

10.

发明申请
VOICE SIGNAL PROCESSING APPARATUS AND VOICE SIGNAL PROCESSING METHOD 审中-公开
Title translation: 语音信号处理设备和语音信号处理方法

公开(公告)号：US20160217805A1

公开(公告)日：2016-07-28

申请号：US14736289

申请日：2015-06-11

Applicant: Acer Incorporated

Inventor： Po-Jen Tu , Jia-Ren Chang , Kai-Meng Tzeng

IPC: G10L21/0332 , G10L21/01 , G10L21/0364 , G10L21/013

CPC classification number: H04R25/353 , G10L21/0364 , G10L2021/0575 , H04L27/227 , H04L27/233 , H04R2225/43

Abstract: A voice signal processing apparatus and a voice signal processing method are provided. Calculate a value of an interpolation parametric function corresponding to a sampling signal frame according to three consecutive sample values in the sampling signal frame, and calculate an interpolated value between two adjacent sampling points in a frequency-lowered signal frame according to the value of the interpolation parametric function.

Abstract translation: 提供语音信号处理装置和语音信号处理方法。根据采样信号帧中的三个连续采样值，计算与采样信号帧对应的插值参数函数的值，并根据插补值计算降频信号帧中两个相邻采样点之间的内插值参数函数。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification