-
公开(公告)号:US12086564B2
公开(公告)日:2024-09-10
申请号:US17539182
申请日:2021-11-30
Applicant: SoundHound, Inc.
Inventor: Dylan H. Ross
IPC: G10L15/18 , G06F40/56 , G06F40/58 , G10L15/06 , G10L19/125 , G10L19/26 , G10L21/013
CPC classification number: G06F40/56 , G06F40/58 , G10L15/06 , G10L15/18 , G10L19/125 , G10L19/265 , G10L21/013 , G10L2021/0135
Abstract: A system and method for masking an identity of a speaker of natural language speech, such as speech clips to be labeled by humans in a system generating voice transcriptions for training an automatic speech recognition model. The natural language speech is morphed prior to being presented to the human for labeling. In one embodiment, morphing comprises pitch shifting the speech randomly either up or down, then frequency shifting the speech, then pitch shifting the speech in a direction opposite the first pitch shift. Labeling the morphed speech comprises at least one or more of transcribing the morphed speech, identifying a gender of the speaker, identifying an accent of the speaker, and identifying a noise type of the morphed speech.
-
公开(公告)号:US11978092B2
公开(公告)日:2024-05-07
申请号:US17518109
申请日:2021-11-03
Applicant: Spotify AB
Inventor: Lu Han , Rachel M. Bittner
IPC: G06Q30/0241 , G10L19/00 , G10L21/013 , G10L21/0232 , G06F3/16
CPC classification number: G06Q30/0276 , G10L19/00 , G10L21/013 , G10L21/0232 , G06F3/165 , G10L2021/0135
Abstract: A call to action processor receives an entity datapoint containing data related to an entity, a campaign objective datapoint containing data associated with a campaign objective, at least one definite script element based on the campaign objective, and entity metadata containing data associated with the entity. The call to action further performs generating at least one variable script element based on the entity metadata, presenting to a device the at least one definite script element the at least one variable script element.
-
公开(公告)号:US20240119954A1
公开(公告)日:2024-04-11
申请号:US18528244
申请日:2023-12-04
Applicant: Modulate, Inc.
Inventor: William Carter Huffman , Michael Pappas
IPC: G10L21/013 , G10L15/02 , G10L15/06 , G10L15/22 , G10L19/018
CPC classification number: G10L21/013 , G10L15/02 , G10L15/063 , G10L15/22 , G10L19/018 , G10L2015/025 , G10L2021/0135 , G10L25/30
Abstract: A method of building a new voice having a new timbre using a timbre vector space includes receiving timbre data filtered using a temporal receptive field. The timbre data is mapped in the timbre vector space. The timbre data is related to a plurality of different voices. Each of the plurality of different voices has respective timbre data in the timbre vector space. The method builds the new timbre using the timbre data of the plurality of different voices using a machine learning system.
-
公开(公告)号:US11798526B2
公开(公告)日:2023-10-24
申请号:US17653005
申请日:2022-03-01
Applicant: Google LLC
Inventor: Ioannis Agiomyrgiannakis , Fergus James Henderson
IPC: G10L13/10 , G10L13/033 , G06F3/16 , G10L21/013
CPC classification number: G10L13/033 , G06F3/167 , G10L13/10 , G10L2021/0135
Abstract: A device may identify a plurality of sources for outputs that the device is configured to provide. The plurality of sources may include at least one of a particular application in the device, an operating system of the device, a particular area within a display of the device, or a particular graphical user interface object. The device may also assign a set of distinct voices to respective sources of the plurality of sources. The device may also receive a request for speech output. The device may also select a particular source that is associated with the requested speech output. The device may also generate speech having particular voice characteristics of a particular voice assigned to the particular source.
-
公开(公告)号:US20230317090A1
公开(公告)日:2023-10-05
申请号:US18043105
申请日:2022-06-01
Applicant: DWANGO CO., LTD.
Inventor: Kazuyuki HIROSHIBA , Yuri ODAGIRI , Shinya KITAOKA
IPC: G10L21/013 , G10L21/04 , G10L15/02
CPC classification number: G10L21/013 , G10L21/04 , G10L15/02 , G10L2021/0135 , G10L2015/025
Abstract: A voice conversion apparatus includes: an input unit that inputs designation of a conversion destination voice; an extraction unit that analyzes a voice signal of a conversion source voice and extracts time series data including a phoneme and a pitch; an adjustment unit that matches a height of the pitch to a height of the designated conversion destination voice; and a generation unit that inputs the phoneme and the pitch to a deep learning model that learns voice data of many people and is capable of synthesizing a designated person's voice in time-series order, and generates a voice signal obtained by synthesizing the designated conversion destination voice.
-
公开(公告)号:US20230197093A1
公开(公告)日:2023-06-22
申请号:US17558580
申请日:2021-12-21
Applicant: Adobe Inc. , Northwestern University
Inventor: Maxwell Morrison , Juan Pablo Caceres Chomali , Zeyu Jin , Nicholas Bryan , Bryan A. Pardo
IPC: G10L21/013 , G10L15/02 , G10L15/18 , G10L25/90 , G10L25/30 , G10L19/028 , G10L19/032 , G10L21/04 , G10L25/24 , G10L15/06
CPC classification number: G10L21/013 , G10L15/02 , G10L15/1807 , G10L25/90 , G10L25/30 , G10L19/028 , G10L19/032 , G10L21/04 , G10L25/24 , G10L15/063 , G10L2021/0135
Abstract: Methods for modifying audio data include operations for accessing audio data having a first prosody, receiving a target prosody differing from the first prosody, and computing acoustic features representing samples. Computing respective acoustic features for a sample includes computing a pitch feature as a quantized pitch value of the sample by assigning a pitch value, of the target prosody or the audio data, to at least one of a set of pitch bins having equal widths in cents. Computing the respective acoustic features further includes computing a periodicity feature from the audio data. The respective acoustic features for the sample include the pitch feature, the periodicity feature, and other acoustic features. A neural vocoder is applied to the acoustic features to pitch-shift and time-stretch the audio data from the first prosody toward the target prosody.
-
公开(公告)号:US20180144737A1
公开(公告)日:2018-05-24
申请号:US15874051
申请日:2018-01-18
Applicant: Google LLC
Inventor: Ioannis Agiomyrgiannakis , Fergus James Henderson
IPC: G10L13/033 , G06F3/16 , G10L21/013
CPC classification number: G10L13/033 , G06F3/167 , G10L13/10 , G10L2021/0135
Abstract: A device may identify a plurality of sources for outputs that the device is configured to provide. The plurality of sources may include at least one of a particular application in the device, an operating system of the device, a particular area within a display of the device, or a particular graphical user interface object. The device may also assign a set of distinct voices to respective sources of the plurality of sources. The device may also receive a request for speech output. The device may also select a particular source that is associated with the requested speech output. The device may also generate speech having particular voice characteristics of a particular voice assigned to the particular source.
-
公开(公告)号:US09870769B2
公开(公告)日:2018-01-16
申请号:US14955311
申请日:2015-12-01
Applicant: International Business Machines Corporation
Inventor: Su Liu , Yi Liu , Cheng Xu , Shi Lei Zhang
CPC classification number: G10L15/187 , G10L15/02 , G10L15/063 , G10L15/075 , G10L15/26 , G10L21/003 , G10L2015/022 , G10L2015/0635 , G10L2021/0135
Abstract: A method comprising receiving an audio input signal comprising speech, determining an accent class corresponding to the speech, identifying an accented phone pattern within the speech, replacing the accented phone pattern with an unaccented phone pattern, and generating an unaccented output signal from the unaccented phone pattern.
-
公开(公告)号:US20170345444A1
公开(公告)日:2017-11-30
申请号:US15496900
申请日:2017-04-25
Inventor: Toshimichi Tokuda
IPC: G10L21/043 , G10L25/78 , G10L21/038 , G10L21/057 , G10L21/0364
CPC classification number: G10L21/043 , G10L21/0364 , G10L21/038 , G10L21/057 , G10L25/78 , G10L25/90 , G10L2021/0135 , G10L2025/783 , H04M1/6016 , H04M1/642 , H04M1/6505
Abstract: In a communication apparatus, an encoder compresses telephone call voice which is transmitted from another communication apparatus. A voice accumulator preserves the telephone call voice, which is compressed by the encoder, as a message. A decoder expands the telephone call voice which is preserved in the voice accumulator. A signal memory temporarily maintains the telephone call voice which is expanded by the decoder. A speech speed convertor performs speech speed conversion on the telephone call voice, which is read from the signal memory, and outputs resulting voice from a speaker. A memory monitor temporarily stops to expand the telephone call voice in the decoder in a case where the memory monitor determines that an idle capacity of the signal memory approaches a predetermined lower limit value.
-
公开(公告)号:US09830903B2
公开(公告)日:2017-11-28
申请号:US14757028
申请日:2015-11-10
Applicant: Paul Wendell Mason
Inventor: Paul Wendell Mason
IPC: G10L13/00 , G10L13/027 , G10L13/033 , G10L13/04 , G10L21/007 , G10L25/48
CPC classification number: G10L13/027 , G10L13/0335 , G10L13/043 , G10L21/007 , G10L25/48 , G10L2021/0135
Abstract: Apparatus and methods consistent with the present invention measure one or more of the characteristics of a voice recording and use such measurements to create a synthetic voice that approximates the recorded voice and uses such created synthetic voice to verbalize the content of an electronically conveyed written message such as an SMS text message. The vocal characteristics measured may include frequency, timbre, intensity, rhythm, and rate of speech as well as others.
-
-
-
-
-
-
-
-
-