-
101.
公开(公告)号:US12051440B1
公开(公告)日:2024-07-30
申请号:US18591497
申请日:2024-02-29
发明人: Weijun Pan , Yidi Wang , Qinghai Zuo , Xuan Wang , Rundong Wang , Tian Luan , Jian Zhang , Zixuan Wang , Peiyuan Jiang , Qianlan Jiang
CPC分类号: G10L25/60 , G08G5/0095 , G10L21/0388 , G10L25/18 , G10L25/21 , G10L25/30 , G10L25/93 , G10L2025/937
摘要: Disclosed are a self-attention-based speech quality measuring method and system for real-time air traffic control, including following steps: acquiring real-time air traffic control speech data and generating speech information frames; detecting the speech information frames, discarding unvoiced information frames of the speech information frames, generating a voiced long speech information frame; performing mel spectrogram conversion, attention extraction and feature fusion on the long speech information frame to obtain a predicted mos value.
-
公开(公告)号:US11996121B2
公开(公告)日:2024-05-28
申请号:US17644363
申请日:2021-12-15
摘要: A method, computer system, and a computer program product for detecting face mask usage based on a crowd sound is provided. The present invention may include capturing an audio stream including a crowd voice data. The present invention may also include analyzing the crowd voice data using a machine learning model to determine an amount of people wearing masks. The present invention may further include in response to determining that the amount of people wearing masks does not meet a compliance threshold, displaying a content to promote face mask usage.
-
公开(公告)号:US11947924B2
公开(公告)日:2024-04-02
申请号:US18369742
申请日:2023-09-18
申请人: VoyagerX, Inc.
发明人: Hyeonsoo Oh , Sedong Nam
IPC分类号: G10L15/00 , G06F40/47 , G10L15/05 , G10L15/22 , G10L15/26 , G10L25/57 , G10L25/93 , G11B27/34 , H04N21/488 , H04N21/8547
CPC分类号: G06F40/47 , G10L15/05 , G10L15/22 , G10L15/26 , G10L25/57 , G10L25/93 , G11B27/34 , H04N21/4884 , H04N21/8547
摘要: The present disclosure relates to systems and methods for providing subtitle for a video. The video's audio is transcribed to obtain caption text for the video. A first machine-trained model identifies sentences in the caption text. A second model identifies intra-sentence breaks with in the sentences identified using the first machine-trained model. Based on the identified sentences and intra-sentence breaks, one or more words in the caption text are grouped into a clip caption to be displayed for a corresponding clip of the video.
-
104.
公开(公告)号:US20240105214A1
公开(公告)日:2024-03-28
申请号:US18158773
申请日:2023-01-24
发明人: Tsutomu UDAKA , Minoru Akiyama
摘要: An information processing apparatus includes a processor configured to: acquire first data indicative of a temporal change of an intensity of sound emitted by an apparatus; generate second data by extracting, from the first data, a maximum value in each section of a time width corresponding to temporal resolution at which human voice is unrecognizable and discarding values other than the maximum value; and transmit the second data to an external apparatus.
-
105.
公开(公告)号:US20240062902A1
公开(公告)日:2024-02-22
申请号:US18364078
申请日:2023-08-02
摘要: An electronic device and method for chronic pulmonary disease prediction from audio input based on short-winded breath determination using artificial intelligence is disclosed. The electronic device receives an audio input associated with a user. The electronic device applies an Artificial Intelligence (AI) model to detect a short-winded breath duration that corresponds to a time duration between an end of a first spoken word and a start of a second spoken word succeeding the first spoken word. The electronic device detects a speaking pattern. The electronic device applies a Recurrent neural network (RNN) model to reconstruct a set of short-winded breath audio samples. The electronic device generates an audio sample dataset and a set of audio features. The electronic device applies a modular neural network model on the generated audio sample dataset and on the generated set of audio features to determine a set of chronic obstructive pulmonary disease (COPD) metrics.
-
公开(公告)号:US20240057936A1
公开(公告)日:2024-02-22
申请号:US18271416
申请日:2022-01-12
CPC分类号: A61B5/4803 , G10L25/18 , G10L25/21 , G10L25/66 , G10L25/93 , G10L25/90 , G10L15/05 , G10L2025/937
摘要: Methods of assessing the pathological and/or physiological state of a subject, methods of monitoring a subject with heart failure or a subject that has been diagnosed as having or being at risk of having a condition associated with dyspnea and/or fatigue, and methods of diagnosing a subject as having decompensated heart failure are provided. The methods comprise obtaining a voice recording from a word-reading test from the subject, wherein the voice recording is from a word-reading test comprising reading a sequence of words drawn from a set of n words and analysing the voice recording, or a portion thereof. The analysing can comprise identifying a plurality of segments of the voice recording that correspond to single words or syllables; determining the value of one or more metrics selected from the breathing %, unvoicing/voicing ratio, voice pitch and correct word rate at least in part based on the identified segments; and comparing the value of the one or more metrics with one or more respective reference values. Related systems and products are also described.
-
公开(公告)号:US11894017B2
公开(公告)日:2024-02-06
申请号:US17628467
申请日:2019-07-25
发明人: Ryo Masumura , Takanobu Oba , Kiyoaki Matsui
IPC分类号: G10L25/93 , G10L25/78 , G10L15/00 , G10L15/02 , G10L21/0208 , G06N20/20 , G06N3/044 , G06N3/09 , G10L17/00 , G10L25/84
CPC分类号: G10L25/93 , G06N20/20 , G10L15/00 , G10L15/02 , G10L21/0208 , G10L25/78 , G06N3/044 , G06N3/09 , G10L17/00 , G10L25/84 , G10L2015/025
摘要: A voice/non-voice determination device robust with respect to an acoustic signal in a high-noise environment is provided. The voice/non-voice determination device includes an acoustic scene classification unit including a first model which receives input of an acoustic signal and outputs acoustic scene information which is information regarding a scene where the acoustic signal is collected, a speech enhancement unit including a second model which receives input of the acoustic signal and outputs speech enhancement information which is information regarding the acoustic signal after enhancement, and a voice/non-voice determination unit including a third model which receives input of the acoustic signal, the acoustic scene information and the speech enhancement information and outputs a voice/non-voice label which is information regarding a label of either a speech section or a non-speech section.
-
108.
公开(公告)号:US11881228B2
公开(公告)日:2024-01-23
申请号:US17121179
申请日:2020-12-14
IPC分类号: G10L19/12 , G10L19/06 , G10L19/07 , G10L19/083 , G10L19/20 , G10L19/032 , G10L25/93 , G10L19/00
CPC分类号: G10L19/12 , G10L19/032 , G10L19/06 , G10L19/07 , G10L19/083 , G10L19/20 , G10L25/93 , G10L2019/0016
摘要: According to an aspect of the present invention an encoder for encoding an audio signal has an analyzer configured for deriving prediction coefficients and a residual signal from a frame of the audio signal. The encoder has a formant information calculator configured for calculating a speech related spectral shaping information from the prediction coefficients, a gain parameter calculator configured for calculating a gain parameter from an unvoiced residual signal and the spectral shaping information and a bitstream former configured for forming an output signal based on an information related to a voiced signal frame, the gain parameter or a quantized gain parameter and the prediction coefficients.
-
公开(公告)号:US11881221B2
公开(公告)日:2024-01-23
申请号:US17810032
申请日:2022-06-30
申请人: The Notebook, LLC
IPC分类号: G10L15/22 , G10L25/90 , G10L25/24 , G10L25/78 , G10L15/18 , G10L25/66 , G10L25/93 , A61B5/00 , B60K28/06 , A61B5/16 , G06V40/16
CPC分类号: G10L15/22 , A61B5/165 , A61B5/4803 , B60K28/06 , G06V40/165 , G06V40/167 , G06V40/171 , G06V40/174 , G10L15/1815 , G10L15/1822 , G10L25/24 , G10L25/66 , G10L25/78 , G10L25/90 , G10L25/93 , G10L2015/223 , G10L2015/227
摘要: Systems and methods are disclosed. A digitized human vocal expression of a user and digital images are received over a network from a remote device. The digitized human vocal expression is processed to determine characteristics of the human vocal expression, including: pitch, volume, rapidity, a magnitude spectrum identify, and/or pauses in speech. Digital images are received and processed to detect characteristics of the user face, including detecting if any of the following is present: a sagging lip, a crooked smile, uneven eyebrows, and/or facial droop. Using the human vocal expression characteristics and face characteristics, a determination is made as to what action is to be taken. A cepstrum pitch may be determined using an inverse Fourier transform of a logarithm of a spectrum of a human vocal expression signal. The volume may be determined using peak heights in a power spectrum of the human vocal expression.
-
公开(公告)号:US11880633B2
公开(公告)日:2024-01-23
申请号:US17602870
申请日:2020-04-14
发明人: Toru Ogiso
CPC分类号: G06F3/165 , G10L25/93 , G10L2025/935
摘要: An information processing apparatus is connected to a peripheral apparatus that includes sound inputting means for outputting a sound signal representative of sound of surroundings. The information processing apparatus performs control such that, in a case where sound input is required in processing of an application determined in advance, in a state in which a sound signal accepted from the peripheral apparatus is cut off, the sound signal accepted from the peripheral apparatus is accepted and the sound signal is used only in the processing of the application determined in advance.
-
-
-
-
-
-
-
-
-