Method and apparatus for frame loss concealment in transform domain

    公开(公告)号:US09978400B2

    公开(公告)日:2018-05-22

    申请号:US14736499

    申请日:2015-06-11

    申请人: ZTE CORPORATION

    摘要: The present document discloses a method and apparatus for compensating for a lost frame in a transform domain, comprising: calculating frequency-domain coefficients of a current lost frame using frequency-domain coefficients of one or more frames prior to the current lost frame, and performing frequency-time transform to obtain an initially compensated signal; and performing waveform adjustment, to obtain a compensated signal. Alternatively, extrapolation is performed for all or part of frequency points of the current lost frame using phases and amplitudes of corresponding frequency points of a plurality of previous frames to obtain phases and amplitudes of the corresponding frequency points of the current lost frame, to obtain frequency-domain coefficients of the corresponding frequency points, and frequency-time transform is performed to obtain a compensated signal. The above methods can be selected through a judgment algorithm to compensate for the current lost frame, thereby achieving a better compensation effect.

    DETERMINING WHEN A SUBJECT IS SPEAKING BY ANALYZING A RESPIRATORY SIGNAL OBTAINED FROM A VIDEO

    公开(公告)号:US20170294193A1

    公开(公告)日:2017-10-12

    申请号:US15092287

    申请日:2016-04-06

    申请人: Xerox Corporation

    摘要: What is disclosed is a system and method for determining when a subject is speaking from a respiratory signal obtained from a video of that subject. A video of a subject is received and a respiratory signal is extracted from a time-series signal is obtained from processing pixels in image frames of the video. The respiratory signal comprises an inspiratory signal and an expiratory signal. Cycle-level feature are extracted from the respiratory signal and used to identify expiratory signals during which speech is likely to have occurred. The identified expiratory signal are divided into time intervals. Frame-level features are determined for each time interval and an amount of distortion in the expiratory signal for this time interval is quantified. The amount of distortion is compared to a threshold. In response to the comparison, a determination is made that speech occurred during this interval. The process repeats for all time intervals.

    SYSTEM AND METHOD TO PROVIDE CLASSIFICATION OF NOISE DATA OF HUMAN CROWD
    45.
    发明申请
    SYSTEM AND METHOD TO PROVIDE CLASSIFICATION OF NOISE DATA OF HUMAN CROWD 审中-公开
    提供人体噪声数据分类的系统和方法

    公开(公告)号:US20160307582A1

    公开(公告)日:2016-10-20

    申请号:US15101817

    申请日:2014-12-03

    IPC分类号: G10L25/51 G10L25/24 G10L25/09

    摘要: System(s) and method(s) for classifying noise data of human crowd are disclosed. Noise data is captured from one or more sources and features are extracted by using computation techniques. The features comprise spectral domain features and time domain features. Classification models are developed by using each of the spectral domain features and the time domain features. Discriminative information with respect to the noise data is extracted by using the classification models. A performance matrix is computed for each of the classification model. The performance matrix comprises classified noise elements with respect to the noise data. Each classified noise element is associated with a classification performance score with respect to a spectral domain feature, a time domain feature, and fusion of features and scores. The classified noise elements provide the classification of the noise data.

    摘要翻译: 披露了用于对人群噪声数据进行分类的系统和方法。 从一个或多个源捕获噪声数据,并通过使用计算技术提取特征。 这些特征包括频域特征和时域特征。 通过使用每个频域特征和时域特征来开发分类模型。 通过使用分类模型提取关于噪声数据的辨别信息。 为每个分类模型计算一个性能矩阵。 性能矩阵包括相对于噪声数据的分类噪声元素。 每个分类噪声元素与关于频域特征,时域特征以及特征和分数的融合的分类性能分数相关联。 分类噪声要素提供噪声数据的分类。

    User programmable voice command recognition based on sparse features
    46.
    发明授权
    User programmable voice command recognition based on sparse features 有权
    基于稀疏特征的用户可编程语音命令识别

    公开(公告)号:US09443508B2

    公开(公告)日:2016-09-13

    申请号:US14458688

    申请日:2014-08-13

    发明人: Bozhao Tan

    IPC分类号: G10L15/06 G10L15/02

    摘要: A low power sound recognition sensor is configured to receive an analog signal that may contain a signature sound. Sparse sound parameter information is extracted from the analog signal. The extracted sparse sound parameter information is processed using a speaker dependent sound signature database stored in the sound recognition sensor to identify sounds or speech contained in the analog signal. The sound signature database may include several user enrollments for a sound command each representing an entire word or multiword phrase. The extracted sparse sound parameter information may be compared to the multiple user enrolled signatures using cosine distance, Euclidean distance, correlation distance, etc., for example.

    摘要翻译: 低功率声音识别传感器被配置为接收可能包含签名声音的模拟信号。 从模拟信号中提取稀疏声音参数信息。 提取的稀疏声音参数信息使用存储在声音识别传感器中的与扬声器相关的声音签名数据库来处理,以识别包含在模拟信号中的声音或语音。 声音签名数据库可以包括用于每个表示整个单词或多个词短语的声音命令的几个用户注册。 提取的稀疏声音参数信息可以使用例如余弦距离,欧氏距离,相关距离等与多个用户登记的签名进行比较。

    SPEECH RECOGNITION APPARATUS AND SPEECH RECOGNITION METHOD
    47.
    发明申请
    SPEECH RECOGNITION APPARATUS AND SPEECH RECOGNITION METHOD 有权
    语音识别装置和语音识别方法

    公开(公告)号:US20160217787A1

    公开(公告)日:2016-07-28

    申请号:US14660886

    申请日:2015-03-17

    申请人: Acer Incorporated

    摘要: A speech recognition apparatus and a speech recognition method are provided. In the invention, whether an original voice sampling signal corresponding to a target voice frame is a consonant signal is determined according to at least one of a ratio of an energy of a low-pass sampling signal to an energy of the original voice sampling signal and a ratio value of an energy of a second consonant frequency band signal.

    摘要翻译: 提供语音识别装置和语音识别方法。 在本发明中,根据低通采样信号的能量与原始语音采样信号的能量的比率中的至少一个来确定对应于目标语音帧的原始语音采样信号是否是辅音信号,以及 第二辅音频带信号的能量的比值。

    Voice data playback speed conversion method and voice data playback speed conversion device
    48.
    发明授权
    Voice data playback speed conversion method and voice data playback speed conversion device 有权
    语音数据播放速度转换方式和语音数据播放速度转换装置

    公开(公告)号:US09361905B2

    公开(公告)日:2016-06-07

    申请号:US14763303

    申请日:2014-01-21

    摘要: The present invention addresses the problems of enabling a process of converting voice data playback speed even in a voice data playback device alone. The solution is a voice data playback speed conversion method and a voice data playback speed conversion device, comprising: a step of setting a reference zero cross point from any arbitrary zero cross point; a step of selecting a zero cross point temporally after the reference zero cross point within a first predetermined time range; a step of calculating a reference correlation function in a waveform from the reference zero cross point until a second predetermined time; and a step of calculating a correlation function in a waveform from a plurality of previously selected zero cross points until the second predetermined time, wherein a second reference zero cross point is the zero cross point of the waveform having a correlation function in which a concordance rate of the correlation value between the reference correlation function and the correlation function is the highest value, the difference between the reference zero cross point and the second reference zero cross point is calculated as a basic cycle, and the expansion and contraction of voice data is executed in basic cycle units so as to perform a process of converting the playback speed of the voice data.

    摘要翻译: 本发明解决了即使在语音数据重放设备中也能转换语音数据播放速度的处理的问题。 解决方案是语音数据回放速度转换方法和语音数据重放速度转换装置,包括:从任意零交叉点设置参考零交叉点的步骤; 在第一预定时间范围内在参考零交叉点之后临时选择零交叉点的步骤; 计算从参考零交叉点到第二预定时间的波形中的参考相关函数的步骤; 以及从多个先前选择的零交叉点到第二预定时间计算波形中的相关函数的步骤,其中第二参考零交叉点是具有相关函数的波形的零交叉点,其中一致率 参考相关函数与相关函数之间的相关值是最大值,将基准零交叉点和第二参考零交叉点之间的差作为基本周期计算,并执行语音数据的扩展和缩小 以基本周期为单位进行转换声音数据的播放速度的处理。

    METHOD, TERMINAL, SYSTEM FOR AUDIO ENCODING/DECODING/CODEC
    49.
    发明申请
    METHOD, TERMINAL, SYSTEM FOR AUDIO ENCODING/DECODING/CODEC 有权
    方法,终端,音频编码/解码/编解码系统

    公开(公告)号:US20150127356A1

    公开(公告)日:2015-05-07

    申请号:US14596753

    申请日:2015-01-14

    IPC分类号: G10L19/02

    摘要: Audio encoding methods/terminals, audio decoding methods/terminals, and audio codec systems are provided. A plurality of audio signals that are continuous is obtained. It is determined whether each audio signal of the plurality of audio signals includes a designated signal type, according to an audio parameter of each audio signal. A marked audio encoding stream is obtained by performing a marking to each audio signal as having or not having the designated signal type. The marking is used, at a decoding terminal, to perform an enhancement-process to one or more audio signals having the designated signal type. The enhancement-process is not performed to audio signals that do not have the designated signal type.

    摘要翻译: 提供音频编码方法/终端,音频解码方法/终端和音频编解码器系统。 获得连续的多个音频信号。 根据每个音频信号的音频参数确定多个音频信号中的每个音频信号是否包括指定的信号类型。 通过对每个音频信号执行标记以获得具有或不具有指定信号类型的标记音频编码流。 在解码终端处使用标记对具有指定信号类型的一个或多个音频信号执行增强处理。 对不具有指定信号类型的音频信号不执行增强处理。

    Audio signal processing apparatus, audio signal processing method, and program
    50.
    发明授权
    Audio signal processing apparatus, audio signal processing method, and program 有权
    音频信号处理装置,音频信号处理方法和程序

    公开(公告)号:US08971549B2

    公开(公告)日:2015-03-03

    申请号:US13179721

    申请日:2011-07-11

    申请人: Toshiyuki Sekiya

    发明人: Toshiyuki Sekiya

    摘要: The present disclosure provides a audio signal processing apparatus including, an amplitude detector configured to detect a noise start point of an audio signal including a noise signal by comparing an amplitude value of the audio signal with a threshold value, a frequency feature calculator configured to calculate a frequency feature representing at least a frequency characteristic of the audio signal after the noise start point, and a noise determiner configured to determine a leg continuously including high-frequency components equal to or higher than a reference frequency in the audio signal after the noise start point as a noise leg based on the frequency feature.

    摘要翻译: 本公开提供一种音频信号处理装置,其包括:振幅检测器,被配置为通过将音频信号的振幅值与阈值进行比较来检测包括噪声信号的音频信号的噪声起始点;频率特征计算器,被配置为计算 频率特征,表示噪声开始点之后的音频信号的至少频率特性,以及噪声判定器,被配置为连续地确定包括等于或高于噪声开始后的音频信号中的参考频率的高频分量的支路 作为基于频率特征的噪音腿。