NEAR-END INDICATION THAT THE END OF SPEECH IS RECEIVED BY THE FAR END IN AN AUDIO OR VIDEO CONFERENCE
    1.
    发明申请
    NEAR-END INDICATION THAT THE END OF SPEECH IS RECEIVED BY THE FAR END IN AN AUDIO OR VIDEO CONFERENCE 审中-公开
    在音频或视频会议末尾接收到语音结束的最终指示

    公开(公告)号:WO2014052745A1

    公开(公告)日:2014-04-03

    申请号:PCT/US2013/062159

    申请日:2013-09-27

    Abstract: Embodiments of client device and method for audio or video conferencing are described. An embodiment includes an offset detecting unit, a configuring unit, an estimator and an output unit. The offset detecting unit detects an offset of speech input to the client device. The configuring unit determines a voice latency from the client device to every far end. The estimator estimates a time when a user at the far end perceives the offset based on the voice latency. The output unit outputs a perceivable signal indicating that a user at the far end perceives the offset based on the time estimated for the far end. The perceivable signal is helpful to avoid collision between parties.

    Abstract translation: 描述用于音频或视频会议的客户端设备和方法的实施例。 实施例包括偏移检测单元,配置单元,估计器和输出单元。 偏移检测单元检测输入到客户端设备的语音偏移。 配置单元确定从客户端设备到每个远端的语音延迟。 估计器估计远端用户根据语音延迟感知到偏移的时间。 输出单元输出可感知的信号,指示远端的用户基于为远端估计的时间感知偏移量。 可感知的信号有助于避免各方之间的冲突。

    SPEAKER IDENTIFICATION USING SPATIAL INFORMATION
    2.
    发明申请
    SPEAKER IDENTIFICATION USING SPATIAL INFORMATION 审中-公开
    使用空间信息的扬声器识别

    公开(公告)号:WO2016095218A1

    公开(公告)日:2016-06-23

    申请号:PCT/CN2014/094409

    申请日:2014-12-19

    Abstract: A method of speaker identification for audio content being of a format based on multiple channels is disclosed. The method comprises: extracting, from a first audio clip in the format, a plurality of spatial acoustic features across the multiple channels and location information, the first audio clip containing voices from a speaker (S201), constructing a first model for the speaker based on the spatial acoustic features and the location information, the first model indicating a characteristic of the voices from the speaker (S202), identifying whether the audio content contains voices from the speaker based on the first model (S203). Corresponding system and computer program product are also disclosed.

    Abstract translation: 公开了一种基于多个频道的音频内容的扬声器识别方法。 该方法包括:从格式的第一音频剪辑中提取跨多个频道和位置信息的多个空间声学特征,包含来自扬声器的语音的第一音频剪辑(S201),为基于扬声器的第一模型构建第一模型 在空间声学特征和位置信息上,第一模型指示来自说话者的声音的特征(S202),基于第一模型识别音频内容是否包含来自扬声器的语音(S203)。 还公开了相应的系统和计算机程序产品。

    POSITION-DEPENDENT HYBRID DOMAIN PACKET LOSS CONCEALMENT
    3.
    发明申请
    POSITION-DEPENDENT HYBRID DOMAIN PACKET LOSS CONCEALMENT 审中-公开
    位置相关的混合域包丢失隐藏

    公开(公告)号:WO2014052746A1

    公开(公告)日:2014-04-03

    申请号:PCT/US2013/062161

    申请日:2013-09-27

    CPC classification number: G10L19/005 G10L19/0017

    Abstract: The present document relates to audio signal processing in general, and to the concealment of artifacts that result from loss of audio packets during audio transmission over a packet-switched network, in particular. A method (200) for concealing one or more consecutive lost packets is described. A lost packet is a packet which is deemed to be lost by a transform-based audio decoder. Each of the one or more lost packets comprises a set of transform coefficients. A set of transform coefficients is used by the transform-based audio decoder to generate a corresponding frame of a time domain audio signal. The method (200) comprises determining (205) for a current lost packet of the one or more lost packets a number of preceding lost packets from the one or more lost packets; wherein the determined number is referred to as a loss position. Furthermore, the method comprises determining a packet loss concealment, referred to as PLC, scheme based on the loss position of the current packet; and determining (204, 207, 208) an estimate of a current frame of the audio signal using the determined PLC scheme (204, 207, 208); wherein the current frame corresponds to the current lost packet.

    Abstract translation: 本文件一般涉及音频信号处理,特别涉及在通过分组交换网络的音频传输期间由于音频分组丢失而导致的伪影的隐藏。 描述用于隐藏一个或多个连续丢失分组的方法(200)。 丢失的分组是被视为由基于变换的音频解码器丢失的分组。 一个或多个丢失分组中的每一个包括一组变换系数。 基于变换的音频解码器使用一组变换系数来生成时域音频信号的相应帧。 所述方法(200)包括:从所述一个或多个丢失分组确定(205)所述一个或多个丢失分组的当前丢失分组的若干先前丢失分组; 其中所确定的数量被称为损失位置。 此外,该方法包括基于当前分组的丢失位置确定称为PLC的分组丢失隐藏; 以及使用所确定的所述PLC方案(204,207,208)确定所述音频信号的当前帧的估计(204,207,208); 其中当前帧对应于当前丢失分组。

    SEARCHING THE RESULTS OF AN AUTOMATIC SPEECH RECOGNITION PROCESS
    4.
    发明申请
    SEARCHING THE RESULTS OF AN AUTOMATIC SPEECH RECOGNITION PROCESS 审中-公开
    搜索自动语音识别过程的结果

    公开(公告)号:WO2017020011A1

    公开(公告)日:2017-02-02

    申请号:PCT/US2016/044878

    申请日:2016-07-29

    Abstract: Various disclosed implementations involve searching the results of an automatic speech recognition (ASR) process, such an ASR process that has been performed on a recording of a monologue or of a conference, such as a teleconference or a video conference. An initial search query, including at least one search word, may be received. The initial search query may be analyzed according to phonetic similarity and semantic similarity. An expanded search query may be determined according to the phonetic similarity, the semantic similarity, or both the phonetic similarity and the semantic similarity. A search of the speech recognition results data may be performed according to the expanded search query. Some aspects of this disclosure involve playing back audio data that corresponds with such search results.

    Abstract translation: 各种公开的实现涉及搜索自动语音识别(ASR)过程的结果,所述ASR处理已经在独立或会议的记录(例如电话会议或视频会议)上执行。 可以接收包括至少一个搜索词的初始搜索查询。 可以根据语音相似性和语义相似性来分析初始搜索查询。 可以根据语音相似性,语义相似性或语音相似性和语义相似性来确定扩展搜索查询。 可以根据扩展的搜索查询来执行语音识别结果数据的搜索。 本公开的一些方面涉及回放与这种搜索结果相对应的音频数据。

    CONFERENCE SEARCHING AND PLAYBACK OF SEARCH RESULTS
    5.
    发明申请
    CONFERENCE SEARCHING AND PLAYBACK OF SEARCH RESULTS 审中-公开
    搜索结果会议搜索和回放

    公开(公告)号:WO2016126769A1

    公开(公告)日:2016-08-11

    申请号:PCT/US2016/016283

    申请日:2016-02-03

    Abstract: Various disclosed implementations involve processing and/or playback of a recording of a conference involving a plurality of conference participants. Some implementations disclosed herein involve receiving audio data corresponding to a recording of at least one conference involving a plurality of conference participants. The audio data may include conference participant speech data from multiple endpoints, recorded separately and/or conference participant speech data from a single endpoint corresponding to multiple conference participants and including spatial information for each conference participant of the multiple conference participants. A search of the audio data may be based on one or more search parameters. The search may be a concurrent search for multiple features of the audio data. Instances of conference participant speech may be rendered to at least two different virtual conference participant positions of a virtual acoustic space.

    Abstract translation: 各种公开的实现涉及对涉及多个会议参与者的会议的记录的处理和/或回放。 本文公开的一些实施方式涉及接收对应于涉及多个会议参与者的至少一个会议的记录的音频数据。 音频数据可以包括来自多个端点的会议参与者语音数据,分别记录和/或来自对应于多个会议参与者的单个端点的会议参与者语音数据,并且包括多个会议参与者的每个会议参与者的空间信息。 音频数据的搜索可以基于一个或多个搜索参数。 搜索可以是对音频数据的多个特征的并发搜索。 可以将会议参与者语音的实例呈现给虚拟声学空间的至少两个不同的虚拟会议参与者位置。

    HARMONICITY ESTIMATION, AUDIO CLASSIFICATION, PITCH DETERMINATION AND NOISE ESTIMATION
    6.
    发明申请
    HARMONICITY ESTIMATION, AUDIO CLASSIFICATION, PITCH DETERMINATION AND NOISE ESTIMATION 审中-公开
    谐波估计,音频分类,音调确定和噪声估计

    公开(公告)号:WO2013142652A2

    公开(公告)日:2013-09-26

    申请号:PCT/US2013/033232

    申请日:2013-03-21

    CPC classification number: G10L25/78 G10L25/18 G10L25/81 G10L25/84

    Abstract: Embodiments are described for harmonicity estimation, audio classification, pitch determination and noise estimation. Measuring harmonicity of an audio signal includes calculation a log amplitude spectrum of audio signal. A first spectrum is derived by calculating each component of the first spectrum as a sum of components of the log amplitude spectrum on frequencies. In linear frequency scale, the frequencies are odd multiples of the component's frequency of the first spectrum. A second spectrum is derived by calculating each component of the second spectrum as a sum of components of the log amplitude spectrum on frequencies. In linear frequency scale, the frequencies are even multiples of the component's frequency of the second spectrum. A difference spectrum is derived subtracting the first spectrum from the second spectrum. A measure of harmonicity is generated as a monotonically increasing function of the maximum component of the difference spectrum within predetermined frequency range.

    Abstract translation: 描述了用于谐度估计,音频分类,音调确定和噪声估计的实施例。 测量音频信号的谐度包括计算音频信号的对数幅度谱。 通过计算第一频谱的每个分量作为频率上对数幅度频谱的分量之和来导出第一频谱。 在线性频率标度中,频率是第一个频谱的分量频率的奇数倍。 通过计算第二频谱的每个分量作为频率上对数幅度谱的分量之和来导出第二频谱。 在线性频率标度中,频率甚至是第二频谱的频率成分的倍数。 从第二个光谱中减去第一个光谱导出差异光谱。 在预定的频率范围内,谐波的度量产生为差频谱的最大分量的单调递增函数。

    METHODS AND DEVICES FOR IMPROVEMENTS RELATING TO VOICE QUALITY ESTIMATION
    7.
    发明申请
    METHODS AND DEVICES FOR IMPROVEMENTS RELATING TO VOICE QUALITY ESTIMATION 审中-公开
    与语音质量估计相关的改进方法和设备

    公开(公告)号:WO2016103222A2

    公开(公告)日:2016-06-30

    申请号:PCT/IB2015/059962

    申请日:2015-12-23

    CPC classification number: H04M3/2236 G10L25/60 H04L43/04 H04L43/0829

    Abstract: This disclosure falls into the field of voice communication systems, more specifically it is related to the field of voice quality estimation in a packet based voice communication system. In particular the disclosure provides a method and device for 5 reducing a prediction error of the voice quality estimation by considering the content of lost packets. Furthermore, this disclosure provides a method and device which uses a voice quality estimating algorithm to calculate the voice quality estimate based on an input which is switchable between a first and a second input mode.

    Abstract translation: 本公开涉及语音通信系统领域,更具体地涉及基于分组的语音通信系统中的语音质量估计领域。 具体地,本公开提供了一种用于通过考虑丢失分组的内容来减少语音质量估计的预测误差的方法和装置。 此外,本公开提供了一种方法和装置,其使用语音质量估计算法基于可在第一和第二输入模式之间切换的输入来计算语音质量估计。

    PACKET LOSS CONCEALMENT APPARATUS AND METHOD, AND AUDIO PROCESSING SYSTEM
    8.
    发明申请
    PACKET LOSS CONCEALMENT APPARATUS AND METHOD, AND AUDIO PROCESSING SYSTEM 审中-公开
    分组丢失隐藏装置和方法以及音频处理系统

    公开(公告)号:WO2015003027A1

    公开(公告)日:2015-01-08

    申请号:PCT/US2014/045181

    申请日:2014-07-02

    CPC classification number: G10L19/005 G10L19/008 G10L19/0212 G10L19/167

    Abstract: The present application relates to packet loss concealment apparatus and method, and audio processing system. According to an embodiment, the packet loss concealment apparatus is provided for concealing packet losses in a stream of audio packets, each audio packet comprising at least one audio frame in transmission format comprising at least one monaural component and at least one spatial component. The packet loss concealment apparatus may comprises a first concealment unit for creating the at least one monaural component for a lost frame in a lost packet and a second concealment unit for creating the at least one spatial component for the lost frame. According to the embodiment, spatial artifacts such as incorrect angle and diffuseness may be avoided as far as possible in PLC for multi-channel spatial or sound field encoded audio signals.

    Abstract translation: 本申请涉及分组丢失隐藏装置和方法以及音频处理系统。 根据实施例,分组丢失隐藏设备被提供用于在音频分组流中隐藏分组丢失,每个音频分组包括至少一个包括至少一个单声道分量和至少一个空间分量的传输格式的音频帧。 分组丢失隐藏装置可以包括:用于为丢失分组中的丢失帧创建至少一个单声道分量的第一隐藏单元和用于为丢失帧创建至少一个空间分量的第二隐藏单元。 根据实施例,尽可能避免在多通道空间或声场编码音频信号的PLC中的空间假象,例如不正确的角度和扩散度。

    AUDIO PROCESSING APPARATUS AND AUDIO PROCESSING METHOD
    9.
    发明申请
    AUDIO PROCESSING APPARATUS AND AUDIO PROCESSING METHOD 审中-公开
    音频处理设备和音频处理方法

    公开(公告)号:WO2014099319A1

    公开(公告)日:2014-06-26

    申请号:PCT/US2013/072282

    申请日:2013-11-27

    CPC classification number: G10L15/20 G10L21/02 H04M3/568

    Abstract: An audio processing apparatus and an audio processing method are described. In one embodiment, the audio processing apparatus include an audio masker separator for separating from a first audio signal an audio material comprising a sound other than stationary noise and utterance meaningful in semantics, as an audio masker candidate. The apparatus also includes a first context analyzer for obtaining statistics regarding contextual information of detected audio masker candidates, and a masker library builder for building a masker library or updating an existing masker library by adding, based on the statistics, at least one audio masker candidate as an audio masker into the masker library, wherein audio maskers in the maker library are used to be inserted into a target position in a second audio signal to conceal defects in the second audio signal.

    Abstract translation: 描述音频处理装置和音频处理方法。 在一个实施例中,音频处理装置包括音频屏蔽分离器,用于将音频材料与第一音频信号分离,音频材料包括除固定噪声之外的声音和语义上有意义的语音,作为音频掩蔽者候选。 该装置还包括用于获得关于检测到的音频掩蔽者候选者的上下文信息的统计信息的第一上下文分析器,以及用于构建掩蔽程序库或通过基于统计信息添加至少一个音频掩码选择器来构建掩蔽程序库或更新现有掩蔽程序库的掩码程序库构建器 作为音频掩蔽器进入掩蔽器库,其中制造商库中的音频掩蔽器被用于插入第二音频信号中的目标位置以隐藏第二音频信号中的缺陷。

    ADJUSTING SPATIAL CONGRUENCY IN A VIDEO CONFERENCING SYSTEM
    10.
    发明申请
    ADJUSTING SPATIAL CONGRUENCY IN A VIDEO CONFERENCING SYSTEM 审中-公开
    调整视频会议系统中的空间协调

    公开(公告)号:WO2016081412A1

    公开(公告)日:2016-05-26

    申请号:PCT/US2015/060994

    申请日:2015-11-17

    CPC classification number: H04N7/147 H04L12/1827 H04N7/15 H04S2400/15

    Abstract: Example embodiments disclosed herein relate to spatial congruency adjustment. A method for adjusting spatial congruency in a video conference is disclosed. The method includes detecting spatial congruency between a visual scene captured by a video endpoint device and an auditory scene captured by an audio endpoint device that is positioned in relation to the video endpoint device, the spatial congruency being a degree of alignment between the auditory scene and the visual scene, comparing the detected spatial congruency with a predefined threshold and in response to the detected spatial congruency being below the threshold, adjusting the spatial congruency. Corresponding system and computer program products are also disclosed.

    Abstract translation: 本文公开的示例实施例涉及空间一致性调整。 公开了一种用于调整视频会议中的空间一致性的方法。 该方法包括检测由视频端点设备捕获的视觉场景与由相关于视频端点设备定位的音频端点设备捕获的听觉场景之间的空间一致性,空间一致性是听觉场景和 视觉场景,将检测到的空间一致性与预定阈值进行比较,并且响应于检测到的空间一致性低于阈值,调整空间一致性。 还公开了相应的系统和计算机程序产品。

Patent Agency Ranking