Exploiting visual information for enhancing audio signals via source separation and beamforming

    公开(公告)号:US11295137B2

    公开(公告)日:2022-04-05

    申请号:US17086561

    申请日:2020-11-02

    Abstract: A system for exploiting visual information for enhancing audio signals via source separation and beamforming is disclosed. The system may obtain visual content associated with an environment of a user, and may extract, from the visual content, metadata associated with the environment. The system may determine a location of the user based on the extracted metadata. Additionally, the system may load, based on the location, an audio profile corresponding to the location of the user. The system may also load a user profile of the user that includes audio data associated with the user. Furthermore, the system may cancel, based on the audio profile and user profile, noise from the environment of the user. Moreover, the system may include adjusting, based on the audio profile and user profile, an audio signal generated by the user so as to enhance the audio signal during a communications session of the user.

    Real-time emotion tracking system
    2.
    发明授权
    Real-time emotion tracking system 有权
    实时情感跟踪系统

    公开(公告)号:US09355650B2

    公开(公告)日:2016-05-31

    申请号:US14703107

    申请日:2015-05-04

    CPC classification number: G10L25/63 G10L17/04 G10L17/26 G10L25/48

    Abstract: Devices, systems, methods, media, and programs for detecting an emotional state change in an audio signal are provided. A plurality of segments of the audio signal is received, with the plurality of segments being sequential. Each segment of the plurality of segments is analyzed, and, for each segment, an emotional state and a confidence score of the emotional state are determined. The emotional state and the confidence score of each segment are sequentially analyzed, and a current emotional state of the audio signal is tracked throughout each of the plurality of segments. For each segment, it is determined whether the current emotional state of the audio signal changes to another emotional state based on the emotional state and the confidence score of the segment.

    Abstract translation: 提供了用于检测音频信号中的情绪状态改变的设备,系统,方法,媒体和程序。 接收音频信号的多个段,其中多个段是顺序的。 分析多个片段中的每个片段,并且针对每个片段,确定情感状态的情绪状态和置信评分。 顺序地分析每个片段的情绪状态和置信度得分,并且在多个片段中的每一个片段跟踪音频信号的当前情绪状态。 对于每个片段,基于片段的情绪状态和置信度分数确定音频信号的当前情绪状态是否改变到另一情感状态。

    Real—time emotion tracking system
    3.
    发明授权
    Real—time emotion tracking system 有权
    实时情感跟踪系统

    公开(公告)号:US09047871B2

    公开(公告)日:2015-06-02

    申请号:US13712288

    申请日:2012-12-12

    CPC classification number: G10L25/63 G10L17/04 G10L17/26 G10L25/48

    Abstract: Devices, systems, methods, media, and programs for detecting an emotional state change in an audio signal are provided. A plurality of segments of the audio signal is received, with the plurality of segments being sequential. Each segment of the plurality of segments is analyzed, and, for each segment, an emotional state and a confidence score of the emotional state are determined. The emotional state and the confidence score of each segment are sequentially analyzed, and a current emotional state of the audio signal is tracked throughout each of the plurality of segments. For each segment, it is determined whether the current emotional state of the audio signal changes to another emotional state based on the emotional state and the confidence score of the segment.

    Abstract translation: 提供了用于检测音频信号中的情绪状态改变的设备,系统,方法,媒体和程序。 接收音频信号的多个段,其中多个段是顺序的。 分析多个片段中的每个片段,并且针对每个片段,确定情感状态的情绪状态和置信评分。 顺序地分析每个片段的情绪状态和置信度得分,并且在多个片段中的每一个片段跟踪音频信号的当前情绪状态。 对于每个片段,基于片段的情绪状态和置信度分数确定音频信号的当前情绪状态是否改变到另一情感状态。

    Augmented multi-tier classifier for multi-modal voice activity detection

    公开(公告)号:US09892745B2

    公开(公告)日:2018-02-13

    申请号:US13974453

    申请日:2013-08-23

    CPC classification number: G10L25/78 G06K9/00335 G10L15/24 G10L25/84

    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for detecting voice activity in a media signal in an augmented, multi-tier classifier architecture. A system configured to practice the method can receive, from a first classifier, a first voice activity indicator detected in a first modality for a human subject. Then, the system can receive, from a second classifier, a second voice activity indicator detected in a second modality for the human subject, wherein the first voice activity indicator and the second voice activity indicators are based on the human subject at a same time, and wherein the first modality and the second modality are different. The system can concatenate, via a third classifier, the first voice activity indicator and the second voice activity indicator with original features of the human subject, to yield a classifier output, and determine voice activity based on the classifier output.

    System and method for combining frame and segment level processing, via temporal pooling, for phonetic classification

    公开(公告)号:US09728183B2

    公开(公告)日:2017-08-08

    申请号:US14936772

    申请日:2015-11-10

    CPC classification number: G10L15/02 G10L15/08 G10L15/16

    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for combining frame and segment level processing, via temporal pooling, for phonetic classification. A frame processor unit receives an input and extracts the time-dependent features from the input. A plurality of pooling interface units generates a plurality of feature vectors based on pooling the time-dependent features and selecting a plurality of time-dependent features according to a plurality of selection strategies. Next, a plurality of segmental classification units generates scores for the feature vectors. Each segmental classification unit (SCU) can be dedicated to a specific pooling interface unit (PIU) to form a PIU-SCU combination. Multiple PIU-SCU combinations can be further combined to form an ensemble of combinations, and the ensemble can be diversified by varying the pooling operations used by the PIU-SCU combinations. Based on the scores, the plurality of segmental classification units selects a class label and returns a result.

    PRE-DISTORTION SYSTEM FOR CANCELLATION OF NONLINEAR DISTORTION IN MOBILE DEVICES

    公开(公告)号:US20180262623A1

    公开(公告)日:2018-09-13

    申请号:US15978592

    申请日:2018-05-14

    Abstract: A pre-distortion system for improved mobile device communications via cancellation of nonlinear distortion is disclosed. The pre-distortion system may transmit an acoustic signal from a network to a device, wherein the acoustic signal includes a linear signal and a nonlinear cancellation signal that cancels at least a portion of nonlinear distortions created once a loudspeaker in the device emits the linear signal. Thus, when a loudspeaker of a mobile device is operating and nonlinear distortions are generated by the loudspeaker or adjacent components of the mobile device in close proximity to the loudspeaker, the pre-distortion system may create one or more nonlinear cancellation signals in the network. The nonlinear cancellation signal may be combined with the linear signal sent to the loudspeaker to cancel the nonlinear distortion signal created by the loudspeaker emitting acoustic sounds from the linear signal. Thus, the nonlinear cancellation signal becomes a pre-distortion signal.

    PRE-DISTORTION SYSTEM FOR CANCELLATION OF NONLINEAR DISTORTION IN MOBILE DEVICES
    10.
    发明申请
    PRE-DISTORTION SYSTEM FOR CANCELLATION OF NONLINEAR DISTORTION IN MOBILE DEVICES 有权
    用于消除移动设备中非线性失真的预失真系统

    公开(公告)号:US20160140948A1

    公开(公告)日:2016-05-19

    申请号:US14543261

    申请日:2014-11-17

    CPC classification number: H04M9/082 G10L2021/02082

    Abstract: A pre-distortion system for improved mobile device communications via cancellation of nonlinear distortion is disclosed. The pre-distortion system may transmit an acoustic signal from a network to a device, wherein the acoustic signal includes a linear signal and a nonlinear cancellation signal that cancels at least a portion of nonlinear distortions created once a loudspeaker in the device emits the linear signal. Thus, when a loudspeaker of a mobile device is operating and nonlinear distortions are generated by the loudspeaker or adjacent components of the mobile device in close proximity to the loudspeaker, the pre-distortion system may create one or more nonlinear cancellation signals in the network. The nonlinear cancellation signal may be combined with the linear signal sent to the loudspeaker to cancel the nonlinear distortion signal created by the loudspeaker emitting acoustic sounds from the linear signal. Thus, the nonlinear cancellation signal becomes a pre-distortion signal.

    Abstract translation: 公开了一种通过消除非线性失真来改进移动设备通信的预失真系统。 预失真系统可以将声信号从网络传输到设备,其中声信号包括线性信号和非线性消除信号,其消除一旦设备中的扬声器发出线性信号时产生的至少一部分非线性失真 。 因此,当移动设备的扬声器正在操作并且非常靠近扬声器的扬声器或移动设备的相邻组件产生非线性失真时,预失真系统可以在网络中创建一个或多个非线性消除信号。 非线性消除信号可以与发送到扬声器的线性信号组合,以消除由线性信号发出声音的扬声器产生的非线性失真信号。 因此,非线性消除信号变为预失真信号。

Patent Agency Ranking