Multi-Microphone Speech Recognition Systems and Related Techniques

    公开(公告)号:US20160358606A1

    公开(公告)日:2016-12-08

    申请号:US14732711

    申请日:2015-06-06

    Applicant: Apple Inc.

    CPC classification number: G10L15/32 G10L15/20

    Abstract: A speech recognition system for resolving impaired utterances can have a speech recognition engine configured to receive a plurality of representations of an utterance and concurrently to determine a plurality of highest-likelihood transcription candidates corresponding to each respective representation of the utterance. The recognition system can also have a selector configured to determine a most-likely accurate transcription from among the transcription candidates. As but one example, the plurality of representations of the utterance can be acquired by a microphone array, and beamforming techniques can generate independent streams of the utterance across various look directions using output from the microphone array.

    Echo cancellation and control for microphone beam patterns
    32.
    发明授权
    Echo cancellation and control for microphone beam patterns 有权
    麦克风波束模式的回波消除和控制

    公开(公告)号:US09516409B1

    公开(公告)日:2016-12-06

    申请号:US14714023

    申请日:2015-05-15

    Applicant: Apple Inc.

    CPC classification number: H04B3/23 H04R1/403 H04R1/406 H04R3/02 H04R2430/23

    Abstract: Systems and methods for controlling echo in audio communications between a near-end system and a far-end system are described. The system and method may intelligently assign a plurality of microphone beams to a limited number of echo cancellers for processing. The microphone beams may be classified based on generated statistics to determine beams of interest (e.g., beams with a high ratio of local-voice to echo). Based on this ranking/classification of microphone beams, beams of greater interest may be assigned to echo cancellers while less important beams may temporally remain unprocessed until these beams become of higher importance/interest. Accordingly, a limited number of echo cancellers may be used to intelligently process a larger number of microphone beams based on interest in the beams and properties of echo cancellation performed for each beam.

    Abstract translation: 描述了用于控制近端系统和远端系统之间的音频通信中回波的系统和方法。 该系统和方法可智能地将多个麦克风束分配给有限数量的回波消除器进行处理。 麦克风束可以基于生成的统计来分类,以确定感兴趣的波束(例如,具有高比例的本地话音到回波的波束)。 基于麦克风束的这种排序/分类,可以将更感兴趣的波束分配给回波消除器,而不太重要的波束可以暂时保持未处理,直到这些波束变得更高的重要性/兴趣。 因此,有限数量的回波消除器可以用于基于对光束的兴趣和针对每个光束执行的回波消除的特性来智能处理更大数量的麦克风束。

    System and method of double talk detection with acoustic echo and noise control
    33.
    发明授权
    System and method of double talk detection with acoustic echo and noise control 有权
    双声道检测与声学回声和噪声控制的系统和方法

    公开(公告)号:US09516159B2

    公开(公告)日:2016-12-06

    申请号:US14880824

    申请日:2015-10-12

    Applicant: Apple Inc.

    Abstract: System of improving sound quality includes loudspeaker, microphone, accelerometer, acoustic-echo-cancellers (AEC), and double-talk detector (DTD). Loudspeaker outputs loudspeaker signal including downlink audio signal from far-end speaker. Microphone generates microphone uplink signal and receives at least one of: near-end speaker, ambient noise, and loudspeaker signals. Accelerometer generates accelerometer-uplink signal and receives at least one of: near-end speaker, ambient noise, and loudspeaker signals. First AEC receives downlink audio, microphone-uplink and double talk control signals, and generates AEC-microphone linear echo estimate and corrected AEC-microphone uplink signal. Second AEC receives downlink audio, accelerometer uplink and double talk control signals, and generates AEC-accelerometer linear echo estimate and corrected AEC-accelerometer uplink signal. DTD receives downlink audio signal, uplink signals, corrected uplink signals, linear echo estimates, and generates double-talk control signal. Uplink audio signal including at least one of corrected microphone-uplink signal and corrected accelerometer-uplink signal is generated. Other embodiments are described.

    Abstract translation: 提高音质的系统包括扬声器,麦克风,加速度计,声回波消除器(AEC)和双通话检测器(DTD)。 扬声器输出包括来自远端扬声器的下行音频信号的扬声器信号。 麦克风产生麦克风上行链路信号,并接收以下至少一个:近端扬声器,环境噪声和扬声器信号。 加速度计产生加速度计 - 上行链路信号并接收以下至少一种:近端扬声器,环境噪声和扬声器信号。 第一AEC接收下行音频,麦克风上行和双通话控制信号,并生成AEC麦克风线性回波估计和校正的AEC麦克风上行信号。 第二个AEC接收下行音频,加速度计上行链路和双通道控制信号,并生成AEC加速度计线性回波估计和校正的AEC加速度计上行信号。 DTD接收下行链路音频信号,上行链路信号,校正的上行链路信号,线性回波估计,并产生双向对话控制信号。 产生包括校正的麦克风 - 上行链路信号和校正的加速度计 - 上行链路信号中的至少一个的上行链路音频信号。 描述其他实施例。

    SPATIAL AUDIO CONTROLLER
    37.
    发明申请

    公开(公告)号:US20220394407A1

    公开(公告)日:2022-12-08

    申请号:US17339864

    申请日:2021-06-04

    Applicant: Apple Inc.

    Abstract: A method performed a local device that is communicatively coupled with several remote devices, the method includes: receiving, from each remote device with which the local device is engaged in a communication session, an input audio stream; receiving, for each remote device, a set parameters; determining, for each input audio stream, whether the input audio stream is to be 1) rendered individually or 2) rendered as a mix of input audio streams based on the set of parameters; for each input audio stream that is determined to be rendered individually, spatial rendering the input audio stream as an individual virtual sound source that contains only that input audio stream; and for input audio streams that are determined to be rendered as the mix of input audio streams, spatial rendering the mix of input audio streams as a single virtual sound source that contains the mix of input audio streams.

    Speech model-based neural network-assisted signal enhancement

    公开(公告)号:US10381020B2

    公开(公告)日:2019-08-13

    申请号:US15625966

    申请日:2017-06-16

    Applicant: Apple Inc.

    Abstract: Several embodiments of a digital speech signal enhancer are described that use an artificial neural network that produces clean speech coding parameters based on noisy speech coding parameters as its input features. A vocoder parameter generator produces the noisy speech coding parameters from a noisy speech signal. A vocoder model generator processes the clean speech coding parameters into estimated clean speech spectral magnitudes. In one embodiment, a magnitude modifier modifies an original frequency spectrum of the noisy speech signal using the estimated clean speech spectral magnitudes, to produce an enhanced frequency spectrum, and a synthesis block converts the enhanced frequency spectrum into time domain, as an output speech sequence. Other embodiments are also described.

    Audio adaptation to room
    40.
    发明授权

    公开(公告)号:US10244314B2

    公开(公告)日:2019-03-26

    申请号:US15636967

    申请日:2017-06-29

    Applicant: Apple Inc.

    Abstract: An audio system includes one or more loudspeaker cabinets, each having loudspeakers. The system outputs an omnidirectional sound pattern to determine the acoustic environment. Sensing logic determines an acoustic environment of the loudspeaker cabinets. The sensing logic may include an echo canceller. A playback mode processor adjusts an audio program according to a playback mode determined from the acoustic environment of the audio system. The system may produce a directional pattern superimposed on an omnidirectional pattern, if the acoustic environment is in free space. The system may aim ambient content toward a wall and direct content away from the wall, if the acoustic environment is not in free space. The sensing logic automatically determines the acoustic environment upon initial power up and when position changes of loudspeaker cabinets are detected. Accelerometers may detect position changes of the loudspeaker cabinets.

Patent Agency Ranking