Noise suppressor
    41.
    发明授权

    公开(公告)号:US12080316B2

    公开(公告)日:2024-09-03

    申请号:US18168840

    申请日:2023-02-14

    Applicant: Spotify AB

    Abstract: Apparatus, methods and computer-readable medium are provided for processing wind noise. Audio input is processed by receiving an audio input. A wind noise level representative of a wind noise at the microphone array is measured using the audio input and a determination is made, based on the wind noise level, whether to perform either (i) a wind noise suppression process on the audio input on-device, or (ii) the wind noise suppression process on the audio input on-device and an audio reconstruction process in-cloud.

    Training and testing utterance-based frameworks

    公开(公告)号:US11887582B2

    公开(公告)日:2024-01-30

    申请号:US17173659

    申请日:2021-02-11

    Applicant: Spotify AB

    Inventor: Daniel Bromand

    Abstract: Systems, methods, and devices for training and testing utterance based frameworks are disclosed. The training and testing can be conducting using synthetic utterance samples in addition to natural utterance samples. The synthetic utterance samples can be generated based on a vector space representation of natural utterances. In one method, a synthetic weight vector associated with a vector space is generated. An average representation of the vector space is added to the synthetic weight vector to form a synthetic feature vector. The synthetic feature vector is used to generate a synthetic voice sample. The synthetic voice sample is provided to the utterance-based framework as at least one of a testing or training sample.

    SERVER-BASED FALSE WAKE WORD DETECTION
    45.
    发明公开

    公开(公告)号:US20230237991A1

    公开(公告)日:2023-07-27

    申请号:US17584512

    申请日:2022-01-26

    Applicant: Spotify AB

    CPC classification number: G10L15/08 G10L15/22 G10L15/30 G10L2015/088

    Abstract: A wake word detector, at a server of a content delivery network (CDN) that provides audio (or other) content to a device, such as a voice-enabled device, detects false wake words in the audio content. The CDN wake word detector analyzes the audio stream to determine if the audio stream contains any audio that sounds like the wake word. If so, the CDN wake word detector can generate metadata that describes the time period, within the audio content, in which the false wake word was encountered. The metadata can include time offsets, from the start of the audio content, which can instruct a voice-enabled device to deactivate during the time period. This metadata is stored and then sent to the media-playback device requests the media content. The media-playback device can then instruct or inform the voice-enabled device of the presence of the false wake word. In this way, the wake word detector, at the voice-enabled device, is not activated to receive the false wake word.

    Cloud-based preset for media content playback

    公开(公告)号:US11601486B2

    公开(公告)日:2023-03-07

    申请号:US17401850

    申请日:2021-08-13

    Applicant: Spotify AB

    Abstract: A system is provided for streaming media content in a vehicle. The system includes a personal media streaming appliance system configured to connect to a media delivery system and receive media content from the media delivery system at least via a cellular network. The personal media streaming appliance system includes one or more preset buttons for playing media content associated with the preset buttons. Data about the preset buttons and the media content associated with the preset buttons can be stored in the media delivery system.

    Apparatus for media entity pronunciation using deep learning

    公开(公告)号:US11501764B2

    公开(公告)日:2022-11-15

    申请号:US16408887

    申请日:2019-05-10

    Applicant: Spotify AB

    Inventor: Daniel Bromand

    Abstract: Methods, systems, and related products for voice-enabled computer systems are described. A machine-learning model is trained to produce pronunciation output based on text input. The trained machine-learning model is used to produce pronunciation data for text input even where the text input includes numbers, punctuation, emoji, or other non-letter characters. The machine-learning model is further trained based on real-world data from users to improve pronunciation output.

Patent Agency Ranking