Sound source tracking system and method thereof

    公开(公告)号:US12046255B2

    公开(公告)日:2024-07-23

    申请号:US17572100

    申请日:2022-01-10

    摘要: A sound source tracking method adapted to an ongoing video conference comprising: obtaining a streaming signal of the video conference from an internet; performing a video conference procedure to obtain an audio signal from the streaming signal and send the audio signal to a speaker; performing an audio tracking procedure to obtain the audio signal outputted from the video conference procedure to the communication device and send the audio signal to a sound source tracking camera; playing the audio signal to generate a far-end sound; recording a field sound comprising at least one of the far-end sound and a local-end sound; and performing a comparing procedure to determine a shooting direction of the sound source tracking camera, wherein the shooting direction is adjusted so as not to shoot the speaker when a similarity of the far-end sound and the audio signal is greater than a threshold.

    METHODS AND SYSTEMS FOR COMPUTER-GENERATED VISUALIZATION OF SPEECH

    公开(公告)号:US20240087591A1

    公开(公告)日:2024-03-14

    申请号:US18451736

    申请日:2023-08-17

    申请人: SomniQ, Inc.

    摘要: Methods, systems and apparatuses for computer-generated visualization of speech are described herein. An example method of computer-generated visualization of speech including at least one segment includes: generating a graphical representation of an object corresponding to a segment of the speech; and displaying the graphical representation of the object on a screen of a computing device. Generating the graphical representation includes: representing a duration of the respective segment by a length of the object and representing intensity of the respective segment by a width of the object; and placing, in the graphical representation, a space between adjacent objects.

    PRE-WAKEWORD SPEECH PROCESSING
    7.
    发明公开

    公开(公告)号:US20230386458A1

    公开(公告)日:2023-11-30

    申请号:US17804544

    申请日:2022-05-27

    申请人: SoundHound, Inc.

    IPC分类号: G10L15/22 G10L15/08 G10L25/93

    摘要: Methods and systems for pre-wakeword speech processing are disclosed. Speech audio, comprising command speech spoken before a wakeword, may be stored in a buffer in oldest to newest order. Upon detection of the wakeword, reverse acoustic models and language models, such as reverse automatic speech recognition (R-ASR) can be applied to the buffered audio, in newest to oldest order, starting from before the wakeword. The speech is converted into a sequence of words. Natural language grammar models, such as natural language understanding (NLU), can be applied to match the sequence of words to a complete command, the complete command being associated with invoking a computer operation.

    SYSTEM AND METHOD FOR HANDLING UNSPLIT SEGMENTS IN TRANSCRIPTION OF AIR TRAFFIC COMMUNICATION (ATC)

    公开(公告)号:US20230352042A1

    公开(公告)日:2023-11-02

    申请号:US17806565

    申请日:2022-06-13

    摘要: Systems and methods are provided for a transcription system with voice activity detection (VAD). The system includes a VAD module to receive incoming audio and generate an audio segment; and a speech decoder with a split predictor to perform, in a first pass, a decode operation to transcribe text from an audio segment into a message; wherein in the first pass, if the message is determined not to contain a split point based on a content-based analysis performed by the split predictor, the speech decoder forwards the message for display and if the message is determined based on the content-based analysis to contain the split point, the speech decoder performs in a second pass, a re-decode operation to transcribe text from the audio segment based on the split point wherein the split point is configured within an audio domain of the audio segment by the split predictor and forward the message for display.

    APPARATUS, METHOD AND COMPUTER PROGRAM FOR ENCODING AN AUDIO SIGNAL OR FOR DECODING AN ENCODED AUDIO SCENE

    公开(公告)号:US20230306975A1

    公开(公告)日:2023-09-28

    申请号:US18160894

    申请日:2023-01-27

    摘要: There are disclosed an apparatus for generating an encoded audio scene, and an apparatus for decoding and/or processing an encoded audio scene; as well as related methods and non-transitory storage units storing instructions which, when executed by a processor, cause the processor to perform a related method. An apparatus for processing an encoded audio scene may include, in a first frame, a first soundfield parameter representation and an encoded audio signal, wherein a second frame is an inactive frame, the apparatus including: an activity detector for detecting that the second frame is the inactive frame; a synthetic signal synthesizer for synthesizing a synthetic audio signal for the second frame using the parametric description for the second frame; an audio decoder for decoding the encoded audio signal for the first frame; and a spatial renderer for spatially rendering the audio signal for the first frame using the first soundfield parameter representation and using the synthetic audio signal for the second frame, or a transcoder for generating a meta data assisted output format including the audio signal for the first frame, the first soundfield parameter representation for the first frame, the synthetic audio signal for the second frame, and a second soundfield parameter representation for the second frame.