APPARATUS, METHOD AND COMPUTER PROGRAM CODE FOR PROCESSING AUDIO STREAM

    公开(公告)号:US20240221777A1

    公开(公告)日:2024-07-04

    申请号:US18686266

    申请日:2022-07-12

    申请人: Utopia Music AG

    IPC分类号: G10L25/54 G10L25/18

    CPC分类号: G10L25/54 G10L25/18

    摘要: Apparatus, method, and computer program code for processing audio stream. The method includes: obtaining first peaks of an audio stream, wherein the first peak comprises a first peak amplitude at a first frequency and at a first time offset from a beginning of the audio stream; for each first peak, detecting a second peak in a window with a predetermined offset from the first peak, wherein the second peak comprises a second peak amplitude at a second frequency and at a second time offset from the beginning of the audio stream; and for each first peak, generating a fingerprint hash based on the first frequency, a time difference between the first time offset and the second time offset, a frequency difference between the first frequency and the second frequency, and an amplitude difference between the first amplitude and the second amplitude.

    Tool for assisting people with speech disorder

    公开(公告)号:US11763821B1

    公开(公告)日:2023-09-19

    申请号:US16455196

    申请日:2019-06-27

    发明人: Douglas S. McNair

    摘要: Various tools are disclosed for providing assistive or augmentative means to enhance the fluency and accuracy of persons having speech disabilities. These technologies may automatically ascertain and dynamically improve the accuracy with which automatic speech recognition (ASR) systems recognize utterances of persons having impaired speech conditions. In an embodiment, digitized audio information about a speaker’s utterance is processed to determine a set of candidate words matching the utterance. From these candidate words, a set of concepts is determined using a finite state machine model. A pictogram representing each concept is identified and presented to the speaker so that the speaker may select the pictogram corresponding to the best match of his or her intended meaning associated with the utterance. An action corresponding to speaker’s selection then may be performed. For example, displaying or synthesizing speech from textual information describing the selected concept.

    MACHINE LEARNING MODELS FOR AUTOMATED PROCESSING OF AUDIO WAVEFORM DATABASE ENTRIES

    公开(公告)号:US20230238019A1

    公开(公告)日:2023-07-27

    申请号:US17580748

    申请日:2022-01-21

    IPC分类号: G10L25/66 G10L25/30 G10L25/54

    CPC分类号: G10L25/66 G10L25/30 G10L25/54

    摘要: A computer system includes memory hardware and processor hardware configured to execute stored instructions. The instructions include training a machine learning model with the historical feature vector inputs including multiple audio data entries and multiple claims data entries, to generate a condition likelihood output indicative of a specified condition associated with one of multiple historical database entities. The instructions include for each of a set of multiple database entities, generating a feature vector input according to audio data and the claims data associated with the entity, processing the feature vector input with the machine learning model to generate the condition likelihood output, and assigning the database entity to an identified condition subset in response to determining that the condition likelihood output is greater than a specified likelihood threshold. The instructions include transforming a user interface to display the condition likelihood output associated with the database entity.