摘要:
Implementations enable conversations between operators of communication devices who use sign language and other operators who don't. A method may include receiving images of first sign language gestures captured by a camera of a first communication device, converting the first sign language gestures into first text, transmitting the first text to a second communication device, receiving second text from the second communication device, and converting the second text into images of second sign language gestures made by an avatar. The method may also include operating the camera to capture the images of the first sign language gestures and presenting the images of the second sign language gestures on a display of the first communication device. The method may further include receiving first speech captured at the second communication device, converting the first speech into third text, and then into images of third sign language gestures made by the avatar.
摘要:
A content server accessing an audio stream, and inputs portions of the audio stream into one or more non-speech classifiers for classification, the non-speech classifiers generating, for portions of the audio stream, a set of raw scores representing likelihoods that the respective portion of the audio stream includes an occurrence of a particular class of non-speech sounds associated with each of the non-speech classifiers. The content server generates binary scores for the sets of raw scores, the binary scores generated based on a smoothing of a respective set of raw scores. The content server applies a set of non-speech captions to portions of the audio stream in time, each of the sets of non-speech captions based on a different one of the set binary scores of the corresponding portion of the audio stream.
摘要:
Implementations enable conversations between operators of communication devices who use sign language and other operators who don't. A method may include receiving images of first sign language gestures captured by a camera of a first communication device, converting the first sign language gestures into first text, transmitting the first text to a second communication device, receiving second text from the second communication device, and converting the second text into images of second sign language gestures made by an avatar. The method may also include operating the camera to capture the images of the first sign language gestures and presenting the images of the second sign language gestures on a display of the first communication device. The method may further include receiving first speech captured at the second communication device, converting the first speech into third text, and then into images of third sign language gestures made by the avatar.
摘要:
Aspects relate to computer implemented methods, systems, and processes to automatically generate audio-based display indicia of media content including receiving, by a processor, a plurality of media content categories including at least one feature, receiving a plurality of categorized speech recognition algorithms, each speech recognition algorithm being associated with a respective one or more of the plurality of media content categories, determining a media content category of a current media content based on at least one feature of the current media content, selecting one speech recognition algorithm from the plurality of categorized speech recognition algorithms based on the determination of the media content category of the current media content, and applying the selected speech recognition algorithm to the current media content.
摘要:
A method for providing speech recognition to a user on a mobile device are provided, the method comprising: 1) receiving, by a processor, audio data; 2) processing the audio data, by a speech recognition engine, to determine one or more corresponding text, wherein the processing comprises querying a local language model and a local acoustic model; and 3) displaying the one or more corresponding text on a screen of the mobile device.
摘要:
A method includes the following steps. A speech input is received. At least two speech recognition candidates are generated from the speech input. A scene related to the speech input is observed using one or more non-acoustic sensors. The observed scene is segmented into one or more regions. One or more properties for the one or more regions are computed. One of the speech recognition candidates is selected based on the one or more computed properties of the one or more regions.
摘要:
A tinnitus treatment system comprising a sound processing unit, a haptic stimulus unit and an audio delivery unit. The sound processing unit comprises a processor input for receiving an audio signal; and a digital signal processor operable to analyse said audio signal and generate a plurality of actuation signals therefrom which are representative of said audio signal. The digital signal processor is further operable to spectrally modify said audio signal in accordance with a predetermined modification profile to generate a modified audio signal. The haptic stimulus unit comprises an array of stimulators each of which can be independently actuated to apply a tactile stimulus to a subject; and a stimulus unit input for receiving the plurality of actuation signals generated by said digital signal processor and directing individual actuation signals to individual stimulators. The audio delivery unit comprises an audio delivery unit input for receiving the modified audio signal generated by said digital signal processor.
摘要:
Methods and devices are described for allowing users to use portable computer devices such as smart phones to share microphone signals and/or closed captioning text generated by speech recognition processing of the microphone signals. Under user direction, the portable devices exchange messages to form a signal sharing group to facilitate their conversation.
摘要:
A simulation method and system. A computing system receives a first audio and/or video data stream. The first audio and/or video data stream includes data associated with a first person. The computing system monitors the first audio and/or video data stream. The computing system identifies emotional attributes comprised by the first audio and/or video data stream. The computing system generates a second audio and/or video data stream associated with the first audio and/or video data stream. The second audio and/or video data stream includes the data without the emotional attributes. The computing system stores the second audio and/or video data stream.
摘要:
A method determines a bias reduced noise and interference estimation in a binaural microphone configuration with a right and a left microphone signal at a time-frame with a target speaker active. The method includes a determination of the auto power spectral density estimate of the common noise formed of noise and interference components of the right and left microphone signals and a modification of the auto power spectral density estimate of the common noise by using an estimate of the magnitude squared coherence of the noise and interference components contained in the right and left microphone signals determined at a time frame without a target speaker active. An acoustic signal processing system and a hearing aid implement the method for determining the bias reduced noise and interference estimation. The noise reduction performance of speech enhancement algorithms is improved by the invention. Further, distortions of the target speech signal and residual noise and interference components are reduced.