摘要:
A voiced/unvoiced speech classifier (30) includes a speech segmentor (34) which segments an input digitized speech waveform into frames of speech and a band-pass filter (36) which filters the frames of speech. A relative energy generator (38) generates a relative energy value for each filtered frame of speech and a decision parameter generator (52) including an autocorrelation calculator (54) and a pitch calculator (56) generates a decision parameter based on an autocorrelation function and a pitch frequency index for the filtered frames of speech. A normalized energy calculator (46) adjusts the threshold and then normalizes the relative energy. A comparator (60) provides a signal indicative of whether a frame of speech is voiced speech or unvoiced speech depending on a comparison of the decision parameter and the normalized relative energy value for each filtered frame of speech.
摘要:
A method and apparatus for speech recognition involves classifying (38) a digitized speech segment according to whether the speech segment comprises voiced or unvoiced speech and utilizing that classification to generate tonal feature vectors (41) of the speech segment when the speech is voiced. The tonal feature vectors are then combined (42) with other non-tonal feature vectors (40) to provide speech feature vectors. The speech feature vectors are compared (35) with previously stored models of speech feature vectors (37) for different segments of speech to determine which previously stored model is a most likely match for the segment to be recognized.
摘要:
The invent discloses a smart wearable device for vision enhancement and a method for realizing stereoscopic vision transposition, comprising a wearable device body, wherein the wearable device body is provided with camera lenses, image sensors, an image information receiving and transmitting unit, image enhancement units, and near-to-eye optical systems; the optical axis and field angle of the near-to-eye optical system are matched with the optical axis and field angle of the camera lens; the image sensor is arranged behind the camera lens; the real scene enters the image sensor through an image imaging device for image acquisition, and through the image enhancement unit, the low-light environment image collected by the smart wearable device in the low-light environment is enhanced and displayed clearly. The invention can ensure the enhancement of the real stereoscopic vision in the dark environment and the interchange of the remote and barrier-free stereoscopic real vision.
摘要:
A method (400, 600, 700) and apparatus (220) for enhancing the intelligibility of speech emitted into a noisy environment. After filtering (408) ambient noise with a filter (304) that simulates the physical blocking of noise by a at least a part of a voice communication device (102) a frequency dependent SNR of received voice audio relative to ambient noise is computed (424) on a perceptual (e.g. Bark) frequency scale. Formants are identified (426, 600, 700) and the SNR in bands including certain formants are modified (508, 510) with formant enhancement gain factors in order to improve intelligibility. A set of high pass filter gains (338) is combined (516) with the formant enhancement gains factors yielding combined gains which are clipped (518), scaled (520) according to a total SNR, normalized (526), smoothed across time (530) and frequency (532) and used to reconstruct (532, 534) an audio signal.
摘要:
A continuous stream of noise is created from a plurality of input signals. A smoothing spectrum estimate is continuously calculated from the continuous stream of noise. Noise is responsively removed from a selected one of the plurality of input signals using the smoothing spectrum estimate. The removal of the noise from the selected input signal is performed substantially synchronously and in time alignment with the creating of the continuous stream of noise and the calculating of the smoothing spectrum estimate.
摘要:
A device (800) performs statistical pattern recognition using model parameters that are refined by optimizing an objective function that includes a term for many items of training data for which recognition errors occur wherein each term depends on a relative magnitude of a first score for a recognition result for an item of training data and a second score calculated by evaluating a statistical pattern recognition model identified by a transcribed identity of the training data item with feature vectors extracted from the item of training data. The objective function does not include terms for items of training data for which there is a gross discrepancy between a transcribed identity and a recognized identity. Gross discrepancies can be detected by probability score or pattern identity comparisons. Terms, of the objective function are weighted based on the type of recognition error and weights can be increased for high priority patterns.
摘要:
An apparatus for selecting a cohort model for use in a speaker verification system includes a model generator (108) for determining a target speaker model (114) from a speech sample collected from the target speaker (106). A cohort selector (110) determines a similarity value between each of a number of predetermined existing speaker models from a model pool (112) and the target speaker model (114) and a dissimilarity value between each of the existing speaker models and any previously selected cohort models (116). An existing speaker model which is most similar to the target speaker model, but most dissimilar to previously chosen cohort models, is then chosen as another cohort model for the target speaker.
摘要:
Speech from a driver and speech from a passenger in a vehicle is selected directionally using a plurality of directional microphones. Sounds detected as coming from a passenger from a plurality of directional microphones are suppressed from sounds detected as coming from a driver by a second plurality of directional microphones.
摘要:
Speech from a driver and speech from a passenger in a vehicle is selected directionally using a plurality of directional microphones. Sounds detected as coming from a passenger from a plurality of directional microphones are suppressed from sounds detected as coming from a driver by a second plurality of directional microphones.
摘要:
A continuous stream of noise is created from a plurality of input signals. A smoothing spectrum estimate is continuously calculated from the continuous stream of noise. Noise is responsively removed from a selected one of the plurality of input signals using the smoothing spectrum estimate. The removal of the noise from the selected input signal is performed substantially synchronously and in time alignment with the creating of the continuous stream of noise and the calculating of the smoothing spectrum estimate.