摘要:
A method and apparatus is disclosed that allows people to carry on unobtrusive phone conversations in business or other settings where it is either not possible or impolite to talk. In the system of FIG. 1, the telephone user one will listen in the same manner as with a regular telephone. However, he will not speak into the telephone microphone. User one instead employs a unit including a keyboard to enter the text corresponding to what he wants to say. The text is converted into a synthesized speech using TTS apparatus and a voice output is sent to the microphone of the phone apparatus. The telephone apparatus transmits the synthesized voice signal over a standard telephone line to a unit including a conventional telephone speaker 26 and telephone microphone. User two, the party using the telephone at the other end, listens to a synthesized voice, but user one listens to the actual voice of user two with the telephone speaker, unless user two is also using a system similar to that of user one. Handwritten text may also be used in the system by employing a computer with a character recognition program as an input. In such a case handwriting is converted into synthesized sound and inputted into the telephone microphone. The telephone system can be used by the hearing impaired without involving a third party human transcriber.
摘要:
Selecting human speech samples for a speech model of human speech is preformed. The system presents a graphic representing a human speech sample on a computer display, e.g., an amplitude vs. time graph of the speech sample. Through user input, the system marks a segment of the graphic. The marked segment of the graphic represents a portion of the human speech sample. The system plays the portion of the human speech sample represented by the marked segment back to the user to allow the user to determine its acceptability for inclusion in the speech model. If so indicated by the user, the portion of the human speech sample represented by the marked segment is selected for inclusion in the speech model. The system also analyzes the portion of the human speech sample represented by the marked segment for acoustic properties. These properties are presented to the user in a graphic of the analyzed portion representative of the acoustic properties, e.g., a spectral analysis of the sample graphed as a set of spectral lines. Thus, the user can select the analyzed portion for inclusion in the speech model due to the presence of desired acoustic properties in the analyzed portion.
摘要:
A data processing system collects video and audio samples of acceptable speech production. A video camera focuses on a speaker's face and, particularly, articulation visible in the area of the mouth or other body movements associated with speech production. Video files are used to archive acceptable and unacceptable productions. These files may then be used to provide feedback about acceptable and unacceptable ways to produce speech. A speech professional or language teacher may play a model speech production and a subject speech attempt simultaneously to compare articulation, audio analysis, and appearance of articulators. A subject may play a model speech production and record a speech attempt simultaneously to attempt to mimic the appearance of articulators. Image processing may be used to create a mirror image of a video model or a current attempt or both to avoid left-right confusion.
摘要:
A method and apparatus for interfacing a device driver in real time applications are provided. On input, the device driver is probed to determine a data sample block size supported by the device driver. The device driver delivers data samples to a buffer at each interrupt. The buffer is accessed to determine the presence of data in at least one buffer entry, or block. At each such access, a first counter is incremented to point to a next buffer entry to be accessed. One or more buffer entries are filled at each interrupt, with any data samples not sufficient to fill an entry held by the device driver until a subsequent interrupt. A second counter is incremented by the number of entries filled by the device driver. The size of each block in the buffer is incremented until the number of data samples held by the device driver between each interrupt corresponds to the size of the block, wherein each of the first and second counters increment by one, on each access to the buffer. On output, a display cursor is synchronized with an audio signal being played by storing, in a buffer, a frame number associated with each block of data samples sent to the device driver. A head pointer tracks each entry in the buffer as the frame number is stored. As each corresponding frame is played a tail pointer is incremented to point to the buffer entry containing the frame number of a next set of data samples to be played. If the position of the display cursor corresponds to a frame earlier than the position of the last frame output by the device driver, the position of the cursor is repeatedly updated until the cursor position coincides with the position of the last frame output by the device driver.