摘要:
A method for communicating real time media data between a first client and a second client across a packet switched data network is provided. The method includes receiving an indication of a first client network address for use as a destination network address for sending media datagrams to the first client. A media datagram originated by the first client is also received. A comparison between the first client network address and a source network address extracted from the media datagram originated by the first client is made. A media datagram is sent to the first client using the source network address if the source network address and the first client network receiving address are not the same and the media datagram is sent to the first client using the first client network address if the source network address and the first client network address are the same.
摘要:
A system and method for voice activity detection, in accordance with the invention includes the steps of inputting data including frames of speech and noise, and deciding if the frames of the input data include speech or noise by employing a log-likelihood ratio test statistic and pitch. The frames of the input data are tagged based on the log-likelihood ratio test statistic and pitch characteristics of the input data as being most likely noise or most likely speech. The tags are counted in a plurality of frames to determine if the input data is speech or noise.
摘要:
A method and system for transforming a sampling rate in speech recognition systems, in accordance with the present invention, includes the steps of providing cepstral based data including utterances comprised of segments at a reference frequency, the segments being represented by cepstral vector coefficients, converting the cepstral vector coefficients to energy bands in logarithmic spectra, filtering the energy bands of the logarithmic spectra to remove energy bands having a frequency above a predetermined portion of a target frequency and converting the filtered logarithmic spectra to modified cepstral vector coefficients at the target frequency. Another method and system convert system prototypes for speech recognition systems from a reference frequency to a target frequency.
摘要:
Techniques for employing improved prompts in a speech-to-speech translation system are disclosed. By way of example, a technique for use in indicating a dialogue turn in an automated speech-to-speech translation system comprises the following steps/operations. One or more text-based scripts are obtained. The one or more text-based scripts are synthesizable into one or more voice prompts. At least one of the one or more voice prompts is synthesized for playback from at least one of the one or more text-based scripts, the at least one synthesized voice prompt comprising an audible message in a language understandable to a speaker interacting with the speech-to-speech translation system, the audible message indicating a dialogue turn in the automated speech-to-speech translation system.
摘要:
A method of audio communication between a first telephony client located behind a network address translation (NAT) server and a remote second telephony client is disclosed. A calibration datagram is sent from the first telephony client to the second telephony client on a user datagram protocol (UDP) channel identified for sending audio data. The second telephony client extracts the source address and port number to identify a reverse UDP channel for sending audio data to the first telephony client.
摘要:
A method of audio communication utilizing media datagrams between a first telephony client located behind a network address translation (NAT) server and a remote second telephony client is disclosed. Each client utilizes a single port number for both sending and receiving media datagrams. A media datagram is sent from the first telephony client to the second telephony client on a UDP/IP channel utilizing a destination IP address and port number provided by the second telephony client. The second telephony client extracts the source IP address and source port number from the received media datagram to determine if the first telephony client is located behind a NAT server. If the first telephony client is located behind a NAT server, the extracted source IP address and port number are stored and used to send media datagrams to the first telephony client located behind the NAT server.
摘要:
An automatic segmenter for continuous text segments such text in a rapid, consistent and semantically accurate manner. Two statistical methods for segmentation of continuous text are used. The first method, called "forward-backward matching", is easy and fast but can produce occasional errors in long phrases. The second method, called "statistical stack search segmenter", utilizes statistical language models to generate more accurate segmentation output at an expense of two times more execution time than the "forward-backward matching" method. In some applications where speed is a major concern, "forward-backward matching" can be used, while in other applications where highly accurate output is desired, "statistical stack search segmenter" is ideal.
摘要:
Techniques for employing improved prompts in a speech-to-speech translation system are disclosed. By way of example, a technique for use in indicating a dialogue turn in an automated speech-to-speech translation system comprises the following steps/operations. One or more text-based scripts are obtained. The one or more text-based scripts are synthesizable into one or more voice prompts. At least one of the one or more voice prompts is synthesized for playback from at least one of the one or more text-based scripts, the at least one synthesized voice prompt comprising an audible message in a language understandable to a speaker interacting with the speech-to-speech translation system, the audible message indicating a dialogue turn in the automated speech-to-speech translation system.
摘要:
Techniques for employing improved prompts in a speech-to-speech translation system are disclosed. By way of example, a technique for use in indicating a dialogue turn in an automated speech-to-speech translation system comprises the following steps/operations. One or more text-based scripts are obtained. The one or more text-based scripts are synthesizable into one or more voice prompts. At least one of the one or more voice prompts is synthesized for playback from at least one of the one or more text-based scripts, the at least one synthesized voice prompt comprising an audible message in a language understandable to a speaker interacting with the speech-to-speech translation system, the audible message indicating a dialogue turn in the automated speech-to-speech translation system.