摘要:
A method of interacting with a client/server architecture with a 2G mobile phone is provided. The 2G phone includes a data channel for transmitting data and a voice channel for transmitting speech. The method includes receiving a web page from a web server pursuant to an application through the data channel and rendering the web page on the 2G phone. Speech is received from the user corresponding to at least one data field on the web page. A call is established from the 2G phone to a telephony server over the voice channel. The telephony server is remote from the 2G phone and is adapted to process speech. The telephony server obtains a speech-enabled web page from the web server corresponding to the web page provided to the 2G phone. Speech is transmitted from the 2G phone to the telephony server. The speech is processed in accordance with the speech-enabled web page to obtain textual data. The textual data is transmitted to the web server. The 2G phone obtains a new web page through the data channel and renders the new web page having the textual data.
摘要:
In a method of entering text into a device a first character input is provided that is indicative of a first character of a text entry. Next, a vocalization of the text entry is captured. A probable word candidate is then identified for a first word of the vocalization based upon the first character input and an analysis of the vocalization. Finally, the probable word candidate is displayed for a user.
摘要:
The present invention provides a dialog system in which the subsystems are integrated under a single technology model. In particular, each of the sub-systems uses stochastic modeling to identify a probability for its respective output. The combined probabilities identify a most probable action to be taken by the dialog system given the latest input from the user and the past dialog states. An additional aspect of the present invention is an embodiment in which the sub-systems communicate with one another through XML pages, thus allowing the sub-systems to be distributed across a distributed network.
摘要:
A method and apparatus is provided for identifying patterns from a series of feature vectors representing a time-varying signal. The method and apparatus use both a frame-based model and a segment model in a unified framework. The frame-based model determines the probability of an individual feature vector given a frame state. The segment model determines the probability of sub-sequences of feature vectors given a single segment state. The probabilities from the frame-based model and the segment model are then combined to form a single path score that is indicative of the probability of a sequence of patterns. Another aspect of the invention is the use of a frame-based model and a segment model to segment feature vectors during model training. Under this aspect of the invention, the frame-based model and the segment model are used together to identify probabilities associated with different segmentations.
摘要:
A classifier that disambiguates among entities based on a dictionary, such as corpus of documents about those entities, is built by incorporating probabilities that an entity exists that is not in the dictionary. Given a document it is associated by the classifier with an entity. By incorporating out of collection probabilities into the classifier, a higher level of confidence in the match between an entity and a document is achieved.
摘要:
An auto-answer feature is implemented in SIP by configuring a receiving device to automatically acknowledge and answer an incoming call or session from a specific trusted third party. The receiving device may skip to an OK response to an INVITE request when the call is routed through the trusted third party. When the device can automatically answer the incoming call, advanced features such as Push To Talk, Information Tone, Click to Call, and Remote Monitoring may be easily implemented.
摘要:
Described is automatically testing the quality of an audio channel between a caller and a callee that includes a device under test, such as a VoIP or other gateway. An analyzer receives timestamps from a caller and callee during a calling session, including timestamps for when the callee initially provides audio (e.g., speech) to the caller, when the caller initially detects sound, when the caller initially provides audio to the callee, and when the callee initially detects sound. The analyzer uses the relative timing of the timestamps and the speech recognizer's outcome to determine whether the audio channel is experiencing interference or echo. When the audio includes speech, a confidence level corresponding to accuracy of speech recognition also may establish the audio channel's quality. Random selection and timing of output may be employed, such as to vary the testing patterns during repetitive tests.
摘要:
A statistical language model (SLM) may be iteratively refined by considering N-gram counts in new data, and blending the information contained in the new data with the existing SLM. A first group of documents is evaluated to determine the probabilities associated with the different N-grams observed in the documents. An SLM is constructed based on these probabilities. A second group of documents is then evaluated to determine the probabilities associated with each N-gram in that second group. The existing SLM is then evaluated to determine how well it explains the probabilities in the second group of documents, and a weighting parameter is calculated from that evaluation. Using the weighting parameter, a new SLM is then constructed as a weighted average of the existing SLM and the new probabilities.
摘要:
Technologies pertaining to inferring a view sequence of a user are described herein. A view sequence is an order that graphical objects on a graphical user interface are viewed by a user. A view sequence with respect to graphical objects presented on a graphical user interface is inferred based upon historically observed user actions, such as selection of a link or hovering over respective graphical objects. The view sequence is inferred without employment of sensor equipment that tracks eye movements of users.
摘要:
The present invention relates to establishing a media channel and a signaling channel between a client and a server. The media channel uses a chosen codec and protocol for communication. Through the media channel and signaling channel, an application on the client can utilize speech services on the server.