摘要:
To enable use of content data recorded in an arbitrary player using a mobile terminal, such as a cellular phone, in response to an inquiry made by a mobile terminal to a player, the player transmits a response to the mobile terminal. When a user performs a predetermined operation on the mobile terminal, the mobile terminal creates a one-time password and transmits the one-time password and operation information concerning the user-performed operation to the player. The player transmits a terminal ID and the like to a service center. Upon reception of the information, the service center creates a one-time password and transmits the one-time password and an operation permission command to the player. The player compares the transmitted password with the password created by the mobile terminal. When the passwords are verified, the user operation instruction is made valid.
摘要:
Bifurcated speaker specific and non-speaker specific method and apparatus is provided for enabling speech-based remote control and for recognizing the speech of an unspecified speaker at extremely high recognition rates regardless of the speaker's age, sex, or individual speech mannerisms. A device main unit is provided with a speech recognition processor for recognizing speech and taking an appropriate action, and with a user terminal containing specific speaker capture and/or preprocessing capabilities. The user terminal exchanges data with the speech recognition processor using radio transmission. The user terminal may be provided with a conversion rule generator that compares the speech of a user with previously compiled standard speech feature data and, based on this comparison result, generates a conversion rule for converting the speaker's speech feature parameters to corresponding standard speaker's feature information. The speech recognition processor, in turn, may reference the conversion rule developed in the user terminal and perform speech recognition based on the input speech feature parameters that have been converted above.
摘要:
Techniques for implementing adaptable voice activation operations for interactive speech recognition devices and instruments. Specifically, such speech recognition devices and instruments include an input sound signal power or volume detector in communication with a central CPU for bringing the CPU out of an initial sleep state upon detection of perceived voice exceeding a predetermined threshold volume level and is continuously perceived for at least a certain period of time. If both these conditions are satisfied, the CPU is transitioned into an active mode so that the perceived voice can be analyzed against a set of registered key words to determine if a "power on" command or similar instruction has been received. If so, the CPU maintains an active state in normal speech recognition processing ensues until a "power off" command is received. However, if the perceived and analyzed voice can not be recognized, it is deemed to be background noise and the minimum threshold is selectively updated to accommodate the volume level of the perceived but unrecognized voice. Other aspects include tailoring the volume level of the synthesized voice response according to the perceived volume level as detected by the input sound signal power detector, as well as modifying audible response volume in accordance with updated volume threshold levels.
摘要:
A technique for improving voice recognition in low-cost, speech interactive devices. This technique calls for implementing a affirmative/negative discrimination unit in parallel with a word detection unit to permit comprehension of spoken commands or messages issued by binary questions when no recognizable words are found. Preferably, affirmative/negative discrimination will include either spoken vowel analysis or negative language descriptor detection of the perceived message or command. Other facets include keyword identification within the perceived message or command, confidence match level comparison or correlation table compilation in order to increase recognition accuracy of word-based recognition, volume analysis, and inclusion of ambient environment information in generating responses to perceived messages or queries.
摘要:
A technique for improving speech recognition in low-cost, speech interactive devices. This technique calls for selectively implementing a speaker-specific word enrollment and detection unit in parallel with a word detection unit to permit comprehension of spoken commands or messages when no recognizable words are found. Preferably, specific speaker detection will be based on the speaker's own personal list of words or expression. Other facets include complementing non-specific pre-registered word characteristic information with individual, speaker-specific verbal characteristics to improve recognition in cases where the speaker has unusual speech mannerisms or accent and response alteration in which speaker-specification registration functions are leveraged to provide access and permit changes to a predefined responses table according to user needs and tastes. Also disclosed is the externalization and modularization of non-specific speaker recognition, action and response information to enhance adaptability of the speech recognizer without sacrificing product cost competitiveness or overall device responsiveness.
摘要:
The invention relates to a method and apparatus for recognition processing of continuous words of a group which is structured by a plurality of words such that a recognition result of all of the words which structures the continuous words is effectively and accurately confirmed. All of the continuous words which have been input are recognition processed, the recognition result of all of the continuous words is output, a response from a speaker showing an affirmative/negative recognition result is input and recognition processed. If affirmative is determined, the recognition result at that time is confirmed for all of the continuous words. If negative is determined, for each word from a first to an nth (third in this case) which structures continuous words, the content showing affirmative/negative from the speaker is recognized, affirmative or negative is determined, and the recognition result at that time is confirmed as a recognition processing target word.
摘要:
This invention concerns obtaining high recognition capability while there is a large limitation on memory capacity and processing ability of a CPU. When several words are selected as registration words among a plurality of recognizable words, a recognition target speaker speaks the respective registration words, registration word data for the respective registration words from the sound data is created and saved in a RAM. When the recognition target speaker speaks a registration word, sound is recognized using the registration word data, and when recognizable words other than the registration words are recognized, sound is recognized using specific speaker group sound model data. Furthermore, speaker learning processing is performed using the registration word data and the specific speaker group sound model data, and when recognizable words other than the registration words are recognized, sound is recognized using post-speaker learning data for speaker adaptation.
摘要:
A voice model learning data creation method and apparatus makes possible the creation of an inexpensive voice model in a short period of time when creating a voice model for a new word not in a preexisting database. Verbal data from several persons is selected from among the verbal data held in the database. This selected verbal data is referred to as standard speaker data, and is stored in a standard speaker data storage component. The remaining verbal data in the preexisting database is designated as learning speaker data, as is stored in a learning speaker data storage component. A data conversion function from the standard speaker data space to the learning speaker data space is derived. Then, the learning data for the new word is created by the data conversion function. Thus, the data which is obtained from the standard speaker speaking the new word is converted to the learning speaker data space.
摘要:
A technique for improving speech recognition in low-cost, speech interactive devices. This technique calls for selectively implementing a speaker-specific word enrollment and detection unit in parallel with a word detection unit to permit comprehension of spoken commands or messages when no recognizable words are found. Preferably, specific speaker detection will be based on the speaker's own personal list of words or expression. Other facets include complementing non-specific pre-registered word characteristic information with individual, speaker-specific verbal characteristics to improve recognition in cases where the speaker has unusual speech mannerisms or accent and response alteration in which speaker-specification registration functions are leveraged to provide access and permit changes to a predefined responses table according to user needs and tastes. Also disclosed is the externalization and modularization of non-specific speaker recognition, action and response information to enhance adaptability of the speech recognizer without sacrificing product cost competitiveness or overall device responsiveness.
摘要:
To impose a restriction on actual use, such as playback, of content data and to reduce or prevent unauthorized use of content data by a unit that is not registered with an information provider, in response to an operation request to play content data recorded in a user unit, the user unit transmits current time information, the unit ID, content information, and operation information to a service center and, using these pieces of information, creates a one-time password that is valid only for a predetermined period of time. On the basis of the information transmitted from the user unit and current time information obtained from a time keeping unit of the service center, the service center similarly creates a one-time password and an operation permission command and transmits them to the user unit. The user unit compares the two one-time passwords. When the two one-time passwords match each other, the user-requested operation (playback) is executed.