摘要:
A technique for improving speech recognition in low-cost, speech interactive devices. This technique calls for selectively implementing a speaker-specific word enrollment and detection unit in parallel with a word detection unit to permit comprehension of spoken commands or messages when no recognizable words are found. Preferably, specific speaker detection will be based on the speaker's own personal list of words or expression. Other facets include complementing non-specific pre-registered word characteristic information with individual, speaker-specific verbal characteristics to improve recognition in cases where the speaker has unusual speech mannerisms or accent and response alteration in which speaker-specification registration functions are leveraged to provide access and permit changes to a predefined responses table according to user needs and tastes. Also disclosed is the externalization and modularization of non-specific speaker recognition, action and response information to enhance adaptability of the speech recognizer without sacrificing product cost competitiveness or overall device responsiveness.
摘要:
Bifurcated speaker specific and non-speaker specific method and apparatus is provided for enabling speech-based remote control and for recognizing the speech of an unspecified speaker at extremely high recognition rates regardless of the speaker's age, sex, or individual speech mannerisms. A device main unit is provided with a speech recognition processor for recognizing speech and taking an appropriate action, and with a user terminal containing specific speaker capture and/or preprocessing capabilities. The user terminal exchanges data with the speech recognition processor using radio transmission. The user terminal may be provided with a conversion rule generator that compares the speech of a user with previously compiled standard speech feature data and, based on this comparison result, generates a conversion rule for converting the speaker's speech feature parameters to corresponding standard speaker's feature information. The speech recognition processor, in turn, may reference the conversion rule developed in the user terminal and perform speech recognition based on the input speech feature parameters that have been converted above.
摘要:
Techniques for implementing adaptable voice activation operations for interactive speech recognition devices and instruments. Specifically, such speech recognition devices and instruments include an input sound signal power or volume detector in communication with a central CPU for bringing the CPU out of an initial sleep state upon detection of perceived voice exceeding a predetermined threshold volume level and is continuously perceived for at least a certain period of time. If both these conditions are satisfied, the CPU is transitioned into an active mode so that the perceived voice can be analyzed against a set of registered key words to determine if a "power on" command or similar instruction has been received. If so, the CPU maintains an active state in normal speech recognition processing ensues until a "power off" command is received. However, if the perceived and analyzed voice can not be recognized, it is deemed to be background noise and the minimum threshold is selectively updated to accommodate the volume level of the perceived but unrecognized voice. Other aspects include tailoring the volume level of the synthesized voice response according to the perceived volume level as detected by the input sound signal power detector, as well as modifying audible response volume in accordance with updated volume threshold levels.
摘要:
A technique for improving speech recognition in low-cost, speech interactive devices. This technique calls for selectively implementing a speaker-specific word enrollment and detection unit in parallel with a word detection unit to permit comprehension of spoken commands or messages when no recognizable words are found. Preferably, specific speaker detection will be based on the speaker's own personal list of words or expression. Other facets include complementing non-specific pre-registered word characteristic information with individual, speaker-specific verbal characteristics to improve recognition in cases where the speaker has unusual speech mannerisms or accent and response alteration in which speaker-specification registration functions are leveraged to provide access and permit changes to a predefined responses table according to user needs and tastes. Also disclosed is the externalization and modularization of non-specific speaker recognition, action and response information to enhance adaptability of the speech recognizer without sacrificing product cost competitiveness or overall device responsiveness.
摘要:
A technique for improving speech recognition in low-cost, speech interactive devices. This technique calls for implementing a speaker-specific word enrollment and detection unit in parallel with a word detection unit to permit comprehension of spoken commands or messages issued by binary questions when no recognizable words are found. Preferably, specific speaker detection will be based on the speaker's own personal list of words or expression. Other facets include complementing non-specific pre-registered word characteristic information with individual, speaker-specific verbal characteristics to improve recognition in cases where the speaker has unusual speech mannerisms or accent and response alteration in which speaker-specification registration functions are leveraged to provide access and permit changes to a predefined responses table according to user needs and tastes.
摘要:
A technique for improving voice recognition in low-cost, speech interactive devices. This technique calls for implementing a affirmative/negative discrimination unit in parallel with a word detection unit to permit comprehension of spoken commands or messages issued by binary questions when no recognizable words are found. Preferably, affirmative/negative discrimination will include either spoken vowel analysis or negative language descriptor detection of the perceived message or command. Other facets include keyword identification within the perceived message or command, confidence match level comparison or correlation table compilation in order to increase recognition accuracy of word-based recognition, volume analysis, and inclusion of ambient environment information in generating responses to perceived messages or queries.
摘要:
A copying machine of the present invention has a scanner unit (2) for reading an original, a line type thermal head (14) for printing read image data on tape (16) to be used as a recording medium, a tape cartridge (51) for supplying the tape and a tape conveying mechanism. The tape (16) has an image carrying sheet (161), the front surface of which is a printing face to be printed with an image, an adhesive layer (162) formed by applying an adhesive to the back surface of this image carrying sheet and a release sheet (163) releasably stuck to the surface of the adhesive layer. The tape is taken up like a roll and is detachably fitted into the body of the copying machine in the form of the tape cartridge (51). At both ends of the tape, conveyance border portions of a uniform width are formed. In the border portions, engaging holes ( ) are formed at regular intervals in the direction of the length of the tape. The tape conveying mechanism has sprockets (183, 184), on the outer circumferences of which projections being capable of engaging with the engaging holes are formed. The tape can be conveyed with high precision and a desired image can be printed thereon. Thereby, the copying machine of the present invention can easily obtain a tape, which has a desired image copied and is able to be stuck to a desired place.
摘要:
A copying machine is provided which generally includes a scanner unit (2) for reading an original image, a line type thermal head (14) for printing the read image on to a tape (16) which is used as a recording medium, a tape cartridge (51) for supplying the tape, and a tape conveying mechanism. The tape (16) has an image carrying sheet (161) with a front surface forming a printing face, and a back surface forming an adhesive layer (162) formed by applying an adhesive, and a release sheet (163) releasably stuck to the surface of the adhesive layer. The tape may be rolled into a tape roll and placed in a tape cartridge (51) which is detachably fitted into the body of the copying machine. Both ends of the tape have conveyance border portions of a uniform width. Engaging holes are formed in the border portions, at regular intervals in the direction of the tape length. The tape conveying mechanism has sprockets (183, 184) with projections being capable of engaging the engaging holes on the outer circumferences. This feature allows the tape to be conveyed with high precision.
摘要:
The invention improves recognition rates by providing an interactive speech recognition device that performs recognition by taking situational and environmental changes into consideration, thus enabling interactions that correspond to situational and environmental changes. The invention comprises a speech analysis unit that creates a speech data pattern corresponding to the input speech; a timing circuit for generating time data, for example, as variable data; a coefficient setting unit receiving the time data from the timing circuit and generating weighting coefficients that change over time, in correspondence to the content of each recognition target speech; a speech recognition unit that receives the speech data pattern of the input speech from the speech analysis unit, and that at the same time obtains a weighting coefficient in effect for a pre-registered recognition target speech at the time from the coefficient setting unit, that computes final recognition data by multiplying the recognition data corresponding to each recognition target speech by its corresponding weighting coefficient, and that recognizes the input speech based on the computed final recognition result; a speech synthesis unit for outputting speech synthesis data based on the recognition data that takes the weighting coefficient into consideration; and a drive control unit for transmitting the output from the speech synthesis unit to the outside.
摘要:
Since both an output setting value and an output subject identifier, which are contained by a setting program, are stored as one record into a drawing table, an output condition such as a layout of image data which is designated when a printing operation is carried out can be previously set before the printing operation is carried out. Since photograph numbers of image data stored in a memory card are defined in relation to output subject identifiers, such image data to be printed out can be designated.