摘要:
A portable acoustic signal (speech signal) preprocessing (SSP) device for accessing an automatic speech/speaker recognition (ASSR) server comprises a microphone for converting sound including speech, silence and background noise signals to analog signals; an analog signals to digital converter for converting the analog signals to digital signals; a digital signal processor (DSP) for generating feature vector data representing the digitized speech and silence/background noise, and for generating channel characterization signals; and an acoustic coupler for converting the feature vector data and the characterization signals to acoustic signals and coupling the acoustic signals to a communication channel to access the ASSR server to perform speech and speaker recognition at a remote location. The SSP device may also be configured to compress and encrypt data transmitted to the ASSR server via the DSP and encryption keys stored in a memory. The ASSR server receives the preprocessed acoustic signals to perform speech/speaker recognition by setting references, selecting appropriate decoding models and algorithms to decode the acoustic signals by modeling the channel transfer function from the channel characterization signals and processing the silence/background noise data to reduce word error rate for speech recognition and to perform accurate speaker recognition. A client/server system having the portable SSP device and the ASSR server can be used to remotely activate, reset, or change personal identification numbers (PINs) or user passwords for smartcards, magnetic cards, or electronic money cards.
摘要:
Methods and apparatus are disclosed for transmitting data, such as biometric data or Internet telephone data, in a packet network. Packets are split and interchanged prior to transmission across a packet network, such that packets that teach their destination may be processed, even in the presence of lost or delayed packets. Packets of biometric data, such as fingerprints, retinal scans or voice characteristics, or sampled voice packets are split, and optionally interchanged prior to transmission. If some packets are lost or delayed, while some of the packets reach their destination and provide sufficient data for user identification, then the user may be authenticated without requesting the retransmission of the lost or delayed data. If some packets are lost or delayed, while some packets teach their destination, then the received speech samples may be reproduced without requesting the retransmission of the lost or delayed data.
摘要:
A method for collecting data associated with the voice of a voice system user includes conducting a plurality of conversations with a plurality of voice system users. For each conversation, a speech waveform is captured and digitized, and at least one acoustic feature is extracted. The features are correlated with at least one attribute such as gender, age, accent, native language, dialect, socioeconomic classification, educational level and emotional state. Attribute data and at least one identifying indicia are stored for each user in a data warehouse, in a form to facilitate subsequent data mining thereon. The resulting collection of stored data is then mined to provide information for modifying underlying business logic of the voice system. An apparatus suitable for carrying out the method includes a dialog management unit, an audio capture module, an acoustic from end, a processing module and a data warehouse. Appropriate method steps can be implemented by a digital computer running a suitable program stored on a program storage device.
摘要:
A method and apparatus for authenticating (or identifying) a subject, includes using one or a plurality of biometric measurements for authentication (or identification) without any sharing of the subject's biometric data with a party requesting authentication.
摘要:
A structural means is provided that positions an operator's voice communication microphone in a vehicle in the vicinity of the visor without interfering with the movement and functions of the visor. The positioning being achieved by attaching a portion of a microphone holder in connection with an escutcheon- type plate that is part of the visor retention and the visor support member and attaching the microphone to another portion of the microphone holder so as to extend the microphone to a position above the visor when the visor is in the stored position.
摘要:
The present invention is a portable client PDA with a touch screen or other equivalent user interface and having a microphone and local central processing unit (CPU) for processing voice commands and for processing biometric data to provide user verification. The PDA also includes a memory for storing financial and personal information of the user and I/O capability for reading and writing information to various cards such as smartcards, magnetic cards, optical cards or EAROM cards. The PDA includes a Universal Card, which is common generic smartcard with a unique imprint provided by a service provider, on which selected financial or personal information stored in the PDA can be downloaded to perform certain consumer transactions. The PDA includes a modem, a serial port and/or a parallel port so as to provide direct communication capability with peripheral devices (such as POS and ATM terminals) and is capable of transmitting or receiving information through wireless communications such as radio frequency (RF) and infrared (IR) communication. The present invention is preferably operated in two modes, i.e., a client/server mode and a local mode. The client/server mode is periodically performed to download a temporary digital certificate (which is necessary to access selected information stored in the PDA and to write such information to the Universal Card) from a central server of the service provider of the PDA and Universal Card. Next, the local mode of operation is performed by providing the PDA with biometric data and selecting one of the pre-enrolled credit cards that are stored in the PDA. Upon biometric verification, the Universal Card is written with the selected card information, which is then used to initiate a consumer transaction. In the absence of an unexpired digital certificate, however, the selected card information will not be written to the Universal Card, notwithstanding that the user may have passed local biometric verification.
摘要:
Voice-controlled customized commands including customization of the command to be preformed, such as a number to be dialed to make a connection with an address of a corporate voice dialing system, and the speech pattern or utterance which may be enrolled by a user to invoke the command can be used by other users, if authorized by the enrolling user. When a current user wants to use a customized command enrolled by another user, a preferably voice actuated command is invoked to cause the search of a database containing a page of customized commands for each user and the return of commands to which access of a current user is authorized in accordance with aliases established by the enrolling user. The returned commands are preferably presented to the current user as a menu from which the current user can make a selection and obtain execution of the authorized command.
摘要:
Methods and apparatus are provided for processing an information signal containing content presented in accordance with at least one modality. In one aspect of the present invention, a method of processing an information signal containing content presented in accordance with at least one modality, comprises the steps of: (i) obtaining the information signal; (ii) performing content detection on the information signal to detect whether the information signal includes particular content presented in accordance with the at least one modality; and (iii) generating a control signal, when the particular content is detected, for use in controlling a rendering property of the particular content and/or implementation of a specific action relating to the particular content. Various illustrative embodiments in the context of speech signal processing for use in voicemail and/or cellular phone applications are provided, as well as illustrative embodiments associated with the processing of multi-modal or multimedia information signals. Also, the present invention provides for storing selectively marked information, even in the absence of content detection, such that the information may be rendered and/or used at a later time. The invention also extends to processing of text-based and markup language-based signals, e.g., XML documents.
摘要:
A method of validating production of a biometric attribute allegedly associated with a user comprises the following steps. A first signal is generated representing data associated with the biometric attribute allegedly received in association with the user. A second signal is also generated representing data associated with at least one feature detected in association with the production of the biometric attribute allegedly received from the user. Then, the first signal and the second signal are compared to determine a correlation level between the biometric attribute and the production feature, wherein the validation of the production of the biometric attribute depends on the correlation level. Accordingly, the invention serves to provide substantial assurance that the biometric attribute offered by the user has been physically generated by the user.
摘要:
A method of identifying homophones of a word uttered by a user from at least a portion of existing words of a vocabulary of a speech recognition engine comprises the steps of: a user uttering the word; decoding the uttered word; computing respective measures between the decoded word and at least a portion of the other existing vocabulary words, the respective measures indicative of acoustic similarity between the word and the at least a portion of other existing words; if at least one measure is within a threshold range, indicating, to the user, results associated with the at least one measure, the results preferably including the decoded word and the other existing vocabulary word associated with the at least one measure; and the user preferably making a selection depending on the word the user intended to utter.