Abstract:
There is provided an association apparatus for associating a plurality of voice data converted from voices produced by speakers, comprising: a word/phrase similarity deriving section which derives an appearance ratio of a common word/phrase that is common among the voice data based on a result of speech recognition processing on the voice data, as a word/phrase similarity; a speaker similarity deriving section which derives a result of comparing characteristics of voices extracted from the voice data, as a speaker similarity; an association degree deriving section which derives a possibility of the plurality of the voice data, which are associated with one another, based on the derived word/phrase similarity and the speaker similarity, as an association degree; and an association section which associates the plurality of the voice data with one another, the derived association degree of which is equal to or more than a preset threshold.
Abstract:
System, method and program for controlling a mute function on a telephone device. While the mute function is active, sound reaching a telephone or other communication device is sensed, and a determination is made if the sound includes a word. If so, an alarm is activated to alert a user that the mute function is active. If not, the alarm is not activated. In accordance with an optional feature of the present invention, speech recognition software is trained to recognize the voice or speech pattern of a specific user, and the alarm is activated only if the word was spoken by the specific user.
Abstract:
A method and apparatus for generating a voice tag (140) includes a means (110) for combining (205) a plurality of utterances (106, 107, 108) into a combined utterance (111) and a means (120) for extraction (210) of the voice tag as a sequence of phonemes having a high likelihood of representing the combined utterance, using a set of stored phonemes (115) and the combined utterance.
Abstract:
A system and method provides universal access to voice-based documents containing information formatted using MIME and HTML standards using customized extensions for voice information access and navigation. These voice documents are linked using HTML hyper-links that are accessible to subscribers using voice commands, touch-tone inputs and other selection means. These voice documents and components in them are addressable using HTML anchors embedding HTML universal resource locators (URLs) rendering them universally accessible over the Internet. This collection of connected documents forms a voice web. The voice web includes subscriber-specific documents including speech training files for speaker dependent speech recognition, voice print files for authenticating the identity of a user and personal preference and attribute files for customizing other aspects of the system in accordance with a specific subscriber.
Abstract:
A system and method provides universal access to voice-based documents containing information formatted using MIME and HTML standards using customized extensions for voice information access and navigation. These voice documents are linked using HTML hyper-links that are accessible to subscribers using voice commands, touch-tone inputs and other selection means. These voice documents and components in them are addressable using HTML anchors embedding HTML universal resource locators (URLs) rendering them universally accessible over the Internet. This collection of connected documents forms a voice web. The voice web includes subscriber-specific documents including speech training files for speaker dependent speech recognition, voice print files for authenticating the identity of a user and personal preference and attribute files for customizing other aspects of the system in accordance with a specific subscriber.
Abstract:
A telephone network provides personalized services based on voice identification of a person making or receiving a call. For example, when a person initiates a call, the network executes a speaker identification/verification procedure to identify the person as a subscriber. The identification/verification process provides a virtual office equipment number corresponding to the identified subscriber. The central office switch servicing the outgoing call receives this virtual number and uses it to retrieve a service profile associated with the subscriber. The switch provides a personalized telephone service to the subscriber during processing of the call, using the retrieved service profile. The personalized service may include a number of unique features, such as recording call related data together with data about the identification of the person. For home incarceration or the like, the service may entail monitoring speech communications to detect certain designated words and automatically terminate the call.
Abstract:
An advanced telecommunications system is provided for the recognizing of spoken commands over a cellular telephone, satellite telephone, or personal communications network. In the cellular application, for example, a Speech Recognition System interconnects either internally with or as an external peripheral to a cellular telecommunications switch. The Speech Recognition System includes an administrative subsystem, a call processing subsystem, a speaker-dependent recognition subsystem, a speaker-independent recognition subsystem, and a data storage subsystem. The Speech Recognition System also allows for increased efficiency in the cellular telephone network by integrating with the switch or switches as a shared resource. The administrative subsystem of the Speech Recognition System is used to keep statistical logs of pertinent call information. Pre-recorded instructional messages are stored in the memory of the call processing subsystem for instructing a user on his or her progress in using the system. The speaker-independent recognition subsystem allows the user to interact with the system employing non-user specific functions. User specific functions are controlled with the speaker-dependent recognition subsystem. User specific attributes collected by the recognition subsystems are stored in the data storage subsystem.
Abstract:
A computer-implemented method is provided for quantitative performance evaluation of a call agent. The method comprises converting an audio recording of a call between the call agent and a customer to a text-based transcript and identifying at least one topic for categorizing the transcript. The method also includes retrieving a set of criteria associated with the topic. Each criterion correlates to a set of predefined questions for interrogating the transcript to evaluate the performance of the call agent with respect to the corresponding criterion. Each question captures a sub-criterion under the corresponding criterion. The method further includes inputting the predefined questions and the transcript into a trained large language model to obtain scores for respective ones of the predefined questions. Each score measures a degree of satisfaction of the performance of the call agent during the call with respect to the sub-criterion captured by the corresponding predefined question.
Abstract:
Disclosed are methods and systems for voice phishing monitoring. For instance, a method includes receiving voice data of an incoming call to a communication device from an application associated with a user account and executing on the device, identifying an entity and interaction allegedly associated with the incoming call from the voice data, determining first fraud indicator data based on a number of the incoming call and second fraud indicator data based on a correspondence of user account interaction data to the entity and/or interaction, and providing the voice data to a trained machine learning system to receive third fraud indicator data based on content and/or a voice characteristic identified from the voice data. The method may further include determining a status for the incoming call of fraudulent or confirmed based on the first, second, and third fraud indicator data, and generating a notification indicating the status for display.
Abstract:
The present disclosure relates generally to systems, methods, instructions, and other aspects describing automated transcription and associated script generation. In one aspect, a method includes facilitating a voice bot segment of a two-way communication session, where the voice bot segment is between a customer device and a non-human bot agent, and transfer of the session to a human agent device as part of a human voice segment of the two-way communication session, wherein the transfer occurs following a failure of the non-human bot agent to resolve a customer issue. Accessing survey data describing the two-way communication session, wherein the survey data is associated with successful resolution of the customer issue and automatically processing transcript data from the two-way communication with the survey data to identify language data from the transcript associated with resolution of the customer issue. The non-human bot agent is then dynamically updated using the language data.