Abstract:
Systems and methods of script identification in audio data obtained from audio data. The audio data is segmented into a plurality of utterances. A script model representative of a script text is obtained. The plurality of utterances are decoded with the script model. A determination is made if the script text occurred in the audio data.
Abstract:
Machine learning techniques for classifying encrypted traffic with a high degree of accuracy. The techniques do not require decrypting any traffic and may not require any manually-labeled traffic samples. An automated system uses an application of interest to perform a large number of user actions of various types. The system further records, in a log, the respective times at which the actions were performed. The system further receives the encrypted traffic exchanged between the system and the application server, and records properties of this traffic in a time series. Subsequently, by correlating between the times in the log and the times at which the traffic was received, the system matches each of the user actions with a corresponding portion of the traffic, which is assumed to have been generated by the user action. The system thus automatically builds a labeled training set, which may be used to train a network-traffic classifier.
Abstract:
Systems and methods of diarization using linguistic labeling include receiving a set of diarized textual transcripts. A least one heuristic is automatedly applied to the diarized textual transcripts to select transcripts likely to be associated with an identified group of speakers. The selected transcripts are analyzed to create at least one linguistic model. The linguistic model is applied to transcripted audio data to label a portion of the transcripted audio data as having been spoken by the identified group of speakers. Still further embodiments of diarization using linguistic labeling may serve to label agent speech and customer speech in a recorded and transcripted customer service interaction.
Abstract:
Systems, methods, and media for developing ontologies and analyzing communication data are provided herein. In an example implementation, the method includes: identifying terms in in a set of communication data; identifying a list of possible relations of the identified terms; scoring the possible relations according to a set of predefined merits; ranking the possible relations into a list of possible relations in descending order according to their score; and tagging relations in the set of communication data. The relations may be tagged by identifying the possible relations in the communication data in order corresponding with the list of possible relations. The possible relations that have lower rankings that conflict with higher ranking relations are not tagged. The conflicts may be determined by a predefined set of conflict criteria.
Abstract:
A method of evaluating scripts in an interpersonal communication includes monitoring a customer service interaction. At least one portion of a script is identified. At least one script requirement is determined. A determination is made whether the at least one portion of the script meets the at least one script requirement. An alert is generated indicative of a non-compliant script.
Abstract:
A method for converting speech to text in a speech analytics system is provided. The method includes receiving audio data containing speech made up of sounds from an audio source, processing the sounds with a phonetic module resulting in symbols corresponding to the sounds, and processing the symbols with a language module and occurrence table resulting in text. The method also includes determining a probability of correct translation for each word in the text, comparing the probability of correct translation for each word in the text to the occurrence table, and adjusting the occurrence table based on the probability of correct translation for each word in the text.
Abstract:
Methods, systems, and computer readable media for automated transcription model adaptation includes obtaining audio data from a plurality of audio files. The audio data is transcribed to produce at least one audio file transcription which represents a plurality of transcription alternatives for each audio file. Speech analytics are applied to each audio file transcription. A best transcription is selected from the plurality of transcription alternatives for each audio file. Statistics from the selected best transcription are calculated. An adapted model is created from the calculated statistics.
Abstract:
Systems and methods of diarization using linguistic labeling include receiving a set of diarized textual transcripts. A least one heuristic is automatedly applied to the diarized textual transcripts to select transcripts likely to be associated with an identified group of speakers. The selected transcripts are analyzed to create at least one linguistic model. The linguistic model is applied to transcripted audio data to label a portion of the transcripted audio data as having been spoken by the identified group of speakers. Still further embodiments of diarization using linguistic labeling may serve to label agent speech and customer speech in a recorded and transcripted customer service interaction.
Abstract:
Systems and methods of script identification in audio data obtained from audio data. The audio data is segmented into a plurality of utterances. A script model representative of a script text is obtained. The plurality of utterances are decoded with the script model. A determination is made if the script text occurred in the audio data.
Abstract:
A traffic-monitoring system that monitors encrypted traffic exchanged between IP addresses used by devices and a network, and further receives the user-action details that are passed over the network. By correlating between the times at which the encrypted traffic is exchanged and the times at which the user-action details are received, the system associates the user-action details with the IP addresses. In particular, for each action specified in the user-action details, the system identifies one or more IP addresses that may be the source of the action. Based on the IP addresses, the system may identify one or more users who may have performed the action. The system may correlate between the respective action-times of the encrypted actions and the respective approximate action-times of the indicated actions. The system may hypothesize that the indicated action may correspond to one of the encrypted actions having these action-times.