AUTOMATIC CLASSIFICATION OF PHONE CALLS USING REPRESENTATION LEARNING BASED ON THE HIERARCHICAL PITMAN-YOR PROCESS

    公开(公告)号:EP4134839A1

    公开(公告)日:2023-02-15

    申请号:EP22188804.3

    申请日:2022-08-04

    申请人: Invoca, Inc.

    发明人: MCCOURT, Michael

    摘要: Embodiments of the disclosed technology include a representation learning model for classification of natural language text. In embodiments, a classification model comprises a feature model and a classifier. The feature model may be hierarchical in nature: data may pass through a series of representations, decreasing in specificity and increasing in generality. Intermediate levels of representation may then be used as automatically learned features to train a statistical classifier. Specifically, the feature model may be based on a hierarchical Pitman-Yor process. In embodiments, once the feature model has been expressed as a Bayesian Belief Network and some aspect of the feature model has been selected for prediction, the feature model may be attached to the classifier. In embodiments, after training, potentially using a mix of labeled and unlabeled data, the classification model can be used to classify documents such as call transcripts based on topics of conversation represented in the transcripts.

    DIALOG ANALYSIS USING VOICE ENERGY LEVEL
    2.
    发明公开

    公开(公告)号:EP4266659A1

    公开(公告)日:2023-10-25

    申请号:EP23154254.9

    申请日:2023-01-31

    申请人: Invoca, Inc.

    发明人: KIRCHHOFF, Leland

    IPC分类号: H04M3/51 G10L25/78

    摘要: A computer-implemented method for analyzing whether a phone call is answered by a human agent. The computer-implemented method receives phone call audio data of the phone call and separates the phone call audio data into caller stream data and agent stream data that each includes a plurality of frames and calculates decibel level for each frame. In response to measuring alternating groups of frames in the caller stream data and agent stream data that exceed a dialog decibel threshold, the computer-implemented method further identifies a dialog in the phone call audio data. In response to measuring decibel levels that exceed the dialog decibel threshold in corresponding frames in both the caller stream data and agent stream data, the computer-implemented method further identifies talkover in the phone call audio data. Furthermore, in response to identifying the dialog and if a level of talkover in the phone call audio data does not exceed a talkover threshold, the computer-implemented method determines the call is answered by the human agent.

    TOPIC-BASED SEMANTIC SEARCH OF ELECTRONIC DOCUMENTS BASED ON MACHINE LEARNING MODELS FROM BAYESIAN BELIEF NETWORKS

    公开(公告)号:EP4432121A1

    公开(公告)日:2024-09-18

    申请号:EP23170670.6

    申请日:2023-04-28

    申请人: Invoca, Inc.

    IPC分类号: G06F16/33 G06F16/35 G06N7/01

    摘要: A computer-implemented method executed using a computing device comprises digitally generating and storing a machine learning statistical topic model in computer memory, the topic model being programmed to model call transcript data representing words spoken on a call as a function of one or more topics of a set of topics, the set of topics being modeled to comprise a set of pre-seeded topics and a set of non-pre-seeded topics, and the one or more topics being modeled as a function of a probability distribution of topics; programmatically pre-seeding the topic model with a set of keyword groups, each keyword group associating a respective set of keywords with a topic of the set of pre-seeded topics; programmatically training the topic model using unlabeled training data; conjoining a classifier to the topic model to create a classifier model, the classifier defining a joint probability distribution over topic vectors and one or more observed labels; programmatically training the classifier model using labeled training data; receiving target call transcript data comprising an electronic digital representation of a verbal transcription of a target call; programmatically determining, using the classifier model, at least one of one or more topics of the target call or one or more classifications of the target call; digitally storing the target call transcript data with additional data indicating the determined one or more topics of the target call and/or the determined one or more classifications of the target call; accessing, in computer storage, a first digitally stored electronic document comprising a first text; receiving computer input specifying a search query comprising one or more search terms; processing the search query using the classifier model to output a query topic vector representing a thematic content of the search query; processing the first text using the classifier model to output and store in the computer memory a first plurality of topic vectors each representing a topic in the text; using the query topic vector and the first plurality of topic vectors, calculating a plurality of similarity values, each of the similarity values representing a similarity of the query topic vector to a particular topic vector among the first plurality of topic vectors; outputting a visual display that specifies one or more topic vectors among the first plurality of topic vectors having one or more corresponding similarity values that are greater than a specified threshold similarity value.

    PITMAN-YOR PROCESS TOPIC MODELING PRE-SEEDED BY KEYWORD GROUPINGS

    公开(公告)号:EP4198809A1

    公开(公告)日:2023-06-21

    申请号:EP22213527.9

    申请日:2022-12-14

    申请人: Invoca, Inc.

    摘要: In one embodiment, the disclosed technology involves: digitally generating and storing a machine learning statistical topic model in computer memory, the topic model being programmed to model call transcript data representing words spoken on a call as a function of one or more topics of a set of topics that includes pre-seeded topics and non-pre-seeded topics; programmatically pre-seeding the topic model with a set of keyword groups; programmatically training the topic model using unlabeled training data; conjoining a classifier to the topic model to create a classifier model; programmatically training the classifier model using labeled training data; receiving target call transcript data; programmatically determining at least one of one or more topics of the target call or one or more classifications of the target call; and digitally storing the target call transcript data with additional data indicating the determined topics and/or classifications of the target call.