Context-Aware Text Sanitization
    72.
    发明公开

    公开(公告)号:US20240184912A1

    公开(公告)日:2024-06-06

    申请号:US18060921

    申请日:2022-12-01

    Applicant: PayPal, Inc.

    CPC classification number: G06F21/6245 G06F40/284 G06F40/295

    Abstract: Techniques are disclosed relating to text sanitization. Given textual data, a computer system identifies tokens predicted to constitute sensitive information. Multi-field data structures (e.g., triplets) are generated for the identified tokens that include questions, answers, and corresponding context. These data structures are supplied to a pre-trained multiple-choice question (MCQ) reading comprehension model. The model outputs, for each data structure, a probability that the question and answer for a given data structure, provided the context, is accurate. A post-processing module can then rank probabilities in this set of probabilities and select the multi-field data structure with the highest probability (in some cases, a programmable threshold must also be met). The selected multi-field data structure is then used to select category information to be used in sanitizing the textual data. In this manner, a piece of sensitive data may be replaced by a label that helps retain interpretability of the sanitized text.

    SELECTION SYSTEM FOR CONTEXTUAL PREDICTION PROCESSING VERSUS CLASSICAL PREDICTION PROCESSING

    公开(公告)号:US20240169152A1

    公开(公告)日:2024-05-23

    申请号:US17993048

    申请日:2022-11-23

    CPC classification number: G06F40/284 G06F3/017 G06F40/40

    Abstract: Apparatus, methods and systems for contextual prediction processing is provided. Methods may include receiving a conversation from an entity. The conversation may include current utterance, previous utterances and details. Methods may include using an action-topic ontology to build, using data retrieved from the current utterance, a conversation frame that corresponds to the current utterance. Methods may include merging the conversation frame with data, retrieved from the previous utterances and the details, to generate a target conversation frame. Methods may include validating the target conversation frame to prevent looping over historic data in the event that the current utterance fails to add relevant information. Methods may include generating an enhanced contextual utterance based on algorithms and the target conversation frame. The enhanced contextual utterance may be used to understand the current utterance in a context of the conversation. Methods may include returning the enhanced contextual utterance to the entity.

    SYSTEMS AND METHODS FOR BUILDING A DOMAIN-AGNOSTIC ANSWERING SYSTEM USING SUMMARIZATION DRIVEN SCORING

    公开(公告)号:US20240168983A1

    公开(公告)日:2024-05-23

    申请号:US17992318

    申请日:2022-11-22

    CPC classification number: G06F16/345 G06F16/31 G06F40/284

    Abstract: A domain-agnostic answering system configured to: (a) receive a question and one or more documents; (b) generate summary representations the one or more documents, each summary representation including a summary having one or more sentences and a score vector; (d) determine that a first summary representation of the summary representations is a winning candidate for extracting an answer to the question; (e) match the first summary representation to a first document in the one or more documents to obtain reference indexes of sentences in the first summary representation in portions of the first document; (f) determine a start logit vector and an end logit vector from the question and the matched first summary representation; and (g) generate a start span and an end span from the start logit vector, the end logit vector, and the score vector associated with the first summary representation, the start span and the end span representing the answer to the question.

    INTELLIGENT PREDICTION OF NEXT STEP SENTENCES FROM A COMMUNICATION SESSION

    公开(公告)号:US20240143936A1

    公开(公告)日:2024-05-02

    申请号:US17978074

    申请日:2022-10-31

    CPC classification number: G06F40/35 G06F40/284

    Abstract: Methods and systems provide for extracting next step sentences from a communication session. In one embodiment, the system defines a set of annotation guidelines for labeling training data; receives a set of labeled training data including sentences from a transcript of a communication session, a subset of the sentences being associated with a positive label; organizes the labeled training data and trains a model with the labeled training data, the training including, for each of the sentences, inputting the sentence into a language model and a classification head to output a number of class probabilities, and inputting a classification token representing the sentence into a classification head; using a number of classifiers from the trained model to generate ensemble class scores; and using the ensemble class scores to predict one or more next step sentences from the sentences in the transcript.

Patent Agency Ranking