MULTI-LANGUAGE DOCUMENT FIELD EXTRACTION
    2.
    发明公开

    公开(公告)号:US20240273290A1

    公开(公告)日:2024-08-15

    申请号:US18168450

    申请日:2023-02-13

    申请人: SAP SE

    摘要: A method for multi-language document field extraction may include determining, based on a received document including a plurality of key fields and a plurality of value fields, a plurality of key-value pairs. The method also includes determining whether an encoding of a key field is within a threshold distance from a predetermined encoding of a predefined key field associated with a predefined field type. The method further includes assigning, based on determining the encoding of the key field is within the threshold distance, the predefined field type to the corresponding key-value pair. The method also includes performing a document processing operation based on each key-value pair and the predefined field type assigned to each key-value pair. Related systems and methods are provided.

    Non-lexicalized features for language identity classification using subword tokenization

    公开(公告)号:US12061872B2

    公开(公告)日:2024-08-13

    申请号:US17237961

    申请日:2021-04-22

    发明人: Philip Ogren

    摘要: A natural language identity classifier system is described, which employs a supervised machine learning (ML) model to perform language identity classification on input text. The ML model takes, as input, non-lexicalized features of target text derived from subword tokenization of the text. Specifically, these non-lexicalized features are generated based on statistics determined for tokens identified for the input text. According to an embodiment, at least some of the non-lexicalized features are based on natural language-specific summary statistics that indicate how often tokens were found within a corpus for each natural language. Use of such summary statistics allows for generation of natural language specific conditional probability-based features. Because of the inherent interpretability of a trained non-lexicalized ML model as described herein, it is possible to modify behavior of the trained ML model by adjusting summary statistics maintained for natural language tokens and/or by adjusting data for the subword tokenizers.

    Systems and methods for inclusive conversational artificial intelligence

    公开(公告)号:US12047336B1

    公开(公告)日:2024-07-23

    申请号:US18302175

    申请日:2023-04-18

    申请人: Optum, Inc.

    IPC分类号: H04L51/02 G06F40/263

    CPC分类号: H04L51/02 G06F40/263

    摘要: Systems and methods for dynamically customizing a virtual assistant are disclosed. The systems and methods can receive information associated with a conversation involving the virtual assistant; determine whether a channel switching condition for switching the conversation from a first channel to a second channel is satisfied, based on the information associated with the conversation; determine whether a language switching condition for switching the conversation from a first language to a second language is satisfied, based on the information associated with the conversation; determine whether a configuration switching condition for switching a first configuration of the virtual assistant to a second configuration of the virtual assistant is satisfied, based on the information associated with the conversation; and perform an action based on at least one of the determinations.

    Context-independent conversational flow

    公开(公告)号:US12020211B2

    公开(公告)日:2024-06-25

    申请号:US17302876

    申请日:2021-05-14

    申请人: ADP, INC.

    摘要: A method, apparatus, system, and computer program code for performing a human resource operation using a context-independent conversational flow. The computer system receives an intended human resource operation from an application executing on a user device, identifies the context-independent conversational flow for performing an intended human resource operation, and calls a structured data object according to the context-independent conversational flow. The computer system interprets the structured data object to produce a business rule output, and generates a context-independent response from the business rule output. The computer system transforms the context-independent response according to a user context to produce a context-specific response, and forwards the context-specific response to a user device for display within a conversational user interface of an application.

    METHODS, SYSTEMS, AND MEDIA FOR DETERMINING PLAYLIST TITLE COHERENCE AND QUALITY

    公开(公告)号:US20240176816A1

    公开(公告)日:2024-05-30

    申请号:US18071986

    申请日:2022-11-30

    申请人: Google LLC

    摘要: Methods, systems, and media for determining playlist title coherence and quality are provided. In some embodiments, a method for generating playlist recommendations includes: determining, using a hardware processor, a title of a playlist; generating, using the hardware processor, a byte-level representation of the title based on the title of the playlist; determining, using the hardware processor, an embedded representation of the title based on the byte-level representation; determining, using the hardware processor, a perplexity score of the title by inputting the embedded representation of the title into a trained language model, wherein the perplexity score is an output of the trained language model; and causing, using the hardware processor, a recommendation based on the perplexity score of the title to be presented.

    MULTI-LINGUAL NATURAL LANGUAGE QUERY
    10.
    发明公开

    公开(公告)号:US20240095267A1

    公开(公告)日:2024-03-21

    申请号:US17933990

    申请日:2022-09-21

    摘要: One or more systems, devices, computer program products and/or computer-implemented methods of use provided herein relate to a process to facilitate multi-lingual query interpretation. A system can comprise a memory that stores computer executable components, and a processor that executes the computer executable components stored in the memory, wherein the computer executable components can comprise an annotation component that generates one or more language invariant signals, an interpretation component that generates a complete query intent using the one or more language invariant signals, and a translation component that processes the complete query intent to an executable backend query to facilitate multi-lingual query interpretation. In one or more embodiments, the translation component can be operatively connected with the interpretation component to generate a zero-shot transfer of the one or more language invariant signals.