FAITHFUL GENERATION OF OUTPUT TEXT FOR MULTIMODAL APPLICATIONS

    公开(公告)号:US20250078818A1

    公开(公告)日:2025-03-06

    申请号:US18590222

    申请日:2024-02-28

    Abstract: Systems and techniques are described for generating and using unimodal/multimodal generative models that mitigate hallucinations. For example, a computing device can encode input data to generate encoded representations of the input data. The computing device can obtain intermediate data including a plurality of partial sentences associated with the input data and can generate, based on the intermediate data, at least one complete sentence associated with the input data. The computing device can encode the at least one complete sentence to generate at least one encoded representation of the at least one complete sentence. The computing device can generate a faithfulness score based on a comparison of the encoded representations of the input data and the at least one encoded representation of the at least one complete sentence. The computing device can re-rank the plurality of partial sentences of the intermediate data based on the faithfulness score to generate re-ranked data.

    SEMANTICALLY-AUGMENTED CONTEXT REPRESENTATION GENERATION

    公开(公告)号:US20230034450A1

    公开(公告)日:2023-02-02

    申请号:US17383284

    申请日:2021-07-22

    Abstract: A device includes a memory configured to store instructions. The device also includes one or more processors configured to execute the instructions to provide context and one or more items of interest corresponding to the context to a dependency network encoder to generate a semantic-based representation of the context. The one or more processors are also configured to provide the context to a data dependent encoder to generate a context-based representation. The one or more processors are further configured to combine the semantic-based representation and the context-based representation to generate a semantically-augmented representation of the context.

    AUTOMATED AUDIO CAPTION CORRECTION USING FALSE ALARM AND MISS DETECTION

    公开(公告)号:US20250078828A1

    公开(公告)日:2025-03-06

    申请号:US18811349

    申请日:2024-08-21

    Abstract: Systems and techniques are provided for natural language processing. A system generates a plurality of tokens (e.g., words or portions thereof) based on input content (e.g., text and/or speech). The system searches through the plurality of tokens to generate a first ranking the plurality of tokens based on probability. The system generates natural language inference (NLI) scores for the plurality of tokens to generate a second ranking of the plurality of tokens based on faithfulness to the input content (e.g., whether the tokens produce statements that are true based on the input content). The system generates output text that includes at least one token selected from the plurality of tokens based on the first ranking and the second ranking.

    KNOWLEDGE-BASED AUDIO SCENE GRAPH

    公开(公告)号:US20240419731A1

    公开(公告)日:2024-12-19

    申请号:US18738243

    申请日:2024-06-10

    Abstract: A device includes a processor configured to obtain a first audio embedding of a first audio segment and obtain a first text embedding of a first tag assigned to the first audio segment. The first audio segment corresponds to a first audio event of audio events. The processor is configured to obtain a first event representation based on a combination of the first audio embedding and the first text embedding. The processor is configured to obtain a second event representation of a second audio event of the audio events. The processor is also configured to determine, based on knowledge data, relations between the audio events. The processor is configured to construct an audio scene graph based on a temporal order of the audio events. The audio scene graph constructed to include a first node corresponding to the first audio event and a second node corresponding to the second audio event.

    HALLUCINATION MITIGATION FOR GENERATIVE TRANSFORMER MODELS

    公开(公告)号:US20240184988A1

    公开(公告)日:2024-06-06

    申请号:US18193572

    申请日:2023-03-30

    CPC classification number: G06F40/284 G06F40/253

    Abstract: Systems and techniques are provided for natural language processing. A system generates a plurality of tokens (e.g., words or portions thereof) based on input content (e.g., text and/or speech). The system searches through the plurality of tokens to generate a first ranking the plurality of tokens based on probability. The system generates natural language inference (NLI) scores for the plurality of tokens to generate a second ranking of the plurality of tokens based on faithfulness to the input content (e.g., whether the tokens produce statements that are true based on the input content). The system generates output text that includes at least one token selected from the plurality of tokens based on the first ranking and the second ranking.

Patent Agency Ranking