Self-attention-based confidence estimation of language models

    公开(公告)号:US12124814B2

    公开(公告)日:2024-10-22

    申请号:US17720912

    申请日:2022-04-14

    申请人: NAVER CORPORATION

    摘要: A confidence estimation system includes: a neural network including at least one an attention module including N heads configured to: generate attention matrices based on interactions between tokens for words in an input sequence of words, the input sequence of words including a word that is obscured; and determine the word that is obscured in the input sequence; and a confidence module configured to determine a confidence value indicative of a probability of the neural network correctly determining the word that is obscured, the confidence module determining the confidence value of the word that is obscured using a convolutional neural network that projects the attention matrices generated by the attention module over a multi-dimensional space, the attention matrices recording interactions between the tokens in the input sequence of words without information regarding the tokens for the words and the word that is obscured.

    SELF-ATTENTION-BASED CONFIDENCE ESTIMATION OF LANGUAGE MODELS

    公开(公告)号:US20230334267A1

    公开(公告)日:2023-10-19

    申请号:US17720912

    申请日:2022-04-14

    申请人: NAVER CORPORATION

    摘要: A confidence estimation system includes: a neural network including at least one an attention module including N heads configured to: generate attention matrices based on interactions between tokens for words in an input sequence of words, the input sequence of words including a word that is obscured; and determine the word that is obscured in the input sequence; and a confidence module configured to determine a confidence value indicative of a probability of the neural network correctly determining the word that is obscured, the confidence module determining the confidence value of the word that is obscured using a convolutional neural network that projects the attention matrices generated by the attention module over a multi-dimensional space, the attention matrices recording interactions between the tokens in the input sequence of words without information regarding the tokens for the words and the word that is obscured.