ATTENTION-BASED DECODER-ONLY SEQUENCE TRANSDUCTION NEURAL NETWORKS

    公开(公告)号:US20240220796A1

    公开(公告)日:2024-07-04

    申请号:US18403992

    申请日:2024-01-04

    申请人: Google LLC

    IPC分类号: G06N3/08 G06N3/045

    CPC分类号: G06N3/08 G06N3/045

    摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating an output sequence from an input sequence. One of the methods includes, at each of a plurality of generation time steps: generating a combined sequence for the generation time step that includes the input sequence followed by the output tokens that have already been generated as of the generation time step; processing the combined sequence using a self-attention decoder neural network to generate a time step output that defines a score distribution over a set of possible output tokens; and selecting, using the time step output, an output token from the set of possible output tokens as the next output token in the output sequence.

    ATTENTION-BASED DECODER-ONLY SEQUENCE TRANSDUCTION NEURAL NETWORKS

    公开(公告)号:US20240256859A1

    公开(公告)日:2024-08-01

    申请号:US18403966

    申请日:2024-01-04

    申请人: Google LLC

    IPC分类号: G06N3/08 G06N3/045

    CPC分类号: G06N3/08 G06N3/045

    摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating an output sequence from an input sequence. One of the methods includes, at each of a plurality of generation time steps: generating a combined sequence for the generation time step that includes the input sequence followed by the output tokens that have already been generated as of the generation time step; processing the combined sequence using a self-attention decoder neural network to generate a time step output that defines a score distribution over a set of possible output tokens; and selecting, using the time step output, an output token from the set of possible output tokens as the next output token in the output sequence.

    Attention-based decoder-only sequence transduction neural networks

    公开(公告)号:US11886998B2

    公开(公告)日:2024-01-30

    申请号:US18096946

    申请日:2023-01-13

    申请人: Google LLC

    IPC分类号: G06N3/08 G06N3/045

    CPC分类号: G06N3/08 G06N3/045

    摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating an output sequence from an input sequence. One of the methods includes, at each of a plurality of generation time steps: generating a combined sequence for the generation time step that includes the input sequence followed by the output tokens that have already been generated as of the generation time step; processing the combined sequence using a self-attention decoder neural network to generate a time step output that defines a score distribution over a set of possible output tokens; and selecting, using the time step output, an output token from the set of possible output tokens as the next output token in the output sequence.

    Attention-based decoder-only sequence transduction neural networks

    公开(公告)号:US11556786B2

    公开(公告)日:2023-01-17

    申请号:US16759690

    申请日:2018-10-29

    申请人: GOOGLE LLC

    IPC分类号: G06N3/08 G06N3/04

    摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating an output sequence from an input sequence. One of the methods includes, at each of a plurality of generation time steps: generating a combined sequence for the generation time step that includes the input sequence followed by the output tokens that have already been generated as of the generation time step; processing the combined sequence using a self-attention decoder neural network to generate a time step output that defines a score distribution over a set of possible output tokens; and selecting, using the time step output, an output token from the set of possible output tokens as the next output token in the output sequence.

    TRAINING TEXT SUMMARIZATION NEURAL NETWORKS WITH AN EXTRACTED SEGMENTS PREDICTION OBJECTIVE

    公开(公告)号:US20210350229A1

    公开(公告)日:2021-11-11

    申请号:US17140863

    申请日:2021-01-04

    申请人: Google LLC

    IPC分类号: G06N3/08 G06N3/04 G06F40/30

    摘要: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a text summarization neural network. One of the methods includes pre-training the text summarization neural network including learning values of a plurality of network parameters through self-supervised learning using unlabeled data comprising unlabeled first texts, the pre-training including: obtaining an unlabeled first text comprising a plurality of segments; selecting one or more of the plurality of segments; processing a masked first text that excludes the one or more selected segments to generate a prediction of the one or more selected segments; and determining, based on a difference between the prediction and the one or more selected segments, an update to the current values of the plurality of network parameters; adapting the pre-trained text summarization neural network for a specific text summarization task using labeled data comprising second texts and respective summaries of the second texts.

    Training text summarization neural networks with an extracted segments prediction objective

    公开(公告)号:US11803751B2

    公开(公告)日:2023-10-31

    申请号:US17140863

    申请日:2021-01-04

    申请人: Google LLC

    IPC分类号: G06N3/08 G06F40/30 G06N3/045

    CPC分类号: G06N3/08 G06F40/30 G06N3/045

    摘要: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a text summarization neural network. One of the methods includes pre-training the text summarization neural network including learning values of a plurality of network parameters through self-supervised learning using unlabeled data comprising unlabeled first texts, the pre-training including: obtaining an unlabeled first text comprising a plurality of segments; selecting one or more of the plurality of segments; processing a masked first text that excludes the one or more selected segments to generate a prediction of the one or more selected segments; and determining, based on a difference between the prediction and the one or more selected segments, an update to the current values of the plurality of network parameters; adapting the pre-trained text summarization neural network for a specific text summarization task using labeled data comprising second texts and respective summaries of the second texts.

    ATTENTION-BASED DECODER-ONLY SEQUENCE TRANSDUCTION NEURAL NETWORKS

    公开(公告)号:US20200342316A1

    公开(公告)日:2020-10-29

    申请号:US16759690

    申请日:2018-10-29

    申请人: GOOGLE LLC

    IPC分类号: G06N3/08 G06N3/04

    摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating an output sequence from an input sequence. One of the methods includes, at each of a plurality of generation time steps: generating a combined sequence for the generation time step that includes the input sequence followed by the output tokens that have already been generated as of the generation time step; processing the combined sequence using a self-attention decoder neural network to generate a time step output that defines a score distribution over a set of possible output tokens; and selecting, using the time step output, an output token from the set of possible output tokens as the next output token in the output sequence.

    ATTENTION-BASED DECODER-ONLY SEQUENCE TRANSDUCTION NEURAL NETWORKS

    公开(公告)号:US20240211752A1

    公开(公告)日:2024-06-27

    申请号:US18404014

    申请日:2024-01-04

    申请人: Google LLC

    IPC分类号: G06N3/08 G06N3/045

    CPC分类号: G06N3/08 G06N3/045

    摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating an output sequence from an input sequence. One of the methods includes, at each of a plurality of generation time steps: generating a combined sequence for the generation time step that includes the input sequence followed by the output tokens that have already been generated as of the generation time step; processing the combined sequence using a self-attention decoder neural network to generate a time step output that defines a score distribution over a set of possible output tokens; and selecting, using the time step output, an output token from the set of possible output tokens as the next output token in the output sequence.

    ATTENTION-BASED DECODER-ONLY SEQUENCE TRANSDUCTION NEURAL NETWORKS

    公开(公告)号:US20240211751A1

    公开(公告)日:2024-06-27

    申请号:US18403939

    申请日:2024-01-04

    申请人: Google LLC

    IPC分类号: G06N3/08 G06N3/045

    CPC分类号: G06N3/08 G06N3/045

    摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating an output sequence from an input sequence. One of the methods includes, at each of a plurality of generation time steps: generating a combined sequence for the generation time step that includes the input sequence followed by the output tokens that have already been generated as of the generation time step; processing the combined sequence using a self-attention decoder neural network to generate a time step output that defines a score distribution over a set of possible output tokens; and selecting, using the time step output, an output token from the set of possible output tokens as the next output token in the output sequence.

    TRAINING TEXT SUMMARIZATION NEURAL NETWORKS WITH AN EXTRACTED SEGMENTS PREDICTION OBJECTIVE

    公开(公告)号:US20240185065A1

    公开(公告)日:2024-06-06

    申请号:US18485950

    申请日:2023-10-12

    申请人: Google LLC

    IPC分类号: G06N3/08 G06F40/30 G06N3/045

    CPC分类号: G06N3/08 G06F40/30 G06N3/045

    摘要: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a text summarization neural network. One of the methods includes pre-training the text summarization neural network including learning values of a plurality of network parameters through self-supervised learning using unlabeled data comprising unlabeled first texts, the pre-training including: obtaining an unlabeled first text comprising a plurality of segments; selecting one or more of the plurality of segments; processing a masked first text that excludes the one or more selected segments to generate a prediction of the one or more selected segments; and determining, based on a difference between the prediction and the one or more selected segments, an update to the current values of the plurality of network parameters; adapting the pre-trained text summarization neural network for a specific text summarization task using labeled data comprising second texts and respective summaries of the second texts.