Translating large source code using sparse self- attention

    公开(公告)号:US12093671B2

    公开(公告)日:2024-09-17

    申请号:US17731593

    申请日:2022-04-28

    Applicant: Google LLC

    CPC classification number: G06F8/51

    Abstract: Techniques are described herein for translating source code using sparse-self attention. In various implementations, a source code snippet in a first programming language may be processed to obtain graph(s) representing snippet tokens, and relationships therebetween. Based on the graph(s), a subset of snippet token pairs may be identified from a superset of all possible token pairs in the source code snippet. Each token pair of the subset may include snippet tokens that are represented by nodes connected by one or more edges of the one or more graphs. A self-attention network of a translation machine learning model may be adapted to sparsely attend across the identified subset of token pairs. The source code snippet may then be processed based on the adapted translation machine learning model to generate a translation of the source code snippet in the second programming language.

    Generation and/or recommendation of tools for automating aspects of computer programming

    公开(公告)号:US12001821B2

    公开(公告)日:2024-06-04

    申请号:US18119640

    申请日:2023-03-09

    Applicant: GOOGLE LLC

    CPC classification number: G06F8/40 G06N20/00

    Abstract: Implementations are described herein for leveraging prior source code transformations to facilitate automatic creation and/or recommendation of tools for automating aspects of source code transformations captured in real time. In various implementations, a transformation made by a programmer to a source code snipped may be captured in a source code editor application in real time. Based on the transformation and the intent, one or more candidate source code transformations may be identified from one or more repositories of prior source code transformations made by one or more other programmers. The source code editor application may be caused to provide output indicative of a tool that is operable to automate one or more edits associated with both the transformation made by the programmer to the source code snippet and with one or more of the candidate source code transformations.

    Translating between programming languages using machine learning

    公开(公告)号:US11842174B2

    公开(公告)日:2023-12-12

    申请号:US16506161

    申请日:2019-07-09

    Applicant: GOOGLE LLC

    CPC classification number: G06F8/41 G06N20/00

    Abstract: Techniques are described herein for translating source code in one programming language to source code in another programming language using machine learning. In various implementations, one or more components of one or more generative adversarial networks, such as a generator machine learning model, may be trained to generate “synthetically-naturalistic” source code that can be used as a translation of source code in an unfamiliar language. In some implementations, a discriminator machine learning model may be employed to aid in training the generator machine learning model, e.g., by being trained to discriminate between human-generated (“genuine”) and machine-generated (“synthetic”) source code.

    Learning and using programming styles

    公开(公告)号:US11748065B2

    公开(公告)日:2023-09-05

    申请号:US17563881

    申请日:2021-12-28

    Applicant: GOOGLE LLC

    CPC classification number: G06F8/33 G06F8/40 G06N3/08

    Abstract: Techniques are described herein for using artificial intelligence to “learn,” statistically, a target programming style that is imposed in and/or evidenced by a code base. Once the target programming style is learned, it can be used for various purposes. In various implementations, one or more generative adversarial networks (“GANs”), each including a generator machine learning model and a discriminator machine learning model, may be trained to facilitate learning and application of target programming style(s). In some implementations, the discriminator(s) and/or generator(s) may operate on graphical input, and may take the form of graph neural networks (“GNNs”), graph attention neural networks (“GANNs”), graph convolutional networks (“GCNs”), etc., although this is not required.

    TRANSLATING LARGE SOURCE CODE USING SPARSE SELF-ATTENTION

    公开(公告)号:US20230350657A1

    公开(公告)日:2023-11-02

    申请号:US17731593

    申请日:2022-04-28

    Applicant: Google LLC

    CPC classification number: G06F8/51

    Abstract: Techniques are described herein for translating source code using sparse-self attention. In various implementations, a source code snippet in a first programming language may be processed to obtain graph(s) representing snippet tokens, and relationships therebetween. Based on the graph(s), a subset of snippet token pairs may be identified from a superset of all possible token pairs in the source code snippet. Each token pair of the subset may include snippet tokens that are represented by nodes connected by one or more edges of the one or more graphs. A self-attention network of a translation machine learning model may be adapted to sparsely attend across the identified subset of token pairs. The source code snippet may then be processed based on the adapted translation machine learning model to generate a translation of the source code snippet in the second programming language.

    REFACTORING AND/OR REARCHITECTING SOURCE CODE USING MACHINE LEARNING

    公开(公告)号:US20230251856A1

    公开(公告)日:2023-08-10

    申请号:US17668974

    申请日:2022-02-10

    Applicant: Google LLC

    CPC classification number: G06F8/72 G06N3/10

    Abstract: Implementations are described herein for leveraging machine learning to automate source code refactoring and/or rearchitecting. In various implementations, one or more ground truth boundaries may be removed from one or more boundaried source code files to produce one or more boundary-less source code files. One or more of the boundary-less source code files may be processed using a machine learning model to predict one or more candidate boundaries for reintroduction into the one or more boundary-less source code files. The one or more ground truth boundaries may be compared with the one or more predicted candidate boundaries. The machine learning model may be trained based on the comparing.

    Generation and/or recommendation of tools for automating aspects of computer programming

    公开(公告)号:US11604628B2

    公开(公告)日:2023-03-14

    申请号:US17123768

    申请日:2020-12-16

    Applicant: Google LLC

    Abstract: Implementations are described herein for leveraging prior source code transformations to facilitate automatic creation and/or recommendation of tools for automating aspects of source code transformations captured in real time. In various implementations, a transformation made by a programmer to a source code snipped may be captured in a source code editor application in real time. Based on the transformation and the intent, one or more candidate source code transformations may be identified from one or more repositories of prior source code transformations made by one or more other programmers. The source code editor application may be caused to provide output indicative of a tool that is operable to automate one or more edits associated with both the transformation made by the programmer to the source code snippet and with one or more of the candidate source code transformations.

    GENERATING SYNTHETIC TRAINING DATA FOR PROGRAMMING LANGUAGE TRANSLATION

    公开(公告)号:US20240086164A1

    公开(公告)日:2024-03-14

    申请号:US17940618

    申请日:2022-09-08

    Applicant: Google LLC

    Inventor: Lucas Kramer Bin Ni

    CPC classification number: G06F8/51 G06F8/36 G06N20/00

    Abstract: Techniques are described herein for generating synthetic paired source code snippets that are semantically equivalent but syntactically distinct. In various implementations, few shot learning may be performed to prompt a large language model, based on demonstration source code snippet(s) in syntactically constrained pseudocode, to generate additional source code snippets in the syntactically constrained pseudocode. Based on additional source code snippets in additional programming language(s), the large language model may be used to generate more training source code snippets in the syntactically constrained pseudocode. The training source code snippets in the syntactically constrained pseudocode may be programmatically translated to generate synthetic training pairs of semantically equivalent source code snippets. Each synthetic training pair of the plurality of synthetic training pairs may include training snippets in first and second programming languages, and may be usable to train a machine learning translation model to translate between the first and second programming languages.

    Refactoring and/or rearchitecting source code using machine learning

    公开(公告)号:US11893384B2

    公开(公告)日:2024-02-06

    申请号:US17668974

    申请日:2022-02-10

    Applicant: Google LLC

    CPC classification number: G06F8/72 G06N3/10

    Abstract: Implementations are described herein for leveraging machine learning to automate source code refactoring and/or rearchitecting. In various implementations, one or more ground truth boundaries may be removed from one or more boundaried source code files to produce one or more boundary-less source code files. One or more of the boundary-less source code files may be processed using a machine learning model to predict one or more candidate boundaries for reintroduction into the one or more boundary-less source code files. The one or more ground truth boundaries may be compared with the one or more predicted candidate boundaries. The machine learning model may be trained based on the comparing.

Patent Agency Ranking