-
公开(公告)号:US11960867B1
公开(公告)日:2024-04-16
申请号:US18198674
申请日:2023-05-17
Applicant: GOOGLE LLC
Inventor: Rishabh Singh , Hanjun Dai , Manzil Zaheer , Artem Goncharuk , Karen Davis , David Andre
CPC classification number: G06F8/436 , G06F40/279 , G06F40/40 , G06N3/08 , G06N7/01
Abstract: Using a natural language (NL) latent presentation in the automated conversion of source code from a base programming language (e.g., C++) to a target programming language (e.g., Python). A base-to-NL model can be used to generate an NL latent representation by processing a base source code snippet in the base programming language. Further, an NL-to-target model can be used to generate a target source code snippet in the target programming language (that is functionally equivalent to the base source code snippet), by processing the NL latent representation. In some implementations, output(s) from the NL-to-target model indicate canonical representation(s) of variables, and in generating the target source code snippet, technique(s) are used to match those canonical representation(s) to variable(s) of the base source code snippet. In some implementations, multiple candidate target source code snippets are generated, and a subset (e.g., one) is selected based on evaluation(s).
-
公开(公告)号:US11775271B1
公开(公告)日:2023-10-03
申请号:US17316331
申请日:2021-05-10
Applicant: Google LLC
Inventor: Rishabh Singh , Artem Goncharuk , Karen Davis , David Andre
Abstract: Techniques are described herein for translating source code in one programming language to source code in another programming language using machine learning. A method includes: receiving first source code in a first higher-level programming language; processing the first source code, or an intermediate representation thereof, using a sequence-to-sequence neural network model to generate a sequence of outputs, each including a probability distribution; generating second source code in a second higher-level programming language by, for each output in the sequence of outputs: determining a highest probability in the probability distribution associated with the output; in response to the highest probability exceeding a first threshold, generating a predicted portion of the second source code based on a token that corresponds to the highest probability; and in response to the highest probability not exceeding the first threshold, generating a placeholder; and outputting the second source code.
-
公开(公告)号:US11899566B1
公开(公告)日:2024-02-13
申请号:US17318436
申请日:2021-05-12
Applicant: Google LLC
Inventor: Rishabh Singh , David Andre
CPC classification number: G06F11/3684 , G06F11/3688 , G06N5/04 , G06N20/00
Abstract: Training and/or utilization of machine learning model(s) (e.g., neural network model(s)) in automatically generating test case(s) for source code. Techniques disclosed herein can be utilized in generating test case(s) for unit test testing (or other white-box testing) and/or for functional testing (or other black-box testing). In some implementations, the machine learning model(s) can be trained on source code, unit test pairs. In some additional or alternative implementations, reinforcement learning techniques can be utilized to check for correctness of base source code, target source code pairs (e.g., by matching program execution of different branches).
-
公开(公告)号:US11693637B1
公开(公告)日:2023-07-04
申请号:US17319739
申请日:2021-05-13
Applicant: Google LLC
Inventor: Rishabh Singh , Hanjun Dai , Manzil Zaheer , Artem Goncharuk , Karen Davis , David Andre
CPC classification number: G06F8/436 , G06F40/279 , G06F40/40 , G06N3/08 , G06N7/01
Abstract: Using a natural language (NL) latent presentation in the automated conversion of source code from a base programming language (e.g., C++) to a target programming language (e.g., Python). A base-to-NL model can be used to generate an NL latent representation by processing a base source code snippet in the base programming language. Further, an NL-to-target model can be used to generate a target source code snippet in the target programming language (that is functionally equivalent to the base source code snippet), by processing the NL latent representation. In some implementations, output(s) from the NL-to-target model indicate canonical representation(s) of variables, and in generating the target source code snippet, technique(s) are used to match those canonical representation(s) to variable(s) of the base source code snippet. In some implementations, multiple candidate target source code snippets are generated, and a subset (e.g., one) is selected based on evaluation(s).
-
公开(公告)号:US11656867B2
公开(公告)日:2023-05-23
申请号:US17945376
申请日:2022-09-15
Applicant: Google LLC
Inventor: Rishabh Singh , David Andre , Bin Ni , Owen Lewis
Abstract: Implementations are described herein for using machine learning to perform various tasks related to migrating source code based on relatively few (“few shots”) demonstrations. In various implementations, an autoregressive language model may be conditioned based on demonstration tuple(s). In some implementations, a demonstration tuple may include a pre-migration version of a first source code snippet and a post-migration version of the first source code snippet. In other implementations, demonstration tuples may include other data, such as intermediate forms (e.g., natural language descriptions or pseudocode), input-output pairs demonstrating intended behavior, etc. The autoregressive language model may be trained on corpora of source code and natural language documentation on the subject of computer programming. A pre-migration version of a source code file may be processed based on the conditioned autoregressive language model, and a post-migration version may be generated based on output generated based on the conditioned autoregressive model.
-
-
-
-