-
1.
公开(公告)号:US10901708B1
公开(公告)日:2021-01-26
申请号:US16198969
申请日:2018-11-23
Applicant: Amazon Technologies, Inc.
Inventor: Russell Reas , Neela Sawant , Srinivasan Sengamedu Hanumantha Rao , Yinglong Wang , Anton Emelyanov , Shishir Sethiya
Abstract: Techniques for unsupervised learning of embeddings on source code from non-local contexts are described. Code can be processed to generate an abstract syntax tree (AST) which represents syntactic paths between tokens in the code. Once the AST(s) have been generated, the paths in the AST(s) can be crawled to identify terminals (e.g., leaf nodes in the AST) and paths between terminals can be identified. The pairs of tokens identified at the ends of each path can then be used to generate a cooccurrence matrix. For example, if X number of unique terminals are identified, a matrix of size X by X can be generated to indicate a frequency at which pairs of terminals cooccur. This cooccurrence matrix can then be used as input to existing techniques for learning vector-space embeddings, such as word2vec, GloVe, Swivel, etc.