-
公开(公告)号:US20230119613A1
公开(公告)日:2023-04-20
申请号:US17505531
申请日:2021-10-19
发明人: Zeqi LIN , Yu HU , Haiyuan CAO , Yi LIU , Jian-Guang LOU , Kuralmani ELANGO , PalaniRaj KALIYAPERUMAL , Weizhu CHEN , Kunal MUKERJEE
IPC分类号: G06F40/35 , G06F40/211 , G06F40/186 , G06N20/00
摘要: Examples described herein generate training data for machine learning (ML) for natural language (NL) processing (such as semantic parsing for translating NL). A formula tree is generated based on sampling both a formula grammar and NL templates. Using the formula tree, an ML training data instance pair is generated comprising a formula example and an NL example. A context example may also be used during instantiation of the formula tree. An ML model is trained with training data including the ML training data instance pair, and ML output is generated from NL input. The ML output includes, for example, a machine-interpretable formula, a database querying language command, or a general programming language instruction. Some examples support context-free grammar, probabilistic context-free grammar, and/or non-context-free production rules.
-
公开(公告)号:US20240264809A1
公开(公告)日:2024-08-08
申请号:US18165254
申请日:2023-02-06
发明人: Konstantin Andreyevich GOLOBOKOV , Zeqi LIN , Haizhen ZHANG , Yu HU , Yousef Ahmed AL-KOFAHI , Jonathan Richard MALSAN , Haiyuan CAO , Daniel Akintola FATADE
IPC分类号: G06F8/30 , G06F8/41 , G06F40/211
CPC分类号: G06F8/37 , G06F8/42 , G06F40/211
摘要: The automatic generation of synthetic training data that can be used to train a language model to generate code examples following a code language based on a natural language input. Thus, new language models may be created, or existing language models may be fine-tuned, to adapt to automatically generate code without having to manually generate bulk quantities of training data. Rather, a many-to-many grammar mapping is navigated to generate training data. Specifically, the many-to-many grammar mapping maps code grammar to natural grammar. Then, each training data is generated by navigating the many-to-many grammar mapping definition to generate a mapping of a respective code expression to a respective natural language expression.
-