ENSURING THAT LANGUAGE MODELS FOLLOW INSTRUCTIONS INDICATED IN PROMPTS
Abstract:
Techniques for ensuring that language models follow instructions indicated in prompts are provided. In one technique, a first language model generates a response based on a prompt. A set of instructions in the prompt is identified. For each instruction in the set, a second language model determines whether the response indicates that the first language model followed the instruction. In another technique, for each prompt of a plurality of prompts: (1) a first language model generates a response based on the prompt; (2) multiple instructions are identified based on the prompt; (3) a second language model generates, based on the plurality of instructions, an output that indicates that the first language model followed each instruction; and (4) the prompt, the response, and the multiple instructions are stored in a training instance. The first language model is finetuned based on the training instances.
Public/Granted literature
Information query
Patent Agency Ranking
0/0