MULTILINGUAL TEXT-TO-IMAGE GENERATION
摘要:
Systems and methods for image processing are provided. One aspect of the systems and methods includes obtaining a text prompt in a first language. Another aspect of the systems and methods includes encoding the text prompt using a multilingual encoder to obtain a multilingual text embedding. Yet another aspect of the systems and methods includes processing the multilingual text embedding using a diffusion prior model to obtain an image embedding, wherein the diffusion prior model is trained to process multilingual text embeddings from the first language and a second language based on training data from the first language and the second language. Yet another aspect of the systems and methods includes generating an image using a diffusion model based on the image embedding, wherein the image includes an element corresponding to the text prompt.
信息查询
0/0