TEXT-TO-IMAGE DIFFUSION MODEL WITH COMPONENT LOCKING AND RANK-ONE EDITING
Abstract:
A text-to-image machine learning model takes a user input text and generates an image matching the given description. While text-to-image models currently exist, there is a desire to personalize these models on a per-user basis, including to configure the models to generate images of specific, unique user-provided concepts (via images of specific objects or styles) while allowing the user to use free text “prompts” to modify their appearance or compose them in new roles and novel scenes. Current personalization solutions either generate images with only coarse-grained resemblance to the provided concept(s) or require fine tuning of the entire model which is costly and can adversely affect the model. The present description employs component locking and/or rank-one editing for personalization of text-to-image diffusion models, which can improve the fine-grained details of the concepts in the generated images, reduce the memory footprint update of the underlying model instead of full fine-tuning, and reduce adverse effects to the model.
Information query
Patent Agency Ranking
0/0