TRAINING LANGUAGE MODELS AND PRESERVING PRIVACY

Invention Publication

US20240135103A1 TRAINING LANGUAGE MODELS AND PRESERVING PRIVACY 审中-公开

Please log in to see more content

Patent Title: TRAINING LANGUAGE MODELS AND PRESERVING PRIVACY
Application No.: US18173199

Application Date: 2023-02-23
Publication No.: US20240135103A1

Publication Date: 2024-04-25
Inventor: Franck Dernoncourt , Tong Sun , Thi kim phung Lai , Rajiv Bhawanji Jain , Nikolaos Barmpalios , Jiuxiang Gu
Applicant: Adobe Inc.
Applicant Address: US CA San Jose
Assignee: Adobe Inc.
Current Assignee: Adobe Inc.
Current Assignee Address: US CA San Jose
Main IPC: G06F40/295
IPC: G06F40/295 ; G06F40/274

Abstract:

In implementations of systems for training language models and preserving privacy, a computing device implements a privacy system to predict a next word after a last word in a sequence of words by processing input data using a machine learning model trained on training data to predict next words after last words in sequences of words. The training data describes a corpus of text associated with clients and including sensitive samples and non-sensitive samples. The machine learning model is trained by sampling a client of the clients and using a subset of the sensitive samples associated with the client and a subset of the non-sensitive samples associated with the client to update parameters of the machine learning model. The privacy system generates an indication of the next word after the last word in the sequence of words for display in a user interface.

Information query

Global Dossier Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F40/00	处理自然语言数据（语音分析或综合，语音识别G10L）
G06F40/20	.自然语言分析（自然语言的语义分析入G06F40/30）
G06F40/279	..文字实体的识别
G06F40/289	...短语分析，例如有限状态技术或分块
G06F40/295	....命名实体识别