Context-Aware Text Sanitization

Invention Publication

US20240184912A1 Context-Aware Text Sanitization 审中-公开

Please log in to see more content

Patent Title: Context-Aware Text Sanitization
Application No.: US18060921

Application Date: 2022-12-01
Publication No.: US20240184912A1

Publication Date: 2024-06-06
Inventor: Yanfei Dong , Yuan Deng , Soujanya Poria
Applicant: PayPal, Inc.
Applicant Address: US CA San Jose
Assignee: PayPal, Inc.
Current Assignee: PayPal, Inc.
Current Assignee Address: US CA San Jose
Main IPC: G06F21/62
IPC: G06F21/62 ; G06F40/284 ; G06F40/295

Abstract:

Techniques are disclosed relating to text sanitization. Given textual data, a computer system identifies tokens predicted to constitute sensitive information. Multi-field data structures (e.g., triplets) are generated for the identified tokens that include questions, answers, and corresponding context. These data structures are supplied to a pre-trained multiple-choice question (MCQ) reading comprehension model. The model outputs, for each data structure, a probability that the question and answer for a given data structure, provided the context, is accurate. A post-processing module can then rank probabilities in this set of probabilities and select the multi-field data structure with the highest probability (in some cases, a programmable threshold must also be met). The selected multi-field data structure is then used to select category information to be used in sanitizing the textual data. In this manner, a piece of sensitive data may be replaced by a label that helps retain interpretability of the sanitized text.

Information query

Global Dossier Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F21/00	防止未授权行为的保护计算机、其部件、程序或数据的安全装置
G06F21/60	.保护数据
G06F21/62	..通过一个平台保护数据存取访问，例如使用密钥或访问控制规则