SELECTING CONDITIONALLY INDEPENDENT INPUT SIGNALS FOR UNSUPERVISED CLASSIFIER TRAINING

Invention Application

US20220245477A1 SELECTING CONDITIONALLY INDEPENDENT INPUT SIGNALS FOR UNSUPERVISED CLASSIFIER TRAINING 有权

Please log in to see more content

Patent Title: SELECTING CONDITIONALLY INDEPENDENT INPUT SIGNALS FOR UNSUPERVISED CLASSIFIER TRAINING
Application No.: US17163243

Application Date: 2021-01-29
Publication No.: US20220245477A1

Publication Date: 2022-08-04
Inventor: Kave Eshghi , Victor De Vansa Vikramaratne
Applicant: Box, Inc.
Applicant Address: US CA Redwood City
Assignee: Box, Inc.
Current Assignee: Box, Inc.
Current Assignee Address: US CA Redwood City
Main IPC: G06N5/04
IPC: G06N5/04 ; G06N20/00 ; G06F21/62

SELECTING CONDITIONALLY INDEPENDENT INPUT SIGNALS FOR UNSUPERVISED CLASSIFIER TRAINING

Abstract:

Methods, systems, and computer program products for content management systems. An unlabeled dataset comprising documents that at least potentially comprise personally identifiable information (PII) is used when training a PII content classifier. Such a classifier is trained by (1) determining, based on applying a PII rule to a first portion of a document selected from the unlabeled dataset, a confidence value that the first portion of the document does contain personally identifiable information, (2) selecting a second portion of the document selected from the unlabeled dataset such that the second portion does not include the first portion; and (3) assigning, based on the confidence value, a likelihood value that corresponds to whether characteristics of the second portion are indicative that the document does contain personally identifiable information. Such a PII content classifier is used over selected portions of subject content objects to determine whether the selected portions contain PII.

Information query

Global Dossier Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06N	基于特定计算模型的计算机系统
G06N5/00	利用基于知识的模式的计算机系统
G06N5/04	.推理方法或设备