Invention Grant
- Patent Title: Machine-learnt field-specific standardization
-
Application No.: US16528175Application Date: 2019-07-31
-
Publication No.: US11886461B2Publication Date: 2024-01-30
- Inventor: Arun Kumar Jagota , Stanislav Georgiev
- Applicant: Salesforce, Inc.
- Applicant Address: US CA San Francisco
- Assignee: Salesforce, Inc.
- Current Assignee: Salesforce, Inc.
- Current Assignee Address: US CA San Francisco
- Agency: Kwan & Olynick LLP
- Main IPC: G06N20/00
- IPC: G06N20/00 ; G06N7/01 ; G06F16/25 ; G06F16/2455

Abstract:
A system tokenizes raw values and corresponding standardized values into raw token sequences and corresponding standardized token sequences. A machine-learning model learns standardization from token insertions and token substitutions that modify the raw token sequences to match the corresponding standardized token sequences. The system tokenizes an input value into an input token sequence. The machine-learning model determines a probability of inserting an insertion token after an insertion markable token in the input token sequence. If the probability of inserting the insertion token satisfies a threshold, the system inserts the insertion token after the insertion markable token in the input token sequence. The machine-learning model determines a probability of substituting a substitution token for a substitutable token in the input token sequence. If the probability of substituting the substitution token satisfies another threshold, the system substitutes the substitution token for the substitutable token in the input token sequence.
Public/Granted literature
- US20210034638A1 MACHINE-LEARNT FIELD-SPECIFIC STANDARDIZATION Public/Granted day:2021-02-04
Information query