-
公开(公告)号:US11755680B2
公开(公告)日:2023-09-12
申请号:US16749234
申请日:2020-01-22
Applicant: salesforce.com, inc.
Inventor: Arun Kumar Jagota , Ajitesh Jain
IPC: G06F16/958 , G06F40/284 , G06N5/02 , G06F16/901 , G06F16/903 , G06F18/2415
CPC classification number: G06F16/986 , G06F16/901 , G06F16/90344 , G06F18/2415 , G06F40/284 , G06N5/02
Abstract: A system receives a record which includes a string and separates the string into a number of tokens, including a token and another token. The system identifies a pattern that includes an entity, another entity, and a number of entities that equals the number of tokens, and another pattern that includes the same number of entities as the number of tokens. The system determines a combined probability that combines a probability based on the number of entries in the entity's dictionary which stores the token, and another probability based on a number of character types in the other entity that match characters in the other token. If the combined probability associated with the pattern is greater than another combined probability associated with the other pattern, the system matches the record to a system record based on recognizing the token as the entity and the other token as the other entity.
-
公开(公告)号:US11016959B2
公开(公告)日:2021-05-25
申请号:US15884732
申请日:2018-01-31
Applicant: salesforce.com, inc.
Inventor: Arun Kumar Jagota , Ajitesh Jain , Dmytro Kudriavtsev
IPC: G06F16/23 , G06F16/2458 , G06F16/2457 , G06F16/22 , G06F16/2452
Abstract: A system tokenizes values stored in a field by multiple records. The system creates a trie from the tokenized values, each branch in the trie labeled with one of the tokenized values, each node storing a count indicating the number of the multiple records associated with a tokenized value sequence beginning from a root of the trie. The system tokenizes a value stored in the field by a prospective record. Beginning from the root of the trie, the system identifies each node corresponding to a token value sequence for the prospective record's tokenized value. Beginning from the most recently identified node for the prospective record's token value sequence, the system identifies each extending node which stores a count that satisfies a threshold, each identified extending node corresponding to another token value sequence. The system uses the other token value sequence to identify one of the multiple records that matches the prospective record.
-
公开(公告)号:US11372928B2
公开(公告)日:2022-06-28
申请号:US16775611
申请日:2020-01-29
Applicant: salesforce.com, inc.
Inventor: Arun Kumar Jagota , Ajitesh Jain , Rahul Mathias Madan , Shravani Madhavaram
IPC: G06F16/90 , G06N20/00 , G06F16/903 , G06F16/901
Abstract: Determine first count of first records storing first value in first field, second count of second records storing second value in second field, third count of third records storing third value in third field. Determine count threshold using first, second and third counts, dispersion measure based on dispersion of values stored in second field by first records and other dispersion measure based on other dispersion of values stored in third field by first records. Train machine-learning model to determine dispersion measure threshold based on dispersion and other dispersion measures. If first count is greater than count threshold, and dispersion measure is greater than dispersion measure threshold, create match index based on first and second fields. Receive prospective record storing first value in first field, second value in second field. Use match index to identify record storing first value in first field, second value in second field as matching prospective record.
-
公开(公告)号:US20210342353A1
公开(公告)日:2021-11-04
申请号:US16862667
申请日:2020-04-30
Applicant: salesforce.com, inc.
Inventor: Arun Kumar Jagota , Ajitesh Jain , Rahul Mathias Madan , Shravani Madhavaram
IPC: G06F16/2455 , G06N20/00
Abstract: Adaptive field-level matching is described. A system identifies first elements in a field of a prospective record for a database, and second elements in the field of a candidate record, in the database, for matching the prospective record. The system identifies features corresponding to any of the first elements that are identical to any of the second elements, any of the first elements that are absent from the second elements, and any of the second elements that are absent from the first elements. A machine-learning model uses the features to determine a field match score for the candidate record's field. Another machine-learning model weighs the field match score and weighs another field match score for another field of the candidate record to determine a record match score for the candidate record. If the record match score satisfies a threshold, the system identifies the candidate record as matching the prospective record.
-
公开(公告)号:US11755582B2
公开(公告)日:2023-09-12
申请号:US16862667
申请日:2020-04-30
Applicant: salesforce.com, inc.
Inventor: Arun Kumar Jagota , Ajitesh Jain , Rahul Mathias Madan , Shravani Madhavaram
IPC: G06F7/00 , G06F16/2455 , G06N20/00
CPC classification number: G06F16/24558 , G06F16/24564 , G06N20/00
Abstract: Adaptive field-level matching is described. A system identifies first elements in a field of a prospective record for a database, and second elements in the field of a candidate record, in the database, for matching the prospective record. The system identifies features corresponding to any of the first elements that are identical to any of the second elements, any of the first elements that are absent from the second elements, and any of the second elements that are absent from the first elements. A machine-learning model uses the features to determine a field match score for the candidate record's field. Another machine-learning model weighs the field match score and weighs another field match score for another field of the candidate record to determine a record match score for the candidate record. If the record match score satisfies a threshold, the system identifies the candidate record as matching the prospective record.
-
公开(公告)号:US20190155938A1
公开(公告)日:2019-05-23
申请号:US16006775
申请日:2018-06-12
Applicant: salesforce.com, inc.
Inventor: Dmytro Kudriavtsev , Pawan Nachnani , Dmytro Kashyn , Binyuan Chen , Satya Venkata Kamuju , Harini Vaidhyanathan , Venkata Muralidhar Tejomurtula , Shouzhong Shi , Ajitesh Jain , Prabhjot Singh
Abstract: In various embodiments, a system of synchronizing data is described. The system may store data associated with a plurality of data vendors. The system may synchronize the stored data with data from a first data vendor. The received data may be parsed by identifying data values indicated by associated metadata, and modifying the data values based on a universal data format. The system may also receive synchronization requests from a user of the service. The synchronization requests may indicate requested data and a list of processing operations. The requested data may correspond to data received from multiple data vendors. The system may perform the list of processing operations and return the data. Accordingly, the system may manage data received from multiple data vendors even if the data vendors have different synchronization conditions and provide the data in different formats. The data may be analyzed and output together to a user.
-
公开(公告)号:US20210232637A1
公开(公告)日:2021-07-29
申请号:US16775611
申请日:2020-01-29
Applicant: salesforce.com, inc.
Inventor: Arun Kumar Jagota , Ajitesh Jain , Rahul Mathias Madan , Shravani Madhavaram
IPC: G06F16/903 , G06N20/00 , G06F16/901
Abstract: Determine first count of first records storing first value in first field, second count of second records storing second value in second field, third count of third records storing third value in third field. Determine count threshold using first, second and third counts, dispersion measure based on dispersion of values stored in second field by first records and other dispersion measure based on other dispersion of values stored in third field by first records. Train machine-learning model to determine dispersion measure threshold based on dispersion and other dispersion measures. If first count is greater than count threshold, and dispersion measure is greater than dispersion measure threshold, create match index based on first and second fields. Receive prospective record storing first value in first field, second value in second field. Use match index to identify record storing first value in first field, second value in second field as matching prospective record.
-
公开(公告)号:US20210224482A1
公开(公告)日:2021-07-22
申请号:US16749234
申请日:2020-01-22
Applicant: salesforce.com, inc.
Inventor: Arun Kumar Jagota , Ajitesh Jain
IPC: G06F40/284 , G06K9/62 , G06F16/903 , G06F16/901 , G06N5/02
Abstract: A system receives a record which includes a string and separates the string into a number of tokens, including a token and another token. The system identifies a pattern that includes an entity, another entity, and a number of entities that equals the number of tokens, and another pattern that includes the same number of entities as the number of tokens. The system determines a combined probability that combines a probability based on the number of entries in the entity's dictionary which stores the token, and another probability based on a number of character types in the other entity that match characters in the other token. If the combined probability associated with the pattern is greater than another combined probability associated with the other pattern, the system matches the record to a system record based on recognizing the token as the entity and the other token as the other entity.
-
公开(公告)号:US20190236178A1
公开(公告)日:2019-08-01
申请号:US15884732
申请日:2018-01-31
Applicant: salesforce.com, inc.
Inventor: Arun Kumar Jagota , Ajitesh Jain , Dmytro Kudriavtsev
IPC: G06F17/30
CPC classification number: G06F16/2365 , G06F16/24575 , G06F16/2468
Abstract: A system tokenizes values stored in a field by multiple records. The system creates a trie from the tokenized values, each branch in the trie labeled with one of the tokenized values, each node storing a count indicating the number of the multiple records associated with a tokenized value sequence beginning from a root of the trie. The system tokenizes a value stored in the field by a prospective record. Beginning from the root of the trie, the system identifies each node corresponding to a token value sequence for the prospective record's tokenized value. Beginning from the most recently identified node for the prospective record's token value sequence, the system identifies each extending node which stores a count that satisfies a threshold, each identified extending node corresponding to another token value sequence. The system uses the other token value sequence to identify one of the multiple records that matches the prospective record.
-
公开(公告)号:US11586628B2
公开(公告)日:2023-02-21
申请号:US17099478
申请日:2020-11-16
Applicant: salesforce.com, inc.
Inventor: Kaushal Bansal , Venkata Muralidhar Tejomurtula , Azeem Feroz , Dmytro Kashyn , Dmytro Kudriavtsev , Shouzhong Shi , Ajitesh Jain
IPC: G06F16/2457 , G06F16/2455 , G06Q30/01 , G06F16/22 , G06F16/81 , G06F16/25
Abstract: A method for configuring the operation of the software of a data as a service (DAAS) system during run time is described. The configuring includes at least one of configuring ingestion of a vendor dataset to produce an ingested dataset and which analysis operations to perform on the vendor dataset to produce an analyzed dataset, and the configuring also includes at least one of how to search the vendor dataset based on a search query from a customer to allow the customer to locate a new record from the vendor dataset and how to match records in the vendor dataset with a match query from the customer to provide an updated record to the customer.
-
-
-
-
-
-
-
-
-