Systems and methods for word segmentation based on a competing neural character language model

    公开(公告)号:US11113468B1

    公开(公告)日:2021-09-07

    申请号:US17028023

    申请日:2020-09-22

    Applicant: Coupang Corp.

    Inventor: Shusi Yu Jing Li

    Abstract: Systems and methods are provided for detecting inaccuracy in a product title, comprising identifying, by running a string algorithm on a title associated with a product, at least one product type associated with the product, predicting, using a machine learning algorithm, at least one product type associated with the product based on the title, detecting an inaccuracy in the title, based on at least one of the identification or the prediction, and outputting, to a remote device, a message indicating that the title comprises the inaccuracy. Running the string algorithm may comprise receiving a set of strings, generating a tree based on the received set of strings, receiving the title, and traversing the generated tree using the title to find a match. Using the machine learning algorithm may comprise identifying words in the title, learning a vector representation for each character n-gram of each word, and summing each character n-gram.

    Systems and methods for word segmentation based on a competing neural character language model

    公开(公告)号:US10817665B1

    公开(公告)日:2020-10-27

    申请号:US16869741

    申请日:2020-05-08

    Applicant: COUPANG CORP.

    Inventor: Shusi Yu Jing Li

    Abstract: Systems and methods are provided for detecting inaccuracy in a product title, comprising identifying, by running a string algorithm on a title associated with a product, at least one product type associated with the product, predicting, using a machine learning algorithm, at least one product type associated with the product based on the title, detecting an inaccuracy in the title, based on at least one of the identification or the prediction, and outputting, to a remote device, a message indicating that the title comprises the inaccuracy. Running the string algorithm may comprise receiving a set of strings, generating a tree based on the received set of strings, receiving the title, and traversing the generated tree using the title to find a match. Using the machine learning algorithm may comprise identifying words in the title, learning a vector representation for each character n-gram of each word, and summing each character n-gram.

Patent Agency Ranking