Regularizing Word Segmentation
    2.
    发明申请

    公开(公告)号:US20220310061A1

    公开(公告)日:2022-09-29

    申请号:US17656225

    申请日:2022-03-23

    Applicant: Google LLC

    Abstract: A method for subword segmentation includes receiving an input word to be segmented into a plurality of subword units. The method also includes executing a subword segmentation routine to segment the input word into a plurality of subword units by accessing a trained vocabulary set of subword units and selecting the plurality of subword units from the input word by greedily finding a longest subword unit from the input word that is present in the trained vocabulary set until an end of the input word is reached.

Patent Agency Ranking