-
公开(公告)号:US12087279B2
公开(公告)日:2024-09-10
申请号:US17656225
申请日:2022-03-23
Applicant: Google LLC
Inventor: Bhuvana Ramabhadran , Hainan Xu , Kartik Audhkhasi , Yinghui Huang
CPC classification number: G10L15/04 , G06F40/284 , G06N3/04 , G10L15/063 , G10L15/16 , G10L25/30 , G10L15/02
Abstract: A method for subword segmentation includes receiving an input word to be segmented into a plurality of subword units. The method also includes executing a subword segmentation routine to segment the input word into a plurality of subword units by accessing a trained vocabulary set of subword units and selecting the plurality of subword units from the input word by greedily finding a longest subword unit from the input word that is present in the trained vocabulary set until an end of the input word is reached.
-
公开(公告)号:US20220310061A1
公开(公告)日:2022-09-29
申请号:US17656225
申请日:2022-03-23
Applicant: Google LLC
Inventor: Bhuvana Ramabhadran , Hainan Xu , Kartik Audhkhasi , Yinghui Huang
Abstract: A method for subword segmentation includes receiving an input word to be segmented into a plurality of subword units. The method also includes executing a subword segmentation routine to segment the input word into a plurality of subword units by accessing a trained vocabulary set of subword units and selecting the plurality of subword units from the input word by greedily finding a longest subword unit from the input word that is present in the trained vocabulary set until an end of the input word is reached.
-