-
1.
公开(公告)号:US20230419053A1
公开(公告)日:2023-12-28
申请号:US17988315
申请日:2022-11-16
Applicant: Google LLC
Inventor: Jing Huang , Apurva Shah , Melvin Johnson , Viresh Ratnakar , Maxim Krikun
Abstract: Systems and methods for training a translation model based on a first text sequence in a first language, a second text sequence in a second language different from the first language, and a label based on a source of the second text sequence. In some examples, the label may comprise an Internet domain, an Internet subdomain, a uniform resource locator, a website name, or an IP address. In some examples, the label may further indicate a source of the first text sequence. In some examples, each given training example may be automatically generated by sampling the first text sequence from a first page of a given Internet domain, sampling the second text sequence from a second page of the given Internet domain, and generating the label based on all or a portion of source data of the second page.