Detecting Unintended Memorization in Language-Model-Fused ASR Systems

    公开(公告)号:US20230335126A1

    公开(公告)日:2023-10-19

    申请号:US18303296

    申请日:2023-04-19

    Applicant: Google LLC

    CPC classification number: G10L15/197 G10L13/02 G10L15/01 G10L15/063 G10L15/16

    Abstract: A method includes inserting a set of canary text samples into a corpus of training text samples and training an external language model on the corpus of training text samples and the set of canary text samples inserted into the corpus of training text samples. For each canary text sample, the method also includes generating a corresponding synthetic speech utterance and generating an initial transcription for the corresponding synthetic speech utterance. The method also includes rescoring the initial transcription generated for each corresponding synthetic speech utterance using the external language model. The method also includes determining a word error rate (WER) of the external language model based on the rescored initial transcriptions and the canary text samples and detecting memorization of the canary text samples by the external language model based on the WER of the external language model.

    Realtime busyness for places
    12.
    发明授权

    公开(公告)号:US11727419B2

    公开(公告)日:2023-08-15

    申请号:US17701170

    申请日:2022-03-22

    Applicant: Google LLC

    CPC classification number: G06Q30/0201 G06F16/2255

    Abstract: Real-time busyness information is for a public place is computed in a privacy-sensitive way, and provided for display in relation to historical busyness information. An aggregate amount of real-time location information available for a particular public place is measured (410), and used to determine (420) whether the public place is privacy-qualified. If the public place is privacy-qualified, real-time busyness information is computed (440) for the public place based on the real-time location information. Further, it is determined (450) whether the computed real-time busyness information is accuracy-qualified, based on a comparison of the real-time busyness information to historical busyness information. If both qualifications are met, the real-time busyness information is output (470) for display or to another application.

Patent Agency Ranking