Audio-visual speech separation
    2.
    发明授权

    公开(公告)号:US11456005B2

    公开(公告)日:2022-09-27

    申请号:US16761707

    申请日:2018-11-21

    Applicant: GOOGLE LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: obtaining, for each frame in a stream of frames from a video in which faces of one or more speakers have been detected, a respective per-frame face embedding of the face of each speaker; processing, for each speaker, the per-frame face embeddings of the face of the speaker to generate visual features for the face of the speaker; obtaining a spectrogram of an audio soundtrack for the video; processing the spectrogram to generate an audio embedding for the audio soundtrack; combining the visual features for the one or more speakers and the audio embedding for the audio soundtrack to generate an audio-visual embedding for the video; determining a respective spectrogram mask for each of the one or more speakers; and determining a respective isolated speech spectrogram for each speaker.

    AUDIO-VISUAL SPEECH SEPARATION
    3.
    发明申请

    公开(公告)号:US20200335121A1

    公开(公告)日:2020-10-22

    申请号:US16761707

    申请日:2018-11-21

    Applicant: GOOGLE LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: obtaining, for each frame in a stream of frames from a video in which faces of one or more speakers have been detected, a respective per-frame face embedding of the face of each speaker; processing, for each speaker, the per-frame face embeddings of the face of the speaker to generate visual features for the face of the speaker; obtaining a spectrogram of an audio soundtrack for the video; processing the spectrogram to generate an audio embedding for the audio soundtrack; combining the visual features for the one or more speakers and the audio embedding for the audio soundtrack to generate an audio-visual embedding for the video; determining a respective spectrogram mask for each of the one or more speakers; and determining a respective isolated speech spectrogram for each speaker.

    Taking photos through visual obstructions

    公开(公告)号:US10412316B2

    公开(公告)日:2019-09-10

    申请号:US15392452

    申请日:2016-12-28

    Applicant: Google LLC

    Abstract: The present disclosure relates to systems and methods for image capture. Namely, an image capture system may include a camera configured to capture images of a field of view, a display, and a controller. An initial image of the field of view from an initial camera pose may be captured. An obstruction may be determined to be observable in the field of view. Based on the obstruction, at least one desired camera pose may be determined. The at least one desired camera pose includes at least one desired position of the camera. A capture interface may be displayed, which may include instructions for moving the camera to the at least one desired camera pose. At least one further image of the field of view from the at least one desired camera pose may be captured. Captured images may be processed to remove the obstruction from a background image.

    Depth Determination for Images Captured with a Moving Camera and Representing Moving Features

    公开(公告)号:US20210090279A1

    公开(公告)日:2021-03-25

    申请号:US16578215

    申请日:2019-09-20

    Applicant: Google LLC

    Abstract: A method includes obtaining a reference image and a target image each representing an environment containing moving features and static features. The method also includes determining an object mask configured to mask out the moving features and preserves the static features in the target image. The method additionally includes determining, based on motion parallax between the reference image and the target image, a static depth image representing depth values of the static features in the target image. The method further includes generating, by way of a machine learning model, a dynamic depth image representing depth values of both the static features and the moving features in the target image. The model is trained to generate the dynamic depth image by determining depth values of at least the moving features based on the target image, the object mask, and the static depth image.

    AUDIO-VISUAL SPEECH SEPARATION
    7.
    发明申请

    公开(公告)号:US20230122905A1

    公开(公告)日:2023-04-20

    申请号:US17951002

    申请日:2022-09-22

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: obtaining, for each frame in a stream of frames from a video in which faces of one or more speakers have been detected, a respective per-frame face embedding of the face of each speaker; processing, for each speaker, the per-frame face embeddings of the face of the speaker to generate visual features for the face of the speaker; obtaining a spectrogram of an audio soundtrack for the video; processing the spectrogram to generate an audio embedding for the audio soundtrack; combining the visual features for the one or more speakers and the audio embedding for the audio soundtrack to generate an audio-visual embedding for the video; determining a respective spectrogram mask for each of the one or more speakers; and determining a respective isolated speech spectrogram for each speaker.

    Taking photos through visual obstructions

    公开(公告)号:US11050948B2

    公开(公告)日:2021-06-29

    申请号:US16526343

    申请日:2019-10-21

    Applicant: Google LLC

    Abstract: The present disclosure relates to systems and methods for image capture. Namely, an image capture system may include a camera configured to capture images of a field of view, a display, and a controller. An initial image of the field of view from an initial camera pose may be captured. An obstruction may be determined to be observable in the field of view. Based on the obstruction, at least one desired camera pose may be determined. The at least one desired camera pose includes at least one desired position of the camera. A capture interface may be displayed, which may include instructions for moving the camera to the at least one desired camera pose. At least one further image of the field of view from the at least one desired camera pose may be captured. Captured images may be processed to remove the obstruction from a background image.

    Taking Photos Through Visual Obstructions
    9.
    发明申请

    公开(公告)号:US20200036908A1

    公开(公告)日:2020-01-30

    申请号:US16526343

    申请日:2019-10-21

    Applicant: Google LLC

    Abstract: The present disclosure relates to systems and methods for image capture. Namely, an image capture system may include a camera configured to capture images of a field of view, a display, and a controller. An initial image of the field of view from an initial camera pose may be captured. An obstruction may be determined to be observable in the field of view. Based on the obstruction, at least one desired camera pose may be determined. The at least one desired camera pose includes at least one desired position of the camera. A capture interface may be displayed, which may include instructions for moving the camera to the at least one desired camera pose. At least one further image of the field of view from the at least one desired camera pose may be captured. Captured images may be processed to remove the obstruction from a background image.

Patent Agency Ranking