Deep Saliency Prior
    4.
    发明申请

    公开(公告)号:US20230015117A1

    公开(公告)日:2023-01-19

    申请号:US17856370

    申请日:2022-07-01

    Applicant: Google LLC

    Abstract: Techniques for tuning an image editing operator for reducing a distractor in raw image data are presented herein. The image editing operator can access the raw image data and a mask. The mask can indicate a region of interest associated with the raw image data. The image editing operator can process the raw image data and the mask to generate processed image data. Additionally, a trained saliency model can process at least the processed image data within the region of interest to generate a saliency map that provides saliency values. Moreover, a saliency loss function can compare the saliency values provided by the saliency map for the processed image data within the region of interest to one or more target saliency values. Subsequently, the one or more parameter values of the image editing operator can be modified based at least in part on the saliency loss function.

    Analysis and visualization of subtle motions in videos

    公开(公告)号:US11526996B2

    公开(公告)日:2022-12-13

    申请号:US17055831

    申请日:2019-06-20

    Applicant: GOOGLE LLC

    Abstract: Example embodiments allow for fast, efficient motion-magnification of video streams by decomposing image frames of the video stream into local phase information at multiple spatial scales and/or orientations. The phase information for each image frame is then scaled to magnify local motion and the scaled phase information is transformed back into image frames to generate a motion-magnified video stream. Scaling of the phase information can include temporal filtering of the phase information across image frames, for example, to magnify motion at a particular frequency. In some embodiments, temporal filtering of phase information at a frequency of breathing, cardiovascular pulse, or some other process of interest allows for motion-magnification of motions within the video stream corresponding to the breathing or the other particular process of interest. The phase information can also be used to determine time-varying motion signals corresponding to motions of interest within the video stream.

    Re-Timing Objects in Video Via Layered Neural Rendering

    公开(公告)号:US20230206955A1

    公开(公告)日:2023-06-29

    申请号:US17927101

    申请日:2020-05-22

    Applicant: Google LLC

    CPC classification number: G11B27/005 G06V10/82 G06V20/46 G11B27/031

    Abstract: A computer-implemented method for decomposing videos into multiple layers (212, 213) that can be re-combined with modified relative timings includes obtaining video data including a plurality of image frames (201) depicting one or more objects. For each of the plurality of frames, the computer-implemented method includes generating one or more object maps descriptive of a respective location of at least one object of the one or more objects within the image frame. For each of the plurality of frames, the computer-implemented method includes inputting the image frame and the one or more object maps into a machine-learned layer Tenderer model. (220) For each of the plurality of frames, the computer-implemented method includes receiving, as output from the machine-learned layer Tenderer model, a background layer illustrative of a background of the video data and one or more object layers respectively associated with one of the one or more object maps. The object layers include image data illustrative of the at least one object and one or more trace effects at least partially attributable to the at least one object such that the one or more object layers and the background layer can be re-combined with modified relative timings.

    Audio-visual speech separation
    8.
    发明授权

    公开(公告)号:US11456005B2

    公开(公告)日:2022-09-27

    申请号:US16761707

    申请日:2018-11-21

    Applicant: GOOGLE LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: obtaining, for each frame in a stream of frames from a video in which faces of one or more speakers have been detected, a respective per-frame face embedding of the face of each speaker; processing, for each speaker, the per-frame face embeddings of the face of the speaker to generate visual features for the face of the speaker; obtaining a spectrogram of an audio soundtrack for the video; processing the spectrogram to generate an audio embedding for the audio soundtrack; combining the visual features for the one or more speakers and the audio embedding for the audio soundtrack to generate an audio-visual embedding for the video; determining a respective spectrogram mask for each of the one or more speakers; and determining a respective isolated speech spectrogram for each speaker.

    AUDIO-VISUAL SPEECH SEPARATION
    9.
    发明申请

    公开(公告)号:US20200335121A1

    公开(公告)日:2020-10-22

    申请号:US16761707

    申请日:2018-11-21

    Applicant: GOOGLE LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: obtaining, for each frame in a stream of frames from a video in which faces of one or more speakers have been detected, a respective per-frame face embedding of the face of each speaker; processing, for each speaker, the per-frame face embeddings of the face of the speaker to generate visual features for the face of the speaker; obtaining a spectrogram of an audio soundtrack for the video; processing the spectrogram to generate an audio embedding for the audio soundtrack; combining the visual features for the one or more speakers and the audio embedding for the audio soundtrack to generate an audio-visual embedding for the video; determining a respective spectrogram mask for each of the one or more speakers; and determining a respective isolated speech spectrogram for each speaker.

Patent Agency Ranking