-
公开(公告)号:US20230086766A1
公开(公告)日:2023-03-23
申请号:US17933631
申请日:2022-09-20
Applicant: Google LLC
Inventor: Alex Olwal , Ruofei Du
IPC: H04N21/442 , H04N21/431 , G06T19/00 , G06F3/01
Abstract: Systems and methods are related to tracking an attention of a user with respect to content presented on a virtual screen, detecting a defocus event associated with a first region of the content, and determining a next focus event associated with a second region of the content. The determination can be based at least in part on the defocus event and on the tracked attention of the user. The systems and methods can include generating, based on the determined next focus event, a marker for differentiating the second region of the content from a remainder of the content, and in response to detecting a refocus event associated with the virtual screen, triggering execution of the marker associated with the second region of the content.
-
公开(公告)号:US20230051409A1
公开(公告)日:2023-02-16
申请号:US17444890
申请日:2021-08-11
Applicant: Google LLC
Inventor: Ruofei Du , Alex Olwal
Abstract: According to a general aspect, a method can include receiving a photo of a virtual conference participant, and a depth map based on the photo, and generating a plurality of synthesized images based on the photo. The plurality of synthesized images can have respective simulated gaze directions of the virtual conference participant. The method can also include receiving, during a virtual conference, an indication of a current gaze direction of the virtual conference participant. The method can further include animating, in a display of the virtual conference, an avatar corresponding with the virtual conference participant. The avatar can be based on the photo. Animating the avatar can be based on the photo, the depth map and at least one synthesized image of the plurality of synthesized images, the at least one synthesized image corresponding with the current gaze direction.
-
公开(公告)号:US20250094137A1
公开(公告)日:2025-03-20
申请号:US18468025
申请日:2023-09-15
Applicant: Google LLC
Inventor: Ruofei Du , Zhongyi Zhou
Abstract: A visual programming platform can leverage a machine learning-based coding system to generate an initial set of programming-language code for further graphical editing by a human user. As an example, the visual programming platform can obtain a natural language description of a task to be performed by a computational pipeline. The visual programming platform can process the natural language description of the task with a machine learning coding system that includes one or more machine-learned models to generate, as an output of the machine learning coding system, a set of pseudocode that describes performance of the task. The platform can process the set of pseudocode that describes performance of the task with a compiler to generate a set of programming-language code that defines the computational pipeline for performing the task. The visual programming platform can generate a graphical visualization of the computational pipeline defined by the set of programming-language code.
-
公开(公告)号:US20250054246A1
公开(公告)日:2025-02-13
申请号:US18707075
申请日:2022-10-14
Applicant: Google LLC
Inventor: Ruofei Du , Alex Olwal
Abstract: A user can interact with sounds and speech in an environment using an augmented reality device. The augmented reality device can be configured to identify objects in the environment and display messages beside the object that are related to sounds produced by the object. For example, the messages may include sound statistics, transcripts of speech, and/or sound detection events. The disclosed approach enables a user to interact with these messages using a gaze and a gesture.
-
公开(公告)号:US20250045968A1
公开(公告)日:2025-02-06
申请号:US18570562
申请日:2021-06-16
Applicant: Google LLC
Inventor: Onur G. Guleryuz , Ruofei Du , Hugues H. Hoppe , Sean Ryan Francesco Fanello , Philip Andrew Chou , Danhang Tang , Philip Davidson
Abstract: Nonlinear peri-codec optimization for image and video coding includes obtaining a source image including pixel values expressed in a first defined image sample space, generating a neuralized image representing the source image, the neuralized image including pixel values that are expressed as neural latent space values, encoding the input image wherein the neural latent space values are used as pixel values in a second defined image sample space and the input image is in an operative image format of the encoder, such that a decoder decodes the encoded image to obtain a reconstructed image in the second defined image sample space, wherein the reconstructed image is a reconstructed neuralized image including reconstructed neural latent space values, such that a deneuralized reconstructed image corresponding to the source image is obtained by a nonlinear post-codec image processor in the first defined image sample space.
-
公开(公告)号:US20240290025A1
公开(公告)日:2024-08-29
申请号:US18588948
申请日:2024-02-27
Applicant: GOOGLE LLC
Inventor: Yinda Zhang , Sean Ryan Francesco Fanello , Ziqian Bai , Feitong Tan , Zeng Huang , Kripasindhu Sarkar , Danhang Tang , Di Qiu , Abhimitra Meka , Ruofei Du , Mingsong Dou , Sergio Orts Escolano , Rohit Kumar Pandey , Thabo Beeler
CPC classification number: G06T13/40 , G06T7/90 , G06T17/20 , G06V10/44 , G06T2207/10024 , G06T2207/20084
Abstract: A method comprises receiving a first sequence of images of a portion of a user, the first sequence of images being monocular images; generating an avatar based on the first sequence of images, the avatar being based on a model including a feature vector associated with a vertex; receiving a second sequence of images of the portion of the user; and based on the second sequence of images, modifying the avatar with a displacement of the vertex to represent a gesture of the avatar.
-
27.
公开(公告)号:US20240212325A1
公开(公告)日:2024-06-27
申请号:US18596822
申请日:2024-03-06
Applicant: Google LLC
Inventor: Yinda Zhang , Feitong Tan , Danhang Tang , Mingsong Dou , Kaiwen Guo , Sean Ryan Francesco Fanello , Sofien Bouaziz , Cem Keskin , Ruofei Du , Rohit Kumar Pandey , Deqing Sun
IPC: G06V10/771 , G06T7/70 , G06T17/00 , G06V10/44 , G06V10/75
CPC classification number: G06V10/771 , G06T7/70 , G06T17/00 , G06V10/44 , G06V10/751 , G06T2207/20081 , G06T2207/20084
Abstract: Systems and methods for training models to predict dense correspondences across images such as human images. A model may be trained using synthetic training data created from one or more 3D computer models of a subject. In addition, one or more geodesic distances derived from the surfaces of one or more of the 3D models may be used to generate one or more loss values, which may in turn be used in modifying the model's parameters during training.
-
公开(公告)号:US20230367960A1
公开(公告)日:2023-11-16
申请号:US18315113
申请日:2023-05-10
Applicant: Google LLC
Inventor: Boris Smus , Vikas Bahirwani , Ruofei Du , Christopher Ross , Alex Olwal
IPC: G06F40/20 , G06F40/166 , G10L15/26
CPC classification number: G06F40/20 , G06F40/166 , G10L15/26
Abstract: A method performed by a computing system comprises generating text from audio data and determining an end portion of the text to include in a summarization of the text based on a length of a portion of the audio data from which the text was generated and which ends with a proposed end portion and a time value associated with the proposed end portion, the proposed end portion including a word from the text.
-
公开(公告)号:US20230136553A1
公开(公告)日:2023-05-04
申请号:US18050329
申请日:2022-10-27
Applicant: GOOGLE LLC
Inventor: Alex Olwal , Ruofei Du
Abstract: Smart devices can be configured to collect and share various forms of context data about where a user is located (e.g., location), what a user will be doing (e.g., schedule), and what a user is currently doing (e.g., activity). This context data may be combined with fingerprint data (e.g., biometrics) to help identify the fingerprint data. For example, a location of a user may help associated speech detected at that location with the user. These associations may be stored in a mapping database that can be updated over time to reduce ambiguities in identification. The mappings in the database may be used to train a machine learning model to recognize fingerprints as identities, which may be useful in applications, such as speaker identification.
-
公开(公告)号:US20230132041A1
公开(公告)日:2023-04-27
申请号:US18047494
申请日:2022-10-18
Applicant: GOOGLE LLC
Inventor: Alex Olwal , Ruofei Du
Abstract: The disclosed systems and method correlates user behaviors with audio processing to achieve more accurate conclusions about sounds in a user's environment. These conclusions may, in turn, be used to adjust the way a device, such as AR glasses, operate or respond to the sounds. For example, audio events determined from processing speech can be correlated with behavior events determined by sensing a user to improve a speech-to-text transcript of the speech by separating, or otherwise altering, the text in the transcript by speaker.
-
-
-
-
-
-
-
-
-