-
公开(公告)号:US12277635B1
公开(公告)日:2025-04-15
申请号:US18532470
申请日:2023-12-07
Applicant: Google LLC
Inventor: Harshit Kharbanda , Louis Wang , Christopher James Kelley , Jessica Lee
Abstract: A multimodal search system is described. The system can receive image data from a user device. Additionally, the system can receive a prompt associated with the image data. Moreover, the system can determine, using a computer vision model, a first object in the image data that is associated with the prompt. Furthermore, the system can receive, from the user device, a user indication on whether the image data includes the first object. Subsequently, in response to receiving the user indication, the system can generate a response using a large language model.
-
公开(公告)号:US20240403362A1
公开(公告)日:2024-12-05
申请号:US18326496
申请日:2023-05-31
Applicant: Google LLC
Inventor: Harshit Kharbanda , Belinda Luna Zeng , Viviana Caso Corella , Aashi Jain , David William Hendon , Christopher James Kelley , Jessica Lee , Dounia Berrada , Kai Yu , Louis Wang , Thomas J. Duerig , Radu Soricut , Robin Dua
IPC: G06F16/735 , G06F16/732 , G06F16/783 , G06T7/70 , G06V10/62 , G06V10/774 , G06V20/40
Abstract: A multimodal search system using a video query is described. The system can receive video data captured by a camera of a user device. The video data can have a sequence of image frames. Additionally, the system can receive audio data associated with the video data captured by the user device. Moreover, the system can process, using one or more machine-learned models, the sequence of image frames to generate video embeddings related to the sequence of the image frames. The video embeddings can have a plurality of image embeddings associated with the sequence of image frames. Furthermore, the system can determine one or more video results based on the video embeddings and the audio data. Subsequently, the system can transmit, to the user device, the one or more video results.
-
公开(公告)号:US20250087207A1
公开(公告)日:2025-03-13
申请号:US18736113
申请日:2024-06-06
Applicant: Google LLC
Inventor: Harshit Kharbanda , Jessica Lee , Christopher James Kelley , Fabian Roth , Dounia Berrada , Samer Hassan Hassan , Afroz Mohiuddin , Misha Khalman , Ali Essam Ali Elqursh , Belinda Luna Zeng
IPC: G10L15/183 , G06F16/583 , G06V10/778 , G06V30/14 , G06V30/148 , G10L15/22 , G10L15/30
Abstract: The present disclosure provides computer-implemented methods, systems, and devices for responding to requests associated with an image. A computing system obtains, wherein the image depicts a first set of textual content. The computing system determines one or more characteristics of the first set of textual content. The computing system determines a response type from a plurality of response types based on the one or more characteristics. The computing system generates a model input, wherein the model input comprises data descriptive of the first set of textual content and a prompt associated with the response type. The computing system provides providing the model input as an input to a machine-learned language model. The computing system receives a second set of text as an output of the machine-learned language model as a result of the machine-learned language model processing the model input. The computing system provides the second set of text for display to a user, wherein the second set of textual content is associated with the response type.
-
公开(公告)号:US20240378237A1
公开(公告)日:2024-11-14
申请号:US18314663
申请日:2023-05-09
Applicant: Google LLC
Inventor: Harshit Kharbanda , Jessica Lee , Christopher James Kelley , Belinda Luna Zeng , Louis Wang
IPC: G06F16/583 , G06V10/74
Abstract: Result images are retrieved based on a similarity to a query image. A set of textual inputs is processed with a machine-learned language model to obtain a language output comprising textual content, wherein the set of textual inputs comprises textual content from source documents that include the result images, and a prompt associated with the query image. The language output and the result images are provided to a user computing device. Information is received descriptive of an indication by a user that a first result image is visually dissimilar to the query image. Textual content associated with the source document that includes the first result image from the set of textual inputs is removed. The set of textual inputs is processed with the machine-learned language model to obtain a refined language output. The refined language output is provided to the user computing device.
-
公开(公告)号:US12033620B1
公开(公告)日:2024-07-09
申请号:US18463951
申请日:2023-09-08
Applicant: Google LLC
Inventor: Harshit Kharbanda , Jessica Lee , Christopher James Kelley , Fabian Roth , Dounia Berrada , Samer Hassan Hassan , Afroz Mohiuddin , Mikhail Khalman , Ali Essam Ali Elqursh , Belinda Luna Zeng
IPC: G06F3/0483 , G06F16/30 , G06F16/33 , G06F16/583 , G06V10/778 , G06V30/14 , G06V30/148 , G10L15/183 , G10L15/22 , G10L15/30
CPC classification number: G10L15/183 , G06F16/5846 , G06V10/778 , G06V30/1456 , G06V30/153 , G10L15/22 , G10L15/30
Abstract: The present disclosure provides computer-implemented methods, systems, and devices for responding to requests associated with an image. A computing system obtains, wherein the image depicts a first set of textual content. The computing system determines one or more characteristics of the first set of textual content. The computing system determines a response type from a plurality of response types based on the one or more characteristics. The computing system generates a model input, wherein the model input comprises data descriptive of the first set of textual content and a prompt associated with the response type. The computing system provides providing the model input as an input to a machine-learned language model. The computing system receives a second set of text as an output of the machine-learned language model as a result of the machine-learned language model processing the model input. The computing system provides the second set of text for display to a user, wherein the second set of textual content is associated with the response type.
-
公开(公告)号:US20250148782A1
公开(公告)日:2025-05-08
申请号:US19015028
申请日:2025-01-09
Applicant: Google LLC
Inventor: Jessica Lee , Christopher James Kelley , Alok Aggarwal , Harshit Kharbanda
IPC: G06V20/20 , G06F16/9535 , G06T11/00 , G06V10/94
Abstract: Systems and methods for providing scene understanding can include obtaining a plurality of images, stitching images associated with the scene, detecting objects in the scene, and providing information associated with the objects in the scene. The systems and methods can include determining filter tags or query tags that can be selected to filter the plurality of objects, which can then be provided as information to the user to provide further insight on the scene. The information may be provided in an augmented-reality experience via text or other user-interface elements anchored to objects in the images.
-
公开(公告)号:US12230030B2
公开(公告)日:2025-02-18
申请号:US18084710
申请日:2022-12-20
Applicant: Google LLC
Inventor: Jessica Lee , Christopher James Kelley , Alok Aggarwal , Harshit Kharbanda
IPC: G06V20/20 , G06F16/9535 , G06T11/00 , G06V10/94
Abstract: Systems and methods for providing scene understanding can include obtaining a plurality of images, stitching images associated with the scene, detecting objects in the scene, and providing information associated with the objects in the scene. The systems and methods can include determining filter tags or query tags that can be selected to filter the plurality of objects, which can then be provided as information to the user to provide further insight on the scene. The information may be provided in an augmented-reality experience via text or other user-interface elements anchored to objects in the images.
-
公开(公告)号:US20240362279A1
公开(公告)日:2024-10-31
申请号:US18306638
申请日:2023-04-25
Applicant: Google LLC
Inventor: Harshit Kharbanda , Belinda Luna Zeng , Viviana Caso Corella , Christopher James Kelley , Jessica Lee , Pendar Yousefi , Dounia Berrada , Sundeep Vaddadi , Kai Yu , Balint Miklos , Severin Heiniger , Louis Wang
IPC: G06F16/9532 , G06F16/538 , G06F40/40
CPC classification number: G06F16/9532 , G06F16/538 , G06F40/40
Abstract: A multimodal search system is described. The system can receive image data captured by a camera of a user device. Additionally, the system can receive audio data associated with the image data. The audio data can be captured by a microphone of the user device. Moreover, the system can process the image data to generate visual features. Furthermore, the system can process the audio data to generate a plurality of words. The system can generate a plurality of search terms based on the plurality of words and the visual features. Subsequently, the system can determine one or more search results associated with the plurality of search terms and provide the one or more search results as an output.
-
公开(公告)号:US20230368527A1
公开(公告)日:2023-11-16
申请号:US18084710
申请日:2022-12-20
Applicant: Google LLC
Inventor: Jessica Lee , Christopher James Kelley , Alok Aggarwal , Harshit Kharbanda
IPC: G06V20/20 , G06T11/00 , G06V10/94 , G06F16/9535
CPC classification number: G06V20/20 , G06T11/00 , G06V10/945 , G06F16/9535 , G06T2200/24
Abstract: Systems and methods for providing scene understanding can include obtaining a plurality of images, stitching images associated with the scene, detecting objects in the scene, and providing information associated with the objects in the scene. The systems and methods can include determining filter tags or query tags that can be selected to filter the plurality of objects, which can then be provided as information to the user to provide further insight on the scene. The information may be provided in an augmented-reality experience via text or other user-interface elements anchored to objects in the images.
-
公开(公告)号:US11303591B2
公开(公告)日:2022-04-12
申请号:US16253586
申请日:2019-01-22
Applicant: Google LLC
Inventor: George Cody Sumter , Christopher James Kelley , Matthew David Tait , Alok Chandel , Shane Riley Brennan
Abstract: Systems, apparatuses, and methods for managing message content are provided. In one embodiment, a method includes receiving, by one or more computing devices, a message comprising audio content and visual media content. The method further includes sending, by the one or more computing devices, a first set of data descriptive of the audio content to an audio device. The audio device is configured to communicate the audio content to a user of the audio device. The method includes sending, by the one or more computing devices, a second set of data descriptive of the visual media content to a display device. The display device is configured to display the visual media content for the user. The method further includes providing, by the one or more computing devices, a notification to the user of the audio device to view the visual media content on the display device.
-
-
-
-
-
-
-
-
-