-
公开(公告)号:US20240022809A1
公开(公告)日:2024-01-18
申请号:US18446381
申请日:2023-08-08
申请人: GOOGLE LLC
发明人: Felix Weissenberger , Balint Miklos , Victor Carbune , Matthew Sharifi , Domenico Carbotta , Ray Chen , Kevin Fu , Bogdan Prisacari , Fo Lee , Mucun Lu , Neha Garg , Jacopo Sannazzaro Natta , Barbara Poblocka , Jae Seo , Matthew Miao , Thomas Qian , Luv Kothari
IPC分类号: H04N23/60 , G06N20/00 , G10L15/22 , G10L25/51 , H04N5/92 , H04N23/61 , H04N23/62 , H04N23/66 , H04N23/80
CPC分类号: H04N23/64 , G06N20/00 , G10L15/22 , G10L25/51 , H04N5/9201 , H04N23/61 , H04N23/62 , H04N23/66 , H04N23/80 , G10L15/1822
摘要: Implementations set forth herein relate to an automated assistant that can control a camera according to one or more conditions specified by a user. A condition can be satisfied when, for example, the automated assistant detects a particular environment feature is apparent. In this way, the user can rely on the automated assistant to identify and capture certain moments without necessarily requiring the user to constantly monitor a viewing window of the camera. In some implementations, a condition for the automated assistant to capture media data can be based on application data and/or other contextual data that is associated with the automated assistant. For instance, a relationship between content in a camera viewing window and other content of an application interface can be a condition upon which the automated assistant captures certain media data using a camera.
-
2.
公开(公告)号:US20220366911A1
公开(公告)日:2022-11-17
申请号:US17337804
申请日:2021-06-03
申请人: GOOGLE LLC
发明人: Victor Carbune , Krishna Sapkota , Behshad Behzadi , Julia Proskurnia , Jacopo Sannazzaro Natta , Justin Lu , Magali Boizot-Roche , Márius Sajgalík , Nicolo D'Ercole , Zaheed Sabur , Luv Kothari
摘要: Implementations described herein relate to an application and/or automated assistant that can identify arrangement operations to perform for arranging text during speech-to-text operations—without a user having to expressly identify the arrangement operations. In some instances, a user that is dictating a document (e.g., an email, a text message, etc.) can provide a spoken utterance to an application in order to incorporate textual content. However, in some of these instances, certain corresponding arrangements are needed for the textual content in the document. The textual content that is derived from the spoken utterance can be arranged by the application based on an intent, vocalization features, and/or contextual features associated with the spoken utterance and/or a type of the application associated with the document, without the user expressly identifying the corresponding arrangements. In this way, the application can infer content arrangement operations from a spoken utterance that only specifies the textual content.
-
公开(公告)号:US12106758B2
公开(公告)日:2024-10-01
申请号:US17322765
申请日:2021-05-17
申请人: GOOGLE LLC
发明人: Victor Carbune , Alvin Abdagic , Behshad Behzadi , Jacopo Sannazzaro Natta , Julia Proskurnia , Krzysztof Andrzej Goj , Srikanth Pandiri , Viesturs Zarins , Nicolo D'Ercole , Zaheed Sabur , Luv Kothari
CPC分类号: G10L15/26 , G06F3/0488 , G06N20/00 , G10L15/18 , G10L15/22 , G10L2015/223
摘要: Systems and methods described herein relate to determining whether to incorporate recognized text, that corresponds to a spoken utterance of a user of a client device, into a transcription displayed at the client device, or to cause an assistant command, that is associated with the transcription and that is based on the recognized text, to be performed by an automated assistant implemented by the client device. The spoken utterance is received during a dictation session between the user and the automated assistant. Implementations can process, using automatic speech recognition model(s), audio data that captures the spoken utterance to generate the recognized text. Further, implementations can determine whether to incorporate the recognized text into the transcription or cause the assistant command to be performed based on touch input being directed to the transcription, a state of the transcription, and/or audio-based characteristic(s) of the spoken utterance.
-
4.
公开(公告)号:US12033637B2
公开(公告)日:2024-07-09
申请号:US17337804
申请日:2021-06-03
申请人: GOOGLE LLC
发明人: Victor Carbune , Krishna Sapkota , Behshad Behzadi , Julia Proskurnia , Jacopo Sannazzaro Natta , Justin Lu , Magali Boizot-Roche , Márius {hacek over (S)}ajgalík , Nicolo D'Ercole , Zaheed Sabur , Luv Kothari
CPC分类号: G10L15/26 , G10L15/22 , G10L2015/223
摘要: Implementations described herein relate to an application and/or automated assistant that can identify arrangement operations to perform for arranging text during speech-to-text operations—without a user having to expressly identify the arrangement operations. In some instances, a user that is dictating a document (e.g., an email, a text message, etc.) can provide a spoken utterance to an application in order to incorporate textual content. However, in some of these instances, certain corresponding arrangements are needed for the textual content in the document. The textual content that is derived from the spoken utterance can be arranged by the application based on an intent, vocalization features, and/or contextual features associated with the spoken utterance and/or a type of the application associated with the document, without the user expressly identifying the corresponding arrangements. In this way, the application can infer content arrangement operations from a spoken utterance that only specifies the textual content.
-
公开(公告)号:US20230156322A1
公开(公告)日:2023-05-18
申请号:US18097150
申请日:2023-01-13
申请人: GOOGLE LLC
发明人: Felix Weissenberger , Balint Miklos , Victor Carbune , Matthew Sharifi , Domenico Carbotta , Ray Chen , Kevin Fu , Bogdan Prisacari , Fo Lee , Mucun Lu , Neha Garg , Jacopo Sannazzaro Natta , Barbara Poblocka , Jae Seo , Matthew Miao , Thomas Qian , Luv Kothari
IPC分类号: H04N23/60 , G06N20/00 , G10L15/22 , G10L25/51 , H04N5/92 , H04N23/61 , H04N23/62 , H04N23/66 , H04N23/80
CPC分类号: H04N23/64 , G06N20/00 , G10L15/22 , G10L25/51 , H04N5/9201 , H04N23/61 , H04N23/62 , H04N23/66 , H04N23/80 , G10L15/1822
摘要: Implementations set forth herein relate to an automated assistant that can control a camera according to one or more conditions specified by a user. A condition can be satisfied when, for example, the automated assistant detects a particular environment feature is apparent. In this way, the user can rely on the automated assistant to identify and capture certain moments without necessarily requiring the user to constantly monitor a viewing window of the camera. In some implementations, a condition for the automated assistant to capture media data can be based on application data and/or other contextual data that is associated with the automated assistant. For instance, a relationship between content in a camera viewing window and other content of an application interface can be a condition upon which the automated assistant captures certain media data using a camera.
-
公开(公告)号:US20220366910A1
公开(公告)日:2022-11-17
申请号:US17322765
申请日:2021-05-17
申请人: GOOGLE LLC
发明人: Victor Carbune , Alvin Abdagic , Behshad Behzadi , Jacopo Sannazzaro Natta , Julia Proskurnia , Krzysztof Andrzej Goj , Srikanth Pandiri , Viesturs Zarins , Nicolo D'Ercole , Zaheed Sabur , Luv Kothari
IPC分类号: G10L15/26 , G10L15/22 , G10L15/18 , G06F3/0488 , G06N20/00
摘要: Systems and methods described herein relate to determining whether to incorporate recognized text, that corresponds to a spoken utterance of a user of a client device, into a transcription displayed at the client device, or to cause an assistant command, that is associated with the transcription and that is based on the recognized text, to be performed by an automated assistant implemented by the client device. The spoken utterance is received during a dictation session between the user and the automated assistant. Implementations can process, using automatic speech recognition model(s), audio data that captures the spoken utterance to generate the recognized text. Further, implementations can determine whether to incorporate the recognized text into the transcription or cause the assistant command to be performed based on touch input being directed to the transcription, a state of the transcription, and/or audio-based characteristic(s) of the spoken utterance.
-
公开(公告)号:US20220166919A1
公开(公告)日:2022-05-26
申请号:US17103805
申请日:2020-11-24
申请人: Google LLC
发明人: Felix Weissenberger , Balint Miklos , Victor Carbune , Matthew Sharifi , Domenico Carbotta , Ray Chen , Kevin Fu , Bogdan Prisacari , Fo Lee , Mucun Lu , Neha Garg , Jacopo Sannazzaro Natta , Barbara Poblocka , Jae Seo , Matthew Miao , Thomas Qian , Luv Kothari
摘要: Implementations set forth herein relate to an automated assistant that can control a camera according to one or more conditions specified by a user. A condition can be satisfied when, for example, the automated assistant detects a particular environment feature is apparent. In this way, the user can rely on the automated assistant to identify and capture certain moments without necessarily requiring the user to constantly monitor a viewing window of the camera. In some implementations, a condition for the automated assistant to capture media data can be based on application data and/or other contextual data that is associated with the automated assistant. For instance, a relationship between content in a camera viewing window and other content of an application interface can be a condition upon which the automated assistant captures certain media data using a camera.
-
8.
公开(公告)号:US20240321277A1
公开(公告)日:2024-09-26
申请号:US18677629
申请日:2024-05-29
申请人: GOOGLE LLC
发明人: Victor Carbune , Krishna Sapkota , Behshad Behzadi , Julia Proskurnia , Jacopo Sannazzaro Natta , Justin Lu , Magali Boizot-Roche , Marius Sajgalik , Nicolo D'Ercole , Zaheed Sabur , Luv Kothari
CPC分类号: G10L15/26 , G10L15/22 , G10L2015/223
摘要: Implementations described herein relate to an application and/or automated assistant that can identify arrangement operations to perform for arranging text during speech-to-text operations—without a user having to expressly identify the arrangement operations. In some instances, a user that is dictating a document (e.g., an email, a text message, etc.) can provide a spoken utterance to an application in order to incorporate textual content. However, in some of these instances, certain corresponding arrangements are needed for the textual content in the document. The textual content that is derived from the spoken utterance can be arranged by the application based on an intent, vocalization features, and/or contextual features associated with the spoken utterance and/or a type of the application associated with the document, without the user expressly identifying the corresponding arrangements. In this way, the application can infer content arrangement operations from a spoken utterance that only specifies the textual content.
-
公开(公告)号:US12052492B2
公开(公告)日:2024-07-30
申请号:US18446381
申请日:2023-08-08
申请人: GOOGLE LLC
发明人: Felix Weissenberger , Balint Miklos , Victor Carbune , Matthew Sharifi , Domenico Carbotta , Ray Chen , Kevin Fu , Bogdan Prisacari , Fo Lee , Mucun Lu , Neha Garg , Jacopo Sannazzaro Natta , Barbara Poblocka , Jae Seo , Matthew Miao , Thomas Qian , Luv Kothari
IPC分类号: H04N23/60 , G06N20/00 , G10L15/18 , G10L15/22 , G10L25/51 , H04N5/92 , H04N23/61 , H04N23/62 , H04N23/66 , H04N23/80
CPC分类号: H04N23/64 , G06N20/00 , G10L15/22 , G10L25/51 , H04N5/9201 , H04N23/61 , H04N23/62 , H04N23/66 , H04N23/80 , G10L15/1822 , G10L2015/223
摘要: Implementations set forth herein relate to an automated assistant that can control a camera according to one or more conditions specified by a user. A condition can be satisfied when, for example, the automated assistant detects a particular environment feature is apparent. In this way, the user can rely on the automated assistant to identify and capture certain moments without necessarily requiring the user to constantly monitor a viewing window of the camera. In some implementations, a condition for the automated assistant to capture media data can be based on application data and/or other contextual data that is associated with the automated assistant. For instance, a relationship between content in a camera viewing window and other content of an application interface can be a condition upon which the automated assistant captures certain media data using a camera.
-
公开(公告)号:US11765452B2
公开(公告)日:2023-09-19
申请号:US18097150
申请日:2023-01-13
申请人: GOOGLE LLC
发明人: Felix Weissenberger , Balint Miklos , Victor Carbune , Matthew Sharifi , Domenico Carbotta , Ray Chen , Kevin Fu , Bogdan Prisacari , Fo Lee , Mucun Lu , Neha Garg , Jacopo Sannazzaro Natta , Barbara Poblocka , Jae Seo , Matthew Miao , Thomas Qian , Luv Kothari
IPC分类号: H04N23/60 , G06N20/00 , G10L15/22 , G10L25/51 , H04N5/92 , H04N23/61 , H04N23/62 , H04N23/66 , H04N23/80 , G10L15/18
CPC分类号: H04N23/64 , G06N20/00 , G10L15/22 , G10L25/51 , H04N5/9201 , H04N23/61 , H04N23/62 , H04N23/66 , H04N23/80 , G10L15/1822 , G10L2015/223
摘要: Implementations set forth herein relate to an automated assistant that can control a camera according to one or more conditions specified by a user. A condition can be satisfied when, for example, the automated assistant detects a particular environment feature is apparent. In this way, the user can rely on the automated assistant to identify and capture certain moments without necessarily requiring the user to constantly monitor a viewing window of the camera. In some implementations, a condition for the automated assistant to capture media data can be based on application data and/or other contextual data that is associated with the automated assistant. For instance, a relationship between content in a camera viewing window and other content of an application interface can be a condition upon which the automated assistant captures certain media data using a camera.
-
-
-
-
-
-
-
-
-