-
11.
公开(公告)号:US12266359B2
公开(公告)日:2025-04-01
申请号:US17902560
申请日:2022-09-02
Applicant: GOOGLE LLC
Inventor: Nicolo D'Ercole , Shumin Zhai , Swante Scholz , Mehek Sharma , Adrien Olczak , Akshay Kannan , Alvin Abdagic , Julia Proskurnia , Viesturs Zarins
IPC: G10L15/22 , G06F16/683 , G10L15/08
Abstract: Implementations described herein generally relate to generating a modification selectable element that may be provided for presentation to a user in a smart dictation session with an automated assistant. The modification selectable element may, when selected, cause a transcription, that includes textual data generated based on processing audio data that captures a spoken utterance and that is automatically arranged, to be modified. The transcription may be automatically arranged to include spacing, punctuation, capitalization, indentations, paragraph breaks, and/or other arrangement operations that are not specified by the user in providing the spoken utterance. Accordingly, a subsequent selection of the modification selectable element may cause these automatic arrangement operation(s), and/or the textual data locationally proximate to these automatic arrangement operation(s), to be modified. Implementations described herein also relate to generating the transcription and/or the modification selectable element on behalf of a third-party software application.
-
公开(公告)号:US20240420699A1
公开(公告)日:2024-12-19
申请号:US18815252
申请日:2024-08-26
Applicant: GOOGLE LLC
Inventor: Victor Carbune , Alvin Abdagic , Behshad Behzadi , Jacopo Sannazzaro Natta , Julia Proskurnia , Krzysztof Andrzej Goj , Srikanth Pandiri , Viesturs Zarins , Nicolo D'Ercole , Zaheed Sabur , Luv Kothari
IPC: G10L15/26 , G06F3/0488 , G06N20/00 , G10L15/18 , G10L15/22
Abstract: Systems and methods described herein relate to determining whether to incorporate recognized text, that corresponds to a spoken utterance of a user of a client device, into a transcription displayed at the client device, or to cause an assistant command, that is associated with the transcription and that is based on the recognized text, to be performed by an automated assistant implemented by the client device. The spoken utterance is received during a dictation session between the user and the automated assistant. Implementations can process, using automatic speech recognition model(s), audio data that captures the spoken utterance to generate the recognized text. Further, implementations can determine whether to incorporate the recognized text into the transcription or cause the assistant command to be performed based on touch input being directed to the transcription, a state of the transcription, and/or audio-based characteristic(s) of the spoken utterance.
-
13.
公开(公告)号:US20240321277A1
公开(公告)日:2024-09-26
申请号:US18677629
申请日:2024-05-29
Applicant: GOOGLE LLC
Inventor: Victor Carbune , Krishna Sapkota , Behshad Behzadi , Julia Proskurnia , Jacopo Sannazzaro Natta , Justin Lu , Magali Boizot-Roche , Marius Sajgalik , Nicolo D'Ercole , Zaheed Sabur , Luv Kothari
CPC classification number: G10L15/26 , G10L15/22 , G10L2015/223
Abstract: Implementations described herein relate to an application and/or automated assistant that can identify arrangement operations to perform for arranging text during speech-to-text operations—without a user having to expressly identify the arrangement operations. In some instances, a user that is dictating a document (e.g., an email, a text message, etc.) can provide a spoken utterance to an application in order to incorporate textual content. However, in some of these instances, certain corresponding arrangements are needed for the textual content in the document. The textual content that is derived from the spoken utterance can be arranged by the application based on an intent, vocalization features, and/or contextual features associated with the spoken utterance and/or a type of the application associated with the document, without the user expressly identifying the corresponding arrangements. In this way, the application can infer content arrangement operations from a spoken utterance that only specifies the textual content.
-
14.
公开(公告)号:US20240029728A1
公开(公告)日:2024-01-25
申请号:US17902560
申请日:2022-09-02
Applicant: GOOGLE LLC
Inventor: Nicolo D'Ercole , Shumin Zhai , Swante Scholz , Mehek Sharma , Adrien Olczak , Akshay Kannan , Alvin Abdagic , Julia Proskurnia , Viesturs Zarins
IPC: G10L15/22 , G10L15/08 , G06F16/683
CPC classification number: G10L15/22 , G10L15/08 , G06F16/685
Abstract: Implementations described herein generally relate to generating a modification selectable element that may be provided for presentation to a user in a smart dictation session with an automated assistant. The modification selectable element may, when selected, cause a transcription, that includes textual data generated based on processing audio data that captures a spoken utterance and that is automatically arranged, to be modified. The transcription may be automatically arranged to include spacing, punctuation, capitalization, indentations, paragraph breaks, and/or other arrangement operations that are not specified by the user in providing the spoken utterance. Accordingly, a subsequent selection of the modification selectable element may cause these automatic arrangement operation(s), and/or the textual data locationally proximate to these automatic arrangement operation(s), to be modified. Implementations described herein also relate to generating the transcription and/or the modification selectable element on behalf of a third-party software application.
-
-
-