-
公开(公告)号:US20230360633A1
公开(公告)日:2023-11-09
申请号:US18120748
申请日:2023-03-13
Applicant: Amazon Technologies, Inc.
Inventor: Kevin Crews , Prasanna H. Sridhar , Ariya Rastrow , Nicholas Matthew Jutila , Andrew Oberlin , Samarth Batra , Paul Anthony Bernhardt , Veerdhawal Pande , Roland Maximilian Rolf Maas
IPC: G10L13/08 , G10L13/04 , G10L15/187
CPC classification number: G10L13/08 , G10L13/04 , G10L15/187 , G10L2015/025
Abstract: Techniques for an interactive turn-based reading experience are described. A system may take turns reading content, such as a book, with a user. The system may process audio data representing a user reading a portion of the content, determine reading evaluation data, and determine how to proceed for the next turn based on the reading evaluation data. For example, based on the reading evaluation data, the system may read a portion of the content by outputting synthesized speech representing the content, may ask the user re-read a portion of the content, or may ask the user to read a different, smaller portion of the content.
-
公开(公告)号:US20240029743A1
公开(公告)日:2024-01-25
申请号:US18206231
申请日:2023-06-06
Applicant: Amazon Technologies, Inc.
Inventor: Stanislaw Ignacy Pasko , Pawel Zelazko , Cagdas Bak , Eli Joshua Fidler , Michal Kowalczuk , Andrew Oberlin , Ariya Rastrow
IPC: G10L17/26 , G10L15/183 , G10L15/34 , G10L15/22
CPC classification number: G10L17/26 , G10L15/183 , G10L15/34 , G10L15/22
Abstract: Some speech processing systems may handle some commands on-device rather than sending the audio data to a second device or system for processing. The first device may have limited speech processing capabilities sufficient for handling common language and/or commands, while the second device (e.g., an edge device and/or a remote system) may call on additional language models, entity libraries, skill components, etc. to perform additional tasks. An intermediate data generator may facilitate dividing speech processing operations between devices by generating a stream of data that includes a first-pass ASR output (e.g., a word or sub-word lattice) and other characteristics of the audio data such as whisper detection, speaker identification, media signatures, etc. The second device can perform the additional processing using the data stream; e.g., without using the audio data. Thus, privacy may be enhanced by processing the audio data locally without sending it to other devices/systems.
-
公开(公告)号:US11721347B1
公开(公告)日:2023-08-08
申请号:US17362301
申请日:2021-06-29
Applicant: Amazon Technologies, Inc.
Inventor: Stanislaw Ignacy Pasko , Pawel Zelazko , Cagdas Bak , Eli Joshua Fidler , Michal Kowalczuk , Andrew Oberlin , Ariya Rastrow
IPC: G10L17/26 , G10L15/183 , G10L15/34 , G10L15/22
CPC classification number: G10L17/26 , G10L15/183 , G10L15/22 , G10L15/34
Abstract: Some speech processing systems may handle some commands on-device rather than sending the audio data to a second device or system for processing. The first device may have limited speech processing capabilities sufficient for handling common language and/or commands, while the second device (e.g., an edge device and/or a remote system) may call on additional language models, entity libraries, skill components, etc. to perform additional tasks. An intermediate data generator may facilitate dividing speech processing operations between devices by generating a stream of data that includes a first-pass ASR output (e.g., a word or sub-word lattice) and other characteristics of the audio data such as whisper detection, speaker identification, media signatures, etc. The second device can perform the additional processing using the data stream; e.g., without using the audio data. Thus, privacy may be enhanced by processing the audio data locally without sending it to other devices/systems.
-
公开(公告)号:US11670285B1
公开(公告)日:2023-06-06
申请号:US17102910
申请日:2020-11-24
Applicant: Amazon Technologies, Inc.
Inventor: Kevin Crews , Prasanna H Sridhar , Ariya Rastrow , Nicholas Matthew Jutila , Andrew Oberlin , Samarth Batra , Paul Anthony Bernhardt , Veerdhawal Pande , Roland Maximilian Rolf Maas
IPC: G06F40/40 , G10L13/08 , G10L13/04 , G10L15/187 , G10L15/02
CPC classification number: G10L13/08 , G10L13/04 , G10L15/187 , G10L2015/025
Abstract: Techniques for an interactive turn-based reading experience are described. A system may take turns reading content, such as a book, with a user. The system may process audio data representing a user reading a portion of the content, determine reading evaluation data, and determine how to proceed for the next turn based on the reading evaluation data. For example, based on the reading evaluation data, the system may read a portion of the content by outputting synthesized speech representing the content, may ask the user re-read a portion of the content, or may ask the user to read a different, smaller portion of the content.
-
-
-