-
公开(公告)号:US12131737B2
公开(公告)日:2024-10-29
申请号:US17910378
申请日:2020-03-19
发明人: Mika Sugimoto
IPC分类号: G10L15/22 , B60H1/00 , B60Q3/80 , B60R25/31 , E05F15/73 , G06F40/20 , G06F40/30 , G10L15/18 , G10L15/28
CPC分类号: G10L15/22 , B60H1/00757 , B60Q3/80 , B60R25/31 , E05F15/73 , G06F40/20 , G06F40/30 , G10L15/1815 , G10L15/28 , E05Y2400/45 , E05Y2400/85 , E05Y2900/531 , E05Y2900/548 , E05Y2900/55 , G10L2015/223
摘要: A voice recognition device receives requests to control devices installed in a moving body based on instructions voiced by a user. The voice recognition device includes a speech acquisition unit, a speech data conversion unit, a control target device identification unit, a detection mode setting unit and a control request identification unit. The speech acquisition unit acquires speech. The speech data conversion unit converts the acquired speech into speech data. The control target device identification unit that analyzes the speech data to identify the control target device. The detection mode setting unit that sets a detection mode for identifying the control request corresponding to the speech data in accordance with the control target device. The control request identification unit that analyzes the speech data to identify the control request with respect to the control target device, based on the set detection mode.
-
公开(公告)号:US12131523B2
公开(公告)日:2024-10-29
申请号:US17182951
申请日:2021-02-23
申请人: Meta Platforms, Inc.
发明人: Xiaohu Liu , Baiyang Liu , Rajen Subba
IPC分类号: G06V10/82 , G06F3/01 , G06F3/16 , G06F7/14 , G06F9/451 , G06F16/176 , G06F16/22 , G06F16/23 , G06F16/242 , G06F16/2455 , G06F16/2457 , G06F16/248 , G06F16/33 , G06F16/332 , G06F16/338 , G06F16/903 , G06F16/9032 , G06F16/9038 , G06F16/904 , G06F16/951 , G06F16/9535 , G06F18/2411 , G06F40/205 , G06F40/295 , G06F40/30 , G06F40/40 , G06N3/006 , G06N3/08 , G06N7/01 , G06N20/00 , G06Q50/00 , G06V10/764 , G06V20/10 , G06V40/20 , G10L15/02 , G10L15/06 , G10L15/07 , G10L15/16 , G10L15/18 , G10L15/183 , G10L15/187 , G10L15/22 , G10L15/26 , G10L17/06 , G10L17/22 , H04L5/02 , H04L12/28 , H04L41/00 , H04L41/22 , H04L43/0882 , H04L43/0894 , H04L51/02 , H04L51/18 , H04L51/216 , H04L51/52 , H04L67/306 , H04L67/50 , H04L67/5651 , H04L67/75 , H04W12/08 , G10L13/00 , G10L13/04 , H04L51/046 , H04L67/10 , H04L67/53
CPC分类号: G06V10/82 , G06F3/011 , G06F3/013 , G06F3/017 , G06F3/167 , G06F7/14 , G06F9/453 , G06F16/176 , G06F16/2255 , G06F16/2365 , G06F16/243 , G06F16/24552 , G06F16/24575 , G06F16/24578 , G06F16/248 , G06F16/3323 , G06F16/3329 , G06F16/3344 , G06F16/338 , G06F16/90332 , G06F16/90335 , G06F16/9038 , G06F16/904 , G06F16/951 , G06F16/9535 , G06F18/2411 , G06F40/205 , G06F40/295 , G06F40/30 , G06F40/40 , G06N3/006 , G06N3/08 , G06N7/01 , G06N20/00 , G06Q50/01 , G06V10/764 , G06V20/10 , G06V40/28 , G10L15/02 , G10L15/063 , G10L15/07 , G10L15/16 , G10L15/1815 , G10L15/1822 , G10L15/183 , G10L15/187 , G10L15/22 , G10L15/26 , G10L17/06 , G10L17/22 , H04L5/02 , H04L12/2816 , H04L41/20 , H04L41/22 , H04L43/0882 , H04L43/0894 , H04L51/02 , H04L51/18 , H04L51/216 , H04L51/52 , H04L67/306 , H04L67/535 , H04L67/5651 , H04L67/75 , H04W12/08 , G06F2216/13 , G10L13/00 , G10L13/04 , G10L2015/223 , G10L2015/225 , H04L51/046 , H04L67/10 , H04L67/53
摘要: In one embodiment, a method includes by a client system associated with a user, receiving, at the client system, a user input from the user, parsing, by the client system, the first user input to identify a request to execute a function to be performed by an assistant system of several assistant systems associated with the client system, determining whether the user is authorized to access the assistant system by comparing a voiceprint of the user to several voiceprints stored on the client system, sending, from the client system to the assistant system in response to determining the user is authorized to access the assistant system, a request to set an assistant xbot of the assistant system into a listening mode, and receiving, at the client system from the assistant system, an indication that the assistant xbot is in listening mode.
-
公开(公告)号:US20240355119A1
公开(公告)日:2024-10-24
申请号:US18305587
申请日:2023-04-24
申请人: ADOBE INC.
发明人: Ioana Croitoru , Trung Huu Bui , Zhaowen Wang , Seunghyun Yoon , Franck Dernoncourt , Hailin Jin
CPC分类号: G06V20/41 , G06V10/774 , G06V20/49 , G06V20/70 , G10L15/04 , G10L15/1815 , G10L15/22 , G10L25/57 , G10L15/16
摘要: One or more aspects of the method, apparatus, and non-transitory computer readable medium include receiving a query relating to a long video. One or more aspects of the method, apparatus, and non-transitory computer readable medium further include generating a segment of the long video corresponding to the query using a machine learning model trained to identify relevant segments from long videos. One or more aspects of the method, apparatus, and non-transitory computer readable medium further include responding to the query based on the generated segment.
-
公开(公告)号:US12125479B2
公开(公告)日:2024-10-22
申请号:US17667483
申请日:2022-02-08
申请人: Seam Social Labs Inc
发明人: Tiasia O'Brien , Marisa Jean Dinko
CPC分类号: G10L15/1815 , G10L15/063 , G10L15/22 , G10L15/30 , G10L25/63 , G10L2015/223
摘要: A system for providing a sociolinguistic virtual assistant includes a communication device, a processing device, and a storage device. The processing device being configured to process input data using a natural language processing algorithm; categorize the semantic data based on psych-sociological categorizations associated with the at least one user; analyze the command from the at least one user to identify a task associated with the command; generate a response based on identification of the task associated with the command; execute the task associated with the command using categorized semantic data, to derive a result. A method corresponding to the system is also provided.
-
公开(公告)号:US20240347052A1
公开(公告)日:2024-10-17
申请号:US18299462
申请日:2023-04-12
发明人: Nimesh SINHA , Abhi DATTASHARMA , Amit GHOSH
IPC分类号: G10L15/18 , G10L15/197 , G10L15/22 , G10L15/30
CPC分类号: G10L15/1815 , G10L15/197 , G10L15/22 , G10L15/30
摘要: Disclosed is a system and method for determining an intent of a user from an utterance of the user. The system initially builds an intent determination model using a plurality of sample utterances, each assigned with at least one intent class. That is, on receiving the sample utterances, the system extracts significant word pairs from each sample utterance, computes a distinction factor of each significant word pairs, computes a Positive Probability and a Negative Probability for each significant word pairs, and generates the intent determination model by storing each significant word pairs and its distinction factor, the Positive Probability and the Negative Probability. Then on receiving any new utterance, the system extracts significant word pairs, identifies one or more matching word pairs in the model and determines the intent of based on the distinction factor, Positive Probability, and the Negative Probability of the one or more matched word pairs.
-
公开(公告)号:US12117838B1
公开(公告)日:2024-10-15
申请号:US17218621
申请日:2021-03-31
CPC分类号: G05D1/0219 , G05D1/0088 , G05D1/0251 , G05D1/0274 , G06T7/73 , G10L13/08 , G10L15/1807 , G10L15/22 , G10L2015/223
摘要: Described herein is a system for tracking objects and performing dynamic entity resolution using image data. For example, the system may build an environment map and populate the map with objects present in the environment. As the devices move about the environment it may capture image data and, based on its position and/or configuration of its components, may determine updated locations of objects that move in the environment. Upon receiving a query from a user, based on the location of the objects relative to the device/user, the system can interpret gestures and voice commands to infer which object is specified by the voice command. To build the environment map, the system performs object detection to generate bounding boxes associated with an object, then clusters the bounding boxes into a three-dimensional (3D) object associated with 3D coordinates. As the system tracks the object using the 3D coordinates while maintaining two-dimensional (2D) information (e.g., bounding boxes and other features), the system can use existing 2D models to process objects in 3D.
-
公开(公告)号:US12112751B2
公开(公告)日:2024-10-08
申请号:US17673972
申请日:2022-02-17
发明人: Taegu Kim , Hyeonjae Bak , Yoonju Lee , Hansin Koh , Jooyeon Kim , Gajin Song , Jaeyung Yeo
CPC分类号: G10L15/22 , G10L15/063 , G10L15/1815 , G10L15/30 , G10L2015/0635 , G10L2015/223 , G10L2015/228
摘要: An electronic device, according to various embodiments, comprises a communication interface, a processor, and a memory. The memory may store instructions that, when executed, cause the processor to: obtain a user utterance; confirm context information associated with the user utterance; on the basis of the context information, select, as a target device, at least one external electronic device from among a plurality of external electronic devices; and via the communication interface, transmit at least a part of the context information to the at least one external electronic device selected as the target device. Various other embodiments are possible.
-
公开(公告)号:US12112742B2
公开(公告)日:2024-10-08
申请号:US17536890
申请日:2021-11-29
发明人: Jongsun Lee , Jongyoub Ryu , Seonghan Ryu , Eunji Lee , Jaechul Yang , Hyungtak Choi
IPC分类号: G10L15/22 , G06F40/166 , G06F40/30 , G10L15/18
CPC分类号: G10L15/1815 , G06F40/166 , G06F40/30 , G10L15/22 , G10L2015/223
摘要: Provided are an electronic device for correcting a speech input, and an operating method thereof. The method may include receiving a first speech signal; obtaining first text; obtaining an intent of the first speech signal and a confidence score of the intent, by inputting the first text to a natural language understanding model; identifying a plurality of correction candidate semantic elements capable of being correction targets in the first text; receiving a second speech signal; obtaining second text; identifying whether the second speech signal is a speech signal for correcting the first text; comparing the plurality of correction candidate semantic elements in the first text with a semantic element in the second text, based on the confidence score; and correcting at least one of the plurality of correction candidate semantic elements in the first text.
-
公开(公告)号:US12112129B2
公开(公告)日:2024-10-08
申请号:US17527167
申请日:2021-11-16
申请人: Fujitsu Limited
IPC分类号: G10L15/16 , G06F18/214 , G06F40/169 , G06F40/226 , G06N3/04 , G10L15/06 , G10L15/07 , G10L15/18 , G06F40/279 , G06F40/295 , G10L15/183
CPC分类号: G06F40/226 , G06F18/214 , G06F40/169 , G06N3/04 , G10L15/063 , G10L15/075 , G10L15/16 , G10L15/18 , G06F40/279 , G06F40/295 , G10L2015/0635 , G10L15/1822 , G10L15/183
摘要: A method of training a neural network as a natural language processing, NLP, model, comprises: inputting annotated training data to first architecture portions of the neural network, the first architecture portions being executed respectively in a plurality of distributed client computing devices in communication with a server computing device, the training data being derived from text data private to the client computing device in which the first architecture portion is executed, the server computing device having no access to any of the private text data; deriving from the training data, using the first architecture portions, weight matrices of numeric weights which are decoupled from the private text data; concatenating the weight matrices, in a second architecture portion of the neural network executed in the server computing device, to obtain a single concatenated weight matrix; and training, on the second architecture portion, the NLP model using the concatenated weight matrix.
-
公开(公告)号:US20240331686A1
公开(公告)日:2024-10-03
申请号:US18739466
申请日:2024-06-11
发明人: Kai Wei , Thanh Dac Tran , Grant Strimel
CPC分类号: G10L15/1815 , G06N3/08 , G10L15/063 , G10L15/16 , G10L15/22 , G10L15/28 , G10L2015/228
摘要: Techniques for determining and storing relevant context information for a user input, such as a spoken input, are described. In some embodiments, context information is determined to be relevant on an audio frame basis. Context scores for different types of context data (e.g., prior dialog turn data, user profile data, device information, etc.) are determined for individual audio frames corresponding to a spoken input. Based on the corresponding context scores, the most relevant context is stored in a local context cache. The local context cache is updated as subsequent audio frames, of the user input, are processed. The data stored in the context cache is provided to downstream components to perform tasks such as ASR, NLU and SLU.
-
-
-
-
-
-
-
-
-