- 专利标题: MULTIMODAL ENTITY AND COREFERENCE RESOLUTION FOR ASSISTANT SYSTEMS
-
申请号: US18623449申请日: 2024-04-01
-
公开(公告)号: US20240331058A1公开(公告)日: 2024-10-03
- 发明人: Shivani Poddar , Seungwhan Moon , Paul Anthony Crook , Rajen Subba
- 申请人: Meta Platforms, Inc
- 申请人地址: US CA Menlo Park
- 专利权人: Meta Platforms, Inc.
- 当前专利权人: Meta Platforms, Inc.
- 当前专利权人地址: US CA Menlo Park
- 主分类号: G06Q50/00
- IPC分类号: G06Q50/00 ; G06F3/01 ; G06F3/16 ; G06F9/451 ; G06F9/48 ; G06F9/54 ; G06F16/332 ; G06F16/9032 ; G06F16/9536 ; G06F18/2321 ; G06F40/205 ; G06F40/242 ; G06F40/253 ; G06F40/295 ; G06F40/30 ; G06F40/35 ; G06F40/56 ; G06N3/04 ; G06N3/045 ; G06N3/047 ; G06N3/08 ; G06N20/00 ; G06Q10/109 ; G06Q30/0601 ; G06V10/20 ; G06V10/764 ; G06V10/82 ; G06V20/00 ; G06V20/20 ; G06V20/30 ; G06V20/40 ; G06V40/16 ; G06V40/20 ; G10L15/06 ; G10L15/08 ; G10L15/16 ; G10L15/18 ; G10L15/22 ; G10L15/30 ; G10L15/32 ; H04L51/18 ; H04L51/212 ; H04L51/222 ; H04L51/224 ; H04L51/52 ; H04L67/306 ; H04L67/75 ; H04N7/14
摘要:
In one embodiment, a method includes receiving, at a client system, an audio input, where the audio input comprises a coreference to a target object, accessing visual data from one or more camera associated with the client system, where the visual data comprises images portraying one or more objects, resolving the coreference to the target object from among the one or more objects, resoling the target object to a specific entity, and providing, at the client system, a response to the audio input, where the response comprises information about the specific entity.
信息查询