-
1.
公开(公告)号:US20240347048A1
公开(公告)日:2024-10-17
申请号:US18442441
申请日:2024-02-15
发明人: Takehiko KAGOSHIMA
CPC分类号: G10L15/16 , G10L15/28 , G10L2015/088
摘要: According to an embodiment, an information processing apparatus includes one or more hardware processors configured to function as a memory control unit, a transformation unit, a first convolutional neural network (CNN), and a second CNN unit. The memory control unit reads a first stride parameter used for controlling an output resolution and a first dilation parameter used for controlling an input resolution from a memory device. The transformation unit transforms the first stride parameter to a second stride parameter and transforms the first dilation parameter to a second dilation parameter by using a transformation parameter. The first CNN unit executes first CNN processing of a feature vector by using at least the second stride parameter. The second CNN unit executes second CNN processing with an output vector of the first CNN unit as an input by using at least the second dilation parameter.
-
公开(公告)号:US12112744B2
公开(公告)日:2024-10-08
申请号:US17684958
申请日:2022-03-02
申请人: Zhejiang University
发明人: Feng Lin , Tiantian Liu , Ming Gao , Chao Wang , Zhongjie Ba , Jinsong Han , Wenyao Xu , Kui Ren
IPC分类号: G10L15/20 , G01S13/88 , G10L15/06 , G10L15/18 , G10L15/22 , G10L15/28 , G10L25/18 , G10L25/78
CPC分类号: G10L15/20 , G01S13/88 , G10L15/063 , G10L15/1815 , G10L15/22 , G10L15/28 , G10L25/18 , G10L25/78
摘要: The disclosure provides a multimodal speech recognition method and system, and a computer-readable storage medium. The method includes calculating a first logarithmic mel-frequency spectral coefficient and a second logarithmic mel-frequency spectral coefficient when a target millimeter-wave signal and a target audio signal both contain speech information corresponding to a target user; inputting the first and the second logarithmic mel-frequency spectral coefficient into a fusion network to determine a target fusion feature, where the fusion network includes at least a calibration module and a mapping module, the calibration module is configured to perform mutual feature calibration on the target audio/millimeter-wave signals, and the mapping module is configured to fuse a calibrated millimeter-wave feature and a calibrated audio feature; and inputting the target fusion feature into a semantic feature network to determine a speech recognition result corresponding to the target user. The disclosure can implement high-accuracy speech recognition.
-
公开(公告)号:US12062360B2
公开(公告)日:2024-08-13
申请号:US16972420
申请日:2019-03-12
申请人: SONY CORPORATION
发明人: Hiro Iwase , Yuhei Taki , Kunihito Sawai
IPC分类号: G10L15/065 , G10L15/08 , G10L15/18 , G10L15/22 , G10L15/28
CPC分类号: G10L15/065 , G10L15/1815 , G10L15/22 , G10L15/28 , G10L2015/088
摘要: The present invention has an issue of effectively reducing the input load related to a voice trigger. There is provided an information processing device comprising a registration control unit that dynamically controls registration of startup phrases used as start triggers of a voice interaction session, in which the registration control unit temporarily additionally registers at least one of the startup phrases based on input voice. There is also provided an information processing method comprising dynamically controlling, by a processor, registration of startup phrases used as start triggers of a voice interaction session, in which the controlling further includes temporarily additionally registering at least one of the startup phrases based on input voice.
-
公开(公告)号:US12014736B2
公开(公告)日:2024-06-18
申请号:US17413158
申请日:2019-10-30
发明人: Tatsuma Sakurai , Ichitaro Kohara
CPC分类号: G10L15/22 , A63H5/00 , A63H11/00 , G10L15/28 , A63H2200/00
摘要: An information processing apparatus that includes a control unit controlling an action of an autonomous operation unit, and in which the control unit controls transition of plural states relating to speech recognition processing through the autonomous operation unit based on a detected trigger, and the states include a first active state in which an action of the autonomous operation unit is restricted, and a second active state in which the speech recognition processing is performed. An information processing method in which a processor controls an action of an autonomous operation unit, the controlling includes controlling transition of plural states relating to speech recognition processing through the autonomous operation unit based on a detected trigger, and the states include a first active state in which an action of the autonomous operation unit is restricted, and a second active state in which the speech recognition processing is performed.
-
公开(公告)号:US12006619B2
公开(公告)日:2024-06-11
申请号:US17267073
申请日:2019-08-21
申请人: LG ELECTRONICS INC.
发明人: Inseong Hwang , Jeongbeom Kim
IPC分类号: D06F34/34 , G10L15/28 , D06F105/60
CPC分类号: D06F34/34 , G10L15/28 , D06F2105/60
摘要: The present invention relates to a clothes-processing device comprising: a cabinet comprising a body; a body front surface fixed to the body and forming the front surface, and an introduction opening formed through the body front surface; a drum comprising a drum body disposed in the cabinet so as to store clothes and a drum introduction opening formed through the drum body to communicate with the introduction opening; a driving part for rotating the drum; a door rotatably disposed at the cabinet so as to open or close the introduction opening; a control part for controlling the driving part; and a voice recognition part disposed at the door so as to recognize a voice generated by a user and transmit a control command corresponding to the recognized voice to the control part.
-
公开(公告)号:US20240184340A1
公开(公告)日:2024-06-06
申请号:US18543629
申请日:2023-12-18
申请人: Google LLC
发明人: James Nelson Castro , Carl Alexander Cepress , Liang Ching Tseng , Darren Torrie , Frances Maria Hui Hong Kwee , Rex Pinegar Price
IPC分类号: G06F1/16 , G02F1/1333 , G02F1/1337 , G06F3/16 , G06F21/83 , G10L15/28 , H04L12/28 , H04R1/02 , H04R1/34
CPC分类号: G06F1/166 , G02F1/133308 , G02F1/133753 , G06F1/1605 , G06F1/1626 , G06F1/1637 , G06F1/1658 , G06F1/1683 , G06F1/1686 , G06F1/1688 , G06F1/1698 , G06F3/167 , G06F21/83 , G10L15/28 , H04R1/023 , H04R1/025 , H04R1/028 , H04R1/345 , G02F1/133325 , G02F1/133761 , H04L12/282 , H04R2499/15
摘要: In a display assistant device, a speaker is mounted in a waveguide structure which is at least partially disposed beneath a display screen. The waveguide structure is mounted in an exterior housing which includes speaker grills distributed on a plurality of surfaces of the exterior housing, permitting sound waves from the speaker to be projected outside the exterior housing. A cover structure is disposed on top of the waveguide structure to conceal the waveguide structure and speaker within the exterior housing. The cover structure has a tilted bottom surface configured to be suspended above the waveguide structure and to be separated by a first space. Sound waves projected from an upper portion of the speaker are reflected by the tilted bottom surface and are guided through the first space to exit the device from a speaker grill portion located on a rear side of the exterior housing.
-
公开(公告)号:US11994917B2
公开(公告)日:2024-05-28
申请号:US17889683
申请日:2022-08-17
申请人: Google LLC
发明人: Xiaoping Qin , Christen Cameron Bilger , Frederic Heckmann , Frances Kwee , Justin Leong , James Castro
IPC分类号: G06F1/16 , G02F1/1333 , G02F1/1337 , G06F3/16 , G06F21/83 , G10L15/28 , H04L12/28 , H04R1/02 , H04R1/34
CPC分类号: G06F1/166 , G02F1/133308 , G02F1/133753 , G06F1/1605 , G06F1/1626 , G06F1/1637 , G06F1/1658 , G06F1/1683 , G06F1/1686 , G06F1/1688 , G06F1/1698 , G06F3/167 , G06F21/83 , G10L15/28 , H04R1/023 , H04R1/025 , H04R1/028 , H04R1/345 , G02F1/133325 , G02F1/133761 , H04L12/282 , H04R2499/15
摘要: This application is directed to a speaker assembly in which a speaker is mounted in an enclosure structure. The enclosure structure exposes a speaker opening of the speaker and provides a sealed enclosure for a rear portion of the speaker, and further includes an electrically conductive portion. One or more electronic components are coupled to the electrically conductive portion of the enclosure structure (which is grounded in some implementations). The electrically conductive portion of the enclosure structure is configured to provide electromagnetic shielding for the electronic components and forms part of the sealed enclosure of the speaker. In some implementations, the electrically conductive portion of the enclosure structure is thermally coupled to the electronic components and acts as a heat sink that is configured to absorb heat generated by the electronic components and dissipate the generated heat away from the electronic components.
-
8.
公开(公告)号:US11984121B2
公开(公告)日:2024-05-14
申请号:US17425444
申请日:2020-01-17
发明人: Akira Fukui , Hiroaki Ogawa , Yoshinori Maeda , Chie Kamada , Emiru Tsunoo , Akira Takahashi , Noriko Totsuka , Kazuya Tateishi , Yuichiro Koyama , Yuki Takeda , Hideaki Watanabe , Kan Kuroda
CPC分类号: G10L15/22 , G06F3/16 , G10L15/28 , G10L2015/221 , G10L2015/223 , G10L2015/225 , G10L2015/228
摘要: An information processing device presents first information indicating that voice input for the voice operation is possible and second information representing a domain of utterance in which voice operation is possible in response to an occurrence of a predetermined state transition, and performs voice recognition for voice input by a user.
-
公开(公告)号:US11978453B2
公开(公告)日:2024-05-07
申请号:US17347323
申请日:2021-06-14
发明人: Narendra Gyanchandani , Junqing Shang , Joe Pemberton , Rushi P Desai , Liyuan Zhang , Shubham Katiyar , Lawrence Mariadas Chettiar , Artun Kutchuk , Naushad Zaveri
摘要: Devices and techniques are generally described for a speech processing routing architecture. First input data representing an input request may be received. First data including a semantic interpretation of the input request may be determined. Metadata of the first input data may be determined. The metadata may identify an entity associated with the input request. In some examples, a query may be sent to a first component. The query may include the metadata. In some examples, second data that identifies a first skill associated with the entity may be received from the first component. In various examples, the first skill may be selected for processing the first input data based at least in part on the first data and the second data.
-
10.
公开(公告)号:US20240106932A1
公开(公告)日:2024-03-28
申请号:US18528013
申请日:2023-12-04
发明人: Kwang-Youn KIM , Won-Nam JANG
IPC分类号: H04M3/493 , G06F3/04817 , G06F3/0482 , G06F3/04842 , G06F3/0487 , G06F3/16 , G06F16/33 , G06F16/957 , G10L15/22
CPC分类号: H04M3/4938 , G06F3/04817 , G06F3/0482 , G06F3/04842 , G06F3/0487 , G06F3/167 , G06F16/3334 , G06F16/957 , G10L15/22 , G10L15/28
摘要: An example electronic apparatus for providing voice recognition control includes a display; and a processor, wherein the processor may be configured to obtain a content including at least one object; distinguish the at least one object within the content; display an instruction text in correspondence with a non-text object among the at least one object; and select the non-text object corresponding to the instruction text if a voice command corresponding to the instruction text is inputted.
-
-
-
-
-
-
-
-
-