Neural Networks based Multimodal Transformer for Multi-Task User Interface Modeling

    公开(公告)号:US20230031702A1

    公开(公告)日:2023-02-02

    申请号:US17812208

    申请日:2022-07-13

    Applicant: Google LLC

    Abstract: A method includes receiving, via a computing device, a screenshot of a display provided by a graphical user interface of the computing device. The method also includes generating, by an image-structure transformer of a neural network, a representation by fusing a first embedding based on the screenshot and a second embedding based on a layout of virtual objects in the screenshot. The method additionally includes predicting, by the neural network and based on the generated representation, a modeling task output associated with the graphical user interface. The method further includes providing, by the computing device, the predicted modeling task output.

Patent Agency Ranking