-
1.
公开(公告)号:WO2021237227A1
公开(公告)日:2021-11-25
申请号:PCT/US2021/040137
申请日:2021-07-01
Applicant: INNOPEAK TECHNOLOGY, INC.
Inventor: ZHANG, Kaiyu , LIN, Yuan
Abstract: Systems and methods are provided for implementing multi-language scene text recognition. Particularly, the system and method can improve automated text recognition applications by autonomously recognizing characters in text, and a language of origin for the text. Additionally, a multi-language text recognition model is employed, which applies deep learning algorithms to accurately detect multiple languages using the one model. Therefore, the system and method can achieve an efficient, accurate, and seamless integration of autonomous language detection and character recognition for multiple languages using a single model. A method can involve extracting visual features corresponding to textual content of an input image, where the input image comprises textual content and non-textual context. The extracted features can be encoded to map each visual feature with a character to recognize the textual content. Further, a language for the recognized text can be autonomously recognized based on index values corresponding to the characters.
-
公开(公告)号:WO2022099325A1
公开(公告)日:2022-05-12
申请号:PCT/US2022/011790
申请日:2022-01-10
Applicant: INNOPEAK TECHNOLOGY, INC.
Inventor: LI, Jiachen , ZHANG, Kaiyu , LIU, Rongrong , LIN, Yuan
Abstract: A system may include a backbone network configured to generate feature maps from an image, a transformer network coupled to the backbone network, and a scene text detection subsystem, the scene text detection subsystem comprising a processor, and a non-transitory computer readable medium having encoded thereon a set of instructions executable by the processor to generate a plurality of image tokens from one or more feature maps of an input image, and generate, via the transformer encoder, a set of token queries, wherein the set of token queries quantify an attention of a respective textual feature of a respective image token of the sequence of image tokens relative to all other respective textual features of all other image tokens, and generate, via the transformer decoder, a set of predicted text boxes of the input image.
-
公开(公告)号:WO2022046486A1
公开(公告)日:2022-03-03
申请号:PCT/US2021/046490
申请日:2021-08-18
Applicant: INNOPEAK TECHNOLOGY, INC.
Inventor: ZHANG, Kaiyu , LIN, Yuan , YIN, Junxi
Abstract: Novel tools and techniques are provided for implementing scene text recognition model with text orientation detection or text angle detection. In various embodiments, a computing system may perform feature extraction on an input image, containing text, using a convolutional layer of a convolutional neural network ("CNN") to produce a feature map, and may perform orientation or angle determination of the text in the input image, using a first dense layer of the CNN. If the image of the text is determined to be in the normal orientation or in response to the input image having been rotated to the normal orientation, the computing system may perform feature encoding on values in the feature map, using a sequence layer of the CNN to produce an encoded feature map. The computing system may use a second dense layer of the CNN to process each encoded feature to produce a classification of text.
-
公开(公告)号:WO2021081562A2
公开(公告)日:2021-04-29
申请号:PCT/US2021/014171
申请日:2021-01-20
Applicant: INNOPEAK TECHNOLOGY, INC.
Inventor: ZHANG, Kaiyu , LIN, Yuan
Abstract: This application is directed to performing optical character recognition (OCR) using deep learning techniques. An electronic device receives an image and a language indicator that indicates that the textual content in the image corresponds to a first language. The electronic device processes the image using a multilingual text recognition model applicable to a plurality of languages. The electronic device generates a feature sequence including a plurality of probability values corresponding to the textual content of the image. The feature sequence includes a plurality of feature subsets that correspond to the plurality of languages. For each feature subset, each probability value indicates a probability that a respective textual content corresponds to a respective character in a dictionary of the corresponding language. The electronic device constructs a sparse mask based on the first language and combines the feature sequence and the sparse mask to determine the textual content.
-
-
-