-
公开(公告)号:US20200285878A1
公开(公告)日:2020-09-10
申请号:US16297388
申请日:2019-03-08
Applicant: Microsoft Technology Licensing, LLC
Inventor: Yan Wang , Ye Wu , Arun Sacheti
Abstract: Described herein is a mechanism for visual recognition of items or visual search using Optical Character Recognition (OCR) of text in images. Recognized OCR blocks in an image comprise position information and recognized text. The embodiments utilize a location-aware feature vector created using the position and recognized information in each recognized block. The location-aware features of the feature vector utilize position information associated with the block to calculate a weight for the block. The recognized text is used to construct a tri-character gram frequency, inverse document frequency (TGF-IDP) metric using tri-character grams extracted from the recognized text. Features in location-aware feature vector for the block are computed by multiplying the weight and the corresponding TGF-IDF metric. The location-aware feature vector for the image is the sum of the location-aware feature vectors for the individual blocks.
-
公开(公告)号:US11928875B2
公开(公告)日:2024-03-12
申请号:US16297388
申请日:2019-03-08
Applicant: Microsoft Technology Licensing, LLC
Inventor: Yan Wang , Ye Wu , Arun Sacheti
IPC: G06F16/387 , G06F16/31 , G06F16/35 , G06F18/2411 , G06N20/10 , G06V10/70 , G06V10/75 , G06V10/80 , G06V20/62 , G06V30/144 , G06V30/148 , G06V30/18 , G06V30/19 , G06V30/413 , G06V30/414 , G06V30/10
CPC classification number: G06V30/18171 , G06F16/313 , G06F16/35 , G06F16/387 , G06F18/2411 , G06N20/10 , G06V10/70 , G06V10/75 , G06V10/806 , G06V20/62 , G06V30/144 , G06V30/153 , G06V30/158 , G06V30/1916 , G06V30/19173 , G06V30/413 , G06V30/414 , G06V30/10
Abstract: Described herein is a mechanism for visual recognition of items or visual search using Optical Character Recognition (OCR) of text in images. Recognized OCR blocks in an image comprise position information and recognized text. The embodiments utilize a location-aware feature vector created using the position and recognized information in each recognized block. The location-aware features of the feature vector utilize position information associated with the block to calculate a weight for the block. The recognized text is used to construct a tri-character gram frequency, inverse document frequency (TGF-IDP) metric using tri-character grams extracted from the recognized text. Features in location-aware feature vector for the block are computed by multiplying the weight and the corresponding TGF-IDF metric. The location-aware feature vector for the image is the sum of the location-aware feature vectors for the individual blocks.
-
公开(公告)号:US11074289B2
公开(公告)日:2021-07-27
申请号:US15885568
申请日:2018-01-31
Applicant: Microsoft Technology Licensing, LLC.
Inventor: Houdong Hu , Yan Wang , Linjun Yang , Li Huang , Xi Chen , Jiapei Huang , Ye Wu , Arun K. Sacheti , Meenaz Merchant
IPC: G06F16/53 , G06F16/532 , G06T7/00 , G06K9/62 , G06K9/46 , G06N3/08 , G06F16/51 , G06F16/56 , G06F16/583 , G06F16/2457
Abstract: Systems and methods can be implemented to conduct searches based on images used as queries in a variety of applications. In various embodiments, a set of visual words representing a query image are generated from features extracted from the query image and are compared with visual words of index images. A set of candidate images is generated from the index images resulting from matching one or more visual words in the comparison. A multi-level ranking is conducted to sort the candidate images of the set of candidate images, and results of the multi-level ranking are returned to a user device that provided the query image. Additional systems and methods are disclosed.
-
公开(公告)号:US20190236167A1
公开(公告)日:2019-08-01
申请号:US15885568
申请日:2018-01-31
Applicant: Microsoft Technology Licensing, LLC.
Inventor: Houdong Hu , Yan Wang , Linjun Yang , Li Huang , Xi Chen , Jiapei Huang , Ye Wu , Arun K. Sacheti , Meenaz Merchant
CPC classification number: G06F16/532 , G06F16/24578 , G06F16/51 , G06F16/56 , G06F16/5838 , G06K9/46 , G06K9/6215 , G06K9/627 , G06K2209/27 , G06N3/08 , G06T7/97 , G06T2207/20084 , G06T2207/30196
Abstract: Systems and methods can be implemented to conduct searches based on images used as queries in a variety of applications. In various embodiments, a set of visual words representing a query image are generated from features extracted from the query image and are compared with visual words of index images. A set of candidate images is generated from the index images resulting from matching one or more visual words in the comparison. A multi-level ranking is conducted to sort the candidate images of the set of candidate images, and results of the multi-level ranking are returned to a user device that provided the query image. Additional systems and methods are disclosed.
-
公开(公告)号:US11669558B2
公开(公告)日:2023-06-06
申请号:US16368798
申请日:2019-03-28
Applicant: Microsoft Technology Licensing, LLC
Inventor: Yan Wang , Ye Wu , Houdong Hu , Surendra Ulabala , Vishal Thakkar , Arun Sacheti
IPC: G06N3/04 , G06N5/02 , G06N3/045 , G06F16/33 , G06F16/245 , G06F16/248 , G06V20/62 , G06F18/2413 , G06F17/16
CPC classification number: G06F16/3347 , G06F16/245 , G06F16/248 , G06F18/2413 , G06N3/04 , G06N3/045 , G06N5/02 , G06V20/62 , G06F17/16
Abstract: A computer-implemented technique generates a dense embedding vector that provides a distributed representation of input text. The technique includes: generating an input term-frequency (TF) vector of dimension g that includes frequency information relating to frequency of occurrence of terms in an instance of input text; using a TF-modifying component to modify the term-specific frequency information in the input TF vector by respective machine-trained weighting factors, to produce an intermediate vector of dimension g; using a projection component to project the intermediate vector of dimension g into an embedding vector of dimension k, where k is less than g. Both the TF-modifying component and the projection component use respective machine-trained neural networks. An application performs any of a retrieval-based function, a recognition-based function, a recommendation-based function, a classification-based function, etc. based on the embedding vector.
-
-
-
-