Abstract:
A visual search system detects one or more user-selected objects represented in an image. A first group of attributes for the user-selected objects is identified. A category for the user-selected objects is identified, and a second group of pre-defined attributes associated with the category is retrieved. The first and second groups of attributes are combined into an attributes set. The combined set of attributes are presented to the user. The user selects one or more attributes and a search is performed to identify images similar to the user-selected attributes. The images are ranked and a subset is presented to the user.
Abstract:
Non-limiting examples of the present disclosure relate to object detection processing of image content that categorically classifies specific objects within image content. Exemplary object detection processing may be utilized to enhance visual search processing including content retrieval and curation, among other technical advantages. An exemplary object detection model is implemented to categorically classify an object. In doing, so an exemplary object detection model may classify objects based on: analysis of specific objects within image content, positioning of the objects within the image content and intent associated with the image content, among other examples. The object detection model generates exemplary categorical classification(s) for specific data objects, which may be propagated to enhance processing efficiency and accuracy during visual search processing. Exemplary categorical classifications may comprise hierarchical classifications of a detected object that can be used to retrieve, curate and surface content that is most contextually relevant to a detected object.
Abstract:
Auxiliary content provided in addition to search results is selected and presented to aid the user in completing tasks and increasing user interaction performance. Auxiliary content is processed utilizing existing search engine categorization and identification mechanisms, thereby facilitating the determination of similarities between the auxiliary content and indexed content that is identified as being responsive to a search query. At least some of the search results identified as being responsive to the search query are compared to auxiliary content to identify similarities, including visual similarities. Similar auxiliary content are selected to aid the user in completing tasks, and such selected auxiliary content is provided with the search results, including in a visually distinct or separated manner.
Abstract:
Systems and methods directed to returning personalized image-based search results are described. In examples, a query including an image may be received, and a personalized item embedding may be generated based on the image and user profile information associated with a user. Further, a plurality of candidate images may be obtained based on the personalized item embedding. The candidate images may then be ranked according to a predicted level of user engagement for a user, and then diversified to ensure visual diversity among the ranked images. A portion of the diversified images may then be returned in response to an image-based search.
Abstract:
Auxiliary content provided in addition to search results is selected and presented to aid the user in completing tasks and increasing user interaction performance. Auxiliary content is processed utilizing existing search engine categorization and identification mechanisms, thereby facilitating the determination of similarities between the auxiliary content and indexed content that is identified as being responsive to a search query. At least some of the search results identified as being responsive to the search query are compared to auxiliary content to identify similarities, including visual similarities. Similar auxiliary content are selected to aid the user in completing tasks, and such selected auxiliary content is provided with the search results, including in a visually distinct or separated manner.
Abstract:
The description relates to diversified hybrid image annotation for annotating images. One implementation includes generating first image annotations for a query image using a retrieval-based image annotation technique. Second image annotations can be generated for the query image using a model-based image annotation technique. The first and second image annotations can be integrated to generate a diversified hybrid image annotation result for the query image.
Abstract:
A visual search engine is described herein. The visual search engine is configured to return information to a client computing device based upon a multimodal query received from the client computing device (wherein the multimodal query comprises an image and text). The visual search engine is further configured to interact with a user of the client computing device to disambiguate information retrieval intent of the user.
Abstract:
Representative embodiments disclose mechanisms to perform visual intent classification or visual intent detection or both on an image. Visual intent classification utilizes a trained machine learning model that classifies subjects in the image according to a classification taxonomy. The visual intent classification can be used as a pre-triggering mechanism to initiate further action in order to substantially save processing time. Example further actions include user scenarios, query formulation, user experience enhancement, and so forth. Visual intent detection utilizes a trained machine learning model to identify subjects in an image, place a bounding box around the image, and classify the subject according to the taxonomy. The trained machine learning model utilizes multiple feature detectors, multi-layer predictions, multilabel classifiers, and bounding box regression.
Abstract:
A method for using a speech signal to augment a visual search includes processing the image data to determine an image search intent. Concurrently with processing the image data, the method processes the speech signal to determine at least one speech search intent. The method generates a search query by combining keywords and/or the image from the image search intent with keywords from the speech search intent. The method then performs a search based on the generated query and reports the results of the search. The method generates the image search intent by applying the image data to a knowledge base and generates the speech search intent by converting the speech to text and applying the text to a cognition service.
Abstract:
A visual search system includes a computing device, where the computing device includes an image processing engine for generating a feature vector representing a user-selected object in an image. The computing device also includes, an object detection engine for locating one or more objects in the image and for determining a category of a user-selected object from objects in the image, where the object detection engine uses the category to generate a plurality of attributes for the user-selected object. The computing device further includes a product data store for storing a plurality of tables storing one or more attributes associated with a category of the user-selected object. The computing device additionally includes an attribute generation engine for generating a plurality of attribute options and an attribute matching engine for comparing attributes and attribute options of the user-selected object with attributes and attribute options of visually similar products and images.