Abstract:
In an example embodiment, a method for selecting text snippets to display on a computer display is provided. A universal concept graph for phrases relevant to a search domain is created, the universal concept graph representing each phrase as a node and relationships between the phrases as edges between the nodes. A result in the search domain is represented as a subgraph of the universal concept graph by extracting a portion of the universal concept graph containing phrases contained in the result. Then, a score is produced for each node of the subgraph, the score based on a graph analysis algorithm applied to the subgraph. Then text snippets to display for the result are selected to be displayed based on the scores produced in the subgraph for phrases contained in the text snippets.
Abstract:
A method of embedding video for text search includes extracting visual features from a video. The visual features may, for example, include appearance information, motion, audio, and/or like features. Term vectors are determined from textual descriptions associated with the video. The text may be included in a title for the video or included within the video (e.g., subtitles), for example. A feature projection is computed based on the extracted video features and a textual projection is computed based on the term vectors. A semantic embedding is computed based on the feature projection and the textual projection by jointly optimizing semantic predictability and semantic descriptiveness.
Abstract:
A system and method are disclosed for obtaining a plurality of unlabeled text documents; obtaining an initial concept; obtaining keywords from a knowledge source based on the initial concept; scoring the plurality of unlabeled documents based at least in part on the initial keywords; determining a categorization of the documents based on the scores; performing a first feature selection and creating a first vector space representation of each document in a first category and a second category, the first and second categories based on the scores, the first vector space representation serving as one or more labels for an associated unlabeled textual document; and generating the training set including a subset of the obtained unlabeled textual documents, the subset of the obtained unlabeled documents including a documents belonging to the first category and documents belonging to the second category.
Abstract:
In example implementations, a plurality of re-structured version of texts is generated for each one of a plurality of different documents by applying a plurality of text summarization methods to each one of the plurality of different documents. An effectiveness score is calculated for each one of the plurality of text summarization methods to determine the text summarization method with the highest effectiveness score for an application. The plurality of re-structured versions of text for each one of the plurality of different documents that is generated by the text summarization method that has the highest effectiveness score is stored to be used in the application.
Abstract:
A protected querying technique involves creating shingles from a query and then fingerprinting the shingles. The documents to be queried are also shingled and then fingerprinted. The overlap between adjacent shingles for the query and the documents to be queried is different, there being less, or no overlap for the query shingles. The query fingerprint is compared to the fingerprints of the documents to be queried to determine whether there are any matches.
Abstract:
The present disclosure relates to performing similarity metric analysis and data enrichment using knowledge sources. A data enrichment service can compare an input data set to reference data sets stored in a knowledge source to identify similarly related data. A similarity metric can be calculated corresponding to the semantic similarity of two or more datasets. The similarity metric can be used to identify datasets based on their metadata attributes and data values enabling easier indexing and high performance retrieval of data values. A input data set can labeled with a category based on the data set having the best match with the input data set. The similarity of an input data set with a data set provided by a knowledge source can be used to query a knowledge source to obtain additional information about the data set. The additional information can be used to provide recommendations to the user.
Abstract:
Techniques and constructs to facilitate spelling correction of email queries can leverage features of email data to obtain candidate corrections particular to the email data being queried. The constructs may enable accurate spelling correction of email queries across languages and domains based on, for example, one or more of a language model such as a bigram language model and/or a normalized token IDF based language model, a translation model such as an edit distance translation model and/or a fuzzy match translation model, content-based features, and/or contextual features. Content-based features can include features associated with the subject line of emails, content including identified phrases, contacts, and/or the number of candidate emails returned. Contextual features can include a time window of subject match and/or contact match, a frequency of emails received from a contact, and/or device characteristics.
Abstract:
Secure information retrieval is disclosed. One example is a system including an information retriever comprising a collection of nodes that receive a hash count from a first dataset, the first dataset including a first data term, and provide the hash count to a second dataset, the second dataset including a plurality of second data terms. A hash transformer transforms the data terms based on the hash count. A modifier modifies, for a given node, the transformed data terms. An evaluator evaluates, for each node, a similarity value between the first data term and each given second data term based on shared data elements between the modified first data term and a given modified second data term associated with the given second data term. The information retriever provides to the first dataset, at least one term identifier associated with a second data term.
Abstract:
A method and an apparatus for recommending music are provided. The method includes : acquiring a historical browsing record of each user account on a network service (101); establishing a browsing sequence of each user account according to the historical browsing record corresponding to each user account (102); mapping the browsing sequence of each user account to a mapping value (103 ); aggregating all user accounts according to the mapping value corresponding to each user account, to obtain at least one user account group (104); and recommending the network service to each user account based on a user account group to which the user account belongs (105). The method improves an accuracy rate of whether a recommended network service satisfies an interest of a user in the network service.
Abstract:
In one embodiment, a method includes receiving an identification of a location. The method further includes accessing an inverted index that comprises a plurality of records, where each record corresponds to a map tile and identifies one or more places corresponding to the map tile. At least one of the places identified in the inverted index is identified in multiple records corresponding to multiple map tiles, where the map tiles collectively define an area that circumscribes the place. The method also includes identifying based on the inverted index one or more places associated with the location.