摘要:
A method for facilitating development of a document classification function comprises selecting a feature of a document, the feature being less than an entirety of the document; presenting the feature to a human subject; asking the human subject for a feature relevance value of the feature; and generating a classification function using the feature relevance value. The method may also include the steps of presenting the document to the human subject at the same time as presenting the feature; asking the human subject for document relevance value that measures relevance of the document to a category; and wherein the generating the classification function also uses the document relevance value.
摘要:
A familiarity level classifier comprises a stopwords engine for conducting a stopwords analysis of stopwords, e.g., introductory level stopwords and advanced level stopwords, in a document, e.g., a website; and a familiarity level classifier module for generating a document familiarity level based on the stopwords analysis. The classifier may be in an indexing module, a search engine, a user computer, or elsewhere in a computer network. The classifier may also include a reading level engine for conducting a reading level analysis of the document, and wherein the familiarity level classifier module is configured to generate the familiarity level also based on the reading level analysis. The classifier may also include a document features engine for conducting a feature analysis of the document, and wherein the familiarity level classifier module is configured to generate the document familiarity level also based on the feature analysis.
摘要:
A familiarity level classifier comprises a stopwords engine for conducting a stopwords analysis of stopwords, e.g., introductory level stopwords and advanced level stopwords, in a document, e.g., a website; and a familiarity level classifier module for generating a document familiarity level based on the stopwords analysis. The classifier may be in an indexing module, a search engine, a user computer, or elsewhere in a computer network. The classifier may also include a reading level engine for conducting a reading level analysis of the document, and wherein the familiarity level classifier module is configured to generate the familiarity level also based on the reading level analysis. The classifier may also include a document features engine for conducting a feature analysis of the document, and wherein the familiarity level classifier module is configured to generate the document familiarity level also based on the feature analysis.
摘要:
An improved system and method is provided for efficiently learning a network of categories using prediction. A learning engine may receive a stream of characters and incrementally segment the stream of characters beginning with individual characters into larger and larger categories. To do so, a prediction engine may be provided for predicting a target category from the stream of characters using one or more context categories. Upon predicting the target category, the edges of the network of categories may be updated. A category composer may also be provided for composing a new category from existing categories in the network of categories, and a new category composed may then be added to the network of categories. Advantageously, iterative episodes of prediction and learning of categories for large scale applications may result in hundreds of thousands of categories connected by millions of prediction edges.
摘要:
An improved system and method is provided for learning a weighted index to categorize objects using ranked recall. In an offline embodiment, a learning engine may learn a weighted index for classifying objects using ranked recall by training during an entire initial pass of a training sequence of a collection of objects. In an online embodiment, a learning engine may learn a weighted index for classifying objects using ranked recall by dynamically updating the weighted index as each instance of the collection of objects may be categorized. Advantageously, an instance of a large collection of objects may be accurately and efficiently recalled for many large scale applications with hundreds of thousands of categories by quickly identifying a small set of candidate categories for the given instance of the object.
摘要:
A video hosting service comprising video classifiers that identify content sources of content included in videos uploaded to the video hosting service. Identifying the content source allows a content owner of the content source to claim ownership of videos that include content based on the content source. Usage policies associated with the content owners are applied to the uploaded videos that describe how the video hosting service is to treat the videos.
摘要:
According to a preferred embodiment, a concept learning system and method is used for classifying instances, which, for example, may include web pages or text documents. An instance is input into the system. One or more candidate concepts are recalled from a set of candidate concepts. For each recalled concept, a classifier that corresponds to it is applied to the instance to determine if the recalled concept is related to the instance. Samples are selected from a training set. A learning method is applied, and a set of candidate concepts are updated according to the results from applying the learning method.
摘要:
An improved system and method is provided for efficiently learning a network of categories using prediction. A learning engine may receive a stream of characters and incrementally segment the stream of characters beginning with individual characters into larger and larger categories. To do so, a prediction engine may be provided for predicting a target category from the stream of characters using one or more context categories. Upon predicting the target category, the edges of the network of categories may be updated. A category composer may also be provided for composing a new category from existing categories in the network of categories, and a new category composed may then be added to the network of categories. Advantageously, iterative episodes of prediction and learning of categories for large scale applications may result in hundreds of thousands of categories connected by millions of prediction edges.
摘要:
An improved system and method is provided for learning a weighted index to categorize objects using ranked recall. In an offline embodiment, a learning engine may learn a weighted index for classifying objects using ranked recall by training during an entire initial pass of a training sequence of a collection of objects. In an online embodiment, a learning engine may learn a weighted index for classifying objects using ranked recall by dynamically updating the weighted index as each instance of the collection of objects may be categorized. Advantageously, an instance of a large collection of objects may be accurately and efficiently recalled for many large scale applications with hundreds of thousands of categories by quickly identifying a small set of candidate categories for the given instance of the object.