摘要:
An ontology directory service tool, computer program product and method of automatically discovering ontology file categories. A web search unit searches a network (e.g., the Internet) for semantic data files, e.g., semantic web pages. A preprocessing unit generates an ontology file from the content of each identified semantic data file. A category discovery unit identifies a domain for each ontology file and provides training sets for training ontology file classification. A classification unit trained using the training sets, classifies ontology file instances into inherent ontology categories.
摘要:
An ontology directory service tool, computer program product and method of automatically discovering ontology file categories. A web search unit searches a network (e.g., the Internet) for semantic data files, e.g., semantic web pages. A preprocessing unit generates an ontology file from the content of each identified semantic data file. A category discovery unit identifies a domain for each ontology file and provides training sets for training ontology file classification. A classification unit trained using the training sets, classifies ontology file instances into inherent ontology categories.
摘要:
Techniques for creating training sets for predictive modeling are provided. In one aspect, a method for generating training data from an unlabeled data set is provided which includes the following steps. A small initial set of data is selected from the unlabeled data set. Labels are acquired for the initial set of data selected from the unlabeled data set resulting in labeled data. The data in the unlabeled data set is clustered using a semi-supervised clustering process along with the labeled data to produce data clusters. Data samples are chosen from each of the clusters to use as the training data. The selecting, presenting, clustering and choosing steps are repeated with one or more additional sets of data selected from the unlabeled data set until a desired amount of training data has been obtained, wherein at each iteration an amount of the labeled data is increased.
摘要:
System and method for partitioning a video into a series of semantic units where each semantic unit relates to a generally complete thematic topic. A computer implemented method for partitioning a video into a series of semantic units wherein each semantic unit relates to a theme or a topic, comprises dividing a video into a plurality of homogeneous segments, analyzing audio and visual content of the video, extracting a plurality of keywords from the speech content of each of the plurality of homogeneous segments of the video, and detecting and merging a plurality of groups of semantically related and temporally adjacent homogeneous segments into a series of semantic units in accordance with the results of both the audio and visual analysis and the keyword extraction. The present invention can be applied to generate important table-of-contents as well as index tables for videos to facilitate efficient video topic searching and browsing.
摘要:
A system and method for automatic call segmentation including steps and means for automatically detecting boundaries between utterances in the call transcripts; automatically classifying utterances into target call sections; automatically partitioning the call transcript into call segments; and outputting a segmented call transcript. A training method and apparatus for training the system to perform automatic call segmentation includes steps and means for providing at least one training transcript with annotated call sections; normalizing the at least one training transcript; and performing statistical analysis on the at least one training transcript.
摘要:
A system, method, and computer program are disclosed for recognizing one or more words not listed in a dictionary database. One or more sequences of characters in the word are checked to determine a probability that the word is valid. A prefix removal process removes any prefixes from a word, and obtains information about the removed prefix. A suffix removal process removes any suffixes from the word, and obtains information about the removed suffix. A root process obtains information about a root word from the dictionary database. A combination process then determines if the prefix, the root, and the suffix can be combined into a valid word as defined by one or more combination rules, obtains one or more of the possible parts of speech of the valid word, and stores the parts of speech with the valid word in the dictionary database.
摘要:
System and method for partitioning a video into a series of semantic units where each semantic unit relates to a generally complete thematic topic. A computer implemented method for partitioning a video into a series of semantic units wherein each semantic unit relates to a theme or a topic, comprises dividing a video into a plurality of homogeneous segments, analyzing audio and visual content of the video, extracting a plurality of keywords from the speech content of each of the plurality of homogeneous segments of the video, and detecting and merging a plurality of groups of semantically related and temporally adjacent homogeneous segments into a series of semantic units in accordance with the results of both the audio and visual analysis and the keyword extraction. The present invention can be applied to generate important table-of-contents as well as index tables for videos to facilitate efficient video topic searching and browsing.
摘要:
Techniques for limiting the risk of loss of sensitive data from a mobile device are provided. In one aspect, a method for managing sensitive data on a mobile device is provided. The method includes the following steps. A sensitivity of a data item to be transferred to the mobile device is determined. It is determined whether an aggregate sensitivity of data items already present on the mobile device plus the data item to be transferred exceeds a current threshold sensitivity value for the mobile device. If the aggregate sensitivity exceeds the current threshold sensitivity value, measures are employed to ensure the aggregate sensitivity remains below the current threshold sensitivity value for the mobile device. Otherwise the data item is transferred to the mobile device.
摘要:
A real-time method and system are described for automatically extracting text from the customer-agent interaction at a contact center, analyzing the extracted text to automatically identify one or more customer issues, and performing processing by contact-center agent buddies (CABs) to generate at least one response to the customer issues.
摘要:
Techniques for limiting the risk of loss of sensitive data from a mobile device are provided. In one aspect, a method for managing sensitive data on a mobile device is provided. The method includes the following steps. A sensitivity of a data item to be transferred to the mobile device is determined. It is determined whether an aggregate sensitivity of data items already present on the mobile device plus the data item to be transferred exceeds a current threshold sensitivity value for the mobile device. If the aggregate sensitivity exceeds the current threshold sensitivity value, measures are employed to ensure the aggregate sensitivity remains below the current threshold sensitivity value for the mobile device. Otherwise the data item is transferred to the mobile device.