摘要:
A system that facilitates organization of emails comprises a clustering component that clusters a plurality of emails and creates topics for emails by assigning key phrases extracted from emails within one or more clusters. An organization component then utilizes the key phrases to organize documents. Furthermore, the organization component can comprise a probability component that determines a probability that a document belongs to a certain topic.
摘要:
Architecture that monitors interaction data (e.g., search queries, query results and click-through rates), and provides users with links to other users that fall into similar categories with respect to the foregoing monitored activities (e.g., providing links to individuals and groups that share common interests and/or profiles). A search engine can be interactively coupled with one or more social networks, and that maps individuals and/or groups within respective social networks to subsets of categories associated with searches. A database stores mapped information which can be continuously updated and reorganized as links within the system mapping become stronger or weaker. The architecture can comprise a social network system that includes a database for mapping search-related information to an entity of a social network, and a search component for processing a search query for search results and returning a link to an entity of a social network based on the search query.
摘要:
A system enables metadata to be gathered about a data store beginning from the creation and generation of the data store, through subsequent use of the data store. This metadata can include keywords related to the data store and data appearing within the data store. Thus, keywords and other metadata can be generated without owner/creator intervention, with enough semantic meaning to make a discovery process associated with the data store much easier and efficient. Usage of or communication regarding a data store are monitored and keywords are extracted from the usage or communication. The keywords are then written to otherwise associated with metadata of the data store. During searching, keywords in the metadata are made available to be used to attempt to match query terms entered by a searcher.
摘要:
In one embodiment, datasets are stored in a catalog. The datasets are enriched by establishing relationships among the domains in different datasets. A user searches for relevant datasets by providing examples of the domains of interest. The system identifies datasets corresponding to the user-provided examples. The system them identifies connected subsets of the datasets that are directly linked or indirectly linked through other domains. The user provides known relationship examples to filter the connected subsets and to identify the connected subsets that are most relevant to the user's query. The selected connected subsets may be further analyzed by business intelligence/analytics to create pivot tables or to process the data.
摘要:
A model for mapping the raw text representation of a text object to a vector space is disclosed. A function is defined for computing a similarity score given two output vectors. A loss function is defined for computing an error based on the similarity scores and the labels of pairs of vectors. The parameters of the model are tuned to minimize the loss function. The label of two vectors indicates a degree of similarity of the objects. The label may be a binary number or a real-valued number. The function for computing similarity scores may be a cosine, Jaccard, or differentiable function. The loss function may compare pairs of vectors to their labels. Each element of the output vector is a linear or non-linear function of the terms of an input vector. The text objects may be different types of documents and two different models may be trained concurrently.
摘要:
Signal detectors are described herein. By way of example, a system for detecting signals can include a microphone signal detector, a loudspeaker signal detector, a signal discriminator and a decision component. When the microphone signal detector detects the presence of a microphone signal, the loudspeaker signal detector detects the presence of a loudspeaker signal and the signal discriminator determines that near-end speech dominates loudspeaker echo, the decision component can confirm the presence of doubletalk. When the microphone signal detector detects the presence of a microphone signal and the signal discriminator determines that near-end speech dominates loudspeaker echo, the decision component confirms the presence of near-end signal.
摘要:
Electronic mail messages may be collaboratively ranked and filtered. User actions on an electronic mail message received from a sender by one or more recipients may be monitored. Statistics may be generated based on the user actions. The generated statistics may be utilized to provide a quality ranking of the electronic mail message based on the generated statistics.
摘要:
A malicious behavior detection/prevention system, such as an intrusion detection system, is provided that uses active learning to classify entries into multiple classes. A single entry can correspond to either the occurrence of one or more events or the non-occurrence of one or more events. During a training phase, entries are automatically classified into one of multiple classes. After classifying the entry, a generated model for the determined class is utilized to determine how well an entry corresponds to the model. Ambiguous classifications along with entries that do not fit the model well for the determined class are selected for labeling by a human analyst. The selected entries are presented to a human analyst for labeling. These labels are used to further train the classifier and the models. During an evaluation phase, entries are automatically classified using the trained classifier and a policy associated with determined class is applied.
摘要:
A regression-based residual echo suppression (RES) system and process for suppressing the portion of the microphone signal corresponding to a playback of a speaker audio signal that was not suppressed by an acoustic echo canceller (AEC). In general, a prescribed regression technique is used between a prescribed spectral attribute of multiple past and present, fixed-length, periods (e.g., frames) of the speaker signal and the same spectral attribute of a current period (e.g., frame) of the echo residual in the output of the AEC. This automatically takes into consideration the correlation between the time periods of the speaker signal. The parameters of the regression can be easily tracked using adaptive methods. Multiple applications of RES can be used to produce better results and this system and process can be applied to stereo-RES as well.
摘要:
A strategy is described for identifying anomalies in time-series data. The strategy involves dividing the time-series data into a plurality of collected data segments and then using a modeling technique to fit local models to the collected data segments. Large deviations of the time-series data from the local models are indicative of anomalies. In one approach, the modeling technique can use an absolute value (L1) measure of error value for all of the collected data segments. In another approach, the modeling technique can use the L1 measure for only those portions of the time-series data that are projected to be anomalous. The modeling technique can use a squared-term (L2) measure of error value for normal portions of the time-series data. In another approach, the modeling technique can use an iterative expectation-maximization strategy in applying the L1 and L2 measures.