摘要:
Techniques for classifying structural data with skewed distribution are disclosed. By way of example, a method classifying structural input data comprises a computer system performing the following steps. Multiple classifiers are constructed, wherein each classifier is constructed on a subset of training data, using one or more selected composite features from the subset of training data. A consensus among the multiple classifiers is computed in accordance with a voting scheme such that at least a portion of the structural input data is assigned to a particular class in accordance with the computed consensus. Such techniques for structured data classification are capable of handling skewed class distribution and partial feature coverage issues.
摘要:
A method and system for automatically and adaptively determining query execution plans for parametric queries. A first classifier trained by an initial set of training points is generated. A query workload and/or database statistics are dynamically updated. A new set of training points is collected off-line. Using the new set of training points, the first classifier is modified into a second classifier. A database query is received at a runtime subsequent to the off-line phase. The query includes predicates having parameter markers bound to actual values. The predicates are associated with selectivities. A mapping of the selectivities into a plan determines the query execution plan. The determined query execution plan is included in an augmented set of training points, where the augmented set includes the initial set and the new set.
摘要:
An interconnect structure, an interconnect structure for interconnecting first and second components, an interconnect structure for interconnecting a multiple component stack and a substrate, and a method of fabricating an interconnect structure. The interconnect structure comprising a base portion formed on a mounting surface of a first component; a pillar portion extending from the base portion and substantially perpendicularly to the mounting surface; and a head portion formed on the pillar portion and having larger lateral dimensions than the pillar portion; wherein the base portion and the pillar portion are integrally formed of a homogeneous material.
摘要:
A layered heater structure including an electrode layer and a localized tuning method for tuning the electrode layer of a layered heater structure with high precision is provided. The localized tuning method tunes the electrode layer to its proper local resistance to minimize temperature offsets on the heater surface and thus provide a desired thermal profile that is in marked contrast to conventional, non-localized resistance tuning approaches based on thickness trimming practices, such as grinding or blasting, or resistivity adjustment, such as local heat treatment.
摘要:
A method (and structure) for processing an inductive learning model for a dataset of examples, includes dividing the dataset of examples into a plurality of subsets of data and generating, using a processor on a computer, a learning model using examples of a first subset of data of the plurality of subsets of data. The learning model being generated for the first subset comprises an initial stage of an evolving aggregate learning model (ensemble model) for an entirety of the dataset, the ensemble model thereby providing an evolving estimated learning model for the entirety of the dataset if all the subsets were to be processed. The generating of the learning model using data from a subset includes calculating a value for at least one parameter that provides an objective indication of an adequacy of a current stage of the ensemble model.
摘要:
Sequence-based XML indexing aims at avoiding expensive join operations in query processing. It transforms structured XML data into sequences so that a structured query can be answered holistically through subsequence matching. Herein, there is addressed the problem of query equivalence with respect to this transformation, and thereis introduced a performance-oriented principle for sequencing tree structures. With query equivalence, XML queries can be performed through subsequence matching without join operations, post-processing, or other special handling for problems such as false alarms. There is identified a class of sequencing methods for this purpose, and there is presented a novel subsequence matching algorithm that observe query equivalence. Also introduced is a performance-oriented principle to guide the sequencing of tree structures. For any given XML dataset, the principle finds an optimal sequencing strategy according to its schema and its data distribution; there is thus presented herein a novel method that realizes this principle.
摘要:
In connection with the mining of time-evolving data streams, a general framework that mines changes and reconstructs models from a data stream with unlabeled instances or a limited number of labeled instances. In particular, there are defined herein statistical profiling methods that extend a classification tree in order to guess the percentage of drifts in the data stream without any labelled data. Exact error can be estimated by actively sampling a small number of true labels. If the estimated error is significantly higher than empirical expectations, there preferably re-sampled a small number of true labels to reconstruct the decision tree from the leaf node level.
摘要:
A wafer processing apparatus is fabricated by depositing a film electrode onto the surface of a base substrate, the structure is then overcoated with a protective coating film layer comprising at least one of a nitride, carbide, carbonitride or oxynitride of elements selected from a group consisting of B, Al, Si, Ga, refractory hard metals, transition metals, and combinations thereof. The film electrode has a coefficient of thermal expansion (CTE) that closely matches the CTE of the underlying base substrate layer as well as the CTE of the protective coating layer.
摘要:
The present invention provides an index structure for managing weighted-sequences in large databases. A weighted-sequence is defined as a two-dimensional structure in which each element in the sequence is associated with a weight. A series of network events, for instance, is a weighted-sequence because each event is associated with a timestamp. Querying a large sequence database by events' occurrence patterns is a first step towards understanding the temporal causal relationships among the events. The index structure proposed herein enables the efficient retrieval from the database of all subsequences (contiguous and non-contiguous) that match a given query sequence both by events and by weights. The index structure also takes into consideration the nonuniform frequency distribution of events in the sequence data.
摘要:
Most recent research of scalable inductive learning on very large streaming dataset focuses on eliminating memory constraints and reducing the number of sequential data scans. However, state-of-the-art algorithms still require multiple scans over the data set and use sophisticated control mechanisms and data structures. There is discussed herein a general inductive learning framework that scans the dataset exactly once. Then, there is proposed an extension based on Hoeffding's inequality that scans the dataset less than once. The proposed frameworks are applicable to a wide range of inductive learners.