摘要:
A method for multiple-label data analysis includes: obtaining labeled data points from more than one labeler; building a classifier that maximizes a measure relating the data points, labels on the data points and a predicted output label; and assigning an output label to an input data point by using the classifier.
摘要:
A method of training a classifier for computer aided detection of digitized medical image, includes providing a plurality of bags, each bag containing a plurality of feature samples of a single region-of-interest in a medical image, where each region-of-interest has been labeled as either malignant or healthy. The training uses candidates that are spatially adjacent to each other, modeled by a “bag”, rather than each candidate by itself. A classifier is trained on the plurality of bags of feature samples, subject to the constraint that at least one point in a convex hull of each bag, corresponding to a feature sample, is correctly classified according to the label of the associated region-of-interest, rather than a large set of discrete constraints where at least one instance in each bag has to be correctly classified.
摘要:
A method and system correlate candidate information and provide batch classification of a number of related candidates. The batch of candidates may be identified from a single data set. There may be internal correlations and/or differences among the candidates. The candidates may be classified taking into consideration the internal correlations and/or differences. The locations and descriptive features of a batch of candidates may be determined. In turn, the locations and/or descriptive features determined may used to enhance the accuracy of the classification of some or all of the candidates within the batch. In one embodiment, the single data set analyzed is associated with an internal image of patient and the distance between candidates is accounted for. Two different algorithms may each simultaneously classify all of the samples within a batch, one being based upon probabilistic analysis and the other upon a mathematical programming approach. Alternate algorithms may be used.
摘要:
We propose using different classifiers based on the spatial location of the object. The intuitive idea behind this approach is that several classifiers may learn local concepts better than a “universal” classifier that covers the whole feature space. The use of local classifiers ensures that the objects of a particular class have a higher degree of resemblance within that particular class. The use of local classifiers also results in memory, storage and performance improvements, especially when the classifier is kernel-based. As used herein, the term “kernel-based classifier” refers to a classifier where a mapping function (i.e., the kernel) has been used to map the original training data to a higher dimensional space where the classification task may be easier.
摘要:
A list of biomarkers indicative of patient outcome is reduced. A computer program is applied to a set of biomarkers indicative of a patient outcome (e.g., prognosis, diagnosis, or treatment result). The computer program models the set of biomarkers with a subset of the biomarkers. The subset is identified without labeling based on the patient outcome. Instead, biomarker scores (e.g., sequence score) are used to identify the subset of biomarkers.
摘要:
An incremental greedy method to feature selection is described. This method results in a final classifier that performs optimally and depends on only a few features. Generally, a small number of features is desired because it is often the case that the complexity of a classification method depends on the number of features. It is very well known that a large number of features may lead to overfitting on the training set, which then leads to a poor generalization performance in new and unseen data. The incremental greedy method is based on feature selection of a limited subset of features from the feature space. By providing low feature dependency, the incremental greedy method 100 requires fewer computations as compared to a feature extraction approach, such as principal component analysis.
摘要:
A method, including receiving a data source selection from a user or software application, the data source including medical information of a plurality of patients, receiving, from the user or software application, a data pattern that is related to a concept to be explored in the data source, querying the data source to find information that approximately matches the data pattern; and receiving the information from the data source, wherein the information includes unstructured data, assigning a classification to individual parts of the information based on the part's relationship to the data pattern, and outputting the classified information to the user or software application.
摘要:
A method, including receiving a data source selection from a user or software application, the data source including medical information of a plurality of patients, receiving, from the user or software application, a data pattern that is related to a concept to be explored in the data source, querying the data source to find information that approximately matches the data pattern; and receiving the information from the data source, wherein the information includes unstructured data, assigning a classification to individual parts of the information based on the part's relationship to the data pattern, and outputting the classified information to the user or software application.
摘要:
CAD (computer-aided diagnosis) systems and applications for breast imaging are provided, which implement methods to automatically extract and analyze features from a collection of patient information (including image data and/or non-image data) of a subject patient, to provide decision support for various aspects of physician workflow including, for example, automated diagnosis of breast cancer other automated decision support functions that enable decision support for, e.g., screening and staging for breast cancer. The CAD systems implement machine-learning techniques that use a set of training data obtained (learned) from a database of labeled patient cases in one or more relevant clinical domains and/or expert interpretations of such data to enable the CAD systems to “learn” to analyze patient data and make proper diagnostic assessments and decisions for assisting physician workflow.
摘要:
Functional imaging information is used to determine a probability of residual disease given a treatment. The functional imaging information shows different characteristic levels for different regions of the tumor. The probability is output for planning use and/or used to automatically determine dose by region. Using the probability, the dose may be distributed by region so that some regions receive a greater dose than other regions. This distribution by region of dose more likely treats the tumor with a same dose, allows a lesser dose to sufficient treat the tumor, and/or allows a greater dose with a lesser or no increase in risk to normal tissue. The dose plan may account for personalized tumors as each patient may have distinct tumors. Probability of dose application accuracy may also be used, so that a combined treatment probability allows efficient dose planning.