摘要:
A system and method for computer-assisted karyotyping includes a processor which receives a digitized image of metaphase chromosomes for processing in an image processing module and a classifier module. The image processing module may include a segmenting function for extracting individual chromosome images, a bend correcting function for straightening images of chromosomes that are bent or curved and a feature selection function for distinguishing between chromosome bands. The classifier module, which may be one or more trained kernel-based learning machines, receives the processed image and generates a classification of the image as normal or abnormal.
摘要:
Learning machines, such as support vector machines, are used to analyze datasets to recognize patterns within the dataset using kernels that are selected according to the nature of the data to be analyzed. Where the datasets include an invariance transformation or noise, tangent vectors are defined to identify relationships between the invariance or noise and the training data points. A covariance matrix is formed using the tangent vectors, then used in generation of the kernel, which may be based on a kernel PCA map.
摘要:
Learning machines, such as support vector machines, are used to analyze datasets to recognize patterns within the dataset using kernels that are selected according to the nature of the data to be analyzed. Where the datasets possesses structural characteristics, locational kernels can be utilized to provide measures of similarity among data points within the dataset. The locational kernels are then combined to generate a decision function, or kernel, that can be used to analyze the dataset. Where an invariance transformation or noise is present, tangent vectors are defined to identify relationships between the invariance or noise and the data points. A covariance matrix is formed using the tangent vectors, then used in generation of the kernel.
摘要:
Biomarkers are identified by analyzing gene expression data using support vector machines (SVM), recursive feature elimination (RFE) and/or linear ridge regression classifiers to rank genes according to their ability to separate prostate cancer from normal tissue. Proteins expressed by identified genes are detected in patient samples to screen, predict and monitor prostate cancer.
摘要:
In a pre-processing step prior to training a learning machine, pre-processing includes reducing the quantity of features to be processed using feature selection methods selected from the group consisting of recursive feature elimination (RFE), minimizing the number of non-zero parameters of the system (l0-norm minimization), evaluation of cost function to identify a subset of features that are compatible with constraints imposed by the learning set, unbalanced correlation score and transductive feature selection. The features remaining after feature selection are then used to train a learning machine for purposes of pattern classification, regression, clustering and/or novelty detection.
摘要:
Identification of a determinative subset of features from within a large set of features is performed by training a support vector machine to rank the features according to classifier weights, where features are removed to determine how their removal affects the value of the classifier weights. The features having the smallest weight values are removed and a new support vector machine is trained with the remaining weights. The process is repeated until a relatively small subset of features remain that is capable of accurately separating the data into different patterns or classes. The method is applied for selecting the smallest number of genes that are capable of accurately distinguishing between medical conditions such as cancer and non-cancer.
摘要:
A system and method are provided for diagnosing diseases or conditions from digital images taken by a remote user with a smart phone or a digital camera and transmitted to an image analysis server in communication with a distributed network. The image analysis server includes a trained learning machine for classification of the images. The user-provided image is pre-processed to extract dimensional, shape and color features then is processed using the trained learning machine to classify the image. The classification result is postprocessed to generate a risk score that is transmitted to the remote user. A database associated with the server may include referral information for geographically matching the remote user with a local physician. An optional operation includes collection of financial information to secure payment for analysis services.
摘要:
A system and method are provided for diagnosing diseases or conditions from digital images taken by a remote user with a smart phone or a digital camera and transmitted to an image analysis server in communication with a distributed network. The image analysis server includes a trained learning machine for classification of the images. The user-provided image is pre-processed to extract dimensional, shape and color features then is processed using the trained learning machine to classify the image. The classification result is postprocessed to generate a risk score that is transmitted to the remote user. A database associated with the server may include referral information for geographically matching the remote user with a local physician. An optional operation includes collection of financial information to secure payment for analysis services.
摘要:
Identification of a determinative subset of features from within a group of features is performed by training a support vector machine using training samples with class labels to determine a value of each feature, where features are removed based on their the value. One or more features having the smallest values are removed and an updated kernel matrix is generated using the remaining features. The process is repeated until a predetermined number of features remain which are capable of accurately separating the data into different classes.
摘要:
A group of features that has been identified as “significant” in being able to separate data into classes is evaluated using a support vector machine which separates the dataset into classes one feature at a time. After separation, an extremal margin value is assigned to each feature based on the distance between the lowest feature value in the first class and the highest feature value in the second class. Separately, extremal margin values are calculated for a normal distribution within a large number of randomly drawn example sets for the two classes to determine the number of examples within the normal distribution that would have a specified extremal margin value. Using p-values calculated for the normal distribution, a desired p-value is selected. The specified extremal margin value corresponding to the selected p-value is compared to the calculated extremal margin values for the group of features. The features in the group that have a calculated extremal margin value less than the specified margin value are labeled as falsely significant.