摘要:
Normalization of experimental fragment patterns for nucleic acid polymers having putatively known sequences starts with obtaining at least one raw fragment pattern for the experimental sample. The raw fragment pattern represents the positions of a selected nucleic acid base within the polymer as a function of migration time or distance. This raw fragment pattern is conditioned using conventional baseline correction and noise reduction technique to yield a clean fragment pattern. The clean fragment pattern is then evaluated to determine one or more “normalization coefficients.” These normalization coefficients reflect the displacement, stretching or shrinking, and rate of stretching or shrinking of the clean fragment, or segments thereof, which are necessary to obtain a suitably high degree of correlation between the clean fragment pattern and a standard fragment pattern which represents the positions of the selected nucleic acid base within a standard polymer actually having the known sequence as a function of migration time or distance. The normalization coefficients are then applied to the clean fragment pattern to produce a normalized fragment pattern which is used for base-calling in a conventional manner. This method may be implemented in an apparatus comprising a computer processor programmed to determine normalization coefficients for an experimental fragment pattern. This computer may be separate from the electrophoresis apparatus, or part of an integrated unit.
摘要:
Normalization of experimental fragment patterns for nucleic acid polymers having putatively known sequences starts with obtaining at least one raw fragment pattern for the experimental sample. The raw fragment pattern represents the positions of a selected nucleic acid base within the polymer as a function of migration time or distance. This raw fragment pattern is conditioned using conventional baseline correction and noise reduction technique to yield a clean fragment pattern. The clean fragment pattern is then evaluated to determine one or more "normalization coefficients." These normalization coefficients reflect the displacement, stretching or shrinking, and rate of stretching or shrinking of the clean fragment, or segments thereof, which are necessary to obtain a suitably high degree of correlation between the clean fragment pattern and a standard fragment pattern which represents the positions of the selected nucleic acid base within a standard polymer actually having the known sequence as a function of migration time or distance. The normalization coefficients are then applied to the clean fragment pattern to produce a normalized fragment pattern which is used for base-calling in a conventional manner. This method may be implemented in an apparatus comprising a computer processor programmed to determine normalization coefficients for an experimental fragment pattern. This computer may be separate from the electrophoresis apparatus, or part of an integrated unit.
摘要:
Data traces from four channels of an automated electrophoresis detection apparatus are aligned by identifying peaks in each of the four data traces; optionally normalizing the data traces to achieve a uniform peak height; combining the four data traces in an initial alignment; and determining coefficients of shift and stretch for selected data points within each data trace. The coefficients are determined by optimizing a cost function which reflects the extent of overlap of peaks in the combined normalized data traces to which the coefficients have been applied. The cost function is optimized when the extent of overlap is at a minimum. The coefficients are then used to generate a warp function for each data trace. These warp functions are applied to their respective data traces to produce four warped data traces which are aligned to form an aligned data set. The aligned data set may be displayed on a video screen of a sequencing apparatus, or may be used as the data set for a base-calling process.
摘要:
Data traces from four channels of an automated electrophoresis detection apparatus are aligned by identifying peaks in each of the four data traces; normalizing the heights of the peaks in each of the data traces to a common value to generate four normalized data traces; combining the four normalized data traces in an initial alignment; and determining coefficients of shift and stretch for selected data points within each normalized data trace. The coefficients are determined by optimizing a cost function which reflects the extent of overlap of peaks in the combined normalized data traces to which the coefficients have been applied. The cost function is optimized when the extent of overlap is at a minimum. The coefficients are then used to generate a warp function for each normalized data trace. These warp function are applied to their respective data traces to produce four warped data traces which are aligned to form an aligned data set. The aligned data set may be displayed on a video screen of a sequencing apparatus, or may be used as the data set for a base-calling process.