Development of subspace techniques for Kaczmarz reconstruction and genetic sequence clustering
Methods from linear algebra are used to explore two problems of interest within the areas of computational biomedical and bioinformatic science. In particular, the principles of subspace algebra are employed to develop new theories for applications in two contemporary problems of interest: computed tomography and genetic sequence clustering. In the first Kaczmarz problem, improvements in convergence rates are needed due to the extensive use of the algorithm in industry. A new randomization algorithm for the induced selection of measurement rows based upon relative central angles is proposed and compared to ℓ2 norm based randomization. Another randomization algorithm is proposed for Kaczmarz based upon a new hybrid algebraic iterative method, which creates multiple small orthogonal subspaces and reduces coherent error propagation. Results from both methods are presented which show slight advantage under coherent sampling. Significant theoretical analyses are conducted for both deterministic and statistical treatments of simple and block Kaczmarz's iterative projection methods. Theoretic convergence equivalence between Gram-Schmidt, QR matrix, and block projection is shown with algorithmic framework to support implementation. The Kaczmarz theoretic expected convergence rate proof for randomized IID N-random variate statistical unit vector measurements with normally distributed components for uniform and non-uniform random row and angle selection is presented. The theoretical formalism of Galantai and Halperin is used to prove deterministic convergence of the simple and block methods. Another proof is developed which shows that any measurement system with complete (span) coverage of the signal space will converge with exponential rate. The second problem is based upon a well known challenge in biological sequence clustering and analysis of multiple genomes for multiple species. In particular, a hypothesis for subspace clustering is proposed, developed, and tested via computational simulations using randomized sequence experiments for the first time to our knowledge. The results indicate a strong relationship between subspace dimensions, tree depth (time), and mutation rates, based upon random sampling of orthologous biochemical sequences from random mixtures of groups from the NIH NCBI COG data.
"Development of subspace techniques for Kaczmarz reconstruction and genetic sequence clustering"
ETD Collection for Tennessee State University.