#### Document Type

Dissertation

#### Publication Date

2012

#### Abstract

This thesis developed theory and associated algorithms to solve subspace segmentation problem. Given a set of data W={w_1,...,w_N} in R^D that comes from a union of subspaces, we focused on determining a nonlinear model of the form U={S_i}_{i in I}, where S_i is a set of subspaces, that is nearest to W. The model is then used to classify W into clusters. Our first approach is based on the binary reduced row echelon form of data matrix. We prove that, in absence of noise, our approach can find the number of subspaces, their dimensions, and an orthonormal basis for each subspace S_i. We provide a comprehensive analysis of our theory and determine its limitations and strengths in presence of outliers and noise. Our second approach is based on nearness to local subspaces approach and it can handle noise effectively, but it works only in special cases of the general subspace segmentation problem (i.e., subspaces of equal and known dimensions). Our approach is based on the computation of a binary similarity matrix for the data points. A local subspace is first estimated for each data point. Then, a distance matrix is generated by computing the distances between the local subspaces and points. The distance matrix is converted to the similarity matrix by applying a data-driven threshold. The problem is then transformed to segmentation of subspaces of dimension 1 instead of subspaces of dimension d. The algorithm was applied to the Hopkins 155 Dataset and generated the best results to date.

#### Recommended Citation

Sekmen, Ali Safak, "Subspace Segmentation And High-Dimensional Data Analysis" (2012). *Computer Science Faculty Research*. 1.

http://digitalscholarship.tnstate.edu/computerscience/1