Speaker Identification: Time-Frequency Analysis With Deep Learning

Hui Chen, Tennessee State University


Speaker identification with deep learning commonly use time-frequency representation of the voice signals. This research experiments with spectrogram based, Mel-Frequency Cepstral Coef- ficients (MFCCs) training on different Neural Networks (NNs) Topologies. The NNs ability to separating human voice biometrics features for identifying speakers. MFCCs are commonly used as feature extractor and combines with a Neural Networks (NNs) in speech recognition systems. This research shows that MFCCs with Convolutional Neural Networks (CNNs) shown a better ac- curacy for identifying speakers, comparing to other NNs topologies. This research also proposes a network for speaker identification, combining Wigner Ville Dis- tribution (WVD) with deep learning. WVD has been used for time-frequency (TF) transformation and successfully implemented for other sound identifying tasks, and its representations are known which have a better resolution of properties. In this research, instead of directly extracting features through MFCCs, WVD is implemented with CNNs together as feature extraction network, and trained on the dataset. Even though the result is inconclusive, it still provided many useful insights of the approach.

Subject Area

Computer science

Recommended Citation

Hui Chen, "Speaker Identification: Time-Frequency Analysis With Deep Learning" (2018). ETD Collection for Tennessee State University. Paper AAI13419654.