This is an implementation of MFCC (feature extraction) and DTW analysis.
You will need to train the system before using it. In the project you have a threshold of Xdb to sell the phonemes of the audio clip. The default frames parameter is 1024 (= 23ms). The system extracts by default 12 default MFCC coefficients for the bandwidth. K-NN algorithms are used to compare the dataset. Once you have enough data you can use "Predict" and play a phoneme. Download the project from Github following this link.