声纹识别实验
模型:GMM(高斯混合模型)
数据集:TIMIT
GMM高斯混合模型(Gaussian mixture model)
对于说话人识别,一组$N$个说话人集合,用一系列$GMM$ 示,即每个说话人$\mathrm{s}{\mathrm{k}}$ 对应一个$\mathrm{GMM}$ 参数 $\lambda{\mathrm{k}}, \mathrm{k}=1,2, \ldots, \mathrm{N}$ 。说话人识别的目标是寻找一个说话人模型,使得给定说话人观测序列 $\mathrm{X}=x_{1}, x_{2}, \ldots, x_{t}, \ldots, x_{T}, x_{t}$ 是下标为 $\mathrm{t}$ 的特征向量(时间维度下标,表示帧,frame),在某个模型参数下的后验概率最大(the maximum a posterior probability),该模型即为给定说话人观测序列,得到的说话人模型。假设帧间是相互独立的,预测模型(说话人模型) $s_{\text {predicted }}$ 表示为:
$$
\mathrm{S}{\text {predicted }}=\underset{\mathrm{k} \in \mathcal{S}}{\arg \max } \sum{\mathrm{t}=1}^{\mathrm{T}} \log \left[\mathrm{p}\left(\boldsymbol{x}{\boldsymbol{t}} \mid \lambda{\mathrm{k}}\right)\right]$$
代码实现
训练GMM模型
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| def train_gmm(train_features, train_labels, num_components=16): gmm_models = {} unique_labels = set(train_labels) print('training...') for label in tqdm(unique_labels): features = train_features[train_labels == label] gmm = GaussianMixture(n_components=num_components, covariance_type='diag') gmm.fit(features.reshape(features.shape[1], features.shape[0]*features.shape[2])) gmm_models[label] = gmm with open("gmm_models.pkl", "wb") as f: pickle.dump(gmm_models, f) return gmm_models
|
对每个测试数据,找与之对应的最相似的说话人模型
1 2 3 4 5 6 7 8 9 10 11 12
| def predict_speaker(gmm_models, test_features): predictions = [] print('testing...') for features in tqdm(test_features): scores = {} for label, gmm in gmm_models.items(): score = gmm.score(features) scores[label] = score predicted_speaker = max(scores, key=scores.get) predictions.append(predicted_speaker) return predictions
|
训练并预测
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
| train_features = np.load("mfcc/train_features.npy") test_features = np.load("mfcc/test_features.npy") train_labels = np.load("mfcc/train_labels.npy") test_labels = np.load("mfcc/test_labels.npy")
if load_model == True: with open("gmm_models.pkl", "rb") as f: gmm_models = pickle.load(f) else: gmm_models = train_gmm(train_features, train_labels)
predictions = predict_speaker(gmm_models, test_features)
accuracy = accuracy_score(test_labels, predictions) print("GMM预测准确率: {:.3f}%".format(accuracy * 100))
|
预测结果
