第二次实验

声纹识别实验

模型:GMM(高斯混合模型)
数据集:TIMIT

GMM高斯混合模型(Gaussian mixture model)

对于说话人识别,一组$N$个说话人集合,用一系列$GMM$ 示,即每个说话人$\mathrm{s}{\mathrm{k}}$ 对应一个$\mathrm{GMM}$ 参数 $\lambda{\mathrm{k}}, \mathrm{k}=1,2, \ldots, \mathrm{N}$ 。说话人识别的目标是寻找一个说话人模型,使得给定说话人观测序列 $\mathrm{X}=x_{1}, x_{2}, \ldots, x_{t}, \ldots, x_{T}, x_{t}$ 是下标为 $\mathrm{t}$ 的特征向量(时间维度下标,表示帧,frame),在某个模型参数下的后验概率最大(the maximum a posterior probability),该模型即为给定说话人观测序列,得到的说话人模型。假设帧间是相互独立的,预测模型(说话人模型) $s_{\text {predicted }}$ 表示为:
$$
\mathrm{S}{\text {predicted }}=\underset{\mathrm{k} \in \mathcal{S}}{\arg \max } \sum{\mathrm{t}=1}^{\mathrm{T}} \log \left[\mathrm{p}\left(\boldsymbol{x}{\boldsymbol{t}} \mid \lambda{\mathrm{k}}\right)\right]$$

代码实现

训练GMM模型

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def train_gmm(train_features, train_labels, num_components=16):
gmm_models = {}
unique_labels = set(train_labels)
print('training...')
for label in tqdm(unique_labels):
# 为每个说话人训练一个 GMM 模型
features = train_features[train_labels == label] # 把这个说话人的所有数据作为训练数据 (n, 2000, 13)
gmm = GaussianMixture(n_components=num_components, covariance_type='diag') # n_components:混合高斯模型数量, covariance_type:协方差类型
# TODO:修改fit时传入的数据,把帧数作为sample_num
gmm.fit(features.reshape(features.shape[1], features.shape[0]*features.shape[2])) # 降到2维作为输入
gmm_models[label] = gmm # N个人的GMM模型组成列表
# 保存模型
with open("gmm_models.pkl", "wb") as f:
pickle.dump(gmm_models, f)
return gmm_models

对每个测试数据,找与之对应的最相似的说话人模型

1
2
3
4
5
6
7
8
9
10
11
12
def predict_speaker(gmm_models, test_features):
predictions = []
print('testing...')
for features in tqdm(test_features): # (400, 13)
scores = {}
for label, gmm in gmm_models.items(): # 遍历每个说话人模型
# TODO:对一个语音段的每帧进行预测,把每帧的得分求和,得到该gmm的得分,再对所有gmm取最高分
score = gmm.score(features)
scores[label] = score
predicted_speaker = max(scores, key=scores.get) # 选择得分最高的说话人
predictions.append(predicted_speaker)
return predictions

训练并预测

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# 读取数据
train_features = np.load("mfcc/train_features.npy")
test_features = np.load("mfcc/test_features.npy")
train_labels = np.load("mfcc/train_labels.npy")
test_labels = np.load("mfcc/test_labels.npy")

# 训练并预测
if load_model == True:
with open("gmm_models.pkl", "rb") as f:
gmm_models = pickle.load(f)
else:
gmm_models = train_gmm(train_features, train_labels)

predictions = predict_speaker(gmm_models, test_features)

accuracy = accuracy_score(test_labels, predictions)
print("GMM预测准确率: {:.3f}%".format(accuracy * 100))

预测结果

image.png


第二次实验
http://example.com/2024/11/27/Notes/课程/语音信息处理/第二次实验/
许可协议