A Latent Diffusion Model for Heart Sound Synthesis

Yang Tan; Haojie Zhang; Yi Chang; Qingrong Wu; Kun Qian; Bin Hu; Bjorn W. Schuller; Yoshiharu Yamamoto

doi:10.1109/ISCAIT64916.2025.11010636

A Latent Diffusion Model for Heart Sound Synthesis

Yang Tan, Haojie Zhang, Yi Chang, Qingrong Wu, Kun Qian^*, Bin Hu^*, Bjorn W. Schuller, Yoshiharu Yamamoto

^*此作品的通讯作者

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

There are already many analyses of heart sounds used to develop diagnostic systems for the heart condition. However, the existing heart sound database state is not sufficient to train a large number of deep learning models and is not balanced - the normal heart sounds are always more present than abnormal heart sounds. Hence, we are interested in algorithms that can generate heart sounds as they can enhance the current database state. We enter the field of large-scale modelling in medical synthesis by proposing a latent diffusion model for heart sound generation, which can generate highly realistic heart sounds. We further guide the synthesis process through text prompts and labels, revealing the research field of prompted heart sound synthesis. In terms of experimental results, our proposed method achieves good results compared to existing mathematical models. At the same time, when the generated audio is used as an unlabelled dataset in semi-supervised learning, it shows an improvement effect on the classification model. This indicates that the heart sounds generated by the diffusion model bear great value.

源语言	英语
主期刊名	2025 4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025
出版商	Institute of Electrical and Electronics Engineers Inc.
页	1314-1319
页数	6
ISBN（电子版）	9798331542856
DOI	http://doi.org/10.1109/ISCAIT64916.2025.11010636
出版状态	已出版 - 2025
已对外发布	是
活动	4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025 - Xi'an, 中国期限: 21 3月 2025 → 23 3月 2025

出版系列

姓名	2025 4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025

会议

会议	4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025
国家/地区	中国
市	Xi'an
时期	21/03/25 → 23/03/25

访问文件

10.1109/ISCAIT64916.2025.11010636

其它文件与链接

链接到 Scopus 的出版物

引用此

Tan, Y., Zhang, H., Chang, Y., Wu, Q., Qian, K., Hu, B., Schuller, B. W., & Yamamoto, Y. (2025). A Latent Diffusion Model for Heart Sound Synthesis. 在 2025 4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025 (页码 1314-1319). (2025 4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025). Institute of Electrical and Electronics Engineers Inc.. http://doi.org/10.1109/ISCAIT64916.2025.11010636

Tan, Yang ; Zhang, Haojie ; Chang, Yi 等. / A Latent Diffusion Model for Heart Sound Synthesis. 2025 4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025. Institute of Electrical and Electronics Engineers Inc., 2025. 页码 1314-1319 (2025 4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025).

@inproceedings{d8982021ed464fd28d57fc13bab10492,

title = "A Latent Diffusion Model for Heart Sound Synthesis",

abstract = "There are already many analyses of heart sounds used to develop diagnostic systems for the heart condition. However, the existing heart sound database state is not sufficient to train a large number of deep learning models and is not balanced - the normal heart sounds are always more present than abnormal heart sounds. Hence, we are interested in algorithms that can generate heart sounds as they can enhance the current database state. We enter the field of large-scale modelling in medical synthesis by proposing a latent diffusion model for heart sound generation, which can generate highly realistic heart sounds. We further guide the synthesis process through text prompts and labels, revealing the research field of prompted heart sound synthesis. In terms of experimental results, our proposed method achieves good results compared to existing mathematical models. At the same time, when the generated audio is used as an unlabelled dataset in semi-supervised learning, it shows an improvement effect on the classification model. This indicates that the heart sounds generated by the diffusion model bear great value.",

keywords = "Data augmentation, Heart sound, Latent diffusion models, Semi-supervised learning, Sound synthesis",

author = "Yang Tan and Haojie Zhang and Yi Chang and Qingrong Wu and Kun Qian and Bin Hu and Schuller, \{Bjorn W.\} and Yoshiharu Yamamoto",

note = "Publisher Copyright: {\textcopyright} 2025 IEEE.; 4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025 ; Conference date: 21-03-2025 Through 23-03-2025",

year = "2025",

doi = "10.1109/ISCAIT64916.2025.11010636",

language = "English",

series = "2025 4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "1314--1319",

booktitle = "2025 4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025",

address = "United States",

}

Tan, Y, Zhang, H, Chang, Y, Wu, Q, Qian, K , Hu, B, Schuller, BW & Yamamoto, Y 2025, A Latent Diffusion Model for Heart Sound Synthesis. 在 2025 4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025. 2025 4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025, Institute of Electrical and Electronics Engineers Inc., 页码 1314-1319, 4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025, Xi'an, 中国, 21/03/25. http://doi.org/10.1109/ISCAIT64916.2025.11010636

A Latent Diffusion Model for Heart Sound Synthesis. / Tan, Yang; Zhang, Haojie; Chang, Yi 等.
2025 4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025. Institute of Electrical and Electronics Engineers Inc., 2025. 页码 1314-1319 (2025 4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - A Latent Diffusion Model for Heart Sound Synthesis

AU - Tan, Yang

AU - Zhang, Haojie

AU - Chang, Yi

AU - Wu, Qingrong

AU - Qian, Kun

AU - Hu, Bin

AU - Schuller, Bjorn W.

AU - Yamamoto, Yoshiharu

PY - 2025

Y1 - 2025

N2 - There are already many analyses of heart sounds used to develop diagnostic systems for the heart condition. However, the existing heart sound database state is not sufficient to train a large number of deep learning models and is not balanced - the normal heart sounds are always more present than abnormal heart sounds. Hence, we are interested in algorithms that can generate heart sounds as they can enhance the current database state. We enter the field of large-scale modelling in medical synthesis by proposing a latent diffusion model for heart sound generation, which can generate highly realistic heart sounds. We further guide the synthesis process through text prompts and labels, revealing the research field of prompted heart sound synthesis. In terms of experimental results, our proposed method achieves good results compared to existing mathematical models. At the same time, when the generated audio is used as an unlabelled dataset in semi-supervised learning, it shows an improvement effect on the classification model. This indicates that the heart sounds generated by the diffusion model bear great value.

AB - There are already many analyses of heart sounds used to develop diagnostic systems for the heart condition. However, the existing heart sound database state is not sufficient to train a large number of deep learning models and is not balanced - the normal heart sounds are always more present than abnormal heart sounds. Hence, we are interested in algorithms that can generate heart sounds as they can enhance the current database state. We enter the field of large-scale modelling in medical synthesis by proposing a latent diffusion model for heart sound generation, which can generate highly realistic heart sounds. We further guide the synthesis process through text prompts and labels, revealing the research field of prompted heart sound synthesis. In terms of experimental results, our proposed method achieves good results compared to existing mathematical models. At the same time, when the generated audio is used as an unlabelled dataset in semi-supervised learning, it shows an improvement effect on the classification model. This indicates that the heart sounds generated by the diffusion model bear great value.

KW - Data augmentation

KW - Heart sound

KW - Latent diffusion models

KW - Semi-supervised learning

KW - Sound synthesis

UR - http://www.scopus.com/pages/publications/105010227491

U2 - 10.1109/ISCAIT64916.2025.11010636

DO - 10.1109/ISCAIT64916.2025.11010636

M3 - Conference contribution

AN - SCOPUS:105010227491

T3 - 2025 4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025

SP - 1314

EP - 1319

BT - 2025 4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025

Y2 - 21 March 2025 through 23 March 2025

ER -

Tan Y, Zhang H, Chang Y, Wu Q, Qian K , Hu B 等. A Latent Diffusion Model for Heart Sound Synthesis. 在 2025 4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025. Institute of Electrical and Electronics Engineers Inc. 2025. 页码 1314-1319. (2025 4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025). doi: 10.1109/ISCAIT64916.2025.11010636

A Latent Diffusion Model for Heart Sound Synthesis

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此