Mozualization: Crafting Music and Visual Representation with Multimodal AI

Wanfang Xu; Lixiang Zhao; Haiwen Song; Xinheng Song; Zhaolin Lu; Yu Liu; Min Chen; Eng Gee Lim; Lingyun Yu

doi:10.1145/3706599.3719686

Mozualization: Crafting Music and Visual Representation with Multimodal AI

Wanfang Xu, Lixiang Zhao, Haiwen Song, Xinheng Song, Zhaolin Lu, Yu Liu, Min Chen, Eng Gee Lim, Lingyun Yu^*

^*此作品的通讯作者

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

In this work, we introduce Mozualization, a music generation and editing tool that creates multi-style embedded music by integrating diverse inputs, such as keywords, images, and sound clips (e.g., segments from various pieces of music or even a playful cat’s meow). Our work is inspired by the ways people express their emotions—writing mood-descriptive poems or articles, creating drawings with warm or cool tones, or listening to sad or uplifting music. Building on this concept, we developed a tool that transforms these emotional expressions into a cohesive and expressive song, allowing users to seamlessly incorporate their unique preferences and inspirations. To evaluate the tool and, more importantly, gather insights for its improvement, we conducted a user study involving nine music enthusiasts. The study assessed user experience, engagement, and the impact of interacting with and listening to the generated music.

源语言	英语
主期刊名	CHI EA 2025 - Extended Abstracts of the 2025 CHI Conference on Human Factors in Computing Systems
出版商	Association for Computing Machinery
ISBN（电子版）	9798400713958
DOI	http://doi.org/10.1145/3706599.3719686
出版状态	已出版 - 26 4月 2025
已对外发布	是
活动	2025 CHI Conference on Human Factors in Computing Systems, CHI EA 2025 - Yokohama, 日本期限: 26 4月 2025 → 1 5月 2025

出版系列

姓名	Conference on Human Factors in Computing Systems - Proceedings

会议

会议	2025 CHI Conference on Human Factors in Computing Systems, CHI EA 2025
国家/地区	日本
市	Yokohama
时期	26/04/25 → 1/05/25

访问文件

10.1145/3706599.3719686

其它文件与链接

链接到 Scopus 的出版物

引用此

Xu, W., Zhao, L., Song, H., Song, X., Lu, Z., Liu, Y., Chen, M., Lim, E. G., & Yu, L. (2025). Mozualization: Crafting Music and Visual Representation with Multimodal AI. 在 CHI EA 2025 - Extended Abstracts of the 2025 CHI Conference on Human Factors in Computing Systems 文章 407 (Conference on Human Factors in Computing Systems - Proceedings). Association for Computing Machinery. http://doi.org/10.1145/3706599.3719686

@inproceedings{04e2fd4a9dec48a0b877a48ead9bfea4,

title = "Mozualization: Crafting Music and Visual Representation with Multimodal AI",

abstract = "In this work, we introduce Mozualization, a music generation and editing tool that creates multi-style embedded music by integrating diverse inputs, such as keywords, images, and sound clips (e.g., segments from various pieces of music or even a playful cat{\textquoteright}s meow). Our work is inspired by the ways people express their emotions—writing mood-descriptive poems or articles, creating drawings with warm or cool tones, or listening to sad or uplifting music. Building on this concept, we developed a tool that transforms these emotional expressions into a cohesive and expressive song, allowing users to seamlessly incorporate their unique preferences and inspirations. To evaluate the tool and, more importantly, gather insights for its improvement, we conducted a user study involving nine music enthusiasts. The study assessed user experience, engagement, and the impact of interacting with and listening to the generated music.",

keywords = "Multimodal Input, Music Editing, Music Visualization",

author = "Wanfang Xu and Lixiang Zhao and Haiwen Song and Xinheng Song and Zhaolin Lu and Yu Liu and Min Chen and Lim, \{Eng Gee\} and Lingyun Yu",

note = "Publisher Copyright: {\textcopyright} 2025 Copyright held by the owner/author(s).; 2025 CHI Conference on Human Factors in Computing Systems, CHI EA 2025 ; Conference date: 26-04-2025 Through 01-05-2025",

year = "2025",

month = apr,

day = "26",

doi = "10.1145/3706599.3719686",

language = "English",

series = "Conference on Human Factors in Computing Systems - Proceedings",

publisher = "Association for Computing Machinery",

booktitle = "CHI EA 2025 - Extended Abstracts of the 2025 CHI Conference on Human Factors in Computing Systems",

}

Xu, W, Zhao, L, Song, H, Song, X, Lu, Z, Liu, Y, Chen, M, Lim, EG & Yu, L 2025, Mozualization: Crafting Music and Visual Representation with Multimodal AI. 在 CHI EA 2025 - Extended Abstracts of the 2025 CHI Conference on Human Factors in Computing Systems., 407, Conference on Human Factors in Computing Systems - Proceedings, Association for Computing Machinery, 2025 CHI Conference on Human Factors in Computing Systems, CHI EA 2025, Yokohama, 日本, 26/04/25. http://doi.org/10.1145/3706599.3719686

Mozualization: Crafting Music and Visual Representation with Multimodal AI. / Xu, Wanfang; Zhao, Lixiang; Song, Haiwen 等.
CHI EA 2025 - Extended Abstracts of the 2025 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, 2025. 407 (Conference on Human Factors in Computing Systems - Proceedings).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Mozualization

T2 - 2025 CHI Conference on Human Factors in Computing Systems, CHI EA 2025

AU - Xu, Wanfang

AU - Zhao, Lixiang

AU - Song, Haiwen

AU - Song, Xinheng

AU - Lu, Zhaolin

AU - Liu, Yu

AU - Chen, Min

AU - Lim, Eng Gee

AU - Yu, Lingyun

PY - 2025/4/26

Y1 - 2025/4/26

N2 - In this work, we introduce Mozualization, a music generation and editing tool that creates multi-style embedded music by integrating diverse inputs, such as keywords, images, and sound clips (e.g., segments from various pieces of music or even a playful cat’s meow). Our work is inspired by the ways people express their emotions—writing mood-descriptive poems or articles, creating drawings with warm or cool tones, or listening to sad or uplifting music. Building on this concept, we developed a tool that transforms these emotional expressions into a cohesive and expressive song, allowing users to seamlessly incorporate their unique preferences and inspirations. To evaluate the tool and, more importantly, gather insights for its improvement, we conducted a user study involving nine music enthusiasts. The study assessed user experience, engagement, and the impact of interacting with and listening to the generated music.

AB - In this work, we introduce Mozualization, a music generation and editing tool that creates multi-style embedded music by integrating diverse inputs, such as keywords, images, and sound clips (e.g., segments from various pieces of music or even a playful cat’s meow). Our work is inspired by the ways people express their emotions—writing mood-descriptive poems or articles, creating drawings with warm or cool tones, or listening to sad or uplifting music. Building on this concept, we developed a tool that transforms these emotional expressions into a cohesive and expressive song, allowing users to seamlessly incorporate their unique preferences and inspirations. To evaluate the tool and, more importantly, gather insights for its improvement, we conducted a user study involving nine music enthusiasts. The study assessed user experience, engagement, and the impact of interacting with and listening to the generated music.

KW - Multimodal Input

KW - Music Editing

KW - Music Visualization

UR - http://www.scopus.com/pages/publications/105005750040

U2 - 10.1145/3706599.3719686

DO - 10.1145/3706599.3719686

M3 - Conference contribution

AN - SCOPUS:105005750040

T3 - Conference on Human Factors in Computing Systems - Proceedings

BT - CHI EA 2025 - Extended Abstracts of the 2025 CHI Conference on Human Factors in Computing Systems

PB - Association for Computing Machinery

Y2 - 26 April 2025 through 1 May 2025

ER -

Mozualization: Crafting Music and Visual Representation with Multimodal AI

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此