Publications

Empathetic Response in Audio-Visual Conversations Using Emotion Preference Optimization and MambaCompressor

Yeonju Kim, Se Jin Park, and Yong Man Ro

IEEE Transactions on Affective Computing (IEEE TAFFC 2026)

[paper]

Long-Form Speech Generation with Spoken Language Models

Se Jin Park', Julian Salazar', Aren Jansen, Keisuke Kinoshita, Yong Man Ro, and RJ Skerry-Ryan

International Conference on Machine Learning (ICML 2025)

Oral Presentation
[paper | demo | data]

MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens

Jeonghun Yeo*, Hyeongseop Rha*, Se Jin Park, and Yong Man Ro

Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL Findings 2025)
[paper | code]

AV-EmoDialog: Chat with Audio-Visual Users Leveraging Emotional Cues

Se Jin Park, Yeonju Kim, Hyeongseop Rha, Bella Godiva, and Yong Man Ro

Arxiv Preprint, 2025

[paper]

Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation

Minsu Kim*, Jeonghun Yeo*, Se Jin Park, Hyeongseop Rha, and Yong Man Ro

The Association for Computing Machinery's Annual Conference on Multimedia, (ACMMM 2024)
[paper]

Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation

Se Jin Park*, Chae Won Kim*, Hyeongseop Rha, Minsu Kim, Joanna Hong, Jeonghun Yeo, and Yong Man Ro

Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL 2024)

Oral Presentation

Outstanding Paper Award

[paper | data | demo]

AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation

Jeongsoo Choi*, Se Jin Park*, Minsu Kim*, and Yong Man Ro

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024)
Highlight Presentation

[paper | demo | code]

Persona Extraction Through Semantic Similarity For Emotional Support Conversation
Generation

Seunghee Han, Se Jin Park, Chae Won Kim, and Yong Man Ro

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2024)

[paper]

Exploring Phonetic Context in Lip Movement for Authentic Talking Face Generation

Se Jin Park, Minsu Kim, Jeongsoo Choi, and Yong Man Ro

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2024)

[paper | demo]

Text-driven Talking Face Synthesis By Reprogramming Audio-driven Models

Jeongsoo Choi, Minsu Kim, Se Jin Park, and Yong Man Ro

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2024)

[paper | demo]

Intuitive Multilingual Audio-Visual Speech Recognition with a Single-Trained Model

Joanna Hong, Se Jin Park, and Yong Man Ro

Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2023)
[paper]

DF-3DFace: One-to-Many Speech Synchronized 3D Facial Animation with Diffusion

Se Jin Park, Joanna Hong, Minsu Kim, and Yong Man Ro

Arxiv Preprint, 2023
[paper]

SyncTalkFace: Talking Face Generation with Precise Lip-syncing via Audio-Lip Memorㅛ

Se Jin Park, Minsu Kim, Joanna Hong, Jeongsoo Choi, and Yong Man Ro

AAAI Conference on Artificial Intelligence (AAAI 2022)
Oral Presentation

[paper]

Multi-Modality Associative Bridging Through Memory: Speech Sound Recollected From Face Video Speech Reconstruction with Reminiscent Sound via Visual Voice Memory

Minsu Kim*, Joanna Hong*, Se Jin Park, Yong Man Ro

IEEE/CVF International Conference on Computer Vision (ICCV 2021)

[paper]

Speech Reconstruction with Reminiscent Sound via Visual Voice Memory

Joanna Hong, Minsu Kim, Se Jin Park, Yong Man Ro

IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP 2021)

[paper]

Cromm-vsr: Cross-modal Memory Augmented Visual Speech Recognition

Minsu Kim, Joanna Hong, Se Jin Park, Yong Man Ro

IEEE Transactions on Multimedia (TMM 2021)

[paper]

Page updated

Report abuse

Publications

Kyung Hee University

@ All rights reserved.