AV-EmoDialog: Chat with Audio-Visual Users Leveraging Emotional Cues
Se Jin Park, Yeonju Kim, Hyeongseop Rha, Bella Godiva, and Yong Man Ro
Arxiv Preprint, 2025
[paper]
Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation
Minsu Kim*, Jeonghun Yeo*, Se Jin Park, Hyeongseop Rha, and Yong Man Ro
The Association for Computing Machinery's Annual Conference on Multimedia, (ACMMM 2024)
[paper]
Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation
Se Jin Park*, Chae Won Kim*, Hyeongseop Rha, Minsu Kim, Joanna Hong, Jeonghun Yeo, and Yong Man Ro
Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL 2024)
Oral Presentation
Outstanding Paper Award
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
Jeongsoo Choi*, Se Jin Park*, Minsu Kim*, and Yong Man Ro
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024)
Highlight Presentation
[paper | demo | code]
Persona Extraction Through Semantic Similarity For Emotional Support Conversation
Generation
Seunghee Han, Se Jin Park, Chae Won Kim, and Yong Man Ro
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2024)
[paper]
Intuitive Multilingual Audio-Visual Speech Recognition with a Single-Trained Model
Joanna Hong, Se Jin Park, and Yong Man Ro
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2023)
[paper]
DF-3DFace: One-to-Many Speech Synchronized 3D Facial Animation with Diffusion
Se Jin Park, Joanna Hong, Minsu Kim, and Yong Man Ro
Arxiv Preprint, 2023
[paper]
SyncTalkFace: Talking Face Generation with Precise Lip-syncing via Audio-Lip Memorㅛ
Se Jin Park, Minsu Kim, Joanna Hong, Jeongsoo Choi, and Yong Man Ro
AAAI Conference on Artificial Intelligence (AAAI 2022)
Oral Presentation
[paper]
Multi-Modality Associative Bridging Through Memory: Speech Sound Recollected From Face Video Speech Reconstruction with Reminiscent Sound via Visual Voice Memory
Minsu Kim*, Joanna Hong*, Se Jin Park, Yong Man Ro
IEEE/CVF International Conference on Computer Vision (ICCV 2021)
[paper]
Speech Reconstruction with Reminiscent Sound via Visual Voice Memory
Joanna Hong, Minsu Kim, Se Jin Park, Yong Man Ro
IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP 2021)
[paper]
Cromm-vsr: Cross-modal Memory Augmented Visual Speech Recognition
Minsu Kim, Joanna Hong, Se Jin Park, Yong Man Ro
IEEE Transactions on Multimedia (TMM 2021)
[paper]