EasyGen: Easing Multimodal Generation with BiDiffuser and LLMs

Zhao, Xiangyu; Liu, Bo; Liu, Qijiong; Shi, Guangyuan; Wu, Xiao-Ming

Full-text links:

Download:

Current browse context:

cs.AI

< prev | next >

new | recent | 2310

Computer Science > Artificial Intelligence

Title: EasyGen: Easing Multimodal Generation with BiDiffuser and LLMs

Authors: Xiangyu Zhao, Bo Liu, Qijiong Liu, Guangyuan Shi, Xiao-Ming Wu

(Submitted on 13 Oct 2023 (v1), last revised 17 May 2024 (this version, v3))

Abstract: We present EasyGen, an efficient model designed to enhance multimodal understanding and generation by harnessing the capabilities of diffusion models and large language models (LLMs), Unlike existing multimodal models that predominately depend on encoders like CLIP or ImageBind and need ample amounts of training data to bridge modalities,EasyGen leverages BiDiffuser,a bidirectional conditional diffusion model, to foster more efficient modality interactions. Easygen achieves text generation by training a projection layer linking BiDiffuser and an LLM, and facilities image generation by training an adapter to align the LLM's text space with the BiDiffuser's image space, Comprehensive quantitative and qualitative experiments show that EasyGen excels in data-efficient training, high-quality image generation, and extendibility, effectively addressing the challenges in multimodal generation. The source code is available at this https URL

Comments:	Accepted by ACL 2024, main conference
Subjects:	Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2310.08949 [cs.AI]
	(or arXiv:2310.08949v3 [cs.AI] for this version)

Submission history

From: Xiangyu Zhao [view email]
[v1] Fri, 13 Oct 2023 08:38:56 GMT (11592kb,D)
[v2] Tue, 20 Feb 2024 06:54:50 GMT (4024kb,D)
[v3] Fri, 17 May 2024 08:30:18 GMT (5653kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2310.08949

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Artificial Intelligence

Title: EasyGen: Easing Multimodal Generation with BiDiffuser and LLMs

Submission history