Text-to-Speech
Safetensors
English
Chinese

Update model card for PresentAgent

#11
by nielsr HF Staff - opened

This PR updates the model card for the ByteDance/MegaTTS3 repository to reflect its association with and hosting of components for PresentAgent: Multimodal Agent for Presentation Video Generation.

The changes include:

  • Updating the model description to focus on PresentAgent, as detailed in its paper (PresentAgent: Multimodal Agent for Presentation Video Generation).
  • Adding the correct library_name as transformers (due to the presence of Qwen2 model components) and tags: multimodal-agent.
  • Maintaining the existing pipeline_tag: text-to-speech, as explicitly requested in the task.
  • Providing direct links to the paper, the official GitHub repository, and the Colab demo.
  • Incorporating comprehensive sections on PresentAgent's introduction, setup and usage instructions, benchmark details, experiment results, contribution guidelines, and acknowledgements, all sourced from the official GitHub repository.
  • Retaining the repository's license and security information.
  • Updating the BibTeX Entry and Citation Info to include the PresentAgent paper's citation alongside the existing MegaTTS3 and Wavtokenizer citations, reflecting their roles as integral components.

This update aims to provide a more accurate and comprehensive overview of the artifact hosted here, aligning it with the associated research paper.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment