Dcas89 PRO

Dcas89

AI & ML interests

None yet

Recent Activity

reacted to prithivMLmods's post with πŸ‘ 14 days ago
Try the Hugging Face Space demo for https://huggingface.co/Logics-MLLM/Logics-Parsing, the latest multimodal VLM from the Logics Team at Alibaba Group. It enables end-to-end document parsing with precise content extraction in markdown format, and it also generates a clean HTML representation of the document while preserving its logical structure. πŸ€—πŸ”₯ Additionally, I’ve integrated one of my recent works β€” https://huggingface.co/prithivMLmods/Gliese-OCR-7B-Post1.0 β€” which also excels at document comprehension. ⭐ Space / App : https://huggingface.co/spaces/prithivMLmods/VLM-Parsing πŸ“„ Technical Report by the Logics Team, Alibaba Group : https://huggingface.co/papers/2509.19760 πŸ–– MM: VLM-Parsing: https://huggingface.co/collections/prithivMLmods/mm-vlm-parsing-68e33e52bfb9ae60b50602dc ⚑ Collections : https://huggingface.co/collections/prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0 Other Pages: βž” Multimodal VLMs - July'25 : https://huggingface.co/collections/prithivMLmods/multimodal-vlms-until-july25-688312e6b840e1e156f13027 βž” Multimodal VLMs - Aug'25 : https://huggingface.co/collections/prithivMLmods/multimodal-vlms-aug25-68a56aac39fe8084f3c168bd βž” VL caption β€” < Sep 15 ’25 : https://huggingface.co/collections/prithivMLmods/vl-caption-sep-15-25-68c7f6d737985c63c13e2391 . . . To know more about it, visit the app page or the respective model page!!
reacted to MonsterMMORPG's post with πŸ”₯ 14 days ago
Ovi - Generate Videos With Audio Like VEO 3 or SORA 2 - Run Locally - Open Source for Free Download and install : https://www.patreon.com/posts/140393220 Quick demo tutorial : https://youtu.be/uE0QabiHmRw Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation Project page : https://aaxwaz.github.io/Ovi/ SECourses Ovi Pro Premium App Features Full scale ultra advanced app for Ovi - an open source project that can generate videos from both text prompts and image + text prompts with real audio. Project page is here : https://aaxwaz.github.io/Ovi/ I have developed an ultra advanced Gradio app and much better pipeline that fully supports block swapping Now we can generate full quality videos with as low as 8.2 GB VRAM Hopefully I will work on dynamic on load FP8_Scaled tomorrow to improve VRAM even further So more VRAM optimizations will come hopefully tomorrow Our implemented block swapping is the very best one out there - I took the approach from famous Kohya Musubi tuner The 1-click installer will install into Python 3.10.11 venv and will auto download models as well so it is literally 1-click My installer auto installs with Torch 2.8, CUDA 12.9, Flash Attention 2.8.3 and it supports literally all GPUs like RTX 3000 series, 4000 series, 5000 series, H100, B200, etc All generations will be saved inside outputs folder and we support so many features like batch folder processing, number of generations, full preset save and load This is a rush release (in less than a day) so there can be errors please let me know and I will hopefully improve the app Look the examples to understand how to prompt the model that is extremely important RTX 5090 can run it without any block swap with just cpu-offloading - really fast
View all activity

Organizations

None yet