ZP_Test

community
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

MrDragonFoxย 
posted an update 7 months ago
view post
Post
4139
as a few of you know - i am working on a rather more elaborate-tts that can produce more interesting sounds in context of rp

early sneak peak is here -

MrDragonFox/mOrpheus_3B-1Base_early_preview-v1-25000

its based on orpheus - but really the model is irrelevant as i focus mostly on data augmentation / prep / pipelineing - its just the way to show progress

should be able to express fine even in a sfw context

probably the last release for a few weeks as i go back to the data pipeline and improve there ..

in the mean time, please do test and report problems or enjoyable generations you found - we have a growing discord community and i love to see what you get out of that early release !

(small colab is provided on the model page if you dont have the gpu to run that your self)
MrDragonFoxย 
posted an update 7 months ago
view post
Post
5217
yet a other audio datasets pre classified for events + audio aestetics

this time for german - 680h sampled from emilia yodas

timestamps for asr training or other fancier things available as nc in the raw repo

MrDragonFox/DE_Emilia_Yodas_680h

cc by 4.0 as by emilia yodas

raw events / transcriptions are cc by NC 4.0

MrDragonFox/DE_Emilia_Yodas_680h_raw_timestamps

the coming days i should push about 600h english + some japanese too same format
MrDragonFoxย 
posted an update 7 months ago
view post
Post
2150
did a small emotive classified test dataset for all the tts tuners out there

MrDragonFox/Elise

3h total mit - single speaker voice

dataset is a copy of an existing one just added the emotional tags over 1200 samples - should be good enough to test if emotional tags stick in your finetune
  • 1 reply
ยท
xianbaoย 
posted an update about 1 year ago
view post
Post
2579
With the open-weight release of CogVideoX-5B from THUDM, i.e. GLM team, the Video Generation Model (how about calling it VGM) field has officially became the next booming "LLM"

What does the landscape look like? What are other video generation models? This collection below is all your need.

xianbao/video-generation-models-66c350163c74f60f5c412af6

The above video is generated by @a-r-r-o-w with CogVideoX-5B, taken from a nice lookout for the field!
xianbaoย 
posted an update over 1 year ago
view post
Post
2047
Why Apache 2.0 Matters for LLMs ๐Ÿค”

@01AI_Yi recently switched from a permissive & commercially friendly license, to Apache 2.0. And the community loved it! ๐Ÿš€

@JustinLin610 also had a poll on model license and the majority votes for Apache 2.0.

Why it is a Big Deal? โฌ‡๏ธ

๐Ÿ“š Legal Simplicity: Custom licenses need costly & time-consuming legal review. Apache 2.0 is well-known & easier for legal teams to handle.

๐Ÿ‘ฉโ€๐Ÿ’ป Developer-Friendly: Legal docs are a pain for devs! Apache 2.0 is well-known and tech-friendly, making it easier for non-native developers to understand the implications too.

๐Ÿ”— Easier Integration: Apache 2.0 is compatible with many other licenses, simplifying tasks like model merging with models of different licensing requirements.

๐Ÿšซ No Permission Needed: Custom licenses often require explicit permission and additional documentation work of filling forms, creating barriers. Apache 2.0 removes this hurdle, letting devs focus on innovation.

There are a lot interesting discussions from
@JustinLin610 's poll: https://x.com/JustinLin610/status/1793559737482764375 which inspired this thread.

Any other thoughts? Let me know ^^
  • 1 reply
ยท
xianbaoย 
posted an update over 1 year ago
view post
Post
1319
DeepSeekV2 is a big deal. Not only because its significant improvements to both key components of Transformer: the Attention layer and FFN layer.

It has also completed disrupted the Chines LLM market and forcing the competitors to drop the price to 1% of the original price.

---

There are two key components in Transformer architecture: the self-attention layer, which captures relationships between tokens in context, and the Feed-Forward Network (FFN) layer, which stores knowledge.

DeepSeek V2 introduces optimizations to both:

Attention layer normally uses KV Cache to reduce repetitive compute, but it consumes significant GPU RAM, limiting concurrent requests. DeepSeek V2 introduces Multi-head Latent Attention (MLA), which stores only a small latent representation, resulting in substantial RAM savings.

DeepSeek V2 utilizes 162 experts instead of the usual 8 as in Mixtral. This approach segments experts into finer granularity for higher specialization and more accurate knowledge acquisition. Activating only a small subset of experts for each token, leads to efficient processing.

It disrupted the market by dropping API prices to $0.14 per 1M tokens. This dramatic reduction forced competitors like GLM, Ernie, and QWen to follow suit, lowering their prices to 1% of their original offerings. Now, users can access these APIs at 1/35th the cost of ChatGPT-4o.
xianbaoย 
posted an update over 1 year ago
awniย 
posted an update over 1 year ago
view post
Post
First HF social post:

pip install -U mlx
  • 2 replies
ยท
xianbaoย 
posted an update over 1 year ago
view post
Post
Welcome Bunny! A family of lightweight but powerful multimodal models from BAAI

With detailed work on dataset curation, the Bunny-3B model built upon SigLIP and Phi-2 achieves performance on par with 13B models.

Model: BAAI/bunny-phi-2-siglip-lora

  • 2 replies
ยท
xianbaoย 
posted an update over 1 year ago
view post
Post
There appears to be a huge misunderstanding regarding the licensing requirements for open sourced Chinese speaking speaking LLMs on
@huggingface


I initially shared this misconception too, but after conducting some research, I came up with the list below.

Veryimpressive!