Working?

by yano2mch - opened 13 days ago

13 days ago

Currently giving this a try and so far it seems to be working. Temperature of 0.75 is too high as i started getting weird output, but 0.7 seems promising.

I usually use some trusted 70B models but this was on my list. I'll give some more feedback after i use it in an RP or two.

(using i1-Q6_K model in case that's important)

But first outputs look promising... Unless it barfs really badly i'll try it most of tomorrow/today.

jukofyork

Owner 13 days ago

•

edited 13 days ago

Thanks - it should be pretty good if the online merge tool works, but I didn't want to make a proper page for it until I know it's working correctly... I have really bad internet upload speed so being able to merge online is a big plus if it's working!

These first 3 writer models do seem to have one strange quirk though: they add extra space(s) at the end of some paragraphs before the double newlines. It think I know what's caused it and hopefully the next iterations won't do this.

yano2mch

13 days ago

These first 3 writer models do seem to have one strange quirk though: they add extra space(s) at the end of some paragraphs before the double newlines. It think I know what's caused it and hopefully the next iterations won't do this.

Sounds like a minor problem, i haven't seen those myself. Course maybe SillyTavern cleans it, or maybe it's copying context so if you have it present a few times it will duplicate it within the replies.

So far I've had 1 oddity where it gave a Japanese word then it's English translation "指尖(fingertip)"

I'm also suddenly seeing a lot of tildes ~ which results in strike-throughs. Again a very minor problem. Almost seems like it doesn't know how to do hearts since the RP got really lovey-dovey all the sudden.

jukofyork

Owner 13 days ago

It might be worth using a small amount of min-p (eg: 0.01 to 0.05) if you aren't already, as even the original command-r:35b model this was trained from will sometimes output odd bits of weird characters without it.

yano2mch

13 days ago

•

edited 13 days ago

It might be worth using a small amount of min-p (eg: 0.01 to 0.05) if you aren't already, as even the original command-r:35b model this was trained from will sometimes output odd bits of weird characters without it.

Using SillyTavern i see Top-P (set to 1) but not Min-P. So i have no idea what internal setting its using.

Regardless i'll be interested in any revisions you have later :)

yano2mch

12 days ago

•

edited 12 days ago

Alright so got the occasional blip of oddities at 0.7, while 0.65 seems to be fine so i would wager 0.67 or about there would be safe too.

Formatting doesn't always work right, asterisks placed in the wrong spots, but that might be part of typos in the character card and it's just emulating the style.

Otherwise, i am rather enjoying the output. Seems like a very promising model :)

edit: Nope at 0.65 and got random arabic characters. It's rare enough you can work around it...

jukofyork

Owner 12 days ago

There will be a new version in a couple of days, so hopefully what will address some of the problems. I'll link it in here when it's ready.

yano2mch

11 days ago

Looking forward to it. Aside from the random non-english interjection, it felt like a pretty decent model.

jukofyork

Owner 7 days ago

I've set it off training today using a much larger and more diverse dataset and 4x the trainable parameters. If all goes well I should have the new model in around 6 days.

yano2mch

7 days ago

Looking forward to it :)

jukofyork

Owner about 17 hours ago

https://huggingface.co/jukofyork/command-r-35b-writer-v2

It still seems a bit weird and adds spaces around newlines, but appears to be working.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment