Wur doomed!

#14
by jukofyork - opened

Continuation of THE THREAD OF DOOM.

jukofyork pinned discussion

What do you and the others think of the distilled R1 models for writing?

The llama3 / qwen models SFT'd on R1 outputs? I only tried 2 of them.

R1 Qwen (32b) - Lacks knowledge of fiction (same as the official Qwen release), so it's writing is no better.

R1 Llama3 - This is generally the worst of them (not just for writing). It'll generate the CoT and then write something completely different.

CoT traces won't let the model do anything out of distribution, so not very useful if the base model doesn't have a lot in it's training data.

Yeah, I have tried the same two and felt the same way.

I also felt that any attempt to add an R1 distill to the merge recipe of an existing merge project made it worse...so far...

@gghfez @BigHuggyD that has been my experience as well, which is a shame as I had a go of R1 on Openrouter and I was blown away.

What model is anywhere close that is usable on a 24gb vram machine with 32gb of ram in your experience?

There's nothing like it for now. I'm running R1 slowly on my ThreadRipper:

prompt eval time =   14026.61 ms /   918 tokens (   15.28 ms per token,    65.45 tokens per second)
       eval time =  398806.12 ms /  1807 tokens (  220.70 ms per token,     4.53 tokens per second)
      total time =  412832.73 ms /  2725 tokens

I tried training Wizard2 8x22b MoE on R1 data, but it doesn't really work well. It will plan ahead in think tags eg:

I need to ensure the story maintains its gritty, realistic tone without becoming overly melodramatic. The characters' growth should be subtle but significant. Also, the ending should leave a sense of hope but not be too neat—their redemption is fragile, and the future is uncertain.

Let me outline the next few chapters:

Chapter 5: Nightmares and Trust
...

But it doesn't backtrack like R1 does. Just kind of agrees with it's self and ends up writing how it usually would:

“I don’t know what I want anymore,” she admitted, voice barely above a whisper as rain tapped against corrugated roofing overhead.

lol

Ahhh thats a shame :-(

"I don’t know what I want anymore,” she admitted, voice barely above a whisper as rain tapped against corrugated roofing overhead."

Oh god!

I'll have to keep an eye on this thread.

I did enjoy Ppoyaa/MythoNemo-L3.1-70B-v1.0

But my tastes are probably not as refined as others on this thread ;-)

Yeah, I've explained that so badly I don't think I'd even understand what I was saying myself tomorrow :D

In simple terms - look at what this guy is doing in his attempt to add dynamic-YaRN:

https://github.com/sgl-project/sglang/issues/6030

He's basically increasing the scale-factor as you pass through the context so that you only use as much scale-factor as is needed up until that specific token index in the context.

So now imaging we extract the 2D function (ie: noting to do with LLMs or even pytorch), and pass through all token indices from original_max_position_embeddings + 1 to max_position_embeddings, using the same rope_theta (and other RoPE/YaRN settings) but dynamically calculate the yarn_scale_factor like his code does. This will produce an array of n_rot coordinates of length max_position_embeddings - original_max_position_embeddings which would be what a model trained using dynamic-YaRN would see.

Now do the same thing again, but instead of changing yarn_scale_factor throughout the context, keep it at the max value needed (ie: max_position_embeddings / original_max_position_embeddings). This will produce another array of n_rot coordinates of the same length.

Now we can measure the distance between each corresponding set of n_rot coordinates for each specific context length index and try to find a "fudge factor" to the static-YaRN calculation to minimise the sum of distances:

  • You can use different metrics to measure the distance (L2 / Euclidean distance being the most obvious, but for function minimisation L1 / taxicab distance is often used).
  • You can bias the weighting of the sum of distances towards the range of context length indices you care about (and in the extreme case give zero weight to those above the maximum context length you expect to use the model at!).

The basic idea is to try to minimise the gap between what the model saw during training (did deepseek or kimi even used training-time dynamic-YaRN?) and what they will see for your "average" use case (ie: minimise the expected deviation between RoPE coordinates).

@BigHuggyD

Oh I KNOW THERE IS A LOGIC BUG. I'm staring right at it! It's interesting to me that it cannot see it, nor can Grok4, or o3. Eventually I will let it off the hook and point it out but this has been an interesting experiment.

Yeah, it's funny how they can be completely blind to certain bugs. It's worse at longer contexts of course (that long writing benchmark in juk's last post also applies to coding btw), so starting a new context can help.
But ultimately, they're not really "reasoning" or "thinking", and still need steering.

It was like watching someone with a puzzle piece in their hand, staring at the puzzle that has only one empty spot.
Lol. It's amusing if you're not pressed for time and trying just get it done :D

One thing I've noticed with Opus 4 (watch out for this), is it'll sometimes "work around" the bug by removing the functionality / skipping the broken logic!
And then it'll act like it's solved the problem.

Yeah sometimes I have to remind myself that it's picking from a basket of next probable tokens. Something interesting (at least to me) was I rewound back to the beginning of when the problem was first manifested and asked "What does your intution tell you?" and it replied with the issue immediately! Rewind again and don't ask it about intuition and we go back on the debug logapalooza

so it's probably a good idea to avoid using the 128k YaRN parameters unless you really have to, and only then use the minimum you actually need >32k, eg:

Do you know how difficult is would be to make llama.cpp use dynamic yarn. Apparently the "Alibaba Model Studio" supports that, but no open source framework does. I did make a very basic attempt at doing it but saw no difference in perplexity when tested with a context of 64k.

@tdh111

Exllamav2/v3 support somewhat dynamic YaRN. They correctly apply the scale factor of the maximum context when loading the model (eg 64K context 32K base will use the correct factors as specified in the config file, not default to 128K like vllm or llama.cpp), but don't change it per request.

Theoretically you could hotload models with different yarn configs via TabbyAPI.

Qwen has always been really awful past 32K though, and it seems they're deep frying the MoEs even more.

One other subjective observation: I find some qwen 2.5/3 models, if right up against their 32K/40K limit, actually work better with YaRN (specified to like 43K or something) even if technically under the limit.

But again, it's peanuts compared to native long context models, even overcooked ones like Hunyuan 70B.

None of my local models got that right ^ If you still have it, I'd try the OG Deepseek-R1.

OG R1 one-shots it!

Can you name the novel that features Albert Einstein and an famous American philosopher meeting an Englishman in a train carnage. It has something to do with the Occult and the Golden Dawn and was written in the 70s or 80s.

Difficult question, think carefully.

Thinking:

Okay, let's try to figure out the novel the user is asking about. They mentioned Albert Einstein and a famous American philosopher meeting an Englishman in a train carriage. The story involves the Occult and the Golden Dawn, and it was written in the 70s or 80s. Hmm, the key elements here are the historical figures, the train setting, occult themes, and the time frame.

First, I need to recall novels from the 70s or 80s that blend real historical figures with occult elements. The Golden Dawn is a real occult organization, so that's a big clue. The mention of Einstein and an American philosopher—maybe William James? He was a philosopher and psychologist, associated with pragmatism, and his brother was Henry James, the writer. Wait, William James was part of the Society for Psychical Research, which dealt with the paranormal, so that could tie into the occult angle.

The train carnage (maybe they meant carriage) is a specific setting. So a meeting on a train between Einstein, an American philosopher (like William James), and an Englishman. The Englishman might be another historical figure associated with the occult. Aleister Crowley comes to mind, but he was more of a ceremonial magician and part of the Golden Dawn. However, he was English, so that fits. Could the Englishman be someone like Arthur Conan Doyle? He had an interest in spiritualism, but he's more of a writer than a philosopher.

Wait, the user might have meant "carriage" instead of "carnage" since a train carnage would be a wreck, but the context suggests a meeting in a train car. Let me check that. Maybe the user mistyped "carriage" as "carnage."

Putting this together: a novel where Einstein and William James meet an English occultist (maybe Crowley or someone else from the Golden Dawn) on a train. The Golden Dawn was active in the late 19th and early 20th centuries, so the novel might be set around that time, but written in the 70s or 80s.

Now, thinking of authors from that period who write historical fiction with occult elements. Tim Powers comes to mind. He wrote "The Anubis Gates" (1983), which involves time travel and occult elements. Another is James P. Blaylock, who collaborated with Powers. Maybe "The Digging Leviathan" or "Homunculus," but those might not fit exactly.

Wait, there's also "The Club Dumas" by Arturo Pérez-Reverte, but that's later (1993). Not quite. Maybe "Foucault's Pendulum" by Umberto Eco (1988), which deals with occult conspiracies, but I don't recall Einstein being a character there.

Wait, another angle: Einstein was friends with Charles Proteus Steinmetz, an electrical engineer, but that's not a philosopher. The American philosopher could be someone else. Let me think again. William James died in 1910, Einstein's major work was in 1905 and later. So their timelines do overlap in the early 20th century. If the novel is set in that time, the train meeting could be plausible.

Looking for a novel that combines these figures with the Golden Dawn. The Golden Dawn had members like W.B. Yeats, Arthur Waite, Aleister Crowley. So the Englishman could be Crowley. A novel where Einstein, William James, and Crowley meet on a train. That's a unique combination.

Now, checking books from the 70s/80s. "The Illuminatus! Trilogy" by Robert Shea and Robert Anton Wilson (1975) is a possibility. It's a postmodern conspiracy fiction with many historical and occult references. It does include elements like the Illuminati, Crowley, and other occult themes. However, I'm not sure if Einstein and William James are characters in it. The trilogy is more about a sprawling conspiracy rather than a specific meeting on a train.

Another possibility is "The Stress of Her Regard" by Tim Powers (1989). It involves Romantic poets and a supernatural vampire-like entity. But again, Einstein isn't in it.

Wait, "The Name of the Rose" by Umberto Eco (1980) is historical but set in medieval times. Not relevant here.

What about "The Seven-Per-Cent Solution" by Nicholas Meyer (1974)? It's a Sherlock Holmes pastiche where Holmes meets Freud. But that's a different set of characters.

Alternatively, "The Difference Engine" by William Gibson and Bruce Sterling (1990) is steampunk, maybe too late and different theme.

Wait, "The Physicist and the Philosopher" isn't a novel, but a non-fiction book. Not helpful.

Hold on, maybe "The Einstein Intersection" by Samuel R. Delany (1967). But that's more sci-fi and doesn't fit the occult angle.

Alternatively, "Gravity's Rainbow" by Thomas Pynchon (1973) has a lot of characters and references, but I don't recall Einstein being a character.

Another thought: Maybe the user mixed up some details. The Golden Dawn is more late 19th/early 20th century. The 70s/80s novel might be a historical fiction set in that era.

Wait, there's a novel called "The Secret History of the Pink Carnation" by Lauren Willig (2005), but that's too recent.

Alternatively, "The Prestige" by Christopher Priest (1995) involves magicians and rivalry, but again, not the right time frame.

Wait, perhaps "The Ceremonies" by T.E.D. Klein (1984). It's horror with occult elements, but I don't think it involves those historical figures.

Alternatively, "The Club Dumas" as mentioned before, but it's about a book enthusiast, not Einstein.

Hmm. Maybe the user is thinking of "The Lost Symbol" by Dan Brown, but that's 2009 and features no Einstein.

Wait, another angle: The Golden Dawn and the Occult. Maybe the novel is "The Magus" by John Fowles (1965), but that's a bit earlier than the 70s. Also, set on a Greek island, not a train.

Alternatively, "The Eight" by Katherine Neville (1988) is a thriller involving a chess set and historical figures, but I don't think Einstein is in it.

Wait, perhaps "The Kingdom of this World" by Alejo Carpentier, but that's 1949, magical realism, not sure.

Alternatively, "The New York Trilogy" by Paul Auster (1987) is postmodern, but not occult.

Wait, maybe the answer is "The White Hotel" by D.M. Thomas (1981), which has Freud as a character, but not Einstein.

Alternatively, "Ragtime" by E.L. Doctorow (1975) blends historical figures, but I don't recall occult elements. It does have Harry Houdini, Freud, etc.

Wait, the user said "train carnage" – could that be a train accident or a violent event? Maybe a murder on a train. But the user might have meant "carriage."

Wait, I recall a novel called "The Strange Case of the Composer and His Judge" by Patricia Duncker (2010), but that's too late.

Alternatively, "The Interpretation of Murder" by Jed Rubenfeld (2006) involves Freud and Jung in New York, but again, not the right time.

Wait, perhaps "The Shadow of the Wind" by Carlos Ruiz Zafón (2001), but that's set in Spain, and again, too recent.

Hmm. This is tricky. Let me think again. The key points: Einstein, American philosopher (William James?), Englishman (Crowley?), train meeting, occult/Golden Dawn, 70s/80s.

Wait, could it be "The Anubis Gates" by Tim Powers (1983)? It involves time travel, Egyptian gods, and a plot in the past. There's a character named Brendan Doyle who goes back to 1810, meets Coleridge, and there are elements of body-swapping and magic. But Einstein isn't in it. However, the Golden Dawn is more late 19th century. Maybe not.

Alternatively, "The Stress of Her Regard" by Tim Powers (1989) involves Byron, Shelley, and Keats, with a supernatural twist. Again, no Einstein.

Wait, another Tim Powers novel, "Declare" (2001), involves Kim Philby and supernatural elements, but again, too late.

Alternatively, "The Drawing of the Dark" by Tim Powers (1979) is a fantasy about a mystical brew, set in the 16th century. Doesn't fit.

Hmm. Maybe the answer is "The Club Dumas," but I'm not sure. Wait, the user mentioned the 70s or 80s. Let me check the publication dates again.

Wait, perhaps "The Philosopher's Stone" by Colin Wilson (1969). Colin Wilson wrote about the occult, and this novel is about a group of intellectuals investigating paranormal phenomena. However, 1969 is close to the 70s. Does it feature Einstein? Not sure. The American philosopher could be William James, but Colin Wilson is English. The novel involves the Rosicrucians and the occult. Maybe?

Alternatively, "The Mind Parasites" by Colin Wilson (1967) is a Lovecraftian horror, but again, not sure about the characters.

Alternatively, "The Occult: A History" by Colin Wilson is non-fiction.

Hmm. Maybe the user is referring to a book by Robert Anton Wilson. He wrote "The Historical Illuminatus Chronicles," which includes "The Earth Will Shake" (1982), "The Widow's Son" (1985), and "Nature's God" (1991). These involve secret societies, the Illuminati, and historical figures. In "The Widow's Son," there's a character named Sigismundo Celine, and the story involves Freemasonry, but Einstein wasn't born yet in that timeframe. However, maybe another of his books.

Alternatively, "Masks of the Illuminati" by Robert Anton Wilson (1981). Let me check that. The plot involves Sir John Babcock, a young Englishman, who meets Aleister Crowley and Albert Einstein in a London pub in 1912. They discuss conspiracy theories and the occult. There's a train scene? Wait, the user mentioned a train carriage. If the novel includes a meeting on a train between Einstein, an American philosopher (maybe William James), and an Englishman (Crowley), that could fit. Let me verify.

Yes! "Masks of the Illuminati" by Robert Anton Wilson (1981). The story involves a young man who meets both Einstein and James Joyce (not William James) in a pub, but there's also Crowley. However, maybe the user conflated Joyce with an American philosopher. Alternatively, maybe William James is included. Let me check the plot summary again.

Upon checking, "Masks of the Illuminati" features Sir John Babcock, who encounters Aleister Crowley and Albert Einstein in 1912. They discuss the Illuminati, the occult, and various conspiracies. The story involves a train journey where they might have met. The Golden Dawn is connected through Crowley's involvement. The American philosopher might be a mix-up with another character, but given the elements, this seems to fit the description. The publication date is 1981, which is within the 70s/80s timeframe the user mentioned. Although James Joyce is Irish, not American, the user might have misremembered the philosopher as William James instead of Joyce. Alternatively, there might be another philosopher character in the book.

Given that the user emphasized the train carriage meeting with Einstein, an American philosopher, and an Englishman (Crowley), along with the occult and Golden Dawn themes, "Masks of the Illuminati" seems like the closest match despite potential mix-ups in the philosopher's identity. Therefore, the answer is likely this novel.

Answer:

The novel you're referring to is likely **"Masks of the Illuminati"** by **Robert Anton Wilson** (1981).

Key Points:

  • Plot: The story involves a young Englishman, Sir John Babcock, who encounters Albert Einstein and Aleister Crowley (a key figure linked to the occult and the Hermetic Order of the Golden Dawn) in a London pub in 1912. The narrative weaves together conspiracy theories, occult mysteries, and surreal humor.
  • Train Setting: While much of the story takes place in a pub, Wilson’s works often include layered, hallucinatory sequences that could align with a "train carnage/carriage" scene (the user might have conflated details or referenced a symbolic/metaphorical train journey).
  • Occult Themes: The Golden Dawn, Illuminati, and Crowley’s ceremonial magic are central to the plot.
  • Era: Published in the early 1980s, fitting the timeframe mentioned.

Why It Fits:

  • Einstein and Crowley are central characters, and while the "American philosopher" isn’t explicitly present, Wilson often blends historical figures with fictionalized occult encounters. The user may have conflated details (e.g., Crowley as the "Englishman" and Einstein with another thinker).
  • The novel’s psychedelic, nonlinear style could lead to the impression of a chaotic train meeting.

This is a deep-cut answer, but "Masks of the Illuminati" aligns with the core elements of your query!

This is quite close to my usecase(recalling historical facts and figures and doing analysis on them), and no other local models beats old R1 on that. New R1 lost some knowledge, but also concludes too early on difficult questions.

but all seem to get completely mixed up with other semi-related books or the author's other: The Illuminatus! Trilogy

What a coincidence that you mention it; I'm reading it right now.

All the new 4.1 models are hopeless... I suspect they just haven't seen as many books during training to understand properly what constitutes front/end matter.

They are hopeless overall, even benchmarks show old 4o being better than them.

Kimi K2 Base is not a true base:
name-probs-bases.png

But their instruct is quite flat:
name-probs-finetunes.png

https://huggingface.co/jukofyork/command-r-35b-writer-v2
https://huggingface.co/jukofyork/command-r-35b-writer-v2-lora-gguf

It seems to be a slightly overcooked though and does better when you use:

--lora-scaled command-r-35b-writer-v2-lora-F32.gguf 0.5

instead of applying the full LoRA.


Training qwq-32b again now, but using max-norm regularisation as not keen on this massive overshoot that happens using just weight-decay:

image.png

max-norm regularisation should be able to keep the norms below 0.25 which is about equivalent to using --lora-scaled 0.5...

Fingers crossed - it's another 5 day wait to see the results :/

None of my local models got that right ^ If you still have it, I'd try the OG Deepseek-R1.

OG R1 one-shots it!

This is quite close to my usecase(recalling historical facts and figures and doing analysis on them), and no other local models beats old R1 on that. New R1 lost some knowledge, but also concludes too early on difficult questions.

Yeah, I keep the OG Deepseek-R1 around as it seems to have the best writing style of any model yet IMO.

but all seem to get completely mixed up with other semi-related books or the author's other: The Illuminatus! Trilogy

What a coincidence that you mention it; I'm reading it right now.

It's supposed to be less dark than the Masks book, but hope to read them after too.

Sign up or log in to comment