Text Generation
Transformers
Safetensors
llama
conversational
text-generation-inference

License Conflict: llama3.1 vs CC BY-NC 4.0

#4
by Schilder - opened

Hi, I’d like to report a potential license conflict in Psychotherapy-LLM/PsychoCounsel-Llama3-8B. According to the model card, it appears that part of the training data used for this model comes from datasets licensed under Creative Commons Attribution-NonCommercial 4.0 (CC BY-NC 4.0). This combination introduces a potential license compatibility issue, since LLaMA 3.1 strictly regulates redistribution and sublicensing, while CC BY-NC 4.0 imposes a NonCommercial restriction and attribution requirement that may propagate to derivative works.

⚠️ Key incompatibilities:

LLaMA 3.1 – Section 2 (Sublicensing and Relicensing):
  • Does not allow relicensing under other licenses (e.g., CC BY-NC)
  • Requires derivative models to preserve original terms and restrictions
  • Commercial use is restricted based on MAU thresholds

CC BY-NC 4.0 – Core Clauses:
  • Prohibits any commercial use of the dataset or derivative works
  • Requires clear attribution to dataset creators
  • Intended to propagate “non-commercial only” conditions to downstream outputs

Using CC BY-NC 4.0 data to train a model that is then distributed under LLaMA 3.1, without acknowledging or preserving the CC license conditions, may result in downstream confusion about:

 • Whether commercial use is allowed (Meta's license restricts, but not exactly the same way)
 • Whether attribution is required for the training data
 • Whether a LLaMA-licensed model can legally incorporate and redistribute outputs from a CC BY-NC–licensed dataset

While both licenses limit commercial use in different ways, the real conflict lies in how derivative rights are managed:

LLaMA 3.1 disallows relicensing and requires that its terms remain intact;
CC BY-NC 4.0 requires non-commercial terms to be inherited by derivative works.

This may create an incompatibility, as it's impossible to fully comply with both licenses simultaneously.

🔹 Suggestion:

To better align the licensing structure and ensure full compliance, a few possible steps could be helpful:

1. Add a clear attribution line in the README or model card for the datasets used under CC BY-NC 4.0, including links to the dataset sources.
2. Clarify in the documentation that use of the model is also subject to the non-commercial restrictions of the training data, and not just LLaMA 3.1.
3. Consider re-evaluating the licensing arrangement if full CC BY-NC 4.0 obligations (e.g., non-commercial propagation) are incompatible with the current release.
4. For downstream clarity, state explicitly whether the model can be used commercially and under what terms.

Hope this helps! Let me know if you have any questions or need more info.

Thanks for your attention!

Looking forward to your response!

Psychotherapy with Large Language Models org

Hi, I've changed the lisence to CC BY-NC 4.0.

Best

Sign up or log in to comment