License Conflict: llama3.1 vs CC BY-NC 4.0
Hi, I’d like to report a potential license conflict in Psychotherapy-LLM/PsychoCounsel-Llama3-8B
. According to the model card, it appears that part of the training data used for this model comes from datasets licensed under Creative Commons Attribution-NonCommercial 4.0 (CC BY-NC 4.0). This combination introduces a potential license compatibility issue, since LLaMA 3.1 strictly regulates redistribution and sublicensing, while CC BY-NC 4.0 imposes a NonCommercial restriction and attribution requirement that may propagate to derivative works.
⚠️ Key incompatibilities:
LLaMA 3.1 – Section 2 (Sublicensing and Relicensing):
• Does not allow relicensing under other licenses (e.g., CC BY-NC)
• Requires derivative models to preserve original terms and restrictions
• Commercial use is restricted based on MAU thresholds
CC BY-NC 4.0 – Core Clauses:
• Prohibits any commercial use of the dataset or derivative works
• Requires clear attribution to dataset creators
• Intended to propagate “non-commercial only” conditions to downstream outputs
Using CC BY-NC 4.0 data to train a model that is then distributed under LLaMA 3.1, without acknowledging or preserving the CC license conditions, may result in downstream confusion about:
• Whether commercial use is allowed (Meta's license restricts, but not exactly the same way)
• Whether attribution is required for the training data
• Whether a LLaMA-licensed model can legally incorporate and redistribute outputs from a CC BY-NC–licensed dataset
While both licenses limit commercial use in different ways, the real conflict lies in how derivative rights are managed:
LLaMA 3.1 disallows relicensing and requires that its terms remain intact;
CC BY-NC 4.0 requires non-commercial terms to be inherited by derivative works.
This may create an incompatibility, as it's impossible to fully comply with both licenses simultaneously.
🔹 Suggestion:
To better align the licensing structure and ensure full compliance, a few possible steps could be helpful:
1. Add a clear attribution line in the README or model card for the datasets used under CC BY-NC 4.0, including links to the dataset sources.
2. Clarify in the documentation that use of the model is also subject to the non-commercial restrictions of the training data, and not just LLaMA 3.1.
3. Consider re-evaluating the licensing arrangement if full CC BY-NC 4.0 obligations (e.g., non-commercial propagation) are incompatible with the current release.
4. For downstream clarity, state explicitly whether the model can be used commercially and under what terms.
Hope this helps! Let me know if you have any questions or need more info.
Thanks for your attention!
Looking forward to your response!
Hi, I've changed the lisence to CC BY-NC 4.0.
Best