|  | --- | 
					
						
						|  | license: apache-2.0 | 
					
						
						|  | language: | 
					
						
						|  | - en | 
					
						
						|  | base_model: | 
					
						
						|  | - Qwen/Qwen2.5-14B-Instruct-1M | 
					
						
						|  | pipeline_tag: text-generation | 
					
						
						|  | library_name: transformers | 
					
						
						|  | tags: | 
					
						
						|  | - text-generation-inference | 
					
						
						|  | - code | 
					
						
						|  | - Qwen | 
					
						
						|  | - 14B | 
					
						
						|  | - QWQ | 
					
						
						|  | - Math | 
					
						
						|  | - trl | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | # **Cassiopeia-Qwen-14B** | 
					
						
						|  |  | 
					
						
						|  | Cassiopeia-Qwen-14B is based on the Qwen 2.5 14B modality architecture, designed to enhance the reasoning capabilities of 14B-parameter models. This model is optimized for general-purpose reasoning and answering, excelling in contextual understanding, logical deduction, and multi-step problem-solving. It has been fine-tuned using a long chain-of-thought reasoning model and specialized datasets to improve comprehension, structured responses, and conversational intelligence. | 
					
						
						|  |  | 
					
						
						|  | ## **Key Improvements** | 
					
						
						|  | 1. **Enhanced General Knowledge**: The model provides broad knowledge across various domains, improving capabilities in answering questions accurately and generating coherent responses. | 
					
						
						|  | 2. **Improved Instruction Following**: Significant advancements in understanding and following complex instructions, generating structured responses, and maintaining coherence over extended interactions. | 
					
						
						|  | 3. **Versatile Adaptability**: More resilient to diverse prompts, enhancing its ability to handle a wide range of topics and conversation styles, including open-ended and structured inquiries. | 
					
						
						|  | 4. **Long-Context Support**: Supports up to 128K tokens for input context and can generate up to 8K tokens in a single output, making it ideal for detailed responses. | 
					
						
						|  |  | 
					
						
						|  | ## **Quickstart with transformers** | 
					
						
						|  |  | 
					
						
						|  | Here is a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and generate content: | 
					
						
						|  |  | 
					
						
						|  | ```python | 
					
						
						|  | from transformers import AutoModelForCausalLM, AutoTokenizer | 
					
						
						|  |  | 
					
						
						|  | model_name = "prithivMLmods/Cassiopeia-Qwen-14B" | 
					
						
						|  |  | 
					
						
						|  | model = AutoModelForCausalLM.from_pretrained( | 
					
						
						|  | model_name, | 
					
						
						|  | torch_dtype="auto", | 
					
						
						|  | device_map="auto" | 
					
						
						|  | ) | 
					
						
						|  | tokenizer = AutoTokenizer.from_pretrained(model_name) | 
					
						
						|  |  | 
					
						
						|  | prompt = "What are the key principles of general-purpose AI?" | 
					
						
						|  | messages = [ | 
					
						
						|  | {"role": "system", "content": "You are a helpful assistant capable of answering a wide range of questions."}, | 
					
						
						|  | {"role": "user", "content": prompt} | 
					
						
						|  | ] | 
					
						
						|  | text = tokenizer.apply_chat_template( | 
					
						
						|  | messages, | 
					
						
						|  | tokenize=False, | 
					
						
						|  | add_generation_prompt=True | 
					
						
						|  | ) | 
					
						
						|  | model_inputs = tokenizer([text], return_tensors="pt").to(model.device) | 
					
						
						|  |  | 
					
						
						|  | generated_ids = model.generate( | 
					
						
						|  | **model_inputs, | 
					
						
						|  | max_new_tokens=512 | 
					
						
						|  | ) | 
					
						
						|  | generated_ids = [ | 
					
						
						|  | output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) | 
					
						
						|  | ] | 
					
						
						|  |  | 
					
						
						|  | response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | ## **Intended Use** | 
					
						
						|  | 1. **General-Purpose Reasoning**: | 
					
						
						|  | Designed for broad applicability, assisting with logical reasoning, answering diverse questions, and solving general knowledge problems. | 
					
						
						|  |  | 
					
						
						|  | 2. **Educational and Informational Assistance**: | 
					
						
						|  | Suitable for providing explanations, summaries, and research-based responses for students, educators, and general users. | 
					
						
						|  |  | 
					
						
						|  | 3. **Conversational AI and Chatbots**: | 
					
						
						|  | Ideal for building intelligent conversational agents that require contextual understanding and dynamic response generation. | 
					
						
						|  |  | 
					
						
						|  | 4. **Multilingual Applications**: | 
					
						
						|  | Supports global communication, translations, and multilingual content generation. | 
					
						
						|  |  | 
					
						
						|  | 5. **Structured Data Processing**: | 
					
						
						|  | Capable of analyzing and generating structured outputs, such as tables and JSON, useful for data science and automation. | 
					
						
						|  |  | 
					
						
						|  | 6. **Long-Form Content Generation**: | 
					
						
						|  | Can generate extended responses, including articles, reports, and guides, maintaining coherence over large text outputs. | 
					
						
						|  |  | 
					
						
						|  | ## **Limitations** | 
					
						
						|  | 1. **Hardware Requirements**: | 
					
						
						|  | Requires high-memory GPUs or TPUs due to its large parameter size and long-context support. | 
					
						
						|  |  | 
					
						
						|  | 2. **Potential Bias in Responses**: | 
					
						
						|  | While designed to be neutral, outputs may still reflect biases present in training data. | 
					
						
						|  |  | 
					
						
						|  | 3. **Inconsistent Outputs in Creative Tasks**: | 
					
						
						|  | May produce variable results in storytelling and highly subjective topics. | 
					
						
						|  |  | 
					
						
						|  | 4. **Limited Real-World Awareness**: | 
					
						
						|  | Does not have access to real-time events beyond its training cutoff. | 
					
						
						|  |  | 
					
						
						|  | 5. **Error Propagation in Extended Outputs**: | 
					
						
						|  | Minor errors in early responses may affect overall coherence in long-form outputs. | 
					
						
						|  |  | 
					
						
						|  | 6. **Prompt Sensitivity**: | 
					
						
						|  | The effectiveness of responses may depend on how well the input prompt is structured. |