Instructions to use my-ai-stack/Stack-2-9-finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use my-ai-stack/Stack-2-9-finetuned with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="my-ai-stack/Stack-2-9-finetuned")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("my-ai-stack/Stack-2-9-finetuned")
model = AutoModelForCausalLM.from_pretrained("my-ai-stack/Stack-2-9-finetuned")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use my-ai-stack/Stack-2-9-finetuned with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "my-ai-stack/Stack-2-9-finetuned"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-2-9-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/my-ai-stack/Stack-2-9-finetuned

SGLang

How to use my-ai-stack/Stack-2-9-finetuned with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "my-ai-stack/Stack-2-9-finetuned" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-2-9-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "my-ai-stack/Stack-2-9-finetuned" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-2-9-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use my-ai-stack/Stack-2-9-finetuned with Docker Model Runner:
```
docker model run hf.co/my-ai-stack/Stack-2-9-finetuned
```

walidsobhie-code commited on 30 days ago

Commit

a9b0b85

1 Parent(s): faf9686

init

Browse files

Files changed (8) hide show

hf_space/Dockerfile +26 -0
hf_space/README.md +27 -0
hf_space/app.py +67 -0
requirements_webui.txt +2 -0
src/cli/agent.py +122 -14
src/cli/main.py +0 -3
src/cli/tools.py +4 -2
web_ui.py +26 -15

hf_space/Dockerfile ADDED Viewed

	@@ -0,0 +1,26 @@

+# HuggingFace Spaces Dockerfile for Stack 2.9
+# Use this for free inference hosting on HF Spaces
+# https://huggingface.co/docs/hub/spaces-sdks-docker
+FROM python:3.11-slim
+# Set environment
+ENV PYTHONUNBUFFERED=1
+ENV PORT=7860
+# Install dependencies
+RUN pip install --no-cache-dir \
+    fastapi \
+    uvicorn[standard] \
+    pydantic \
+    requests \
+    huggingface_hub
+# Copy app
+COPY app.py .
+# Expose port
+EXPOSE 7860
+# Run app
+CMD ["python", "app.py"]

hf_space/README.md ADDED Viewed

	@@ -0,0 +1,27 @@

+# Stack 2.9 - Fine-tuned Code Assistant
+A 1.5B parameter code generation model, fine-tuned from Qwen2.5-Coder on Stack Overflow Q&A data.
+**Model:** [my-ai-stack/Stack-2-9-finetuned](https://huggingface.co/my-ai-stack/Stack-2-9-finetuned)
+**Live Demo:** [stack-2-9-demo](https://huggingface.co/spaces/my-ai-stack/stack-2-9-demo)
+## Quick Start
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model = AutoModelForCausalLM.from_pretrained("my-ai-stack/Stack-2-9-finetuned")
+tokenizer = AutoTokenizer.from_pretrained("my-ai-stack/Stack-2-9-finetuned")
+```
+## Hardware Requirements
+| Config | GPU | VRAM |
+|--------|-----|------|
+| FP16 | RTX 3060+ | ~4GB |
+| 8-bit | RTX 3060+ | ~2GB |
+| 4-bit | Any modern GPU | ~1GB |
+## License
+Apache 2.0

hf_space/app.py ADDED Viewed

	@@ -0,0 +1,67 @@

+# Stack 2.9 HuggingFace Space
+# Fine-tuned code assistant powered by Qwen2.5-Coder-1.5B
+app = gr.Blocks(title="Stack 2.9")
+with app:
+    gr.Markdown("""
+    # 💻 Stack 2.9 - Code Assistant
+    Fine-tuned on Stack Overflow data · 1.5B parameters · Qwen2.5-Coder base
+    ---
+    """)
+    with gr.Row():
+        with gr.Column(scale=3):
+            chatbot = gr.Chatbot(label="Stack 2.9", height=500)
+            msg = gr.Textbox(
+                label="Your message",
+                placeholder="Ask me to write or explain code...",
+                lines=3
+            )
+            with gr.Row():
+                submit_btn = gr.Button("Send", variant="primary")
+                clear_btn = gr.Button("Clear")
+        with gr.Column(scale=1):
+            gr.Markdown("### ⚙️ Settings")
+            temperature = gr.Slider(0.1, 1.5, 0.7, label="Temperature")
+            max_tokens = gr.Slider(64, 2048, 1024, step=64, label="Max tokens")
+            system_prompt = gr.Textbox(
+                value="You are Stack 2.9, a helpful coding assistant.",
+                label="System prompt",
+                lines=2
+            )
+            gr.Markdown("### 📊 Model Info")
+            gr.Markdown("""
+            - **Base**: Qwen2.5-Coder-1.5B
+            - **Fine-tuned**: Stack Overflow Q&A
+            - **Context**: 32K tokens
+            - **License**: Apache 2.0
+            """)
+    def respond(message, history, system, temp, tokens):
+        import torch
+        from transformers import AutoModelForCausalLM, AutoTokenizer
+        model_name = "my-ai-stack/Stack-2-9-finetuned"
+        tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
+        model = AutoModelForCausalLM.from_pretrained(
+            model_name,
+            torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
+            device_map="auto" if torch.cuda.is_available() else None,
+            trust_remote_code=True
+        )
+        messages = [{"role": "system", "content": system}, {"role": "user", "content": message}]
+        text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+        inputs = tokenizer([text], return_tensors="pt").to(model.device)
+        outputs = model.generate(**inputs, max_new_tokens=tokens, temperature=temp, do_sample=True, pad_token_id=tokenizer.pad_token_id)
+        response = tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True)
+        return response
+    submit_btn.click(respond, inputs=[msg, chatbot, system_prompt, temperature, max_tokens], outputs=chatbot)
+    msg.submit(respond, inputs=[msg, chatbot, system_prompt, temperature, max_tokens], outputs=chatbot)
+    clear_btn.click(lambda: None, outputs=chatbot)
+app.launch()

requirements_webui.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ streamlit
2	+ requests

src/cli/agent.py CHANGED Viewed

@@ -31,6 +31,7 @@ class QueryIntent(Enum):
     TASK = "task"
     QUESTION = "question"
     GENERAL = "general"
 @dataclass
@@ -80,6 +81,7 @@ class QueryUnderstanding:
         QueryIntent.FILE_SEARCH: [
             r"find\s+(?:files?\s+)?(?:named\s+)?(.+)",
             r"search\s+for\s+(?:files?\s+)?(.+)",
             r"where\s+is\s+(.+)",
             r"locate\s+(.+)",
         ],
@@ -88,24 +90,33 @@ class QueryUnderstanding:
             r"(commit|push|pull|branch)\s+(?:to\s+)?(?:the\s+)?(?:repo|repository)?",
         ],
         QueryIntent.CODE_EXECUTION: [
-            r"run\s+(?:the\s+)?(?:command\s+)?(.+)",
-            r"execute\s+(.+)",
-            r"start\s+(?:the\s+)?(?:server\s+)?(.+)",
-            r"test\s+(?:the\s+)?(.+)",
-            r"lint\s+(.+)",
-            r"format\s+(.+)",
         ],
         QueryIntent.WEB_SEARCH: [
-            r"search\s+(?:the\s+)?web\s+for\s+(.+)",
-            r"google\s+(.+)",
-            r"look\s+up\s+(.+)",
-            r"find\s+information\s+about\s+(.+)",
         ],
         QueryIntent.MEMORY: [
             r"(remember|recall|what do you remember)\s+(.+)",
             r"(save|store)\s+(?:to\s+)?memory\s+(.+)",
             r"what('s| is)\s+in\s+(?:the\s+)?memory",
         ],
         QueryIntent.TASK: [
             r"(create|add|new)\s+task\s+(.+)",
             r"list\s+(?:my\s+)?tasks?",
@@ -182,6 +193,7 @@ class ToolSelector:
         QueryIntent.WEB_SEARCH: ["web_search", "fetch"],
         QueryIntent.MEMORY: ["memory_recall", "memory_save", "memory_list"],
         QueryIntent.TASK: ["create_task", "list_tasks", "update_task"],
     }
     def select(self, intent: str, context: Dict[str, Any]) -> List[str]:
@@ -198,6 +210,7 @@ class ToolSelector:
             "memory": QueryIntent.MEMORY,
             "task": QueryIntent.TASK,
             "general": QueryIntent.GENERAL,
         }
         tools = []
@@ -260,21 +273,73 @@ class ToolSelector:
                 r"search\s+(?:the\s+)?web\s+for\s+(.+)",
                 r"google\s+(.+)",
                 r"look\s+up\s+(.+)",
             ]
             for pattern in patterns:
                 match = re.search(pattern, query, re.IGNORECASE)
                 if match:
-                    params["query"] = match.group(1).strip()
                     break
         return params
 class ResponseGenerator:
     """Generates natural language responses."""
     def __init__(self):
         self.context_manager = create_context_manager()
     def generate(
         self,
@@ -283,14 +348,47 @@ class ResponseGenerator:
         context: Dict[str, Any]
     ) -> str:
         """Generate response from tool results."""
         if not tool_results:
-            return "I couldn't find any results for your query."
         responses = []
         for call in tool_results:
             if call.result is None:
-                responses.append(f"I tried to use {call.tool_name} but got no result.")
                 continue
             if call.result.get("success"):
@@ -304,6 +402,16 @@ class ResponseGenerator:
                             content = content[:500] + "..."
                         responses.append(f"Here's the content:\n```\n{content}\n```")
                 elif call.tool_name == "grep":
                     if "matches" in result:
                         matches = result["matches"]
@@ -313,7 +421,7 @@ class ResponseGenerator:
                                 resp += f"- {m.get('file', '?')}:{m.get('line', '?')} - {m.get('content', '')}\n"
                             responses.append(resp)
                         else:
-                            responses.append("No matches found.")
                 elif call.tool_name in ["git_status", "git_log"]:
                     if "files" in result:

     TASK = "task"
     QUESTION = "question"
     GENERAL = "general"
+    GENERAL_HELP = "general_help"
 @dataclass
         QueryIntent.FILE_SEARCH: [
             r"find\s+(?:files?\s+)?(?:named\s+)?(.+)",
             r"search\s+for\s+(?:files?\s+)?(.+)",
+            r"grep\s+for\s+(.+)",
             r"where\s+is\s+(.+)",
             r"locate\s+(.+)",
         ],
             r"(commit|push|pull|branch)\s+(?:to\s+)?(?:the\s+)?(?:repo|repository)?",
         ],
         QueryIntent.CODE_EXECUTION: [
+            r"^run\s+(?:the\s+)?(?:command\s+)?(.+)",
+            r"^execute\s+(.+)",
+            r"^start\s+(?:the\s+)?(?:server\s+)?(.+)",
+            r"^test\s+(?:the\s+)?(.+)",
+            r"^lint\s+(.+)",
+            r"^format\s+(.+)",
         ],
         QueryIntent.WEB_SEARCH: [
+            r"^search\s+(?:the\s+)?web\s+for\s+(.+)",
+            r"^google\s+(.+)",
+            r"^look\s+up\s+(.+)",
+            r"^find\s+information\s+about\s+(.+)",
+            r"latest\s+ai\s+news",
+            r"what('s|\s+is)\s+new\s+in\s+ai",
         ],
         QueryIntent.MEMORY: [
             r"(remember|recall|what do you remember)\s+(.+)",
             r"(save|store)\s+(?:to\s+)?memory\s+(.+)",
             r"what('s| is)\s+in\s+(?:the\s+)?memory",
         ],
+        QueryIntent.GENERAL_HELP: [
+            r"list\s+(?:all\s+)?tools?",
+            r"what\s+tools\s+(?:do\s+you\s+have|can\s+you\s+do)",
+            r"help\s+me",
+            r"what\s+can\s+you\s+do",
+            r"how\s+to\s+use\s+you",
+        ],
         QueryIntent.TASK: [
             r"(create|add|new)\s+task\s+(.+)",
             r"list\s+(?:my\s+)?tasks?",
         QueryIntent.WEB_SEARCH: ["web_search", "fetch"],
         QueryIntent.MEMORY: ["memory_recall", "memory_save", "memory_list"],
         QueryIntent.TASK: ["create_task", "list_tasks", "update_task"],
+        QueryIntent.GENERAL_HELP: [],
     }
     def select(self, intent: str, context: Dict[str, Any]) -> List[str]:
             "memory": QueryIntent.MEMORY,
             "task": QueryIntent.TASK,
             "general": QueryIntent.GENERAL,
+            "general_help": QueryIntent.GENERAL_HELP,
         }
         tools = []
                 r"search\s+(?:the\s+)?web\s+for\s+(.+)",
                 r"google\s+(.+)",
                 r"look\s+up\s+(.+)",
+                r"latest\s+ai\s+news",
+                r"what('s|\s+is)\s+new\s+in\s+ai",
             ]
             for pattern in patterns:
                 match = re.search(pattern, query, re.IGNORECASE)
                 if match:
+                    # Only extract group(1) if it exists
+                    if match.groups():
+                        params["query"] = match.group(1).strip()
+                    else:
+                        # For patterns without capture groups, use full match
+                        params["query"] = match.group(0).strip()
                     break
+        elif tool_name in ("grep", "search"):
+            # Extract pattern to search for - capture full phrase
+            # Strategy: split by " in " and take second part as path
+            parts = query.split(' in ')
+            if len(parts) >= 2:
+                # Last part is the path
+                path_part = ' in '.join(parts[1:])
+                # Clean up path
+                if path_part.strip() in ['project', 'this project']:
+                    path_part = '/Users/walidsobhi/stack-2.9/src'
+                elif path_part.strip() == 'src':
+                    path_part = '/Users/walidsobhi/stack-2.9/src'
+                elif path_part.startswith('~') or path_part.startswith('/') or path_part.startswith('./'):
+                    pass  # Keep as-is
+                else:
+                    path_part = '/Users/walidsobhi/stack-2.9/' + path_part.strip()
+                params["path"] = path_part
+                # First part is the pattern (remove grep/for/search)
+                pattern_part = parts[0]
+                for prefix in ['grep for', 'search for', 'find for', 'grep', 'search', 'find']:
+                    if pattern_part.strip().lower().startswith(prefix):
+                        pattern_part = pattern_part.strip()[len(prefix):].strip()
+                        break
+                params["pattern"] = pattern_part
+            else:
+                # Default path is workspace root
+                params["path"] = "/Users/walidsobhi/stack-2.9/src"
         return params
 class ResponseGenerator:
     """Generates natural language responses."""
+    GREETING_VARIATIONS = [
+        "Sure! I can help with that.",
+        "Got it! Let me assist with that.",
+        "No problem! Here's what I found:",
+        "Alright! Here you go:",
+        "Sure thing! Let me show you:",
+    ]
+    HELP_RESPONSES = [
+        "I support these operations:",
+        "Here are some things I can do:",
+        "Here's my toolkit:",
+        "I can help with the following:",
+    ]
     def __init__(self):
         self.context_manager = create_context_manager()
+        self.last_intent = None
+        self.last_query = None
     def generate(
         self,
         context: Dict[str, Any]
     ) -> str:
         """Generate response from tool results."""
+        import random
+        # Track intent for conversation flow
+        previous_intent = self.last_intent
+        self.last_intent = intent
         if not tool_results:
+            # Handle queries that don't need tools
+            if intent == "question":
+                return ("I can help with reading/writing files, running commands, "
+                        "git operations, web search, and more. "
+                        "Try asking me something like 'read the file README.md' "
+                        "or 'check git status'.")
+            elif intent == "general_help":
+                greeting = random.choice(self.HELP_RESPONSES)
+                return (f"{greeting}\n"
+                        "- Read/write/edit files\n"
+                        "- Run commands and code\n"
+                        "- Git operations (status, commit, push, pull)\n"
+                        "- Code search with grep\n"
+                        "- Web search\n"
+                        "- Manage tasks\n\n"
+                        "Examples:\n"
+                        "- 'read the file /path/to/file'\n"
+                        "- 'check git status'\n"
+                        "- 'grep for def main in ~/project/src'\n"
+                        "- 'run the command ls'\n"
+                        "- 'what is 2 + 2'")
+            elif intent == "general":
+                # Don't repeat the same greeting
+                if previous_intent == "general":
+                    return "What would you like me to help you with?"
+                return "What can I help you with?"
+            return None
         responses = []
+        greeting = random.choice(self.GREETING_VARIATIONS) if tool_results else None
         for call in tool_results:
             if call.result is None:
+                responses.append(f"Hmm, {call.tool_name} didn't return anything.")
                 continue
             if call.result.get("success"):
                             content = content[:500] + "..."
                         responses.append(f"Here's the content:\n```\n{content}\n```")
+                elif call.tool_name == "search":
+                    # Skip search tool if it has no matches - grep will show results
+                    if "matches" in result and result["matches"]:
+                        matches = result["matches"]
+                        resp = f"Found {len(matches)} matches:\n"
+                        for m in matches[:10]:
+                            resp += f"- {m.get('file', '?')}:{m.get('line', '?')} - {m.get('content', '')}\n"
+                        responses.append(resp)
+                    # else: skip - grep will show results
                 elif call.tool_name == "grep":
                     if "matches" in result:
                         matches = result["matches"]
                                 resp += f"- {m.get('file', '?')}:{m.get('line', '?')} - {m.get('content', '')}\n"
                             responses.append(resp)
                         else:
+                            responses.append("Didn't find any matches for that.")
                 elif call.tool_name in ["git_status", "git_log"]:
                     if "files" in result:

src/cli/main.py CHANGED Viewed

@@ -107,9 +107,6 @@ class Stack29CLI:
                     response = self.agent.process(user_input)
                     print(response.content)
-                    if response.tool_calls:
-                        print(f"\n{self.YELLOW}[Tools called: {', '.join(tc.tool_name for tc in response.tool_calls)}]{self.END}")
                 except Exception as e:
                     print(f"{self.RED}Error: {e}{self.END}")

                     response = self.agent.process(user_input)
                     print(response.content)
                 except Exception as e:
                     print(f"{self.RED}Error: {e}{self.END}")

src/cli/tools.py CHANGED Viewed

@@ -92,7 +92,8 @@ def tool_search_files(
 ) -> Dict[str, Any]:
     """Recursively search for files matching a pattern."""
     try:
-        base_path = Path(path)
         if not base_path.exists():
             return {"success": False, "error": f"Path not found: {path}"}
@@ -121,7 +122,8 @@ def tool_search_files(
 def tool_grep(path: str, pattern: str, context: int = 0) -> Dict[str, Any]:
     """Search for pattern in file(s)."""
     try:
-        base_path = Path(path)
         results = []
         if base_path.is_file():

 ) -> Dict[str, Any]:
     """Recursively search for files matching a pattern."""
     try:
+        # Expand ~ to home directory
+        base_path = Path(os.path.expanduser(path))
         if not base_path.exists():
             return {"success": False, "error": f"Path not found: {path}"}
 def tool_grep(path: str, pattern: str, context: int = 0) -> Dict[str, Any]:
     """Search for pattern in file(s)."""
     try:
+        # Expand ~ to home directory
+        base_path = Path(os.path.expanduser(path))
         results = []
         if base_path.is_file():

web_ui.py CHANGED Viewed

@@ -3,8 +3,9 @@ Stack 2.9 - Web UI Chat
 Simple web interface using Streamlit
 """
 import streamlit as st
-from typing import List, Dict
 import os
 # Configure page
 st.set_page_config(
@@ -13,10 +14,6 @@ st.set_page_config(
     layout="wide"
 )
-# Model configuration
-MODEL_NAME = os.environ.get("MODEL_NAME", "minimax-m2.5:cloud")
-PROVIDER = os.environ.get("MODEL_PROVIDER", "ollama")
 # Title
 st.title("💻 Stack 2.9")
 st.caption("AI Coding Assistant")
@@ -27,7 +24,7 @@ with st.sidebar:
     model = st.selectbox(
         "Model",
-        ["minimax-m2.5:cloud", "qwen2.5-coder:1.5b", "llama3"],
         index=0
     )
@@ -64,11 +61,10 @@ if prompt := st.chat_input("Type your message..."):
     with st.chat_message("assistant"):
         with st.spinner("Thinking..."):
             try:
-                import requests
-                # Call Ollama API
                 response = requests.post(
-                    f"http://localhost:11434/api/chat",
                     json={
                         "model": model,
                         "messages": [
@@ -78,17 +74,32 @@ if prompt := st.chat_input("Type your message..."):
                         "temperature": temperature,
                         "max_tokens": max_tokens
                     },
-                    timeout=120
                 )
                 if response.status_code == 200:
-                    result = response.json()
-                    assistant_msg = result["message"]["content"]
                 else:
-                    assistant_msg = f"Error: {response.status_code}"
             except Exception as e:
-                assistant_msg = f"Error: {str(e)}"
         st.markdown(assistant_msg)
         st.session_state.messages.append({"role": "assistant", "content": assistant_msg})

 Simple web interface using Streamlit
 """
 import streamlit as st
 import os
+import requests
+import json
 # Configure page
 st.set_page_config(
     layout="wide"
 )
 # Title
 st.title("💻 Stack 2.9")
 st.caption("AI Coding Assistant")
     model = st.selectbox(
         "Model",
+        ["minimax-m2.5:cloud", "qwen2.5-coder:1.5b"],
         index=0
     )
     with st.chat_message("assistant"):
         with st.spinner("Thinking..."):
             try:
+                import json
+                # Use local Ollama - your minimax is registered there
                 response = requests.post(
+                    "http://localhost:11434/api/chat",
                     json={
                         "model": model,
                         "messages": [
                         "temperature": temperature,
                         "max_tokens": max_tokens
                     },
+                    timeout=120,
+                    stream=False
                 )
                 if response.status_code == 200:
+                    text = response.text.strip()
+                    # Try to parse each line until we get content
+                    assistant_msg = ""
+                    for line in text.split('\n'):
+                        if line.strip():
+                            try:
+                                result = json.loads(line)
+                                content = result.get("message", {}).get("content", "")
+                                if content:
+                                    assistant_msg = content
+                                    break
+                            except:
+                                continue
+                    if not assistant_msg:
+                        assistant_msg = text
                 else:
+                    assistant_msg = f"Error: {response.status_code}\n{response.text[:200]}"
             except Exception as e:
+                assistant_msg = f"Connection Error: {str(e)}\n\nMake sure Ollama is running with: ollama serve"
         st.markdown(assistant_msg)
         st.session_state.messages.append({"role": "assistant", "content": assistant_msg})