File size: 4,894 Bytes
136b33e
937c29e
 
 
 
136b33e
 
 
a7d24e3
136b33e
 
937c29e
 
a7d24e3
937c29e
a7d24e3
937c29e
a7d24e3
937c29e
a7d24e3
 
 
937c29e
a7d24e3
 
 
 
 
 
937c29e
a7d24e3
 
 
 
 
937c29e
a7d24e3
 
 
 
937c29e
a7d24e3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
be6b137
 
a7d24e3
 
 
 
 
 
 
 
 
 
 
937c29e
 
 
 
a7d24e3
937c29e
 
 
a7d24e3
937c29e
 
 
 
 
 
a7d24e3
 
 
937c29e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
---
title: Sema Translation API
emoji: 🌍
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
license: mit
short_description: Enterprise-grade translation API with 200+ language support
---

# Sema Translation API 🌍

Enterprise-grade translation API supporting 200+ languages with automatic language detection, rate limiting, usage tracking, and comprehensive monitoring. Built with FastAPI and powered by the consolidated `sematech/sema-utils` model repository.

## πŸš€ Features

### Core Translation
- **Automatic Language Detection**: Detects source language automatically if not provided
- **200+ Language Support**: Supports all FLORES-200 language codes
- **High-Performance Translation**: Uses CTranslate2 for optimized inference
- **Character Count Tracking**: Monitors usage for billing and analytics

### Enterprise Features
- **Rate Limiting**: 60 requests/minute, 1000 requests/hour per IP
- **Request Tracking**: Unique request IDs for debugging and monitoring
- **Usage Analytics**: Comprehensive metrics with Prometheus integration
- **Structured Logging**: JSON-formatted logs for easy parsing
- **Health Monitoring**: Detailed health checks for system monitoring

### Security & Reliability
- **Input Validation**: Comprehensive request validation with Pydantic
- **Error Handling**: Graceful error handling with detailed error responses
- **CORS Support**: Configurable cross-origin resource sharing
- **Future-Ready Auth**: Designed for Supabase authentication integration

### API Quality
- **OpenAPI Documentation**: Auto-generated Swagger UI and ReDoc
- **Type Safety**: Full TypeScript-compatible API schemas
- **Production Ready**: Follows FastAPI production best practices

## πŸ“ Project Structure

```
app/
β”œβ”€β”€ __init__.py
β”œβ”€β”€ main.py                     # Application entry point
β”œβ”€β”€ api/                        # API route definitions
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── v1/                     # Versioned API routes
β”‚       β”œβ”€β”€ __init__.py
β”‚       └── endpoints.py        # Route handlers
β”œβ”€β”€ core/                       # Core configuration
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ config.py              # Settings and configuration
β”‚   β”œβ”€β”€ logging.py             # Logging configuration
β”‚   └── metrics.py             # Prometheus metrics
β”œβ”€β”€ middleware/                 # Custom middleware
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── request_middleware.py  # Request tracking middleware
β”œβ”€β”€ models/                     # Data models
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── schemas.py             # Pydantic models
β”œβ”€β”€ services/                   # Business logic
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── translation.py         # Translation service
└── utils/                      # Utility functions
    β”œβ”€β”€ __init__.py
    └── helpers.py             # Helper functions
```

## πŸ”— API Endpoints

### Health & Monitoring
- **`GET /`** - Interactive Swagger UI documentation
- **`GET /status`** - Basic health check
- **`GET /health`** - Detailed health monitoring
- **`GET /metrics`** - Prometheus metrics
- **`GET /redoc`** - ReDoc documentation

### Translation
- **`POST /translate`** - Main translation endpoint
- **`POST /api/v1/translate`** - Versioned translation endpoint

### Request/Response Examples

**Translation Request:**
```json
{
  "text": "Habari ya asubuhi",
  "target_language": "eng_Latn",
  "source_language": "swh_Latn"  // Optional
}
```

**Translation Response:**
```json
{
  "translated_text": "Good morning",
  "source_language": "swh_Latn",
  "target_language": "eng_Latn",
  "inference_time": 0.234,
  "character_count": 17,
  "timestamp": "Monday | 2024-06-21 | 14:30:25",
  "request_id": "550e8400-e29b-41d4-a716-446655440000"
}
```

## Language Codes

This API uses FLORES-200 language codes. Some common examples:

- `eng_Latn` - English
- `swh_Latn` - Swahili
- `kik_Latn` - Kikuyu
- `luo_Latn` - Luo
- `fra_Latn` - French
- `spa_Latn` - Spanish

## Usage Examples

### Python
```python
import requests

response = requests.post("https://your-space-url/translate", json={
    "text": "Habari ya asubuhi",
    "target_language": "eng_Latn"
})

print(response.json())
```

### cURL
```bash
curl -X POST "https://your-space-url/translate" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "WΔ© mwega?",
    "source_language": "kik_Latn",
    "target_language": "eng_Latn"
  }'
```

## Model Information

This API uses models from the consolidated `sematech/sema-utils` repository:

- **Translation Model**: `sematrans-3.3B` (CTranslate2 optimized)
- **Language Detection**: `lid218e.bin` (FastText)
- **Tokenization**: `spm.model` (SentencePiece)

## API Documentation

Once the Space is running, visit `/docs` for interactive API documentation.

---

Created by Lewis Kamau Kimaru | Sema AI