deenasun commited on
Commit
1306721
·
1 Parent(s): 26288e8

update video_gen and Cloudflare upload to use avc1 codec

Browse files
README.md CHANGED
@@ -14,7 +14,7 @@ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-
14
 
15
  # AI-SL API
16
 
17
- Convert text documents to American Sign Language (ASL) videos using AI.
18
 
19
  ## Features
20
 
@@ -22,15 +22,13 @@ Convert text documents to American Sign Language (ASL) videos using AI.
22
  The app accepts both text input and file uploads with flexible options:
23
 
24
  - **Text Input**: Type or paste text directly into the interface (always available)
25
- - **File Upload**: Upload documents (PDF, TXT, DOCX, EPUB) - **optional, can be enabled/disabled**
26
- - **Smart Priority**: Text input takes priority if both are provided
27
- - **Toggle Control**: Checkbox to enable/disable file upload functionality
28
 
29
  ### Video Output Options
30
 
31
  The Gradio interface provides multiple ways for users to receive and download the generated ASL videos:
32
 
33
- #### 1. R2 Cloud Storage (Recommended)
34
  - Videos are automatically uploaded to Cloudflare R2 storage
35
  - Returns a public URL that users can download directly
36
  - Videos persist and can be shared via URL
@@ -57,8 +55,8 @@ result = client.predict(
57
  api_name="/predict"
58
  )
59
 
60
- # The result contains: (json_data, video_output, download_html)
61
- json_data, video_url, download_html = result
62
 
63
  # Download the video
64
  import requests
@@ -67,11 +65,6 @@ with open("asl_video.mp4", "wb") as f:
67
  f.write(response.content)
68
  ```
69
 
70
- #### 4. Direct Download from Interface
71
- - The interface includes a styled download button
72
- - Users can right-click and "Save As" if automatic download doesn't work
73
- - Video files are named `asl_video.mp4` by default
74
-
75
  ## Example Usage
76
 
77
  ### Web Interface
@@ -79,7 +72,7 @@ with open("asl_video.mp4", "wb") as f:
79
  2. Choose input method:
80
  - **Text**: Type or paste text in the text box (always available)
81
  - **File**: Check "Enable file upload" and upload a document (optional)
82
- 3. Click "Generate ASL Video"
83
  4. Download the resulting video
84
 
85
  ### Programmatic Access with Optional File Upload
@@ -88,37 +81,32 @@ with open("asl_video.mp4", "wb") as f:
88
  from gradio_client import Client
89
 
90
  # Connect to your hosted app
91
- client = Client("https://huggingface.co/spaces/your-username/your-space")
 
92
 
93
  # Text input only (file upload disabled)
94
  result = client.predict(
95
- "Hello world! This is a test.", # Text input
96
- False, # Enable file upload (False = disabled)
97
- None, # File input (None since disabled)
98
- True, # Use R2 storage
99
  api_name="/predict"
100
  )
101
 
102
  # File input only (file upload enabled)
103
  result = client.predict(
104
- "", # Text input (empty)
105
- True, # Enable file upload (True = enabled)
106
- "document.pdf", # File input
107
- True, # Use R2 storage
108
  api_name="/predict"
109
  )
110
 
111
  # Both inputs (text takes priority)
112
  result = client.predict(
113
  "Quick text", # Text input
114
- True, # Enable file upload (True = enabled)
115
  "document.pdf", # File input
116
- True, # Use R2 storage
117
  api_name="/predict"
118
  )
119
  ```
120
 
121
- See `example_usage.py`, `example_usage_dual_input.py`, and `example_optional_file_upload.py` for complete examples of how to:
122
  - Download videos from URLs
123
  - Process base64 video data
124
  - Use the interface programmatically
@@ -147,13 +135,3 @@ Once you have the video file, you can:
147
  - Convert to different formats
148
  - Extract frames for further processing
149
  - Add subtitles or overlays
150
-
151
- ## Deployment to Hugging Face Spaces
152
-
153
- 1. Create a new Space on Hugging Face
154
- 2. Choose Gradio as the SDK
155
- 3. Upload your code files
156
- 4. Set environment variables in Space settings
157
- 5. Deploy and share your Space URL
158
-
159
- Your app will be accessible to users worldwide with flexible input options!
 
14
 
15
  # AI-SL API
16
 
17
+ Convert natural language English into American Sign Language (ASL) videos using AI.
18
 
19
  ## Features
20
 
 
22
  The app accepts both text input and file uploads with flexible options:
23
 
24
  - **Text Input**: Type or paste text directly into the interface (always available)
25
+ - **File Upload**: Upload documents (PDF, TXT, DOCX, EPUB)
 
 
26
 
27
  ### Video Output Options
28
 
29
  The Gradio interface provides multiple ways for users to receive and download the generated ASL videos:
30
 
31
+ #### 1. R2 Cloud Storage
32
  - Videos are automatically uploaded to Cloudflare R2 storage
33
  - Returns a public URL that users can download directly
34
  - Videos persist and can be shared via URL
 
55
  api_name="/predict"
56
  )
57
 
58
+ # The result contains: (json_data, video_output)
59
+ json_data, video_url = result
60
 
61
  # Download the video
62
  import requests
 
65
  f.write(response.content)
66
  ```
67
 
 
 
 
 
 
68
  ## Example Usage
69
 
70
  ### Web Interface
 
72
  2. Choose input method:
73
  - **Text**: Type or paste text in the text box (always available)
74
  - **File**: Check "Enable file upload" and upload a document (optional)
75
+ 3. Click "Submit"
76
  4. Download the resulting video
77
 
78
  ### Programmatic Access with Optional File Upload
 
81
  from gradio_client import Client
82
 
83
  # Connect to your hosted app
84
+ from gradio_client import Client, handle_file
85
+ client = Client("deenasun/ai-sl-api")
86
 
87
  # Text input only (file upload disabled)
88
  result = client.predict(
89
+ text="Hello world! This is a test.", # Text input
90
+ file=None, # File input (None since disabled)
 
 
91
  api_name="/predict"
92
  )
93
 
94
  # File input only (file upload enabled)
95
  result = client.predict(
96
+ text="", # Text input (empty)
97
+ file=handle_file("document.pdf"), # File input
 
 
98
  api_name="/predict"
99
  )
100
 
101
  # Both inputs (text takes priority)
102
  result = client.predict(
103
  "Quick text", # Text input
 
104
  "document.pdf", # File input
 
105
  api_name="/predict"
106
  )
107
  ```
108
 
109
+ See `example_usage.py` and `example_usage_dual_input.py` for complete examples of how to:
110
  - Download videos from URLs
111
  - Process base64 video data
112
  - Use the interface programmatically
 
135
  - Convert to different formats
136
  - Extract frames for further processing
137
  - Add subtitles or overlays
 
 
 
 
 
 
 
 
 
 
__pycache__/app.cpython-311.pyc CHANGED
Binary files a/__pycache__/app.cpython-311.pyc and b/__pycache__/app.cpython-311.pyc differ
 
__pycache__/video_gen.cpython-311.pyc CHANGED
Binary files a/__pycache__/video_gen.cpython-311.pyc and b/__pycache__/video_gen.cpython-311.pyc differ
 
app.py CHANGED
@@ -70,11 +70,41 @@ def clean_gloss_token(token):
70
  return cleaned if cleaned else None
71
 
72
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
  def upload_video_to_r2(video_path, bucket_name="asl-videos"):
74
  """
75
  Upload a video file to R2 and return a public URL
76
  """
77
  try:
 
 
 
 
78
  # Generate a unique filename
79
  file_extension = os.path.splitext(video_path)[1]
80
  unique_filename = f"{uuid.uuid4()}{file_extension}"
@@ -87,22 +117,26 @@ def upload_video_to_r2(video_path, bucket_name="asl-videos"):
87
  unique_filename,
88
  ExtraArgs={
89
  'ACL': 'public-read',
90
- 'ContentType': 'video/mp4',
91
  'CacheControl': 'max-age=86400', # Cache for 24 hours
92
  'ContentDisposition': 'inline' # Force inline display
93
  })
94
 
95
  # Replace the endpoint with the domain for uploading
96
- public_domain = (R2_ENDPOINT.replace('https://', '')
97
- .split('.')[0])
98
- video_url = (f"https://{public_domain}.r2.cloudflarestorage.com/"
99
- f"{bucket_name}/{unique_filename}")
 
100
 
101
- print(f"Video uploaded to R2: {video_url}")
102
- public_video_url = f"{R2_ASL_VIDEOS_URL}/{unique_filename}"
103
- print(f"Public video url: {public_video_url}")
104
 
105
- return public_video_url
 
 
 
106
 
107
  except Exception as e:
108
  print(f"Error uploading video to R2: {e}")
@@ -171,6 +205,16 @@ def determine_input_type(input_data):
171
  # Check if it's a file path (contains file extension)
172
  if any(ext in input_data.lower() for ext in ['.pdf', '.txt', '.docx', '.doc', '.epub']):
173
  return 'file_path', input_data
 
 
 
 
 
 
 
 
 
 
174
  else:
175
  return 'text', input_data.strip()
176
  elif isinstance(input_data, dict) and 'path' in input_data:
@@ -312,7 +356,7 @@ def predict_unified(input_data):
312
  "message": "Please provide text or upload a document"
313
  }, None
314
 
315
- print("Input", input_data)
316
  # Use the unified processing function
317
  result = parse_vectorize_and_search_unified_sync(input_data)
318
 
 
70
  return cleaned if cleaned else None
71
 
72
 
73
+ def verify_video_format(video_path):
74
+ """
75
+ Verify that a video file is in a browser-compatible format (H.264 MP4)
76
+ """
77
+ try:
78
+ import cv2
79
+ cap = cv2.VideoCapture(video_path)
80
+ if not cap.isOpened():
81
+ return False, "Could not open video file"
82
+
83
+ # Get video properties
84
+ fourcc = int(cap.get(cv2.CAP_PROP_FOURCC))
85
+ codec = "".join([chr((fourcc >> 8 * i) & 0xFF) for i in range(4)])
86
+
87
+ cap.release()
88
+
89
+ # Check if it's H.264
90
+ if codec in ['avc1', 'H264', 'h264']:
91
+ return True, f"Video is H.264 encoded ({codec})"
92
+ else:
93
+ return False, f"Video codec {codec} may not be browser compatible"
94
+
95
+ except Exception as e:
96
+ return False, f"Error checking video format: {e}"
97
+
98
+
99
  def upload_video_to_r2(video_path, bucket_name="asl-videos"):
100
  """
101
  Upload a video file to R2 and return a public URL
102
  """
103
  try:
104
+ # Verify video format for browser compatibility
105
+ is_compatible, message = verify_video_format(video_path)
106
+ print(f"Video format check: {message}")
107
+
108
  # Generate a unique filename
109
  file_extension = os.path.splitext(video_path)[1]
110
  unique_filename = f"{uuid.uuid4()}{file_extension}"
 
117
  unique_filename,
118
  ExtraArgs={
119
  'ACL': 'public-read',
120
+ 'ContentType': 'video/mp4; codecs="avc1.42E01E"', # H.264
121
  'CacheControl': 'max-age=86400', # Cache for 24 hours
122
  'ContentDisposition': 'inline' # Force inline display
123
  })
124
 
125
  # Replace the endpoint with the domain for uploading
126
+ if R2_ENDPOINT:
127
+ public_domain = (R2_ENDPOINT.replace('https://', '')
128
+ .split('.')[0])
129
+ video_url = (f"https://{public_domain}.r2.cloudflarestorage.com/"
130
+ f"{bucket_name}/{unique_filename}")
131
 
132
+ print(f"Video uploaded to R2: {video_url}")
133
+ public_video_url = f"{R2_ASL_VIDEOS_URL}/{unique_filename}"
134
+ print(f"Public video url: {public_video_url}")
135
 
136
+ return public_video_url
137
+ else:
138
+ print("R2_ENDPOINT is not configured")
139
+ return None
140
 
141
  except Exception as e:
142
  print(f"Error uploading video to R2: {e}")
 
205
  # Check if it's a file path (contains file extension)
206
  if any(ext in input_data.lower() for ext in ['.pdf', '.txt', '.docx', '.doc', '.epub']):
207
  return 'file_path', input_data
208
+ # Check if it's a string representation of a gradio.FileData dict
209
+ elif input_data.startswith('{') and 'gradio.FileData' in input_data:
210
+ try:
211
+ import ast
212
+ # Safely evaluate the string as a dictionary
213
+ file_data = ast.literal_eval(input_data)
214
+ if isinstance(file_data, dict) and 'path' in file_data:
215
+ return 'file_path', file_data['path']
216
+ except (ValueError, SyntaxError):
217
+ pass
218
  else:
219
  return 'text', input_data.strip()
220
  elif isinstance(input_data, dict) and 'path' in input_data:
 
356
  "message": "Please provide text or upload a document"
357
  }, None
358
 
359
+ print("Input", input_data, type(input_data))
360
  # Use the unified processing function
361
  result = parse_vectorize_and_search_unified_sync(input_data)
362
 
test_h264_encoding.py ADDED
@@ -0,0 +1,152 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Test script to verify H.264 video encoding for browser compatibility
4
+ """
5
+
6
+ import cv2
7
+ import numpy as np
8
+ import tempfile
9
+ import os
10
+ from video_gen import get_video_writer
11
+ from app import verify_video_format
12
+
13
+
14
+ def test_h264_encoding():
15
+ """Test H.264 video encoding and verify browser compatibility"""
16
+
17
+ print("Testing H.264 video encoding for browser compatibility...")
18
+
19
+ # Create a simple test video
20
+ temp_file = tempfile.NamedTemporaryFile(delete=False, suffix='.mp4')
21
+ temp_path = temp_file.name
22
+ temp_file.close()
23
+
24
+ try:
25
+ # Create a simple test video with colored rectangles
26
+ out = get_video_writer(temp_path, fps=30.0, width=640, height=480)
27
+
28
+ if not out.isOpened():
29
+ print("ERROR: Could not create video writer")
30
+ return False
31
+
32
+ # Create 3 seconds of video (90 frames at 30 fps)
33
+ for frame_num in range(90):
34
+ # Create a frame with changing colors
35
+ frame = np.zeros((480, 640, 3), dtype=np.uint8)
36
+
37
+ # Create a moving colored rectangle
38
+ color = (
39
+ int(255 * (frame_num % 30) / 30), # Red
40
+ int(255 * ((frame_num + 10) % 30) / 30), # Green
41
+ int(255 * ((frame_num + 20) % 30) / 30) # Blue
42
+ )
43
+
44
+ # Draw a rectangle that moves across the screen
45
+ x = int((frame_num % 60) * 640 / 60)
46
+ cv2.rectangle(frame, (x, 200), (x + 100, 300), color, -1)
47
+
48
+ # Add text
49
+ cv2.putText(frame, f"Frame {frame_num}", (50, 50),
50
+ cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)
51
+
52
+ out.write(frame)
53
+
54
+ out.release()
55
+
56
+ # Verify the video format
57
+ is_compatible, message = verify_video_format(temp_path)
58
+ print(f"Video format verification: {message}")
59
+
60
+ if is_compatible:
61
+ print("✅ SUCCESS: Video is H.264 encoded and browser compatible!")
62
+
63
+ # Get video info
64
+ cap = cv2.VideoCapture(temp_path)
65
+ fps = cap.get(cv2.CAP_PROP_FPS)
66
+ frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
67
+ width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
68
+ height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
69
+ cap.release()
70
+
71
+ print(f"Video properties:")
72
+ print(f" - Resolution: {width}x{height}")
73
+ print(f" - FPS: {fps}")
74
+ print(f" - Frame count: {frame_count}")
75
+ print(f" - Duration: {frame_count/fps:.2f} seconds")
76
+ print(f" - File size: {os.path.getsize(temp_path)} bytes")
77
+
78
+ return True
79
+ else:
80
+ print("❌ FAILED: Video is not browser compatible")
81
+ return False
82
+
83
+ except Exception as e:
84
+ print(f"❌ ERROR: {e}")
85
+ return False
86
+ finally:
87
+ # Clean up
88
+ if os.path.exists(temp_path):
89
+ os.unlink(temp_path)
90
+ print(f"Cleaned up test file: {temp_path}")
91
+
92
+
93
+ def test_codec_availability():
94
+ """Test which video codecs are available"""
95
+
96
+ print("\nTesting available video codecs...")
97
+
98
+ codecs_to_test = [
99
+ ('avc1', 'H.264 (best for browsers)'),
100
+ ('mp4v', 'MPEG-4'),
101
+ ('XVID', 'XVID'),
102
+ ('MJPG', 'Motion JPEG'),
103
+ ('H264', 'H.264 alternative')
104
+ ]
105
+
106
+ temp_file = tempfile.NamedTemporaryFile(delete=False, suffix='.mp4')
107
+ temp_path = temp_file.name
108
+ temp_file.close()
109
+
110
+ available_codecs = []
111
+
112
+ for codec_name, description in codecs_to_test:
113
+ try:
114
+ fourcc = cv2.VideoWriter_fourcc(*codec_name)
115
+ out = cv2.VideoWriter(temp_path, fourcc, 30.0, (640, 480))
116
+
117
+ if out.isOpened():
118
+ available_codecs.append((codec_name, description))
119
+ print(f"✅ {codec_name}: {description}")
120
+ out.release()
121
+ else:
122
+ print(f"❌ {codec_name}: {description} (not working)")
123
+ out.release()
124
+
125
+ except Exception as e:
126
+ print(f"❌ {codec_name}: {description} (error: {e})")
127
+
128
+ # Clean up
129
+ if os.path.exists(temp_path):
130
+ os.unlink(temp_path)
131
+
132
+ print(f"\nAvailable codecs: {len(available_codecs)}")
133
+ return available_codecs
134
+
135
+
136
+ if __name__ == "__main__":
137
+ print("=" * 60)
138
+ print("H.264 Video Encoding Test for Browser Compatibility")
139
+ print("=" * 60)
140
+
141
+ # Test available codecs
142
+ available_codecs = test_codec_availability()
143
+
144
+ # Test H.264 encoding
145
+ success = test_h264_encoding()
146
+
147
+ print("\n" + "=" * 60)
148
+ if success:
149
+ print("🎉 All tests passed! Videos should work in Chrome and Firefox.")
150
+ else:
151
+ print("⚠️ Some tests failed. Check the output above for details.")
152
+ print("=" * 60)
video_gen.py CHANGED
@@ -462,14 +462,56 @@ def interpolate_keypoints(kptsA, kptsB, steps):
462
  frames.append((interp_pose, interp_left, interp_right))
463
  return frames
464
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
465
  def create_stitched_video(videoA_path, videoB_path, output_path="stitched_output.mp4"):
466
  # Extract keypoints from both videos
467
  videoA_keypoints = extract_keypoints_from_video(videoA_path)
468
  videoB_keypoints = extract_keypoints_from_video(videoB_path)
469
 
470
- # Create video writer
471
- fourcc = cv2.VideoWriter_fourcc(*'mp4v')
472
- out = cv2.VideoWriter(output_path, fourcc, 30.0, (1280, 720))
473
 
474
  # Show original A
475
  for pose, l, r in videoA_keypoints:
@@ -513,9 +555,8 @@ def create_multi_stitched_video(video_paths, output_path="multi_stitched_output.
513
  all_keypoints.append(keypoints)
514
  print(f" - Extracted {len(keypoints)} frames")
515
 
516
- # Create video writer
517
- fourcc = cv2.VideoWriter_fourcc(*'mp4v')
518
- out = cv2.VideoWriter(output_path, fourcc, 30.0, (1280, 720))
519
 
520
  total_frames = 0
521
 
 
462
  frames.append((interp_pose, interp_left, interp_right))
463
  return frames
464
 
465
+ def get_video_writer(output_path, fps=30.0, width=1280, height=720):
466
+ """
467
+ Create a video writer with H.264 codec for better browser compatibility.
468
+ Falls back to other codecs if H.264 is not available.
469
+ """
470
+ # Try H.264 codec first (best for browser compatibility)
471
+ try:
472
+ fourcc = cv2.VideoWriter_fourcc(*'avc1')
473
+ out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))
474
+ if out.isOpened():
475
+ print("Using H.264 (avc1) codec for video encoding")
476
+ return out
477
+ else:
478
+ out.release()
479
+ except Exception as e:
480
+ print(f"H.264 codec not available: {e}")
481
+
482
+ # Fallback to MPEG-4
483
+ try:
484
+ fourcc = cv2.VideoWriter_fourcc(*'mp4v')
485
+ out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))
486
+ if out.isOpened():
487
+ print("Using MPEG-4 (mp4v) codec for video encoding")
488
+ return out
489
+ else:
490
+ out.release()
491
+ except Exception as e:
492
+ print(f"MPEG-4 codec not available: {e}")
493
+
494
+ # Final fallback to XVID
495
+ try:
496
+ fourcc = cv2.VideoWriter_fourcc(*'XVID')
497
+ out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))
498
+ if out.isOpened():
499
+ print("Using XVID codec for video encoding")
500
+ return out
501
+ else:
502
+ out.release()
503
+ except Exception as e:
504
+ print(f"XVID codec not available: {e}")
505
+
506
+ raise RuntimeError("No suitable video codec found")
507
+
508
  def create_stitched_video(videoA_path, videoB_path, output_path="stitched_output.mp4"):
509
  # Extract keypoints from both videos
510
  videoA_keypoints = extract_keypoints_from_video(videoA_path)
511
  videoB_keypoints = extract_keypoints_from_video(videoB_path)
512
 
513
+ # Create video writer with H.264 codec for better browser compatibility
514
+ out = get_video_writer(output_path, 30.0, 1280, 720)
 
515
 
516
  # Show original A
517
  for pose, l, r in videoA_keypoints:
 
555
  all_keypoints.append(keypoints)
556
  print(f" - Extracted {len(keypoints)} frames")
557
 
558
+ # Create video writer with H.264 codec for better browser compatibility
559
+ out = get_video_writer(output_path, 30.0, 1280, 720)
 
560
 
561
  total_frames = 0
562