Spaces:

deenasun
/

ai-sl-api

Running

App Files Files Community

deenasun commited on Jun 22

Commit

1306721

1 Parent(s): 26288e8

update video_gen and Cloudflare upload to use avc1 codec

Browse files

Files changed (6) hide show

README.md +13 -35
__pycache__/app.cpython-311.pyc +0 -0
__pycache__/video_gen.cpython-311.pyc +0 -0
app.py +54 -10
test_h264_encoding.py +152 -0
video_gen.py +47 -6

README.md CHANGED Viewed

@@ -14,7 +14,7 @@ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-
 # AI-SL API
-Convert text documents to American Sign Language (ASL) videos using AI.
 ## Features
@@ -22,15 +22,13 @@ Convert text documents to American Sign Language (ASL) videos using AI.
 The app accepts both text input and file uploads with flexible options:
 - **Text Input**: Type or paste text directly into the interface (always available)
-- **File Upload**: Upload documents (PDF, TXT, DOCX, EPUB) - **optional, can be enabled/disabled**
-- **Smart Priority**: Text input takes priority if both are provided
-- **Toggle Control**: Checkbox to enable/disable file upload functionality
 ### Video Output Options
 The Gradio interface provides multiple ways for users to receive and download the generated ASL videos:
-#### 1. R2 Cloud Storage (Recommended)
 - Videos are automatically uploaded to Cloudflare R2 storage
 - Returns a public URL that users can download directly
 - Videos persist and can be shared via URL
@@ -57,8 +55,8 @@ result = client.predict(
     api_name="/predict"
 )
-# The result contains: (json_data, video_output, download_html)
-json_data, video_url, download_html = result
 # Download the video
 import requests
@@ -67,11 +65,6 @@ with open("asl_video.mp4", "wb") as f:
     f.write(response.content)
 ```
-#### 4. Direct Download from Interface
-- The interface includes a styled download button
-- Users can right-click and "Save As" if automatic download doesn't work
-- Video files are named `asl_video.mp4` by default
 ## Example Usage
 ### Web Interface
@@ -79,7 +72,7 @@ with open("asl_video.mp4", "wb") as f:
 2. Choose input method:
    - **Text**: Type or paste text in the text box (always available)
    - **File**: Check "Enable file upload" and upload a document (optional)
-3. Click "Generate ASL Video"
 4. Download the resulting video
 ### Programmatic Access with Optional File Upload
@@ -88,37 +81,32 @@ with open("asl_video.mp4", "wb") as f:
 from gradio_client import Client
 # Connect to your hosted app
-client = Client("https://huggingface.co/spaces/your-username/your-space")
 # Text input only (file upload disabled)
 result = client.predict(
-    "Hello world! This is a test.",  # Text input
-    False,                           # Enable file upload (False = disabled)
-    None,                            # File input (None since disabled)
-    True,                            # Use R2 storage
     api_name="/predict"
 )
 # File input only (file upload enabled)
 result = client.predict(
-    "",                              # Text input (empty)
-    True,                            # Enable file upload (True = enabled)
-    "document.pdf",                  # File input
-    True,                            # Use R2 storage
     api_name="/predict"
 )
 # Both inputs (text takes priority)
 result = client.predict(
     "Quick text",                    # Text input
-    True,                            # Enable file upload (True = enabled)
     "document.pdf",                  # File input
-    True,                            # Use R2 storage
     api_name="/predict"
 )
 ```
-See `example_usage.py`, `example_usage_dual_input.py`, and `example_optional_file_upload.py` for complete examples of how to:
 - Download videos from URLs
 - Process base64 video data
 - Use the interface programmatically
@@ -147,13 +135,3 @@ Once you have the video file, you can:
 - Convert to different formats
 - Extract frames for further processing
 - Add subtitles or overlays
-## Deployment to Hugging Face Spaces
-1. Create a new Space on Hugging Face
-2. Choose Gradio as the SDK
-3. Upload your code files
-4. Set environment variables in Space settings
-5. Deploy and share your Space URL
-Your app will be accessible to users worldwide with flexible input options!

 # AI-SL API
+Convert natural language English into American Sign Language (ASL) videos using AI.
 ## Features
 The app accepts both text input and file uploads with flexible options:
 - **Text Input**: Type or paste text directly into the interface (always available)
+- **File Upload**: Upload documents (PDF, TXT, DOCX, EPUB)
 ### Video Output Options
 The Gradio interface provides multiple ways for users to receive and download the generated ASL videos:
+#### 1. R2 Cloud Storage
 - Videos are automatically uploaded to Cloudflare R2 storage
 - Returns a public URL that users can download directly
 - Videos persist and can be shared via URL
     api_name="/predict"
 )
+# The result contains: (json_data, video_output)
+json_data, video_url = result
 # Download the video
 import requests
     f.write(response.content)
 ```
 ## Example Usage
 ### Web Interface
 2. Choose input method:
    - **Text**: Type or paste text in the text box (always available)
    - **File**: Check "Enable file upload" and upload a document (optional)
+3. Click "Submit"
 4. Download the resulting video
 ### Programmatic Access with Optional File Upload
 from gradio_client import Client
 # Connect to your hosted app
+from gradio_client import Client, handle_file
+client = Client("deenasun/ai-sl-api")
 # Text input only (file upload disabled)
 result = client.predict(
+    text="Hello world! This is a test.",  # Text input
+    file=None,                            # File input (None since disabled)
     api_name="/predict"
 )
 # File input only (file upload enabled)
 result = client.predict(
+    text="",                              # Text input (empty)
+    file=handle_file("document.pdf"),     # File input
     api_name="/predict"
 )
 # Both inputs (text takes priority)
 result = client.predict(
     "Quick text",                    # Text input
     "document.pdf",                  # File input
     api_name="/predict"
 )
 ```
+See `example_usage.py` and `example_usage_dual_input.py` for complete examples of how to:
 - Download videos from URLs
 - Process base64 video data
 - Use the interface programmatically
 - Convert to different formats
 - Extract frames for further processing
 - Add subtitles or overlays

__pycache__/app.cpython-311.pyc CHANGED Viewed

Binary files a/__pycache__/app.cpython-311.pyc and b/__pycache__/app.cpython-311.pyc differ

__pycache__/video_gen.cpython-311.pyc CHANGED Viewed

Binary files a/__pycache__/video_gen.cpython-311.pyc and b/__pycache__/video_gen.cpython-311.pyc differ

app.py CHANGED Viewed

@@ -70,11 +70,41 @@ def clean_gloss_token(token):
     return cleaned if cleaned else None
 def upload_video_to_r2(video_path, bucket_name="asl-videos"):
     """
     Upload a video file to R2 and return a public URL
     """
     try:
         # Generate a unique filename
         file_extension = os.path.splitext(video_path)[1]
         unique_filename = f"{uuid.uuid4()}{file_extension}"
@@ -87,22 +117,26 @@ def upload_video_to_r2(video_path, bucket_name="asl-videos"):
                 unique_filename,
                 ExtraArgs={
                     'ACL': 'public-read',
-                    'ContentType': 'video/mp4',
                     'CacheControl': 'max-age=86400',  # Cache for 24 hours
                     'ContentDisposition': 'inline'    # Force inline display
                 })
         # Replace the endpoint with the domain for uploading
-        public_domain = (R2_ENDPOINT.replace('https://', '')
-                         .split('.')[0])
-        video_url = (f"https://{public_domain}.r2.cloudflarestorage.com/"
-                     f"{bucket_name}/{unique_filename}")
-        print(f"Video uploaded to R2: {video_url}")
-        public_video_url = f"{R2_ASL_VIDEOS_URL}/{unique_filename}"
-        print(f"Public video url: {public_video_url}")
-        return public_video_url
     except Exception as e:
         print(f"Error uploading video to R2: {e}")
@@ -171,6 +205,16 @@ def determine_input_type(input_data):
         # Check if it's a file path (contains file extension)
         if any(ext in input_data.lower() for ext in ['.pdf', '.txt', '.docx', '.doc', '.epub']):
             return 'file_path', input_data
         else:
             return 'text', input_data.strip()
     elif isinstance(input_data, dict) and 'path' in input_data:
@@ -312,7 +356,7 @@ def predict_unified(input_data):
                 "message": "Please provide text or upload a document"
             }, None
-        print("Input", input_data)
         # Use the unified processing function
         result = parse_vectorize_and_search_unified_sync(input_data)

     return cleaned if cleaned else None
+def verify_video_format(video_path):
+    """
+    Verify that a video file is in a browser-compatible format (H.264 MP4)
+    """
+    try:
+        import cv2
+        cap = cv2.VideoCapture(video_path)
+        if not cap.isOpened():
+            return False, "Could not open video file"
+        # Get video properties
+        fourcc = int(cap.get(cv2.CAP_PROP_FOURCC))
+        codec = "".join([chr((fourcc >> 8 * i) & 0xFF) for i in range(4)])
+        cap.release()
+        # Check if it's H.264
+        if codec in ['avc1', 'H264', 'h264']:
+            return True, f"Video is H.264 encoded ({codec})"
+        else:
+            return False, f"Video codec {codec} may not be browser compatible"
+    except Exception as e:
+        return False, f"Error checking video format: {e}"
 def upload_video_to_r2(video_path, bucket_name="asl-videos"):
     """
     Upload a video file to R2 and return a public URL
     """
     try:
+        # Verify video format for browser compatibility
+        is_compatible, message = verify_video_format(video_path)
+        print(f"Video format check: {message}")
         # Generate a unique filename
         file_extension = os.path.splitext(video_path)[1]
         unique_filename = f"{uuid.uuid4()}{file_extension}"
                 unique_filename,
                 ExtraArgs={
                     'ACL': 'public-read',
+                    'ContentType': 'video/mp4; codecs="avc1.42E01E"',  # H.264
                     'CacheControl': 'max-age=86400',  # Cache for 24 hours
                     'ContentDisposition': 'inline'    # Force inline display
                 })
         # Replace the endpoint with the domain for uploading
+        if R2_ENDPOINT:
+            public_domain = (R2_ENDPOINT.replace('https://', '')
+                             .split('.')[0])
+            video_url = (f"https://{public_domain}.r2.cloudflarestorage.com/"
+                         f"{bucket_name}/{unique_filename}")
+            print(f"Video uploaded to R2: {video_url}")
+            public_video_url = f"{R2_ASL_VIDEOS_URL}/{unique_filename}"
+            print(f"Public video url: {public_video_url}")
+            return public_video_url
+        else:
+            print("R2_ENDPOINT is not configured")
+            return None
     except Exception as e:
         print(f"Error uploading video to R2: {e}")
         # Check if it's a file path (contains file extension)
         if any(ext in input_data.lower() for ext in ['.pdf', '.txt', '.docx', '.doc', '.epub']):
             return 'file_path', input_data
+        # Check if it's a string representation of a gradio.FileData dict
+        elif input_data.startswith('{') and 'gradio.FileData' in input_data:
+            try:
+                import ast
+                # Safely evaluate the string as a dictionary
+                file_data = ast.literal_eval(input_data)
+                if isinstance(file_data, dict) and 'path' in file_data:
+                    return 'file_path', file_data['path']
+            except (ValueError, SyntaxError):
+                pass
         else:
             return 'text', input_data.strip()
     elif isinstance(input_data, dict) and 'path' in input_data:
                 "message": "Please provide text or upload a document"
             }, None
+        print("Input", input_data, type(input_data))
         # Use the unified processing function
         result = parse_vectorize_and_search_unified_sync(input_data)

test_h264_encoding.py ADDED Viewed

	@@ -0,0 +1,152 @@

+#!/usr/bin/env python3
+"""
+Test script to verify H.264 video encoding for browser compatibility
+"""
+import cv2
+import numpy as np
+import tempfile
+import os
+from video_gen import get_video_writer
+from app import verify_video_format
+def test_h264_encoding():
+    """Test H.264 video encoding and verify browser compatibility"""
+    print("Testing H.264 video encoding for browser compatibility...")
+    # Create a simple test video
+    temp_file = tempfile.NamedTemporaryFile(delete=False, suffix='.mp4')
+    temp_path = temp_file.name
+    temp_file.close()
+    try:
+        # Create a simple test video with colored rectangles
+        out = get_video_writer(temp_path, fps=30.0, width=640, height=480)
+        if not out.isOpened():
+            print("ERROR: Could not create video writer")
+            return False
+        # Create 3 seconds of video (90 frames at 30 fps)
+        for frame_num in range(90):
+            # Create a frame with changing colors
+            frame = np.zeros((480, 640, 3), dtype=np.uint8)
+            # Create a moving colored rectangle
+            color = (
+                int(255 * (frame_num % 30) / 30),  # Red
+                int(255 * ((frame_num + 10) % 30) / 30),  # Green
+                int(255 * ((frame_num + 20) % 30) / 30)   # Blue
+            )
+            # Draw a rectangle that moves across the screen
+            x = int((frame_num % 60) * 640 / 60)
+            cv2.rectangle(frame, (x, 200), (x + 100, 300), color, -1)
+            # Add text
+            cv2.putText(frame, f"Frame {frame_num}", (50, 50),
+                       cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)
+            out.write(frame)
+        out.release()
+        # Verify the video format
+        is_compatible, message = verify_video_format(temp_path)
+        print(f"Video format verification: {message}")
+        if is_compatible:
+            print("✅ SUCCESS: Video is H.264 encoded and browser compatible!")
+            # Get video info
+            cap = cv2.VideoCapture(temp_path)
+            fps = cap.get(cv2.CAP_PROP_FPS)
+            frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
+            width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
+            height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
+            cap.release()
+            print(f"Video properties:")
+            print(f"  - Resolution: {width}x{height}")
+            print(f"  - FPS: {fps}")
+            print(f"  - Frame count: {frame_count}")
+            print(f"  - Duration: {frame_count/fps:.2f} seconds")
+            print(f"  - File size: {os.path.getsize(temp_path)} bytes")
+            return True
+        else:
+            print("❌ FAILED: Video is not browser compatible")
+            return False
+    except Exception as e:
+        print(f"❌ ERROR: {e}")
+        return False
+    finally:
+        # Clean up
+        if os.path.exists(temp_path):
+            os.unlink(temp_path)
+            print(f"Cleaned up test file: {temp_path}")
+def test_codec_availability():
+    """Test which video codecs are available"""
+    print("\nTesting available video codecs...")
+    codecs_to_test = [
+        ('avc1', 'H.264 (best for browsers)'),
+        ('mp4v', 'MPEG-4'),
+        ('XVID', 'XVID'),
+        ('MJPG', 'Motion JPEG'),
+        ('H264', 'H.264 alternative')
+    ]
+    temp_file = tempfile.NamedTemporaryFile(delete=False, suffix='.mp4')
+    temp_path = temp_file.name
+    temp_file.close()
+    available_codecs = []
+    for codec_name, description in codecs_to_test:
+        try:
+            fourcc = cv2.VideoWriter_fourcc(*codec_name)
+            out = cv2.VideoWriter(temp_path, fourcc, 30.0, (640, 480))
+            if out.isOpened():
+                available_codecs.append((codec_name, description))
+                print(f"✅ {codec_name}: {description}")
+                out.release()
+            else:
+                print(f"❌ {codec_name}: {description} (not working)")
+                out.release()
+        except Exception as e:
+            print(f"❌ {codec_name}: {description} (error: {e})")
+    # Clean up
+    if os.path.exists(temp_path):
+        os.unlink(temp_path)
+    print(f"\nAvailable codecs: {len(available_codecs)}")
+    return available_codecs
+if __name__ == "__main__":
+    print("=" * 60)
+    print("H.264 Video Encoding Test for Browser Compatibility")
+    print("=" * 60)
+    # Test available codecs
+    available_codecs = test_codec_availability()
+    # Test H.264 encoding
+    success = test_h264_encoding()
+    print("\n" + "=" * 60)
+    if success:
+        print("🎉 All tests passed! Videos should work in Chrome and Firefox.")
+    else:
+        print("⚠️  Some tests failed. Check the output above for details.")
+    print("=" * 60)

video_gen.py CHANGED Viewed

@@ -462,14 +462,56 @@ def interpolate_keypoints(kptsA, kptsB, steps):
         frames.append((interp_pose, interp_left, interp_right))
     return frames
 def create_stitched_video(videoA_path, videoB_path, output_path="stitched_output.mp4"):
     # Extract keypoints from both videos
     videoA_keypoints = extract_keypoints_from_video(videoA_path)
     videoB_keypoints = extract_keypoints_from_video(videoB_path)
-    # Create video writer
-    fourcc = cv2.VideoWriter_fourcc(*'mp4v')
-    out = cv2.VideoWriter(output_path, fourcc, 30.0, (1280, 720))
     # Show original A
     for pose, l, r in videoA_keypoints:
@@ -513,9 +555,8 @@ def create_multi_stitched_video(video_paths, output_path="multi_stitched_output.
         all_keypoints.append(keypoints)
         print(f"  - Extracted {len(keypoints)} frames")
-    # Create video writer
-    fourcc = cv2.VideoWriter_fourcc(*'mp4v')
-    out = cv2.VideoWriter(output_path, fourcc, 30.0, (1280, 720))
     total_frames = 0

         frames.append((interp_pose, interp_left, interp_right))
     return frames
+def get_video_writer(output_path, fps=30.0, width=1280, height=720):
+    """
+    Create a video writer with H.264 codec for better browser compatibility.
+    Falls back to other codecs if H.264 is not available.
+    """
+    # Try H.264 codec first (best for browser compatibility)
+    try:
+        fourcc = cv2.VideoWriter_fourcc(*'avc1')
+        out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))
+        if out.isOpened():
+            print("Using H.264 (avc1) codec for video encoding")
+            return out
+        else:
+            out.release()
+    except Exception as e:
+        print(f"H.264 codec not available: {e}")
+    # Fallback to MPEG-4
+    try:
+        fourcc = cv2.VideoWriter_fourcc(*'mp4v')
+        out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))
+        if out.isOpened():
+            print("Using MPEG-4 (mp4v) codec for video encoding")
+            return out
+        else:
+            out.release()
+    except Exception as e:
+        print(f"MPEG-4 codec not available: {e}")
+    # Final fallback to XVID
+    try:
+        fourcc = cv2.VideoWriter_fourcc(*'XVID')
+        out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))
+        if out.isOpened():
+            print("Using XVID codec for video encoding")
+            return out
+        else:
+            out.release()
+    except Exception as e:
+        print(f"XVID codec not available: {e}")
+    raise RuntimeError("No suitable video codec found")
 def create_stitched_video(videoA_path, videoB_path, output_path="stitched_output.mp4"):
     # Extract keypoints from both videos
     videoA_keypoints = extract_keypoints_from_video(videoA_path)
     videoB_keypoints = extract_keypoints_from_video(videoB_path)
+    # Create video writer with H.264 codec for better browser compatibility
+    out = get_video_writer(output_path, 30.0, 1280, 720)
     # Show original A
     for pose, l, r in videoA_keypoints:
         all_keypoints.append(keypoints)
         print(f"  - Extracted {len(keypoints)} frames")
+    # Create video writer with H.264 codec for better browser compatibility
+    out = get_video_writer(output_path, 30.0, 1280, 720)
     total_frames = 0