Spaces:

deenasun
/

ai-sl-api

Running

App Files Files Community

deenasun commited on Jun 22

Commit

03ba989

1 Parent(s): dadcb61

add two input options and R2 cloud upload-download

Browse files

Files changed (5) hide show

README.md +72 -6
__pycache__/app.cpython-311.pyc +0 -0
app.py +185 -111
example_usage.py → examples/example_usage.py +0 -0
examples/example_usage_dual_input.py +148 -0

README.md CHANGED Viewed

@@ -16,23 +16,33 @@ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-
 Convert text documents to American Sign Language (ASL) videos using AI.
-## Video Output Options
 The Gradio interface provides multiple ways for users to receive and download the generated ASL videos:
-### 1. R2 Cloud Storage (Recommended)
 - Videos are automatically uploaded to Cloudflare R2 storage
 - Returns a public URL that users can download directly
 - Videos persist and can be shared via URL
 - Includes a styled download button in the interface
-### 2. Base64 Encoding (Alternative)
 - Videos are embedded as base64 data directly in the response
 - No external storage required
 - Good for smaller videos or when you want to avoid cloud storage
 - Can be downloaded directly from the interface
-### 3. Programmatic Access
 Users can access the video output programmatically using:
 ```python
@@ -57,18 +67,64 @@ with open("asl_video.mp4", "wb") as f:
     f.write(response.content)
 ```
-### 4. Direct Download from Interface
 - The interface includes a styled download button
 - Users can right-click and "Save As" if automatic download doesn't work
 - Video files are named `asl_video.mp4` by default
 ## Example Usage
-See `example_usage.py` for complete examples of how to:
 - Download videos from URLs
 - Process base64 video data
 - Use the interface programmatically
 - Perform further video processing
 ## Requirements
@@ -91,3 +147,13 @@ Once you have the video file, you can:
 - Convert to different formats
 - Extract frames for further processing
 - Add subtitles or overlays

 Convert text documents to American Sign Language (ASL) videos using AI.
+## Features
+### Dual Input Support with Optional File Upload
+The app accepts both text input and file uploads with flexible options:
+- **Text Input**: Type or paste text directly into the interface (always available)
+- **File Upload**: Upload documents (PDF, TXT, DOCX, EPUB) - **optional, can be enabled/disabled**
+- **Smart Priority**: Text input takes priority if both are provided
+- **Toggle Control**: Checkbox to enable/disable file upload functionality
+### Video Output Options
 The Gradio interface provides multiple ways for users to receive and download the generated ASL videos:
+#### 1. R2 Cloud Storage (Recommended)
 - Videos are automatically uploaded to Cloudflare R2 storage
 - Returns a public URL that users can download directly
 - Videos persist and can be shared via URL
 - Includes a styled download button in the interface
+#### 2. Base64 Encoding (Alternative)
 - Videos are embedded as base64 data directly in the response
 - No external storage required
 - Good for smaller videos or when you want to avoid cloud storage
 - Can be downloaded directly from the interface
+#### 3. Programmatic Access
 Users can access the video output programmatically using:
 ```python
     f.write(response.content)
 ```
+#### 4. Direct Download from Interface
 - The interface includes a styled download button
 - Users can right-click and "Save As" if automatic download doesn't work
 - Video files are named `asl_video.mp4` by default
 ## Example Usage
+### Web Interface
+1. Visit your Space URL
+2. Choose input method:
+   - **Text**: Type or paste text in the text box (always available)
+   - **File**: Check "Enable file upload" and upload a document (optional)
+3. Click "Generate ASL Video"
+4. Download the resulting video
+### Programmatic Access with Optional File Upload
+```python
+from gradio_client import Client
+# Connect to your hosted app
+client = Client("https://huggingface.co/spaces/your-username/your-space")
+# Text input only (file upload disabled)
+result = client.predict(
+    "Hello world! This is a test.",  # Text input
+    False,                           # Enable file upload (False = disabled)
+    None,                            # File input (None since disabled)
+    True,                            # Use R2 storage
+    api_name="/predict"
+)
+# File input only (file upload enabled)
+result = client.predict(
+    "",                              # Text input (empty)
+    True,                            # Enable file upload (True = enabled)
+    "document.pdf",                  # File input
+    True,                            # Use R2 storage
+    api_name="/predict"
+)
+# Both inputs (text takes priority)
+result = client.predict(
+    "Quick text",                    # Text input
+    True,                            # Enable file upload (True = enabled)
+    "document.pdf",                  # File input
+    True,                            # Use R2 storage
+    api_name="/predict"
+)
+```
+See `example_usage.py`, `example_usage_dual_input.py`, and `example_optional_file_upload.py` for complete examples of how to:
 - Download videos from URLs
 - Process base64 video data
 - Use the interface programmatically
 - Perform further video processing
+- Handle both text and file inputs
+- Use optional file upload functionality
 ## Requirements
 - Convert to different formats
 - Extract frames for further processing
 - Add subtitles or overlays
+## Deployment to Hugging Face Spaces
+1. Create a new Space on Hugging Face
+2. Choose Gradio as the SDK
+3. Upload your code files
+4. Set environment variables in Space settings
+5. Deploy and share your Space URL
+Your app will be accessible to users worldwide with flexible input options!

__pycache__/app.cpython-311.pyc ADDED Viewed

Binary file (18.8 kB). View file

app.py CHANGED Viewed

@@ -17,13 +17,17 @@ import base64
 load_dotenv()
 # Load R2/S3 environment secrets
 R2_ENDPOINT = os.environ.get("R2_ENDPOINT")
 R2_ACCESS_KEY_ID = os.environ.get("R2_ACCESS_KEY_ID")
 R2_SECRET_ACCESS_KEY = os.environ.get("R2_SECRET_ACCESS_KEY")
 # Validate that required environment variables are set
-if not all([R2_ENDPOINT, R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY]):
-    raise ValueError("Missing required R2 environment variables. Please check your .env file.")
 title = "AI-SL"
 description = "Convert text to ASL!"
@@ -61,7 +65,7 @@ def clean_gloss_token(token):
     return cleaned
-def upload_video_to_r2(video_path, bucket_name="ai-sl-videos"):
     """
     Upload a video file to R2 and return a public URL
     """
@@ -79,10 +83,14 @@ def upload_video_to_r2(video_path, bucket_name="ai-sl-videos"):
                 ExtraArgs={'ACL': 'public-read'}
             )
-        # Generate the public URL
-        video_url = f"{R2_ENDPOINT}/{bucket_name}/{unique_filename}"
         print(f"Video uploaded to R2: {video_url}")
-        return video_url
     except Exception as e:
         print(f"Error uploading video to R2: {e}")
@@ -142,9 +150,68 @@ def cleanup_temp_video(file_path):
         print(f"Error cleaning up file: {e}")
-async def parse_vectorize_and_search(file):
-    print(file)
-    gloss = asl_converter.convert_document(file)
     print("ASL", gloss)
     # Split by spaces and clean each token
@@ -184,7 +251,9 @@ async def parse_vectorize_and_search(file):
     if len(video_files) > 1:
         try:
             print(f"Creating stitched video from {len(video_files)} videos...")
-            stitched_video_path = tempfile.NamedTemporaryFile(delete=False, suffix='.mp4').name
             create_multi_stitched_video(video_files, stitched_video_path)
             print(f"Stitched video created: {stitched_video_path}")
         except Exception as e:
@@ -234,114 +303,119 @@ async def parse_vectorize_and_search(file):
         "final_video_url": final_video_url
     }, final_video_url, download_html
-# Create a synchronous wrapper for Gradio
-def parse_vectorize_and_search_sync(file):
-    return asyncio.run(parse_vectorize_and_search(file))
-async def parse_vectorize_and_search_base64(file):
     """
-    Alternative version that returns video as base64 data instead of uploading to R2
     """
-    print(file)
-    gloss = asl_converter.convert_document(file)
-    print("ASL", gloss)
-    # Split by spaces and clean each token
-    gloss_tokens = gloss.split()
-    cleaned_tokens = []
-    for token in gloss_tokens:
-        cleaned = clean_gloss_token(token)
-        if cleaned:  # Only add non-empty tokens
-            cleaned_tokens.append(cleaned)
-    print("Cleaned tokens:", cleaned_tokens)
-    videos = []
-    video_files = []  # Store local file paths for stitching
-    for g in cleaned_tokens:
-        print(f"Processing {g}")
-        try:
-            result = await vectorizer.vector_query_from_supabase(query=g)
-            print("result", result)
-            if result.get("match", False):
-                video_url = result["video_url"]
-                videos.append(video_url)
-                # Download the video
-                local_path = download_video_from_url(video_url)
-                if local_path:
-                    video_files.append(local_path)
-        except Exception as e:
-            print(f"Error processing {g}: {e}")
-            continue
-    # Create stitched video if we have multiple videos
-    stitched_video_path = None
-    if len(video_files) > 1:
-        try:
-            print(f"Creating stitched video from {len(video_files)} videos...")
-            stitched_video_path = tempfile.NamedTemporaryFile(delete=False, suffix='.mp4').name
-            create_multi_stitched_video(video_files, stitched_video_path)
-            print(f"Stitched video created: {stitched_video_path}")
-        except Exception as e:
-            print(f"Error creating stitched video: {e}")
-            stitched_video_path = None
-    elif len(video_files) == 1:
-        # If only one video, just use it directly
-        stitched_video_path = video_files[0]
-    # Convert final video to base64
-    final_video_base64 = None
-    if stitched_video_path:
-        final_video_base64 = video_to_base64(stitched_video_path)
-        # Clean up the local file after conversion
-        cleanup_temp_video(stitched_video_path)
-    # Clean up individual video files after stitching
-    for video_file in video_files:
-        if video_file != stitched_video_path:  # Don't delete the final output
-            cleanup_temp_video(video_file)
-    # Create download link HTML for base64
-    download_html = ""
-    if final_video_base64:
-        download_html = f"""
-        <div style="text-align: center; padding: 20px;">
-            <h3>Download Your ASL Video</h3>
-            <a href="{final_video_base64}" download="asl_video.mp4"
-               style="background-color: #4CAF50; color: white;
-                      padding: 12px 24px; text-decoration: none;
-                      border-radius: 4px; display: inline-block;">
-                Download Video
-            </a>
-            <p style="margin-top: 10px; color: #666;">
-                <small>Video is embedded directly - click to download</small>
-            </p>
-        </div>
-        """
-    return {
-        "status": "success",
-        "videos": videos,
-        "video_count": len(videos),
-        "gloss": gloss,
-        "cleaned_tokens": cleaned_tokens,
-        "video_format": "base64"
-    }, final_video_base64, download_html
-def parse_vectorize_and_search_base64_sync(file):
-    return asyncio.run(parse_vectorize_and_search_base64(file))
-intf = gr.Interface(
-    fn=parse_vectorize_and_search_sync,
-    inputs=inputs,
-    outputs=outputs,
-    title=title,
-    description=description,
-    article=article
-)
-intf.launch(share=True)

 load_dotenv()
 # Load R2/S3 environment secrets
+R2_ASL_VIDEOS_URL = os.environ.get("R2_ASL_VIDEOS_URL")
 R2_ENDPOINT = os.environ.get("R2_ENDPOINT")
 R2_ACCESS_KEY_ID = os.environ.get("R2_ACCESS_KEY_ID")
 R2_SECRET_ACCESS_KEY = os.environ.get("R2_SECRET_ACCESS_KEY")
 # Validate that required environment variables are set
+if not all([R2_ASL_VIDEOS_URL, R2_ENDPOINT, R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY]):
+    raise ValueError(
+        "Missing required R2 environment variables. "
+        "Please check your .env file."
+    )
 title = "AI-SL"
 description = "Convert text to ASL!"
     return cleaned
+def upload_video_to_r2(video_path, bucket_name="asl-videos"):
     """
     Upload a video file to R2 and return a public URL
     """
                 ExtraArgs={'ACL': 'public-read'}
             )
+        # Replace the endpoint with the domain for uploading
+        public_domain = R2_ENDPOINT.replace('https://', '').split('.')[0]
+        video_url = f"https://{public_domain}.r2.cloudflarestorage.com/{bucket_name}/{unique_filename}"
         print(f"Video uploaded to R2: {video_url}")
+        public_video_url = f"{R2_ASL_VIDEOS_URL}/{unique_filename}"
+        return public_video_url
     except Exception as e:
         print(f"Error uploading video to R2: {e}")
         print(f"Error cleaning up file: {e}")
+def process_text_to_gloss(text):
+    """
+    Convert text directly to ASL gloss
+    """
+    try:
+        # For text input, we can use a simpler approach or call the
+        # document converter with a temporary text file
+        import tempfile
+        # Create a temporary text file
+        with tempfile.NamedTemporaryFile(
+            mode='w', suffix='.txt', delete=False
+        ) as temp_file:
+            temp_file.write(text)
+            temp_file_path = temp_file.name
+        # Use the existing document converter
+        gloss = asl_converter.convert_document(temp_file_path)
+        # Clean up the temporary file
+        os.unlink(temp_file_path)
+        return gloss
+    except Exception as e:
+        print(f"Error processing text: {e}")
+        return None
+def process_input(input_data):
+    """
+    Process either text input or file upload
+    input_data can be either a string (text) or a file object
+    """
+    if input_data is None:
+        return None
+    # Check if it's a file object (has .name attribute)
+    if hasattr(input_data, 'name'):
+        # It's a file upload
+        print(f"Processing file: {input_data.name}")
+        return asl_converter.convert_document(input_data.name)
+    else:
+        # It's text input
+        print(f"Processing text input: "
+              f"{input_data[:100]}...")
+        return process_text_to_gloss(input_data)
+async def parse_vectorize_and_search_unified(input_data):
+    """
+    Unified function that handles both text and file inputs
+    """
+    print(f"Input type: {type(input_data)}")
+    # Process the input to get gloss
+    gloss = process_input(input_data)
+    if not gloss:
+        return {
+            "status": "error",
+            "message": "Failed to process input"
+        }, None, ""
     print("ASL", gloss)
     # Split by spaces and clean each token
     if len(video_files) > 1:
         try:
             print(f"Creating stitched video from {len(video_files)} videos...")
+            stitched_video_path = tempfile.NamedTemporaryFile(
+                delete=False, suffix='.mp4'
+            ).name
             create_multi_stitched_video(video_files, stitched_video_path)
             print(f"Stitched video created: {stitched_video_path}")
         except Exception as e:
         "final_video_url": final_video_url
     }, final_video_url, download_html
+def parse_vectorize_and_search_unified_sync(input_data):
+    return asyncio.run(parse_vectorize_and_search_unified(input_data))
+def predict_unified(input_data):
     """
+    Unified prediction function that handles both text and file inputs
     """
+    try:
+        if input_data is None:
+            return {
+                "status": "error",
+                "message": "Please provide text or upload a document"
+            }, None, ""
+        # Use the unified processing function
+        result = parse_vectorize_and_search_unified_sync(input_data)
+        return result
+    except Exception as e:
+        print(f"Error in predict_unified function: {e}")
+        return {
+            "status": "error",
+            "message": f"An error occurred: {str(e)}"
+        }, None, ""
+# Create the Gradio interface
+def create_interface():
+    """Create and configure the Gradio interface"""
+    with gr.Blocks(title=title) as demo:
+        gr.Markdown(f"# {title}")
+        gr.Markdown(description)
+        with gr.Row():
+            with gr.Column():
+                # Input section
+                gr.Markdown("## Input Options")
+                # Text input
+                gr.Markdown("### Option 1: Enter Text")
+                text_input = gr.Textbox(
+                    label="Enter text to convert to ASL",
+                    placeholder="Type or paste your text here...",
+                    lines=5,
+                    max_lines=10
+                )
+                gr.Markdown("### Option 2: Upload Document")
+                file_input = gr.File(
+                    label="Upload Document (pdf, txt, docx, or epub)",
+                    file_types=[".pdf", ".txt", ".docx", ".epub"]
+                )
+                # Processing options
+                gr.Markdown("## Processing Options")
+                use_r2 = gr.Checkbox(
+                    label="Use Cloud Storage (R2)",
+                    value=True,
+                    info=("Upload video to cloud storage for "
+                          "persistent access")
+                )
+                process_btn = gr.Button(
+                    "Generate ASL Video",
+                    variant="primary"
+                )
+            with gr.Column():
+                # Output section
+                gr.Markdown("## Results")
+                json_output = gr.JSON(label="Processing Results")
+                video_output = gr.Video(label="ASL Video Output")
+                download_html = gr.HTML(label="Download Link")
+        # Handle the processing
+        def process_inputs(text, file, use_r2_storage):
+            # Determine which input to use
+            if text and text.strip():
+                # Use text input
+                input_data = text.strip()
+            elif file is not None:
+                # Use file input
+                input_data = file
+            else:
+                # No input provided
+                return {
+                    "status": "error",
+                    "message": "Please provide either text or upload a file"
+                }, None, ""
+            # Process using the unified function
+            return predict_unified(input_data)
+        process_btn.click(
+            fn=process_inputs,
+            inputs=[text_input, file_input, use_r2],
+            outputs=[json_output, video_output, download_html]
+        )
+        # Footer
+        gr.Markdown(article)
+    return demo
+# For Hugging Face Spaces, use the Blocks interface
+if __name__ == "__main__":
+    demo = create_interface()
+    demo.launch(
+        server_name="0.0.0.0",
+        server_port=7860,
+        share=True  # Set to True for local testing with public URL
+    )

example_usage.py → examples/example_usage.py RENAMED Viewed

File without changes

examples/example_usage_dual_input.py ADDED Viewed

	@@ -0,0 +1,148 @@

+"""
+Example: Using the AI-SL API with both text and file inputs
+This demonstrates how the Gradio interface can handle both text input
+and file uploads, using whichever one is provided.
+"""
+from gradio_client import Client
+import requests
+def test_text_input():
+    """
+    Example 1: Using text input
+    """
+    print("=== Testing Text Input ===")
+    # Connect to your hosted app
+    client = Client("https://huggingface.co/spaces/your-username/your-space")
+    # Test with text input
+    text_input = "Hello world! This is a test of the text input functionality."
+    # Call the interface with text input
+    result = client.predict(
+        text_input,  # Text input
+        None,        # File input (None)
+        True,        # Use R2 storage
+        api_name="/predict"
+    )
+    # Process results
+    json_data, video_url, download_html = result
+    print(f"Status: {json_data['status']}")
+    print(f"Video URL: {video_url}")
+    return video_url
+def test_file_input():
+    """
+    Example 2: Using file input
+    """
+    print("=== Testing File Input ===")
+    # Connect to your hosted app
+    client = Client("https://huggingface.co/spaces/your-username/your-space")
+    # Test with file input
+    file_path = "example_document.txt"
+    # Call the interface with file input
+    result = client.predict(
+        "",          # Text input (empty)
+        file_path,   # File input
+        True,        # Use R2 storage
+        api_name="/predict"
+    )
+    # Process results
+    json_data, video_url, download_html = result
+    print(f"Status: {json_data['status']}")
+    print(f"Video URL: {video_url}")
+    return video_url
+def test_priority_logic():
+    """
+    Example 3: Testing the priority logic
+    """
+    print("=== Testing Priority Logic ===")
+    # Connect to your hosted app
+    client = Client("https://huggingface.co/spaces/your-username/your-space")
+    # Test with both inputs (text should take priority)
+    text_input = "This text should be processed instead of the file."
+    file_path = "example_document.txt"
+    # Call the interface with both inputs
+    result = client.predict(
+        text_input,  # Text input
+        file_path,   # File input
+        True,        # Use R2 storage
+        api_name="/predict"
+    )
+    # Process results
+    json_data, video_url, download_html = result
+    print(f"Status: {json_data['status']}")
+    print(f"Gloss: {json_data['gloss']}")
+    print(f"Video URL: {video_url}")
+    return video_url
+def download_video(video_url, output_path):
+    """
+    Download a video from URL
+    """
+    try:
+        response = requests.get(video_url, stream=True)
+        response.raise_for_status()
+        with open(output_path, 'wb') as f:
+            for chunk in response.iter_content(chunk_size=8192):
+                f.write(chunk)
+        print(f"Video downloaded to: {output_path}")
+        return True
+    except Exception as e:
+        print(f"Error downloading video: {e}")
+        return False
+def main():
+    """
+    Run all examples
+    """
+    print("AI-SL Dual Input Testing")
+    print("=" * 50)
+    # Test text input
+    text_video_url = test_text_input()
+    if text_video_url:
+        download_video(text_video_url, "text_input_video.mp4")
+    print("\n" + "-" * 50 + "\n")
+    # Test file input
+    file_video_url = test_file_input()
+    if file_video_url:
+        download_video(file_video_url, "file_input_video.mp4")
+    print("\n" + "-" * 50 + "\n")
+    # Test priority logic
+    priority_video_url = test_priority_logic()
+    if priority_video_url:
+        download_video(priority_video_url, "priority_test_video.mp4")
+    print("\n" + "=" * 50)
+    print("Testing complete!")
+if __name__ == "__main__":
+    main()