load_model()
Loads the GPT-2 model and tokenizer from the specified directory paths.
lifespan()
Manages the application lifecycle. Initializes the model at startup and handles cleanup on shutdown.
classify_text_sync()
Synchronously tokenizes input text and predicts using the GPT-2 model. Returns classification and perplexity.
classify_text()
Asynchronously runs classify_text_sync() in a thread pool for non-blocking text classification.
analyze_text()
POST endpoint: Accepts text input, classifies it using classify_text(), and returns the result with perplexity.
health()
GET endpoint: Simple health check for API liveness.
parse_docx(), parse_pdf(), parse_txt()
Utilities to extract and convert .docx, .pdf, and .txt file contents to plain text.
warmup()
Downloads the model repository and initializes the model/tokenizer using load_model().
download_model_repo()
Downloads the model files from the designated MODEL folder.
get_model_tokenizer()
Checks if the model already exists; if not, downloads it—otherwise, loads the cached model.
handle_file_upload()
Handles file uploads from the /upload route. Extracts text, classifies, and returns results.
extract_file_contents()
Extracts and returns plain text from uploaded files (PDF, DOCX, TXT).
handle_file_sentence()
Processes file uploads by analyzing each sentence (under 10,000 chars) before classification.
handle_sentence_level_analysis()
Checks/strips each sentence, then computes AI/human likelihood for each.
analyze_sentences()
Splits paragraphs into sentences, classifies each, and returns all results.
analyze_sentence_file()
Like handle_file_sentence()—analyzes sentences in uploaded files.