feat: Add comprehensive input validation and error handling for API e… by aparnatonni · Pull Request #632 · AOSSIE-Org/EduAid

aparnatonni · 2026-03-27T01:57:40Z

Issue

API endpoints don't validate user inputs, causing:

Empty strings and oversized inputs accepted
No error messages explaining what went wrong
Security vulnerabilities
Hard to debug failures

What I Did

✅ Created validators.py - validates all inputs before processing
✅ Updated 17 endpoints - added validation and error handling
✅ Added tests - 50+ tests, all passing
✅ Returns clear error messages - e.g., "max_questions cannot exceed 50, got 100"

Endpoints Updated

Query generation: /get_mcq, /get_boolq, /get_shortq, /get_shortq_llm, /get_mcq_llm, /get_boolq_llm, /get_problems_llm, /get_problems
Answer prediction: /get_mcq_answer, /get_shortq_answer, /get_boolean_answer
Utilities: /get_content, /generate_gform, /get_shortq_hard, /get_mcq_hard, /get_boolq_hard, /upload

Backward Compatible

✅ All existing valid requests still work

Summary by CodeRabbit

New Features
- API endpoints now validate all input parameters before processing.
- Standardized error responses with appropriate HTTP status codes (400 for invalid input, 500 for server errors).
Documentation
- Added validation implementation guide with sample error responses and validation rules.
Tests
- Added comprehensive test suite for input validation.

…ndpoints - Create validators.py module with InputValidator class for robust input validation - Implement 9+ validation methods for text, integers, URLs, lists, and enums - Add custom ValidationError exception and standardized error response formatting - Update 17 backend endpoints with input validation and error handling - Add comprehensive test suite (test_validators.py) with 50+ test cases - Include detailed documentation (VALIDATION_IMPLEMENTATION.md) Features: - Text validation: 1-50,000 character limit with type checking - Question count validation: 1-50 question limit with range checking - URL validation: HTTP(S) URL format verification - List validation: Type and item validation with proper error messages - Enum validation: Pre-defined valid values checking - Required field validation: Mandatory parameters checking - Standardized error responses with clear, actionable error messages Endpoints updated: - Query generation: /get_mcq, /get_boolq, /get_shortq, /get_shortq_llm, /get_mcq_llm, /get_boolq_llm, /get_problems_llm, /get_problems - Answer prediction: /get_mcq_answer, /get_shortq_answer, /get_boolean_answer - Utilities: /get_content, /generate_gform, /get_shortq_hard, /get_mcq_hard, /get_boolq_hard, /upload Testing: - All 50+ validation tests passing - Input validation working correctly for all parameter types - Error handling functioning as expected Backward compatible: All existing valid requests continue to work

coderabbitai · 2026-03-27T01:57:54Z

📝 Walkthrough

Walkthrough

A new input validation framework is introduced with a validators.py module containing validation routines, error handling, and standardized error responses. The API server integrates validation across POST endpoints, enforces required fields and type constraints, and returns consistent HTTP error responses with appropriate status codes (400 for validation failures, 500 for server errors).

Changes

Cohort / File(s)	Summary
Documentation `VALIDATION_IMPLEMENTATION.md`	New documentation file describing the validation framework architecture, validation module structure, affected endpoints, validation limits, and example requests/responses.
Core Validators Module `backend/validators.py`	New module introducing `ValidationError` exception, `InputValidator` class with static methods for validating input text, question counts, URLs, structured lists/dicts, and question types, plus `create_error_response()` helper for standardized error JSON responses.
API Server Integration `backend/server.py`	Wrapped POST route handlers in try/except blocks; replaced ad-hoc request parsing with structured `InputValidator` calls; enforced required field validation for all affected endpoints (MCQ, BoolQ, ShortQ, answer prediction, utility endpoints); standardized error responses and HTTP status codes; improved safety of response extraction via `.get()` with defaults.
Test Suite `backend/test_validators.py`	New comprehensive test file covering validation behavior for input text, question counts, MediaWiki flags, URLs, question/options/QA pair lists, question types, error response construction, and edge cases (empty strings, overly long text, type mismatches, invalid formats).

Sequence Diagram

sequenceDiagram
    participant Client
    participant APIHandler as Server (Route Handler)
    participant Validators as InputValidator
    participant Backend as Backend Logic
    participant Response as Error Handler

    Client->>APIHandler: POST request with JSON payload
    APIHandler->>Validators: validate_request_data(payload, required_fields)
    alt Validation passes
        Validators->>APIHandler: Return validated & coerced data
        APIHandler->>Backend: Process validated input
        Backend->>APIHandler: Return result
        APIHandler->>Client: 200 OK with response data
    else Validation fails
        Validators->>Response: ValidationError raised
        Response->>APIHandler: create_error_response(error_msg)
        APIHandler->>Client: 400 Bad Request with error JSON
    else Server exception
        Backend->>APIHandler: Unexpected exception
        APIHandler->>Response: Log exception & create_error_response()
        Response->>APIHandler: 500 Internal Server Error
        APIHandler->>Client: 500 with error JSON
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰✨ Hop, hop, validate each input with care,
No malformed requests shall pass through the air!
Errors now graceful, with messages clear,
Bad inputs caught—no surprises to fear!
400s and 500s dance in the light,
Our API's much safer, things look quite right! 🛡️

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 51.35% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'feat: Add comprehensive input validation and error handling for API e…' clearly and specifically describes the main change: implementing comprehensive input validation and error handling across API endpoints.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/server.py`:
- Around line 616-621: The current upload gate only checks file.filename
extension (allowed_extensions) and can be bypassed by renaming; update
validation to inspect the actual file content before calling
file_processor.process_file: read a small prefix or use a MIME/magic detection
library (e.g., python-magic) to detect PDF, plain text, or Office formats and
ensure the detected MIME/signature matches the expected extension type; if
detection fails or disagrees with allowed_extensions, return the same
create_error_response("Unsupported file type...") and do not call
file_processor.process_file. Ensure you reference file.filename for extension
lookup but make the content-based check authoritative.
- Around line 513-517: Remove the server-side browser launch and instead return
the constructed edit URL alongside the responder URI: stop calling
webbrowser.open_new_tab in this block, build the edit URL using result["formId"]
(e.g. "https://docs.google.com/forms/d/" + result["formId"] + "/edit"), set
edit_url to that constructed string (not jsonify of responderUri), and return a
JSON payload containing both edit_url and result["responderUri"] (use jsonify to
wrap the response). Ensure no server-side side-effect like
webbrowser.open_new_tab is performed.
- Around line 359-363: The code always forces the OAuth flow because creds is
set to None instead of loading cached credentials; after creating the store with
file.Storage("token.json") call, retrieve cached credentials by assigning creds
= store.get() before checking creds.invalid so existing valid tokens are reused;
update the logic around creds, file.Storage, client.flow_from_clientsecrets and
tools.run_flow to only invoke the OAuth flow when creds is missing or
creds.invalid.
- Around line 58-76: Change request.get_json() to request.get_json(silent=True)
so malformed JSON returns None and is handled by
InputValidator.validate_request_data (which will raise ValidationError and
produce a 400 via create_error_response) instead of being caught by the broad
Exception path; update the call in the /get_mcq handler (the block using
InputValidator.validate_request_data, InputValidator.validate_input_text,
InputValidator.validate_use_mediawiki, MCQGen.generate_mcq and
create_error_response) and apply the same silent=True pattern to all other JSON
POST handlers that call request.get_json() so HTTPExceptions from werkzeug
aren’t converted into 500s.

In `@backend/test_validators.py`:
- Around line 12-312: The tests currently only print pass/fail messages so
regressions won’t fail CI; update each negative-case block in
test_input_text_validation, test_max_questions_validation,
test_use_mediawiki_validation, test_document_url_validation,
test_question_list_validation, test_qa_pairs_validation, and
test_question_type_validation to assert that InputValidator.validate_* raises
ValidationError (use pytest.raises or explicit try/except with a final assert
False) and add positive-case assertions on returned values (e.g., result is
str/int/list or has expected length) so invalid inputs cause test failures and
valid inputs are strongly checked; keep create_error_response checks asserting
returned tuple contents/status codes rather than just printing.

In `@backend/validators.py`:
- Around line 151-182: validate_document_url currently accepts any http(s) URL;
update it to reject non-Google URLs by ensuring the URL belongs to Google
Docs/Drive before returning. Specifically, after the existing scheme and length
checks in validate_document_url, parse the URL (or simply check the host/path)
and assert it matches accepted Google patterns (e.g., host contains
"docs.google.com" or "drive.google.com" or the path pattern for Google Docs
document IDs) and raise ValidationError if not; this prevents arbitrary links
from being forwarded to docs_service.get_document_content and guarantees a clear
400 response.
- Around line 292-345: validate_qa_pairs currently ignores the optional
"options" field which leads to runtime errors in server logic that expects
options to be a list of strings for question_type == "get_mcq"; update
validate_qa_pairs to validate that if "options" is present it is a non-empty
list and each element is a non-empty string, and also require "options" for MCQ
payloads (when question_type == "get_mcq") or otherwise return a
ValidationError; specifically modify validate_qa_pairs to check
pair.get("options") is either absent (for non-MCQ) or a list of strings (each
strip()-nonempty) and surface a clear ValidationError referencing the offending
index (qa_pairs[{i}]['options']) so server code that iterates over options won't
crash.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 303287b2-12a5-4d8f-ac26-dcb04afdeb0a

📥 Commits

Reviewing files that changed from the base of the PR and between 2038116 and cd6ebc8.

📒 Files selected for processing (4)

VALIDATION_IMPLEMENTATION.md
backend/server.py
backend/test_validators.py
backend/validators.py

coderabbitai · 2026-03-27T02:08:30Z

backend/server.py

+    try:
+        data = request.get_json()
+        InputValidator.validate_request_data(data, required_fields=["input_text"])
+
+        input_text = InputValidator.validate_input_text(data.get("input_text"))
+        use_mediawiki = InputValidator.validate_use_mediawiki(data.get("use_mediawiki", 0))
+        max_questions = InputValidator.validate_max_questions(data.get("max_questions", 4))
+
+        input_text = process_input_text(input_text, use_mediawiki)
+        output = MCQGen.generate_mcq(
+            {"input_text": input_text, "max_questions": max_questions}
+        )
+        questions = output.get("questions", [])
+        return jsonify({"output": questions})
+    except ValidationError as e:
+        return create_error_response(str(e))
+    except Exception as e:
+        app.logger.exception("Error in /get_mcq: %s", e)
+        return create_error_response("Internal server error", 500)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the full context of the endpoint and error handling cat -n backend/server.py | sed -n '50,85p'

Repository: AOSSIE-Org/EduAid

Length of output: 1720

🏁 Script executed:

# Let's also search for other POST handlers mentioned in the scratchpad rg -n "def.*POST|@app.route.*POST" backend/server.py | head -20

Repository: AOSSIE-Org/EduAid

Length of output: 917

🏁 Script executed:

# Look for InputValidator.validate_request_data to understand its behavior ast-grep --pattern 'def validate_request_data($_, $_) { $$$ }'

Repository: AOSSIE-Org/EduAid

Length of output: 43

🏁 Script executed:

# Check if there are other JSON POST handlers following the same pattern rg -B 2 -A 10 "request.get_json()" backend/server.py | head -60

Repository: AOSSIE-Org/EduAid

Length of output: 2721

🏁 Script executed:

# Let's examine the InputValidator class to see how validate_request_data handles None rg -B 5 -A 15 "class InputValidator|def validate_request_data" backend/ --type py

Repository: AOSSIE-Org/EduAid

Length of output: 2090

🏁 Script executed:

# Let's look at Flask documentation behavior in code comments or understand the error handling rg -n "get_json" backend/server.py | head -20

Repository: AOSSIE-Org/EduAid

Length of output: 666

🏁 Script executed:

# Check what ValidationError is and where it's defined rg -n "ValidationError|from.*ValidationError|import.*ValidationError" backend/ --type py

Repository: AOSSIE-Org/EduAid

Length of output: 6253

🏁 Script executed:

# Look for error response helper to understand status codes rg -B 2 -A 5 "def create_error_response" backend/ --type py

Repository: AOSSIE-Org/EduAid

Length of output: 437

🏁 Script executed:

# Let's see the full validate_request_data implementation rg -A 20 "def validate_request_data" backend/validators.py

Repository: AOSSIE-Org/EduAid

Length of output: 726

🏁 Script executed:

# Check if there are any imports from werkzeug or Flask for exception handling rg -n "from werkzeug|from flask|BadRequest|HTTPException" backend/ --type py

Repository: AOSSIE-Org/EduAid

Length of output: 170

🏁 Script executed:

# Let's see if there's any test showing what happens with malformed JSON rg -B 3 -A 10 "malformed|invalid.*json|get_json" backend/test_validators.py --type py | head -40

Repository: AOSSIE-Org/EduAid

Length of output: 43

🏁 Script executed:

# Let's verify the complete validate_request_data implementation and what errors it raises cat -n backend/validators.py | sed -n '392,410p'

Repository: AOSSIE-Org/EduAid

Length of output: 859

🏁 Script executed:

# Let's search for any tests that show the actual behavior with malformed JSON cat -n backend/test_validators.py | sed -n '1,50p'

Repository: AOSSIE-Org/EduAid

Length of output: 1854

🏁 Script executed:

# Verify if Flask's BadRequest is imported or if there's error handling for get_json failures rg -B 5 -A 5 "request.get_json" backend/server.py | head -30

Repository: AOSSIE-Org/EduAid

Length of output: 1091

🏁 Script executed:

# Let's verify Flask's BadRequest behavior by checking if it's imported or mentioned rg -n "BadRequest" backend/ --type py

Repository: AOSSIE-Org/EduAid

Length of output: 43

🏁 Script executed:

# Let's look at one more endpoint to see if the same pattern exists cat -n backend/server.py | sed -n '79,100p'

Repository: AOSSIE-Org/EduAid

Length of output: 1222

🏁 Script executed:

# Check the full error handler context and understand what happens with BadRequest rg -B 10 "except ValidationError" backend/server.py | head -50

Repository: AOSSIE-Org/EduAid

Length of output: 2612

🏁 Script executed:

# Let's verify Flask's BadRequest behavior by checking documentation or any error handling # First, let's see if there's any global error handler for BadRequest rg -n "errorhandler|register_error_handler" backend/server.py

Repository: AOSSIE-Org/EduAid

Length of output: 43

🏁 Script executed:

# Let's check Flask imports to confirm what's available cat -n backend/server.py | head -20

Repository: AOSSIE-Org/EduAid

Length of output: 809

Malformed JSON is returned as 500 instead of 400 due to unsilenced get_json().

request.get_json() raises BadRequest (a werkzeug HTTPException) on invalid JSON, which is caught by the broad except Exception handler returning status 500. This bypasses validation error handling that would return status 400. The same issue exists across all JSON POST handlers.

Use request.get_json(silent=True) so validate_request_data() can properly handle None and raise ValidationError for 400 responses.

🛠️ Suggested pattern

- data = request.get_json() + data = request.get_json(silent=True) InputValidator.validate_request_data(data, required_fields=["input_text"])

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

try:

data = request.get_json()

InputValidator.validate_request_data(data, required_fields=["input_text"])

input_text = InputValidator.validate_input_text(data.get("input_text"))

use_mediawiki = InputValidator.validate_use_mediawiki(data.get("use_mediawiki", 0))

max_questions = InputValidator.validate_max_questions(data.get("max_questions", 4))

input_text = process_input_text(input_text, use_mediawiki)

output = MCQGen.generate_mcq(

{"input_text": input_text, "max_questions": max_questions}

)

questions = output.get("questions", [])

return jsonify({"output": questions})

except ValidationError as e:

return create_error_response(str(e))

except Exception as e:

app.logger.exception("Error in /get_mcq: %s", e)

return create_error_response("Internal server error", 500)

try:

data = request.get_json(silent=True)

InputValidator.validate_request_data(data, required_fields=["input_text"])

input_text = InputValidator.validate_input_text(data.get("input_text"))

use_mediawiki = InputValidator.validate_use_mediawiki(data.get("use_mediawiki", 0))

max_questions = InputValidator.validate_max_questions(data.get("max_questions", 4))

input_text = process_input_text(input_text, use_mediawiki)

output = MCQGen.generate_mcq(

{"input_text": input_text, "max_questions": max_questions}

)

questions = output.get("questions", [])

return jsonify({"output": questions})

except ValidationError as e:

return create_error_response(str(e))

except Exception as e:

app.logger.exception("Error in /get_mcq: %s", e)

return create_error_response("Internal server error", 500)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 58 - 76, Change request.get_json() to request.get_json(silent=True) so malformed JSON returns None and is handled by InputValidator.validate_request_data (which will raise ValidationError and produce a 400 via create_error_response) instead of being caught by the broad Exception path; update the call in the /get_mcq handler (the block using InputValidator.validate_request_data, InputValidator.validate_input_text, InputValidator.validate_use_mediawiki, MCQGen.generate_mcq and create_error_response) and apply the same silent=True pattern to all other JSON POST handlers that call request.get_json() so HTTPExceptions from werkzeug aren’t converted into 500s.

coderabbitai · 2026-03-27T02:08:30Z

backend/server.py

+        store = file.Storage("token.json")
+        creds = None
+        if not creds or creds.invalid:
+            flow = client.flow_from_clientsecrets("credentials.json", SCOPES)
+            creds = tools.run_flow(flow, store)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the file and surrounding context wc -l backend/server.py

Repository: AOSSIE-Org/EduAid

Length of output: 82

🏁 Script executed:

# Read the problematic section with context (lines 350-375 to see the full flow) sed -n '350,375p' backend/server.py

Repository: AOSSIE-Org/EduAid

Length of output: 1059

🏁 Script executed:

# Search for any credential loading or storage usage in the file rg -n "creds|token\.json|Storage" backend/server.py -B 2 -A 2

Repository: AOSSIE-Org/EduAid

Length of output: 633

🏁 Script executed:

# Search for the function/endpoint containing these lines sed -n '330,365p' backend/server.py

Repository: AOSSIE-Org/EduAid

Length of output: 1582

🏁 Script executed:

# Search for other OAuth usage patterns in the codebase rg "file.Storage|store.get|\.get" backend/ -B 2 -A 2

Repository: AOSSIE-Org/EduAid

Length of output: 336

🏁 Script executed:

# Look for imports to understand which library is being used head -50 backend/server.py | grep -n "import\|from"

Repository: AOSSIE-Org/EduAid

Length of output: 963

🏁 Script executed:

# Verify oauth2client file.Storage API - check if get() method exists pip install oauth2client 2>/dev/null && python << 'EOF' from oauth2client import file import inspect # Check file.Storage class storage_class = file.Storage print("file.Storage methods:") for name, method in inspect.getmembers(storage_class, predicate=inspect.ismethod): if not name.startswith('_'): print(f" - {name}") # Check for get method if hasattr(storage_class, 'get'): print("\nfile.Storage.get() exists: True") else: print("\nfile.Storage.get() exists: False") # Show the typical pattern print("\nTypical oauth2client token loading pattern:") print(" store = file.Storage('token.json')") print(" creds = store.get() # Load cached credentials") print(" if not creds or creds.invalid:") print(" # Run flow and get new credentials") EOF

Repository: AOSSIE-Org/EduAid

Length of output: 1560

🏁 Script executed:

# Check if there are other Flask routes in the same file that handle OAuth correctly rg -n "@app.route|def " backend/server.py | head -30

Repository: AOSSIE-Org/EduAid

Length of output: 1215

Missing credential loading from cache forces fresh OAuth flow on every request.

Line 360 initializes creds = None, but store.get() is never called to load cached credentials from token.json. Since the condition on line 361 evaluates to True on the first check, the OAuth flow always runs even if valid credentials exist in the cache. Add creds = store.get() immediately after line 359 to retrieve cached credentials before the validity check.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 359 - 363, The code always forces the OAuth flow because creds is set to None instead of loading cached credentials; after creating the store with file.Storage("token.json") call, retrieve cached credentials by assigning creds = store.get() before checking creds.invalid so existing valid tokens are reused; update the logic around creds, file.Storage, client.flow_from_clientsecrets and tools.run_flow to only invoke the OAuth flow when creds is missing or creds.invalid.

coderabbitai · 2026-03-27T02:08:30Z

backend/server.py

+        edit_url = jsonify(result["responderUri"])
+        webbrowser.open_new_tab(
+            "https://docs.google.com/forms/d/" + result["formId"] + "/edit"
+        )
+        return edit_url


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, check if the file exists and read the specified lines if [ -f "backend/server.py" ]; then echo "=== Lines 513-517 from backend/server.py ===" sed -n '513,517p' "backend/server.py" echo "" echo "=== Context: Lines 505-525 ===" sed -n '505,525p' "backend/server.py" else echo "File not found at backend/server.py" # Search for server.py find . -name "server.py" -type f 2>/dev/null | head -20 fi

Repository: AOSSIE-Org/EduAid

Length of output: 1053

🏁 Script executed:

# Check if webbrowser is imported in the file rg "import webbrowser|from webbrowser" backend/server.py -A 2 -B 2

Repository: AOSSIE-Org/EduAid

Length of output: 177

🏁 Script executed:

# Get more context around this code to understand the function sed -n '475,525p' "backend/server.py"

Repository: AOSSIE-Org/EduAid

Length of output: 1904

🏁 Script executed:

# Check if there are any tests or clients that depend on this behavior rg "generate_gform|/generate_gform" --type py -i

Repository: AOSSIE-Org/EduAid

Length of output: 248

Don't open the form editor from the API process.

webbrowser.open_new_tab() runs on the server host, not in the caller's browser. In headless/container deployments this is usually a no-op or failure, and the client never receives the edit URL it actually needs. Additionally, the current code returns only the responder URI, not the constructed edit URL.

↩️ Return the URLs instead of trying to launch a browser

- edit_url = jsonify(result["responderUri"]) - webbrowser.open_new_tab( - "https://docs.google.com/forms/d/" + result["formId"] + "/edit" - ) - return edit_url + return jsonify( + { + "responder_url": result["responderUri"], + "edit_url": f"https://docs.google.com/forms/d/{result['formId']}/edit", + } + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

edit_url = jsonify(result["responderUri"])

webbrowser.open_new_tab(

"https://docs.google.com/forms/d/" + result["formId"] + "/edit"

)

return edit_url

return jsonify(

{

"responder_url": result["responderUri"],

"edit_url": f"https://docs.google.com/forms/d/{result['formId']}/edit",

}

)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 513 - 517, Remove the server-side browser launch and instead return the constructed edit URL alongside the responder URI: stop calling webbrowser.open_new_tab in this block, build the edit URL using result["formId"] (e.g. "https://docs.google.com/forms/d/" + result["formId"] + "/edit"), set edit_url to that constructed string (not jsonify of responderUri), and return a JSON payload containing both edit_url and result["responderUri"] (use jsonify to wrap the response). Ensure no server-side side-effect like webbrowser.open_new_tab is performed.

coderabbitai · 2026-03-27T02:08:30Z

backend/server.py

+        # Validate file type
+        allowed_extensions = {'pdf', 'txt', 'docx', 'doc'}
+        if not ('.' in file.filename and file.filename.rsplit('.', 1)[1].lower() in allowed_extensions):
+            return create_error_response("Unsupported file type. Allowed types: pdf, txt, docx, doc")

-    content = file_processor.process_file(file)
-
-    if content:
-        return jsonify({"content": content})
-    else:
-        return jsonify({"error": "Unsupported file type or error processing file"}), 400
+        content = file_processor.process_file(file)


⚠️ Potential issue | 🟠 Major

This upload gate trusts the filename, not the file.

Line 618 only checks the extension, so a renamed arbitrary binary still reaches file_processor.process_file(). That weakens the new validation boundary and can turn bad uploads into parser failures/500s instead of a clean 400.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 616 - 621, The current upload gate only checks file.filename extension (allowed_extensions) and can be bypassed by renaming; update validation to inspect the actual file content before calling file_processor.process_file: read a small prefix or use a MIME/magic detection library (e.g., python-magic) to detect PDF, plain text, or Office formats and ensure the detected MIME/signature matches the expected extension type; if detection fails or disagrees with allowed_extensions, return the same create_error_response("Unsupported file type...") and do not call file_processor.process_file. Ensure you reference file.filename for extension lookup but make the content-based check authoritative.

coderabbitai · 2026-03-27T02:08:30Z

backend/test_validators.py

+def test_input_text_validation():
+    """Test input text validation."""
+    print("=" * 60)
+    print("TEST: Input Text Validation")
+    print("=" * 60)
+
+    # Valid text
+    try:
+        result = InputValidator.validate_input_text("This is valid text")
+        print("✓ Valid text accepted:", repr(result))
+    except ValidationError as e:
+        print("✗ Failed:", e)
+
+    # Empty text
+    try:
+        InputValidator.validate_input_text("")
+        print("✗ Empty text should have failed")
+    except ValidationError as e:
+        print("✓ Empty text rejected:", e)
+
+    # Text with whitespace only
+    try:
+        InputValidator.validate_input_text("   ")
+        print("✗ Whitespace-only text should have failed")
+    except ValidationError as e:
+        print("✓ Whitespace-only text rejected:", e)
+
+    # Text too long
+    try:
+        long_text = "a" * 60000
+        InputValidator.validate_input_text(long_text)
+        print("✗ Too-long text should have failed")
+    except ValidationError as e:
+        print("✓ Too-long text rejected:", e)
+
+    # Non-string input
+    try:
+        InputValidator.validate_input_text(12345)
+        print("✗ Non-string input should have failed")
+    except ValidationError as e:
+        print("✓ Non-string input rejected:", e)
+
+    print()
+
+
+def test_max_questions_validation():
+    """Test max_questions validation."""
+    print("=" * 60)
+    print("TEST: Max Questions Validation")
+    print("=" * 60)
+
+    # Valid values
+    for val in [1, 5, 25, 50]:
+        try:
+            result = InputValidator.validate_max_questions(val)
+            print(f"✓ Value {val} accepted")
+        except ValidationError as e:
+            print(f"✗ Failed for {val}: {e}")
+
+    # Too small
+    try:
+        InputValidator.validate_max_questions(0)
+        print("✗ Value 0 should have failed")
+    except ValidationError as e:
+        print("✓ Value 0 rejected:", e)
+
+    # Too large
+    try:
+        InputValidator.validate_max_questions(100)
+        print("✗ Value 100 should have failed")
+    except ValidationError as e:
+        print("✓ Value 100 rejected:", e)
+
+    # Non-integer
+    try:
+        InputValidator.validate_max_questions("abc")
+        print("✗ Non-integer should have failed")
+    except ValidationError as e:
+        print("✓ Non-integer rejected:", e)
+
+    # String number (should convert)
+    try:
+        result = InputValidator.validate_max_questions("10")
+        print(f"✓ String number '10' converted to: {result}")
+    except ValidationError as e:
+        print(f"✗ Failed: {e}")
+
+    print()
+
+
+def test_use_mediawiki_validation():
+    """Test use_mediawiki validation."""
+    print("=" * 60)
+    print("TEST: MediaWiki Flag Validation")
+    print("=" * 60)
+
+    # Valid values
+    for val in [0, 1]:
+        try:
+            result = InputValidator.validate_use_mediawiki(val)
+            print(f"✓ Value {val} accepted")
+        except ValidationError as e:
+            print(f"✗ Failed for {val}: {e}")
+
+    # Invalid values
+    for val in [2, -1, 10]:
+        try:
+            InputValidator.validate_use_mediawiki(val)
+            print(f"✗ Value {val} should have failed")
+        except ValidationError as e:
+            print(f"✓ Value {val} rejected:", e)
+
+    print()
+
+
+def test_document_url_validation():
+    """Test document URL validation."""
+    print("=" * 60)
+    print("TEST: Document URL Validation")
+    print("=" * 60)
+
+    # Valid URLs
+    valid_urls = [
+        "https://docs.google.com/document/d/1ABC123/edit",
+        "http://example.com/path"
+    ]
+    for url in valid_urls:
+        try:
+            result = InputValidator.validate_document_url(url)
+            print(f"✓ URL accepted: {url[:50]}...")
+        except ValidationError as e:
+            print(f"✗ Failed: {e}")
+
+    # Invalid URLs
+    invalid_urls = [
+        ("", "empty string"),
+        ("   ", "whitespace only"),
+        ("not-a-url", "no protocol"),
+        ("ftp://example.com", "wrong protocol"),
+    ]
+    for url, desc in invalid_urls:
+        try:
+            InputValidator.validate_document_url(url)
+            print(f"✗ {desc} should have failed")
+        except ValidationError as e:
+            print(f"✓ {desc} rejected")
+
+    print()
+
+
+def test_question_list_validation():
+    """Test question list validation."""
+    print("=" * 60)
+    print("TEST: Question List Validation")
+    print("=" * 60)
+
+    # Valid list
+    try:
+        result = InputValidator.validate_question_list([
+            "What is AI?",
+            "How does ML work?"
+        ])
+        print(f"✓ Valid question list accepted: {len(result)} questions")
+    except ValidationError as e:
+        print(f"✗ Failed: {e}")
+
+    # Empty list (not allowed by default)
+    try:
+        InputValidator.validate_question_list([])
+        print("✗ Empty list should have failed")
+    except ValidationError as e:
+        print("✓ Empty list rejected:", e)
+
+    # Empty list (allowed with flag)
+    try:
+        result = InputValidator.validate_question_list([], allow_empty=True)
+        print("✓ Empty list accepted when allowed")
+    except ValidationError as e:
+        print(f"✗ Failed: {e}")
+
+    # Invalid item
+    try:
+        InputValidator.validate_question_list(["Valid", 123])
+        print("✗ Non-string item should have failed")
+    except ValidationError as e:
+        print("✓ Non-string item rejected:", e)
+
+    # Empty string item
+    try:
+        InputValidator.validate_question_list(["Valid", ""])
+        print("✗ Empty string item should have failed")
+    except ValidationError as e:
+        print("✓ Empty string item rejected:", e)
+
+    print()
+
+
+def test_qa_pairs_validation():
+    """Test QA pairs validation."""
+    print("=" * 60)
+    print("TEST: QA Pairs Validation")
+    print("=" * 60)
+
+    # Valid QA pairs
+    try:
+        result = InputValidator.validate_qa_pairs([
+            {"question": "Q1?", "answer": "A1"},
+            {"question": "Q2?", "answer": "A2"},
+        ])
+        print(f"✓ Valid QA pairs accepted: {len(result)} pairs")
+    except ValidationError as e:
+        print(f"✗ Failed: {e}")
+
+    # Missing question field
+    try:
+        InputValidator.validate_qa_pairs([
+            {"answer": "A1"}
+        ])
+        print("✗ Missing 'question' field should have failed")
+    except ValidationError as e:
+        print("✓ Missing field rejected:", e)
+
+    # Empty answer
+    try:
+        InputValidator.validate_qa_pairs([
+            {"question": "Q1?", "answer": ""}
+        ])
+        print("✗ Empty answer should have failed")
+    except ValidationError as e:
+        print("✓ Empty answer rejected:", e)
+
+    print()
+
+
+def test_question_type_validation():
+    """Test question type validation."""
+    print("=" * 60)
+    print("TEST: Question Type Validation")
+    print("=" * 60)
+
+    # Valid types
+    for qtype in ["get_shortq", "get_mcq", "get_boolq"]:
+        try:
+            result = InputValidator.validate_question_type(qtype)
+            print(f"✓ Type '{qtype}' accepted")
+        except ValidationError as e:
+            print(f"✗ Failed: {e}")
+
+    # Invalid types
+    for qtype in ["invalid", "GET_MCQ", ""]:
+        try:
+            InputValidator.validate_question_type(qtype)
+            print(f"✗ Type '{qtype}' should have failed")
+        except ValidationError as e:
+            print(f"✓ Type '{qtype}' rejected")
+
+    print()
+
+
+def test_error_response():
+    """Test error response formatting."""
+    print("=" * 60)
+    print("TEST: Error Response Formatting")
+    print("=" * 60)
+
+    response, status = create_error_response("Test error message")
+    print(f"✓ Default error response: {response}")
+    print(f"  Status code: {status}")
+
+    response, status = create_error_response("Custom error", status_code=400)
+    print(f"✓ Custom error response: {response}")
+    print(f"  Status code: {status}")
+
+    print()
+
+
+def main():
+    """Run all tests."""
+    print("\n")
+    print("╔" + "=" * 58 + "╗")
+    print("║" + " " * 15 + "INPUT VALIDATOR TEST SUITE" + " " * 17 + "║")
+    print("╚" + "=" * 58 + "╝")
+    print()
+
+    test_input_text_validation()
+    test_max_questions_validation()
+    test_use_mediawiki_validation()
+    test_document_url_validation()
+    test_question_list_validation()
+    test_qa_pairs_validation()
+    test_question_type_validation()
+    test_error_response()
+
+    print("=" * 60)
+    print("All tests completed!")
+    print("=" * 60)
+    print()
+
+
+if __name__ == "__main__":
+    main()


⚠️ Potential issue | 🟠 Major

These checks won’t fail the build when validation regresses.

The negative cases catch ValidationError and print status messages, but they never assert that the exception was required. As written, this file can exit green even if a validator starts accepting invalid input, so it doesn’t provide the coverage guarantee the PR description is relying on.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/test_validators.py` around lines 12 - 312, The tests currently only print pass/fail messages so regressions won’t fail CI; update each negative-case block in test_input_text_validation, test_max_questions_validation, test_use_mediawiki_validation, test_document_url_validation, test_question_list_validation, test_qa_pairs_validation, and test_question_type_validation to assert that InputValidator.validate_* raises ValidationError (use pytest.raises or explicit try/except with a final assert False) and add positive-case assertions on returned values (e.g., result is str/int/list or has expected length) so invalid inputs cause test failures and valid inputs are strongly checked; keep create_error_response checks asserting returned tuple contents/status codes rather than just printing.

coderabbitai · 2026-03-27T02:08:30Z

backend/validators.py

+    def validate_document_url(url: Any) -> str:
+        """
+        Validate Google Docs URL.
+
+        Args:
+            url: The URL to validate
+
+        Returns:
+            Validated URL
+
+        Raises:
+            ValidationError: If validation fails
+        """
+        if not isinstance(url, str):
+            raise ValidationError(
+                f"document_url must be a string, got {type(url).__name__}"
+            )
+
+        url = url.strip()
+
+        if not url:
+            raise ValidationError("document_url cannot be empty")
+
+        if len(url) > InputValidator.MAX_URL_LENGTH:
+            raise ValidationError(
+                f"document_url exceeds maximum length of {InputValidator.MAX_URL_LENGTH} characters"
+            )
+
+        if not url.startswith("http://") and not url.startswith("https://"):
+            raise ValidationError("document_url must be a valid HTTP(S) URL")
+
+        return url


⚠️ Potential issue | 🟠 Major

Reject non-Google URLs at the validator boundary.

validate_document_url() currently accepts any http(s) URL, so /get_content still passes arbitrary links into docs_service.get_document_content() instead of rejecting them up front. That means invalid input can bypass this validation layer and the caller gets whatever downstream error happens next, rather than the clear 400 this PR is trying to guarantee.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/validators.py` around lines 151 - 182, validate_document_url currently accepts any http(s) URL; update it to reject non-Google URLs by ensuring the URL belongs to Google Docs/Drive before returning. Specifically, after the existing scheme and length checks in validate_document_url, parse the URL (or simply check the host/path) and assert it matches accepted Google patterns (e.g., host contains "docs.google.com" or "drive.google.com" or the path pattern for Google Docs document IDs) and raise ValidationError if not; this prevents arbitrary links from being forwarded to docs_service.get_document_content and guarantees a clear 400 response.

coderabbitai · 2026-03-27T02:08:30Z

backend/validators.py

+    def validate_qa_pairs(qa_pairs: Any) -> List[Dict[str, Any]]:
+        """
+        Validate list of QA pairs.
+
+        Args:
+            qa_pairs: The QA pairs list to validate
+
+        Returns:
+            Validated QA pairs
+
+        Raises:
+            ValidationError: If validation fails
+        """
+        if not isinstance(qa_pairs, list):
+            raise ValidationError(
+                f"qa_pairs must be a list, got {type(qa_pairs).__name__}"
+            )
+
+        if len(qa_pairs) == 0:
+            raise ValidationError("qa_pairs cannot be empty")
+
+        for i, pair in enumerate(qa_pairs):
+            if not isinstance(pair, dict):
+                raise ValidationError(
+                    f"qa_pairs[{i}] must be a dictionary, got {type(pair).__name__}"
+                )
+
+            if "question" not in pair:
+                raise ValidationError(f"qa_pairs[{i}] missing 'question' field")
+
+            if "answer" not in pair:
+                raise ValidationError(f"qa_pairs[{i}] missing 'answer' field")
+
+            if not isinstance(pair["question"], str):
+                raise ValidationError(
+                    f"qa_pairs[{i}]['question'] must be a string"
+                )
+
+            if not isinstance(pair["answer"], str):
+                raise ValidationError(
+                    f"qa_pairs[{i}]['answer'] must be a string"
+                )
+
+            if not pair["question"].strip():
+                raise ValidationError(
+                    f"qa_pairs[{i}]['question'] cannot be empty"
+                )
+
+            if not pair["answer"].strip():
+                raise ValidationError(
+                    f"qa_pairs[{i}]['answer'] cannot be empty"
+                )
+
+        return qa_pairs


⚠️ Potential issue | 🟠 Major

MCQ qa_pairs still allow malformed options.

validate_qa_pairs() ignores the optional options field, but backend/server.py Lines 396-425 assume it is a list of strings when question_type == "get_mcq". A payload with options: 1 still passes validation and then fails during iteration; a missing options field degrades into a one-choice MCQ instead of a 400.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/validators.py` around lines 292 - 345, validate_qa_pairs currently ignores the optional "options" field which leads to runtime errors in server logic that expects options to be a list of strings for question_type == "get_mcq"; update validate_qa_pairs to validate that if "options" is present it is a non-empty list and each element is a non-empty string, and also require "options" for MCQ payloads (when question_type == "get_mcq") or otherwise return a ValidationError; specifically modify validate_qa_pairs to check pair.get("options") is either absent (for non-MCQ) or a list of strings (each strip()-nonempty) and surface a clear ValidationError referencing the offending index (qa_pairs[{i}]['options']) so server code that iterates over options won't crash.

coderabbitai bot reviewed Mar 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add comprehensive input validation and error handling for API e…#632

feat: Add comprehensive input validation and error handling for API e…#632
aparnatonni wants to merge 1 commit intoAOSSIE-Org:mainfrom
aparnatonni:main

aparnatonni commented Mar 27, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 27, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 27, 2026

Uh oh!

coderabbitai bot Mar 27, 2026

Uh oh!

coderabbitai bot Mar 27, 2026

Uh oh!

coderabbitai bot Mar 27, 2026

Uh oh!

coderabbitai bot Mar 27, 2026

Uh oh!

coderabbitai bot Mar 27, 2026

Uh oh!

coderabbitai bot Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

aparnatonni commented Mar 27, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

What I Did

Endpoints Updated

Backward Compatible

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aparnatonni commented Mar 27, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 27, 2026 •

edited

Loading