feat: Add comprehensive input validation and error handling for API e…#632
feat: Add comprehensive input validation and error handling for API e…#632aparnatonni wants to merge 1 commit intoAOSSIE-Org:mainfrom
Conversation
…ndpoints - Create validators.py module with InputValidator class for robust input validation - Implement 9+ validation methods for text, integers, URLs, lists, and enums - Add custom ValidationError exception and standardized error response formatting - Update 17 backend endpoints with input validation and error handling - Add comprehensive test suite (test_validators.py) with 50+ test cases - Include detailed documentation (VALIDATION_IMPLEMENTATION.md) Features: - Text validation: 1-50,000 character limit with type checking - Question count validation: 1-50 question limit with range checking - URL validation: HTTP(S) URL format verification - List validation: Type and item validation with proper error messages - Enum validation: Pre-defined valid values checking - Required field validation: Mandatory parameters checking - Standardized error responses with clear, actionable error messages Endpoints updated: - Query generation: /get_mcq, /get_boolq, /get_shortq, /get_shortq_llm, /get_mcq_llm, /get_boolq_llm, /get_problems_llm, /get_problems - Answer prediction: /get_mcq_answer, /get_shortq_answer, /get_boolean_answer - Utilities: /get_content, /generate_gform, /get_shortq_hard, /get_mcq_hard, /get_boolq_hard, /upload Testing: - All 50+ validation tests passing - Input validation working correctly for all parameter types - Error handling functioning as expected Backward compatible: All existing valid requests continue to work
📝 WalkthroughWalkthroughA new input validation framework is introduced with a Changes
Sequence DiagramsequenceDiagram
participant Client
participant APIHandler as Server (Route Handler)
participant Validators as InputValidator
participant Backend as Backend Logic
participant Response as Error Handler
Client->>APIHandler: POST request with JSON payload
APIHandler->>Validators: validate_request_data(payload, required_fields)
alt Validation passes
Validators->>APIHandler: Return validated & coerced data
APIHandler->>Backend: Process validated input
Backend->>APIHandler: Return result
APIHandler->>Client: 200 OK with response data
else Validation fails
Validators->>Response: ValidationError raised
Response->>APIHandler: create_error_response(error_msg)
APIHandler->>Client: 400 Bad Request with error JSON
else Server exception
Backend->>APIHandler: Unexpected exception
APIHandler->>Response: Log exception & create_error_response()
Response->>APIHandler: 500 Internal Server Error
APIHandler->>Client: 500 with error JSON
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/server.py`:
- Around line 616-621: The current upload gate only checks file.filename
extension (allowed_extensions) and can be bypassed by renaming; update
validation to inspect the actual file content before calling
file_processor.process_file: read a small prefix or use a MIME/magic detection
library (e.g., python-magic) to detect PDF, plain text, or Office formats and
ensure the detected MIME/signature matches the expected extension type; if
detection fails or disagrees with allowed_extensions, return the same
create_error_response("Unsupported file type...") and do not call
file_processor.process_file. Ensure you reference file.filename for extension
lookup but make the content-based check authoritative.
- Around line 513-517: Remove the server-side browser launch and instead return
the constructed edit URL alongside the responder URI: stop calling
webbrowser.open_new_tab in this block, build the edit URL using result["formId"]
(e.g. "https://docs.google.com/forms/d/" + result["formId"] + "/edit"), set
edit_url to that constructed string (not jsonify of responderUri), and return a
JSON payload containing both edit_url and result["responderUri"] (use jsonify to
wrap the response). Ensure no server-side side-effect like
webbrowser.open_new_tab is performed.
- Around line 359-363: The code always forces the OAuth flow because creds is
set to None instead of loading cached credentials; after creating the store with
file.Storage("token.json") call, retrieve cached credentials by assigning creds
= store.get() before checking creds.invalid so existing valid tokens are reused;
update the logic around creds, file.Storage, client.flow_from_clientsecrets and
tools.run_flow to only invoke the OAuth flow when creds is missing or
creds.invalid.
- Around line 58-76: Change request.get_json() to request.get_json(silent=True)
so malformed JSON returns None and is handled by
InputValidator.validate_request_data (which will raise ValidationError and
produce a 400 via create_error_response) instead of being caught by the broad
Exception path; update the call in the /get_mcq handler (the block using
InputValidator.validate_request_data, InputValidator.validate_input_text,
InputValidator.validate_use_mediawiki, MCQGen.generate_mcq and
create_error_response) and apply the same silent=True pattern to all other JSON
POST handlers that call request.get_json() so HTTPExceptions from werkzeug
aren’t converted into 500s.
In `@backend/test_validators.py`:
- Around line 12-312: The tests currently only print pass/fail messages so
regressions won’t fail CI; update each negative-case block in
test_input_text_validation, test_max_questions_validation,
test_use_mediawiki_validation, test_document_url_validation,
test_question_list_validation, test_qa_pairs_validation, and
test_question_type_validation to assert that InputValidator.validate_* raises
ValidationError (use pytest.raises or explicit try/except with a final assert
False) and add positive-case assertions on returned values (e.g., result is
str/int/list or has expected length) so invalid inputs cause test failures and
valid inputs are strongly checked; keep create_error_response checks asserting
returned tuple contents/status codes rather than just printing.
In `@backend/validators.py`:
- Around line 151-182: validate_document_url currently accepts any http(s) URL;
update it to reject non-Google URLs by ensuring the URL belongs to Google
Docs/Drive before returning. Specifically, after the existing scheme and length
checks in validate_document_url, parse the URL (or simply check the host/path)
and assert it matches accepted Google patterns (e.g., host contains
"docs.google.com" or "drive.google.com" or the path pattern for Google Docs
document IDs) and raise ValidationError if not; this prevents arbitrary links
from being forwarded to docs_service.get_document_content and guarantees a clear
400 response.
- Around line 292-345: validate_qa_pairs currently ignores the optional
"options" field which leads to runtime errors in server logic that expects
options to be a list of strings for question_type == "get_mcq"; update
validate_qa_pairs to validate that if "options" is present it is a non-empty
list and each element is a non-empty string, and also require "options" for MCQ
payloads (when question_type == "get_mcq") or otherwise return a
ValidationError; specifically modify validate_qa_pairs to check
pair.get("options") is either absent (for non-MCQ) or a list of strings (each
strip()-nonempty) and surface a clear ValidationError referencing the offending
index (qa_pairs[{i}]['options']) so server code that iterates over options won't
crash.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 303287b2-12a5-4d8f-ac26-dcb04afdeb0a
📒 Files selected for processing (4)
VALIDATION_IMPLEMENTATION.mdbackend/server.pybackend/test_validators.pybackend/validators.py
| try: | ||
| data = request.get_json() | ||
| InputValidator.validate_request_data(data, required_fields=["input_text"]) | ||
|
|
||
| input_text = InputValidator.validate_input_text(data.get("input_text")) | ||
| use_mediawiki = InputValidator.validate_use_mediawiki(data.get("use_mediawiki", 0)) | ||
| max_questions = InputValidator.validate_max_questions(data.get("max_questions", 4)) | ||
|
|
||
| input_text = process_input_text(input_text, use_mediawiki) | ||
| output = MCQGen.generate_mcq( | ||
| {"input_text": input_text, "max_questions": max_questions} | ||
| ) | ||
| questions = output.get("questions", []) | ||
| return jsonify({"output": questions}) | ||
| except ValidationError as e: | ||
| return create_error_response(str(e)) | ||
| except Exception as e: | ||
| app.logger.exception("Error in /get_mcq: %s", e) | ||
| return create_error_response("Internal server error", 500) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the full context of the endpoint and error handling
cat -n backend/server.py | sed -n '50,85p'Repository: AOSSIE-Org/EduAid
Length of output: 1720
🏁 Script executed:
# Let's also search for other POST handlers mentioned in the scratchpad
rg -n "def.*POST|@app.route.*POST" backend/server.py | head -20Repository: AOSSIE-Org/EduAid
Length of output: 917
🏁 Script executed:
# Look for InputValidator.validate_request_data to understand its behavior
ast-grep --pattern 'def validate_request_data($_, $_) { $$$ }'Repository: AOSSIE-Org/EduAid
Length of output: 43
🏁 Script executed:
# Check if there are other JSON POST handlers following the same pattern
rg -B 2 -A 10 "request.get_json()" backend/server.py | head -60Repository: AOSSIE-Org/EduAid
Length of output: 2721
🏁 Script executed:
# Let's examine the InputValidator class to see how validate_request_data handles None
rg -B 5 -A 15 "class InputValidator|def validate_request_data" backend/ --type pyRepository: AOSSIE-Org/EduAid
Length of output: 2090
🏁 Script executed:
# Let's look at Flask documentation behavior in code comments or understand the error handling
rg -n "get_json" backend/server.py | head -20Repository: AOSSIE-Org/EduAid
Length of output: 666
🏁 Script executed:
# Check what ValidationError is and where it's defined
rg -n "ValidationError|from.*ValidationError|import.*ValidationError" backend/ --type pyRepository: AOSSIE-Org/EduAid
Length of output: 6253
🏁 Script executed:
# Look for error response helper to understand status codes
rg -B 2 -A 5 "def create_error_response" backend/ --type pyRepository: AOSSIE-Org/EduAid
Length of output: 437
🏁 Script executed:
# Let's see the full validate_request_data implementation
rg -A 20 "def validate_request_data" backend/validators.pyRepository: AOSSIE-Org/EduAid
Length of output: 726
🏁 Script executed:
# Check if there are any imports from werkzeug or Flask for exception handling
rg -n "from werkzeug|from flask|BadRequest|HTTPException" backend/ --type pyRepository: AOSSIE-Org/EduAid
Length of output: 170
🏁 Script executed:
# Let's see if there's any test showing what happens with malformed JSON
rg -B 3 -A 10 "malformed|invalid.*json|get_json" backend/test_validators.py --type py | head -40Repository: AOSSIE-Org/EduAid
Length of output: 43
🏁 Script executed:
# Let's verify the complete validate_request_data implementation and what errors it raises
cat -n backend/validators.py | sed -n '392,410p'Repository: AOSSIE-Org/EduAid
Length of output: 859
🏁 Script executed:
# Let's search for any tests that show the actual behavior with malformed JSON
cat -n backend/test_validators.py | sed -n '1,50p'Repository: AOSSIE-Org/EduAid
Length of output: 1854
🏁 Script executed:
# Verify if Flask's BadRequest is imported or if there's error handling for get_json failures
rg -B 5 -A 5 "request.get_json" backend/server.py | head -30Repository: AOSSIE-Org/EduAid
Length of output: 1091
🏁 Script executed:
# Let's verify Flask's BadRequest behavior by checking if it's imported or mentioned
rg -n "BadRequest" backend/ --type pyRepository: AOSSIE-Org/EduAid
Length of output: 43
🏁 Script executed:
# Let's look at one more endpoint to see if the same pattern exists
cat -n backend/server.py | sed -n '79,100p'Repository: AOSSIE-Org/EduAid
Length of output: 1222
🏁 Script executed:
# Check the full error handler context and understand what happens with BadRequest
rg -B 10 "except ValidationError" backend/server.py | head -50Repository: AOSSIE-Org/EduAid
Length of output: 2612
🏁 Script executed:
# Let's verify Flask's BadRequest behavior by checking documentation or any error handling
# First, let's see if there's any global error handler for BadRequest
rg -n "errorhandler|register_error_handler" backend/server.pyRepository: AOSSIE-Org/EduAid
Length of output: 43
🏁 Script executed:
# Let's check Flask imports to confirm what's available
cat -n backend/server.py | head -20Repository: AOSSIE-Org/EduAid
Length of output: 809
Malformed JSON is returned as 500 instead of 400 due to unsilenced get_json().
request.get_json() raises BadRequest (a werkzeug HTTPException) on invalid JSON, which is caught by the broad except Exception handler returning status 500. This bypasses validation error handling that would return status 400. The same issue exists across all JSON POST handlers.
Use request.get_json(silent=True) so validate_request_data() can properly handle None and raise ValidationError for 400 responses.
🛠️ Suggested pattern
- data = request.get_json()
+ data = request.get_json(silent=True)
InputValidator.validate_request_data(data, required_fields=["input_text"])📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| data = request.get_json() | |
| InputValidator.validate_request_data(data, required_fields=["input_text"]) | |
| input_text = InputValidator.validate_input_text(data.get("input_text")) | |
| use_mediawiki = InputValidator.validate_use_mediawiki(data.get("use_mediawiki", 0)) | |
| max_questions = InputValidator.validate_max_questions(data.get("max_questions", 4)) | |
| input_text = process_input_text(input_text, use_mediawiki) | |
| output = MCQGen.generate_mcq( | |
| {"input_text": input_text, "max_questions": max_questions} | |
| ) | |
| questions = output.get("questions", []) | |
| return jsonify({"output": questions}) | |
| except ValidationError as e: | |
| return create_error_response(str(e)) | |
| except Exception as e: | |
| app.logger.exception("Error in /get_mcq: %s", e) | |
| return create_error_response("Internal server error", 500) | |
| try: | |
| data = request.get_json(silent=True) | |
| InputValidator.validate_request_data(data, required_fields=["input_text"]) | |
| input_text = InputValidator.validate_input_text(data.get("input_text")) | |
| use_mediawiki = InputValidator.validate_use_mediawiki(data.get("use_mediawiki", 0)) | |
| max_questions = InputValidator.validate_max_questions(data.get("max_questions", 4)) | |
| input_text = process_input_text(input_text, use_mediawiki) | |
| output = MCQGen.generate_mcq( | |
| {"input_text": input_text, "max_questions": max_questions} | |
| ) | |
| questions = output.get("questions", []) | |
| return jsonify({"output": questions}) | |
| except ValidationError as e: | |
| return create_error_response(str(e)) | |
| except Exception as e: | |
| app.logger.exception("Error in /get_mcq: %s", e) | |
| return create_error_response("Internal server error", 500) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/server.py` around lines 58 - 76, Change request.get_json() to
request.get_json(silent=True) so malformed JSON returns None and is handled by
InputValidator.validate_request_data (which will raise ValidationError and
produce a 400 via create_error_response) instead of being caught by the broad
Exception path; update the call in the /get_mcq handler (the block using
InputValidator.validate_request_data, InputValidator.validate_input_text,
InputValidator.validate_use_mediawiki, MCQGen.generate_mcq and
create_error_response) and apply the same silent=True pattern to all other JSON
POST handlers that call request.get_json() so HTTPExceptions from werkzeug
aren’t converted into 500s.
| store = file.Storage("token.json") | ||
| creds = None | ||
| if not creds or creds.invalid: | ||
| flow = client.flow_from_clientsecrets("credentials.json", SCOPES) | ||
| creds = tools.run_flow(flow, store) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the file and surrounding context
wc -l backend/server.pyRepository: AOSSIE-Org/EduAid
Length of output: 82
🏁 Script executed:
# Read the problematic section with context (lines 350-375 to see the full flow)
sed -n '350,375p' backend/server.pyRepository: AOSSIE-Org/EduAid
Length of output: 1059
🏁 Script executed:
# Search for any credential loading or storage usage in the file
rg -n "creds|token\.json|Storage" backend/server.py -B 2 -A 2Repository: AOSSIE-Org/EduAid
Length of output: 633
🏁 Script executed:
# Search for the function/endpoint containing these lines
sed -n '330,365p' backend/server.pyRepository: AOSSIE-Org/EduAid
Length of output: 1582
🏁 Script executed:
# Search for other OAuth usage patterns in the codebase
rg "file.Storage|store.get|\.get\(\)" backend/ -B 2 -A 2Repository: AOSSIE-Org/EduAid
Length of output: 336
🏁 Script executed:
# Look for imports to understand which library is being used
head -50 backend/server.py | grep -n "import\|from"Repository: AOSSIE-Org/EduAid
Length of output: 963
🏁 Script executed:
# Verify oauth2client file.Storage API - check if get() method exists
pip install oauth2client 2>/dev/null && python << 'EOF'
from oauth2client import file
import inspect
# Check file.Storage class
storage_class = file.Storage
print("file.Storage methods:")
for name, method in inspect.getmembers(storage_class, predicate=inspect.ismethod):
if not name.startswith('_'):
print(f" - {name}")
# Check for get method
if hasattr(storage_class, 'get'):
print("\nfile.Storage.get() exists: True")
else:
print("\nfile.Storage.get() exists: False")
# Show the typical pattern
print("\nTypical oauth2client token loading pattern:")
print(" store = file.Storage('token.json')")
print(" creds = store.get() # Load cached credentials")
print(" if not creds or creds.invalid:")
print(" # Run flow and get new credentials")
EOFRepository: AOSSIE-Org/EduAid
Length of output: 1560
🏁 Script executed:
# Check if there are other Flask routes in the same file that handle OAuth correctly
rg -n "@app.route|def " backend/server.py | head -30Repository: AOSSIE-Org/EduAid
Length of output: 1215
Missing credential loading from cache forces fresh OAuth flow on every request.
Line 360 initializes creds = None, but store.get() is never called to load cached credentials from token.json. Since the condition on line 361 evaluates to True on the first check, the OAuth flow always runs even if valid credentials exist in the cache. Add creds = store.get() immediately after line 359 to retrieve cached credentials before the validity check.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/server.py` around lines 359 - 363, The code always forces the OAuth
flow because creds is set to None instead of loading cached credentials; after
creating the store with file.Storage("token.json") call, retrieve cached
credentials by assigning creds = store.get() before checking creds.invalid so
existing valid tokens are reused; update the logic around creds, file.Storage,
client.flow_from_clientsecrets and tools.run_flow to only invoke the OAuth flow
when creds is missing or creds.invalid.
| edit_url = jsonify(result["responderUri"]) | ||
| webbrowser.open_new_tab( | ||
| "https://docs.google.com/forms/d/" + result["formId"] + "/edit" | ||
| ) | ||
| return edit_url |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, check if the file exists and read the specified lines
if [ -f "backend/server.py" ]; then
echo "=== Lines 513-517 from backend/server.py ==="
sed -n '513,517p' "backend/server.py"
echo ""
echo "=== Context: Lines 505-525 ==="
sed -n '505,525p' "backend/server.py"
else
echo "File not found at backend/server.py"
# Search for server.py
find . -name "server.py" -type f 2>/dev/null | head -20
fiRepository: AOSSIE-Org/EduAid
Length of output: 1053
🏁 Script executed:
# Check if webbrowser is imported in the file
rg "import webbrowser|from webbrowser" backend/server.py -A 2 -B 2Repository: AOSSIE-Org/EduAid
Length of output: 177
🏁 Script executed:
# Get more context around this code to understand the function
sed -n '475,525p' "backend/server.py"Repository: AOSSIE-Org/EduAid
Length of output: 1904
🏁 Script executed:
# Check if there are any tests or clients that depend on this behavior
rg "generate_gform|/generate_gform" --type py -iRepository: AOSSIE-Org/EduAid
Length of output: 248
Don't open the form editor from the API process.
webbrowser.open_new_tab() runs on the server host, not in the caller's browser. In headless/container deployments this is usually a no-op or failure, and the client never receives the edit URL it actually needs. Additionally, the current code returns only the responder URI, not the constructed edit URL.
↩️ Return the URLs instead of trying to launch a browser
- edit_url = jsonify(result["responderUri"])
- webbrowser.open_new_tab(
- "https://docs.google.com/forms/d/" + result["formId"] + "/edit"
- )
- return edit_url
+ return jsonify(
+ {
+ "responder_url": result["responderUri"],
+ "edit_url": f"https://docs.google.com/forms/d/{result['formId']}/edit",
+ }
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| edit_url = jsonify(result["responderUri"]) | |
| webbrowser.open_new_tab( | |
| "https://docs.google.com/forms/d/" + result["formId"] + "/edit" | |
| ) | |
| return edit_url | |
| return jsonify( | |
| { | |
| "responder_url": result["responderUri"], | |
| "edit_url": f"https://docs.google.com/forms/d/{result['formId']}/edit", | |
| } | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/server.py` around lines 513 - 517, Remove the server-side browser
launch and instead return the constructed edit URL alongside the responder URI:
stop calling webbrowser.open_new_tab in this block, build the edit URL using
result["formId"] (e.g. "https://docs.google.com/forms/d/" + result["formId"] +
"/edit"), set edit_url to that constructed string (not jsonify of responderUri),
and return a JSON payload containing both edit_url and result["responderUri"]
(use jsonify to wrap the response). Ensure no server-side side-effect like
webbrowser.open_new_tab is performed.
| # Validate file type | ||
| allowed_extensions = {'pdf', 'txt', 'docx', 'doc'} | ||
| if not ('.' in file.filename and file.filename.rsplit('.', 1)[1].lower() in allowed_extensions): | ||
| return create_error_response("Unsupported file type. Allowed types: pdf, txt, docx, doc") | ||
|
|
||
| content = file_processor.process_file(file) | ||
|
|
||
| if content: | ||
| return jsonify({"content": content}) | ||
| else: | ||
| return jsonify({"error": "Unsupported file type or error processing file"}), 400 | ||
| content = file_processor.process_file(file) |
There was a problem hiding this comment.
This upload gate trusts the filename, not the file.
Line 618 only checks the extension, so a renamed arbitrary binary still reaches file_processor.process_file(). That weakens the new validation boundary and can turn bad uploads into parser failures/500s instead of a clean 400.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/server.py` around lines 616 - 621, The current upload gate only
checks file.filename extension (allowed_extensions) and can be bypassed by
renaming; update validation to inspect the actual file content before calling
file_processor.process_file: read a small prefix or use a MIME/magic detection
library (e.g., python-magic) to detect PDF, plain text, or Office formats and
ensure the detected MIME/signature matches the expected extension type; if
detection fails or disagrees with allowed_extensions, return the same
create_error_response("Unsupported file type...") and do not call
file_processor.process_file. Ensure you reference file.filename for extension
lookup but make the content-based check authoritative.
| def test_input_text_validation(): | ||
| """Test input text validation.""" | ||
| print("=" * 60) | ||
| print("TEST: Input Text Validation") | ||
| print("=" * 60) | ||
|
|
||
| # Valid text | ||
| try: | ||
| result = InputValidator.validate_input_text("This is valid text") | ||
| print("✓ Valid text accepted:", repr(result)) | ||
| except ValidationError as e: | ||
| print("✗ Failed:", e) | ||
|
|
||
| # Empty text | ||
| try: | ||
| InputValidator.validate_input_text("") | ||
| print("✗ Empty text should have failed") | ||
| except ValidationError as e: | ||
| print("✓ Empty text rejected:", e) | ||
|
|
||
| # Text with whitespace only | ||
| try: | ||
| InputValidator.validate_input_text(" ") | ||
| print("✗ Whitespace-only text should have failed") | ||
| except ValidationError as e: | ||
| print("✓ Whitespace-only text rejected:", e) | ||
|
|
||
| # Text too long | ||
| try: | ||
| long_text = "a" * 60000 | ||
| InputValidator.validate_input_text(long_text) | ||
| print("✗ Too-long text should have failed") | ||
| except ValidationError as e: | ||
| print("✓ Too-long text rejected:", e) | ||
|
|
||
| # Non-string input | ||
| try: | ||
| InputValidator.validate_input_text(12345) | ||
| print("✗ Non-string input should have failed") | ||
| except ValidationError as e: | ||
| print("✓ Non-string input rejected:", e) | ||
|
|
||
| print() | ||
|
|
||
|
|
||
| def test_max_questions_validation(): | ||
| """Test max_questions validation.""" | ||
| print("=" * 60) | ||
| print("TEST: Max Questions Validation") | ||
| print("=" * 60) | ||
|
|
||
| # Valid values | ||
| for val in [1, 5, 25, 50]: | ||
| try: | ||
| result = InputValidator.validate_max_questions(val) | ||
| print(f"✓ Value {val} accepted") | ||
| except ValidationError as e: | ||
| print(f"✗ Failed for {val}: {e}") | ||
|
|
||
| # Too small | ||
| try: | ||
| InputValidator.validate_max_questions(0) | ||
| print("✗ Value 0 should have failed") | ||
| except ValidationError as e: | ||
| print("✓ Value 0 rejected:", e) | ||
|
|
||
| # Too large | ||
| try: | ||
| InputValidator.validate_max_questions(100) | ||
| print("✗ Value 100 should have failed") | ||
| except ValidationError as e: | ||
| print("✓ Value 100 rejected:", e) | ||
|
|
||
| # Non-integer | ||
| try: | ||
| InputValidator.validate_max_questions("abc") | ||
| print("✗ Non-integer should have failed") | ||
| except ValidationError as e: | ||
| print("✓ Non-integer rejected:", e) | ||
|
|
||
| # String number (should convert) | ||
| try: | ||
| result = InputValidator.validate_max_questions("10") | ||
| print(f"✓ String number '10' converted to: {result}") | ||
| except ValidationError as e: | ||
| print(f"✗ Failed: {e}") | ||
|
|
||
| print() | ||
|
|
||
|
|
||
| def test_use_mediawiki_validation(): | ||
| """Test use_mediawiki validation.""" | ||
| print("=" * 60) | ||
| print("TEST: MediaWiki Flag Validation") | ||
| print("=" * 60) | ||
|
|
||
| # Valid values | ||
| for val in [0, 1]: | ||
| try: | ||
| result = InputValidator.validate_use_mediawiki(val) | ||
| print(f"✓ Value {val} accepted") | ||
| except ValidationError as e: | ||
| print(f"✗ Failed for {val}: {e}") | ||
|
|
||
| # Invalid values | ||
| for val in [2, -1, 10]: | ||
| try: | ||
| InputValidator.validate_use_mediawiki(val) | ||
| print(f"✗ Value {val} should have failed") | ||
| except ValidationError as e: | ||
| print(f"✓ Value {val} rejected:", e) | ||
|
|
||
| print() | ||
|
|
||
|
|
||
| def test_document_url_validation(): | ||
| """Test document URL validation.""" | ||
| print("=" * 60) | ||
| print("TEST: Document URL Validation") | ||
| print("=" * 60) | ||
|
|
||
| # Valid URLs | ||
| valid_urls = [ | ||
| "https://docs.google.com/document/d/1ABC123/edit", | ||
| "http://example.com/path" | ||
| ] | ||
| for url in valid_urls: | ||
| try: | ||
| result = InputValidator.validate_document_url(url) | ||
| print(f"✓ URL accepted: {url[:50]}...") | ||
| except ValidationError as e: | ||
| print(f"✗ Failed: {e}") | ||
|
|
||
| # Invalid URLs | ||
| invalid_urls = [ | ||
| ("", "empty string"), | ||
| (" ", "whitespace only"), | ||
| ("not-a-url", "no protocol"), | ||
| ("ftp://example.com", "wrong protocol"), | ||
| ] | ||
| for url, desc in invalid_urls: | ||
| try: | ||
| InputValidator.validate_document_url(url) | ||
| print(f"✗ {desc} should have failed") | ||
| except ValidationError as e: | ||
| print(f"✓ {desc} rejected") | ||
|
|
||
| print() | ||
|
|
||
|
|
||
| def test_question_list_validation(): | ||
| """Test question list validation.""" | ||
| print("=" * 60) | ||
| print("TEST: Question List Validation") | ||
| print("=" * 60) | ||
|
|
||
| # Valid list | ||
| try: | ||
| result = InputValidator.validate_question_list([ | ||
| "What is AI?", | ||
| "How does ML work?" | ||
| ]) | ||
| print(f"✓ Valid question list accepted: {len(result)} questions") | ||
| except ValidationError as e: | ||
| print(f"✗ Failed: {e}") | ||
|
|
||
| # Empty list (not allowed by default) | ||
| try: | ||
| InputValidator.validate_question_list([]) | ||
| print("✗ Empty list should have failed") | ||
| except ValidationError as e: | ||
| print("✓ Empty list rejected:", e) | ||
|
|
||
| # Empty list (allowed with flag) | ||
| try: | ||
| result = InputValidator.validate_question_list([], allow_empty=True) | ||
| print("✓ Empty list accepted when allowed") | ||
| except ValidationError as e: | ||
| print(f"✗ Failed: {e}") | ||
|
|
||
| # Invalid item | ||
| try: | ||
| InputValidator.validate_question_list(["Valid", 123]) | ||
| print("✗ Non-string item should have failed") | ||
| except ValidationError as e: | ||
| print("✓ Non-string item rejected:", e) | ||
|
|
||
| # Empty string item | ||
| try: | ||
| InputValidator.validate_question_list(["Valid", ""]) | ||
| print("✗ Empty string item should have failed") | ||
| except ValidationError as e: | ||
| print("✓ Empty string item rejected:", e) | ||
|
|
||
| print() | ||
|
|
||
|
|
||
| def test_qa_pairs_validation(): | ||
| """Test QA pairs validation.""" | ||
| print("=" * 60) | ||
| print("TEST: QA Pairs Validation") | ||
| print("=" * 60) | ||
|
|
||
| # Valid QA pairs | ||
| try: | ||
| result = InputValidator.validate_qa_pairs([ | ||
| {"question": "Q1?", "answer": "A1"}, | ||
| {"question": "Q2?", "answer": "A2"}, | ||
| ]) | ||
| print(f"✓ Valid QA pairs accepted: {len(result)} pairs") | ||
| except ValidationError as e: | ||
| print(f"✗ Failed: {e}") | ||
|
|
||
| # Missing question field | ||
| try: | ||
| InputValidator.validate_qa_pairs([ | ||
| {"answer": "A1"} | ||
| ]) | ||
| print("✗ Missing 'question' field should have failed") | ||
| except ValidationError as e: | ||
| print("✓ Missing field rejected:", e) | ||
|
|
||
| # Empty answer | ||
| try: | ||
| InputValidator.validate_qa_pairs([ | ||
| {"question": "Q1?", "answer": ""} | ||
| ]) | ||
| print("✗ Empty answer should have failed") | ||
| except ValidationError as e: | ||
| print("✓ Empty answer rejected:", e) | ||
|
|
||
| print() | ||
|
|
||
|
|
||
| def test_question_type_validation(): | ||
| """Test question type validation.""" | ||
| print("=" * 60) | ||
| print("TEST: Question Type Validation") | ||
| print("=" * 60) | ||
|
|
||
| # Valid types | ||
| for qtype in ["get_shortq", "get_mcq", "get_boolq"]: | ||
| try: | ||
| result = InputValidator.validate_question_type(qtype) | ||
| print(f"✓ Type '{qtype}' accepted") | ||
| except ValidationError as e: | ||
| print(f"✗ Failed: {e}") | ||
|
|
||
| # Invalid types | ||
| for qtype in ["invalid", "GET_MCQ", ""]: | ||
| try: | ||
| InputValidator.validate_question_type(qtype) | ||
| print(f"✗ Type '{qtype}' should have failed") | ||
| except ValidationError as e: | ||
| print(f"✓ Type '{qtype}' rejected") | ||
|
|
||
| print() | ||
|
|
||
|
|
||
| def test_error_response(): | ||
| """Test error response formatting.""" | ||
| print("=" * 60) | ||
| print("TEST: Error Response Formatting") | ||
| print("=" * 60) | ||
|
|
||
| response, status = create_error_response("Test error message") | ||
| print(f"✓ Default error response: {response}") | ||
| print(f" Status code: {status}") | ||
|
|
||
| response, status = create_error_response("Custom error", status_code=400) | ||
| print(f"✓ Custom error response: {response}") | ||
| print(f" Status code: {status}") | ||
|
|
||
| print() | ||
|
|
||
|
|
||
| def main(): | ||
| """Run all tests.""" | ||
| print("\n") | ||
| print("╔" + "=" * 58 + "╗") | ||
| print("║" + " " * 15 + "INPUT VALIDATOR TEST SUITE" + " " * 17 + "║") | ||
| print("╚" + "=" * 58 + "╝") | ||
| print() | ||
|
|
||
| test_input_text_validation() | ||
| test_max_questions_validation() | ||
| test_use_mediawiki_validation() | ||
| test_document_url_validation() | ||
| test_question_list_validation() | ||
| test_qa_pairs_validation() | ||
| test_question_type_validation() | ||
| test_error_response() | ||
|
|
||
| print("=" * 60) | ||
| print("All tests completed!") | ||
| print("=" * 60) | ||
| print() | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| main() |
There was a problem hiding this comment.
These checks won’t fail the build when validation regresses.
The negative cases catch ValidationError and print status messages, but they never assert that the exception was required. As written, this file can exit green even if a validator starts accepting invalid input, so it doesn’t provide the coverage guarantee the PR description is relying on.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/test_validators.py` around lines 12 - 312, The tests currently only
print pass/fail messages so regressions won’t fail CI; update each negative-case
block in test_input_text_validation, test_max_questions_validation,
test_use_mediawiki_validation, test_document_url_validation,
test_question_list_validation, test_qa_pairs_validation, and
test_question_type_validation to assert that InputValidator.validate_* raises
ValidationError (use pytest.raises or explicit try/except with a final assert
False) and add positive-case assertions on returned values (e.g., result is
str/int/list or has expected length) so invalid inputs cause test failures and
valid inputs are strongly checked; keep create_error_response checks asserting
returned tuple contents/status codes rather than just printing.
| def validate_document_url(url: Any) -> str: | ||
| """ | ||
| Validate Google Docs URL. | ||
|
|
||
| Args: | ||
| url: The URL to validate | ||
|
|
||
| Returns: | ||
| Validated URL | ||
|
|
||
| Raises: | ||
| ValidationError: If validation fails | ||
| """ | ||
| if not isinstance(url, str): | ||
| raise ValidationError( | ||
| f"document_url must be a string, got {type(url).__name__}" | ||
| ) | ||
|
|
||
| url = url.strip() | ||
|
|
||
| if not url: | ||
| raise ValidationError("document_url cannot be empty") | ||
|
|
||
| if len(url) > InputValidator.MAX_URL_LENGTH: | ||
| raise ValidationError( | ||
| f"document_url exceeds maximum length of {InputValidator.MAX_URL_LENGTH} characters" | ||
| ) | ||
|
|
||
| if not url.startswith("http://") and not url.startswith("https://"): | ||
| raise ValidationError("document_url must be a valid HTTP(S) URL") | ||
|
|
||
| return url |
There was a problem hiding this comment.
Reject non-Google URLs at the validator boundary.
validate_document_url() currently accepts any http(s) URL, so /get_content still passes arbitrary links into docs_service.get_document_content() instead of rejecting them up front. That means invalid input can bypass this validation layer and the caller gets whatever downstream error happens next, rather than the clear 400 this PR is trying to guarantee.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/validators.py` around lines 151 - 182, validate_document_url
currently accepts any http(s) URL; update it to reject non-Google URLs by
ensuring the URL belongs to Google Docs/Drive before returning. Specifically,
after the existing scheme and length checks in validate_document_url, parse the
URL (or simply check the host/path) and assert it matches accepted Google
patterns (e.g., host contains "docs.google.com" or "drive.google.com" or the
path pattern for Google Docs document IDs) and raise ValidationError if not;
this prevents arbitrary links from being forwarded to
docs_service.get_document_content and guarantees a clear 400 response.
| def validate_qa_pairs(qa_pairs: Any) -> List[Dict[str, Any]]: | ||
| """ | ||
| Validate list of QA pairs. | ||
|
|
||
| Args: | ||
| qa_pairs: The QA pairs list to validate | ||
|
|
||
| Returns: | ||
| Validated QA pairs | ||
|
|
||
| Raises: | ||
| ValidationError: If validation fails | ||
| """ | ||
| if not isinstance(qa_pairs, list): | ||
| raise ValidationError( | ||
| f"qa_pairs must be a list, got {type(qa_pairs).__name__}" | ||
| ) | ||
|
|
||
| if len(qa_pairs) == 0: | ||
| raise ValidationError("qa_pairs cannot be empty") | ||
|
|
||
| for i, pair in enumerate(qa_pairs): | ||
| if not isinstance(pair, dict): | ||
| raise ValidationError( | ||
| f"qa_pairs[{i}] must be a dictionary, got {type(pair).__name__}" | ||
| ) | ||
|
|
||
| if "question" not in pair: | ||
| raise ValidationError(f"qa_pairs[{i}] missing 'question' field") | ||
|
|
||
| if "answer" not in pair: | ||
| raise ValidationError(f"qa_pairs[{i}] missing 'answer' field") | ||
|
|
||
| if not isinstance(pair["question"], str): | ||
| raise ValidationError( | ||
| f"qa_pairs[{i}]['question'] must be a string" | ||
| ) | ||
|
|
||
| if not isinstance(pair["answer"], str): | ||
| raise ValidationError( | ||
| f"qa_pairs[{i}]['answer'] must be a string" | ||
| ) | ||
|
|
||
| if not pair["question"].strip(): | ||
| raise ValidationError( | ||
| f"qa_pairs[{i}]['question'] cannot be empty" | ||
| ) | ||
|
|
||
| if not pair["answer"].strip(): | ||
| raise ValidationError( | ||
| f"qa_pairs[{i}]['answer'] cannot be empty" | ||
| ) | ||
|
|
||
| return qa_pairs |
There was a problem hiding this comment.
MCQ qa_pairs still allow malformed options.
validate_qa_pairs() ignores the optional options field, but backend/server.py Lines 396-425 assume it is a list of strings when question_type == "get_mcq". A payload with options: 1 still passes validation and then fails during iteration; a missing options field degrades into a one-choice MCQ instead of a 400.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/validators.py` around lines 292 - 345, validate_qa_pairs currently
ignores the optional "options" field which leads to runtime errors in server
logic that expects options to be a list of strings for question_type ==
"get_mcq"; update validate_qa_pairs to validate that if "options" is present it
is a non-empty list and each element is a non-empty string, and also require
"options" for MCQ payloads (when question_type == "get_mcq") or otherwise return
a ValidationError; specifically modify validate_qa_pairs to check
pair.get("options") is either absent (for non-MCQ) or a list of strings (each
strip()-nonempty) and surface a clear ValidationError referencing the offending
index (qa_pairs[{i}]['options']) so server code that iterates over options won't
crash.
Issue
API endpoints don't validate user inputs, causing:
What I Did
✅ Created validators.py - validates all inputs before processing
✅ Updated 17 endpoints - added validation and error handling
✅ Added tests - 50+ tests, all passing
✅ Returns clear error messages - e.g., "max_questions cannot exceed 50, got 100"
Endpoints Updated
/get_mcq,/get_boolq,/get_shortq,/get_shortq_llm,/get_mcq_llm,/get_boolq_llm,/get_problems_llm,/get_problems/get_mcq_answer,/get_shortq_answer,/get_boolean_answer/get_content,/generate_gform,/get_shortq_hard,/get_mcq_hard,/get_boolq_hard,/uploadBackward Compatible
✅ All existing valid requests still work
Summary by CodeRabbit
New Features
Documentation
Tests