Fix: robust timeout handling for ML endpoints with multiprocessing and parallel execution#596
Fix: robust timeout handling for ML endpoints with multiprocessing and parallel execution#596mahek2016 wants to merge 23 commits intoAOSSIE-Org:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a thread-based timeout wrapper for model/external calls, global request-size and per-route rate limits, centralized silent JSON parsing with input validation and standardized error payloads, concurrent generation for /get_problems, yt-dlp-based transcript fetching with subprocess timeouts, and updates to multiple endpoints to return consistent JSON shapes. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client as Client
participant Flask as Flask App
participant Limiter as Rate Limiter
participant Executor as ThreadPool Executor
participant Worker as Worker (generator / subprocess)
participant External as ML Model / yt-dlp / External API
Client->>Flask: POST /get_mcq or /get_problems
Flask->>Limiter: check per-endpoint rate limit
alt rate limited
Limiter-->>Client: 429 JSON
else allowed
Flask->>Executor: execute_with_timeout(func, timeout, args...)
Executor->>Worker: run generator or subprocess
Worker->>External: call ML model or yt-dlp
alt success
External-->>Worker: result
Worker-->>Executor: (result, None)
Executor-->>Flask: (result, None)
Flask-->>Client: 200 JSON (structured success)
else timeout
Worker-->>Executor: (None, "timeout")
Executor-->>Flask: (None, "timeout")
Flask-->>Client: 504 JSON (standardized)
else internal error
Worker-->>Executor: (None, error)
Executor-->>Flask: (None, error)
Flask-->>Client: 500 JSON (standardized)
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip You can enable review details to help with troubleshooting, context usage and more.Enable the |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
backend/server.py (2)
483-487:⚠️ Potential issue | 🔴 CriticalOAuth interactive flow will fail in server/production context.
tools.run_flow()opens a browser for interactive OAuth consent. This:
- Blocks the request thread indefinitely waiting for user interaction
- Fails entirely in headless server environments
- Creates race conditions if multiple requests trigger auth simultaneously
For server-to-server auth, use a service account or pre-authorize and store refresh tokens.
🔧 Suggested approach using service account
from google.oauth2 import service_account SCOPES = ["https://www.googleapis.com/auth/forms.body"] creds = service_account.Credentials.from_service_account_file( "service_account_key.json", scopes=SCOPES ) form_service = discovery.build("forms", "v1", credentials=creds, ...)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 483 - 487, The current code uses the interactive OAuth flow (client.flow_from_clientsecrets and tools.run_flow) with file.Storage and creds, which will block/fail in server environments; replace this with non-interactive server auth by loading service account credentials (use google.oauth2.service_account.Credentials.from_service_account_file with the SCOPES) and pass that creds into discovery.build when creating the Forms API client, removing the tools.run_flow/client.flow_from_clientsecrets logic and file.Storage usage; alternatively, if using a user account, load a pre-authorized refresh token from secure storage and refresh it programmatically instead of calling tools.run_flow.
644-672:⚠️ Potential issue | 🟠 MajorMissing timeout protection and input validation on
/get_shortq_hard(and similar_hardendpoints).These endpoints call
qg.generate()directly withoutexecute_with_timeout, contradicting the PR's objective to add timeout protection to all ML endpoints. They also lack the input length validation (10-50000 chars) applied to other endpoints.Additionally,
num_questions=len(input_questions)will be 0 ifinput_questionsis empty, potentially causing unexpected behavior.🔧 Proposed fix for /get_shortq_hard (apply similarly to _mcq_hard and _boolq_hard)
def get_shortq_hard(): data = request.get_json(silent=True) if data is None or not isinstance(data, dict): return jsonify({...}), 400 input_text = data.get("input_text", "") use_mediawiki = data.get("use_mediawiki", 0) + input_questions = data.get("input_question", []) + + if not isinstance(input_text, str): + return jsonify({"error": "input_text must be string"}), 400 + + input_text = input_text.strip() + + if len(input_text) < 10: + return jsonify({"error": "Input too short"}), 400 + + if len(input_text) > 50000: + return jsonify({"error": "Input too long"}), 400 + + if not input_questions: + return jsonify({"error": "input_question is required"}), 400 input_text = process_input_text(input_text, use_mediawiki) - input_questions = data.get("input_question", []) - output = qg.generate(...) + output, error = execute_with_timeout( + qg.generate, + 60, + article=input_text, + num_questions=len(input_questions), + answer_style="sentences" + ) + + if error == "timeout": + return jsonify({"error": "Request timed out"}), 504 + if error: + return jsonify({"error": error}), 500🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 644 - 672, The /get_shortq_hard handler calls qg.generate directly and lacks input length checks and protection against zero num_questions; wrap the qg.generate(...) call in the existing execute_with_timeout(...) helper (use the same timeout semantics used by other ML endpoints), validate processed input_text length is between 10 and 50000 characters and return a 400 JSON error if outside that range, and ensure num_questions is never zero (e.g., num_questions = max(1, len(input_questions)) or derive a sensible default) before passing it to qg.generate; preserve post-processing with make_question_harder and propagate execute_with_timeout errors the same way other endpoints do.
🧹 Nitpick comments (4)
backend/server.py (3)
580-588: Simplify redundant key check.The static analyzer flagged this pattern. Using
.get()is cleaner.♻️ Proposed fix
- if "options" in qapair and qapair["options"]: - options = qapair.get("options") or [] + options = qapair.get("options") + if options: valid_options = [opt for opt in options if opt]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 580 - 588, The current code redundantly checks "if 'options' in qapair and qapair['options']"; replace that pattern by always reading options = qapair.get("options") or [] and proceed — compute valid_options = [opt for opt in options if opt], build choices = [qapair["answer"], *valid_options[:3]] (or ensure answer-only if no options), shuffle and create choices_list as before; update the block around qapair, options, valid_options, choices and choices_list to remove the explicit key presence check and rely on .get().
456-464: Overly broad exception handling obscures root cause.Catching bare
Exceptionreturns a generic 400 error for all failures, but different errors warrant different responses:
- Authentication/permission errors → 403
- Document not found → 404
- Network/server errors → 500
♻️ Proposed improvement
try: content = docs_service.get_document_content(doc_id) return jsonify({"content": content}) - - except Exception: + except HttpError as e: + if e.resp.status == 404: + return jsonify({"error": "Document not found", "code": "not_found"}), 404 + elif e.resp.status == 403: + return jsonify({"error": "Permission denied", "code": "forbidden"}), 403 + return jsonify({"error": "Failed to retrieve document", "code": "document_fetch_error"}), e.resp.status + except Exception as e: return jsonify({ - "error": "Failed to retrieve document content", + "error": f"Failed to retrieve document content: {type(e).__name__}", "code": "document_fetch_error" - }), 400 + }), 500This requires importing
HttpErrorfromgoogleapiclient.errors.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 456 - 464, Replace the broad except Exception around the docs_service.get_document_content call with specific handlers: import HttpError from googleapiclient.errors and catch HttpError first, inspect the HTTP status (e.g., e.resp.status or e.status_code) to return 403 for 403 responses, 404 for 404 responses, and 500 for other HttpError statuses; also add explicit except FileNotFoundError to return 404 and except PermissionError to return 403, finally keep a generic except Exception that returns a 500 and logs the exception for unexpected errors so the endpoint (docs_service.get_document_content) returns appropriate status codes and preserves root-cause details.
365-378: Consider adding timeout protection to QA model calls.The
/get_mcq_answer,/get_shortq_answer, and/get_boolean_answerendpoints callqa_model()andanswer.predict_boolean_answer()without timeout protection. While these are typically faster than generation, they can still hang on pathological inputs.For consistency with the PR's timeout protection goals, consider wrapping these calls.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 365 - 378, Wrap calls to qa_model(...) and answer.predict_boolean_answer(...) with a timeout so they cannot hang indefinitely; specifically, in the loop that calls qa_model (the block using variables input_questions, input_options and computing options_with_answer) and in the handlers for the /get_mcq_answer, /get_shortq_answer, and /get_boolean_answer endpoints, execute the model calls via a timeout-enabled executor (e.g., concurrent.futures with future.result(timeout=...) or asyncio.wait_for) and handle TimeoutError by logging the timeout and returning a safe fallback (skip the question, return a default/empty answer, or a 504 response) so the server remains responsive. Ensure you catch and log the timeout and other exceptions around qa_model(...) and answer.predict_boolean_answer(...) and keep existing similarity-selection logic (vectorizer, cosine_similarity) intact when the call succeeds.requirements.txt (1)
35-36: Pinyt-dlpto a specific version for reproducibility.
yt-dlpis unpinned whileFlask-Limiter==3.5.0is pinned. Sinceyt-dlpis actively developed with frequent breaking changes, an unpinned version could cause unexpected failures in production. Given that the latest version is 2026.3.13, consider updating to a recent pinned version:📦 Proposed fix
Flask-Limiter==3.5.0 -yt-dlp +yt-dlp==2026.3.13🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@requirements.txt` around lines 35 - 36, The requirements file currently pins Flask-Limiter but leaves yt-dlp unpinned (the lines containing "Flask-Limiter==3.5.0" and "yt-dlp"); pin yt-dlp to a specific, recent stable version (for example yt-dlp==2026.3.13 or another vetted release) to ensure reproducible installs, update the requirements.txt entry for "yt-dlp" to the chosen pinned version, and commit the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/server.py`:
- Around line 323-336: The current logic assumes mcq_out, bool_out, and
short_out are dicts and calls .get() even when execute_with_timeout may return
None (e.g., returns (None, "unknown_error")), causing AttributeError; update the
checks around the response assembly to defensively handle None/empty outputs by
using a safe accessor (e.g., replace mcq_out.get(...) with (mcq_out or
{}).get(...) or explicitly check isinstance(mcq_out, dict)) and do the same for
bool_out and short_out so that when mcq_err/bool_err/short_err is falsy but the
corresponding *_out is None or missing keys you fall back to empty lists instead
of calling .get() on None; reference the variables mcq_out, bool_out, short_out
and the function execute_with_timeout when making the change.
- Around line 76-79: The limiter is using get_remote_address so behind a reverse
proxy all requests will show the proxy IP; update the rate-limiting key to use
the real client IP by either applying werkzeug.middleware.proxy_fix.ProxyFix to
app.wsgi_app (e.g., wrap app with ProxyFix(x_for=1, ...) and mark your proxy as
trusted) or replace the key_func passed to Limiter with a small function that
returns request.headers.get('X-Forwarded-For', request.remote_addr). Ensure you
modify the Limiter instantiation (the limiter variable) to use the new key
function instead of get_remote_address and document/configure trusted proxy
behavior accordingly.
- Around line 42-68: The current execute_with_timeout spawns a new Process with
a bound method (e.g., MCQGen.generate_mcq, BoolQGen.generate_boolq,
ShortQGen.generate_shortq) which forces pickling of heavy model state
per-request; instead refactor to a worker pool pattern so models are initialized
once per worker process and tasks call those existing instances: create a Pool
or long-lived worker processes with an initializer that constructs generator
instances (MCQGen/BoolQGen/ShortQGen) in-process, then submit jobs via
apply_async or a task queue with a timeout on the result; alternatively, if you
must keep in-process concurrency, replace the per-call Process in
execute_with_timeout with thread-based timeouts or use a dedicated worker
process that receives lightweight task payloads (not bound methods) and returns
results, ensuring p.terminate is no longer used to kill model-bearing processes
to avoid GPU/resource leaks.
---
Outside diff comments:
In `@backend/server.py`:
- Around line 483-487: The current code uses the interactive OAuth flow
(client.flow_from_clientsecrets and tools.run_flow) with file.Storage and creds,
which will block/fail in server environments; replace this with non-interactive
server auth by loading service account credentials (use
google.oauth2.service_account.Credentials.from_service_account_file with the
SCOPES) and pass that creds into discovery.build when creating the Forms API
client, removing the tools.run_flow/client.flow_from_clientsecrets logic and
file.Storage usage; alternatively, if using a user account, load a
pre-authorized refresh token from secure storage and refresh it programmatically
instead of calling tools.run_flow.
- Around line 644-672: The /get_shortq_hard handler calls qg.generate directly
and lacks input length checks and protection against zero num_questions; wrap
the qg.generate(...) call in the existing execute_with_timeout(...) helper (use
the same timeout semantics used by other ML endpoints), validate processed
input_text length is between 10 and 50000 characters and return a 400 JSON error
if outside that range, and ensure num_questions is never zero (e.g.,
num_questions = max(1, len(input_questions)) or derive a sensible default)
before passing it to qg.generate; preserve post-processing with
make_question_harder and propagate execute_with_timeout errors the same way
other endpoints do.
---
Nitpick comments:
In `@backend/server.py`:
- Around line 580-588: The current code redundantly checks "if 'options' in
qapair and qapair['options']"; replace that pattern by always reading options =
qapair.get("options") or [] and proceed — compute valid_options = [opt for opt
in options if opt], build choices = [qapair["answer"], *valid_options[:3]] (or
ensure answer-only if no options), shuffle and create choices_list as before;
update the block around qapair, options, valid_options, choices and choices_list
to remove the explicit key presence check and rely on .get().
- Around line 456-464: Replace the broad except Exception around the
docs_service.get_document_content call with specific handlers: import HttpError
from googleapiclient.errors and catch HttpError first, inspect the HTTP status
(e.g., e.resp.status or e.status_code) to return 403 for 403 responses, 404 for
404 responses, and 500 for other HttpError statuses; also add explicit except
FileNotFoundError to return 404 and except PermissionError to return 403,
finally keep a generic except Exception that returns a 500 and logs the
exception for unexpected errors so the endpoint
(docs_service.get_document_content) returns appropriate status codes and
preserves root-cause details.
- Around line 365-378: Wrap calls to qa_model(...) and
answer.predict_boolean_answer(...) with a timeout so they cannot hang
indefinitely; specifically, in the loop that calls qa_model (the block using
variables input_questions, input_options and computing options_with_answer) and
in the handlers for the /get_mcq_answer, /get_shortq_answer, and
/get_boolean_answer endpoints, execute the model calls via a timeout-enabled
executor (e.g., concurrent.futures with future.result(timeout=...) or
asyncio.wait_for) and handle TimeoutError by logging the timeout and returning a
safe fallback (skip the question, return a default/empty answer, or a 504
response) so the server remains responsive. Ensure you catch and log the timeout
and other exceptions around qa_model(...) and answer.predict_boolean_answer(...)
and keep existing similarity-selection logic (vectorizer, cosine_similarity)
intact when the call succeeds.
In `@requirements.txt`:
- Around line 35-36: The requirements file currently pins Flask-Limiter but
leaves yt-dlp unpinned (the lines containing "Flask-Limiter==3.5.0" and
"yt-dlp"); pin yt-dlp to a specific, recent stable version (for example
yt-dlp==2026.3.13 or another vetted release) to ensure reproducible installs,
update the requirements.txt entry for "yt-dlp" to the chosen pinned version, and
commit the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 05c0db9e-a33a-4e5c-9b3a-28c2c3c18e48
📒 Files selected for processing (2)
backend/server.pyrequirements.txt
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/server.py (1)
365-378:⚠️ Potential issue | 🟡 MinorMissing timeout protection and input validation for QA endpoints.
The
/get_mcq_answer,/get_shortq_answer, and/get_boolean_answerendpoints callqa_model()andanswer.predict_boolean_answer()directly without timeout protection or input text length validation. A malicious or oversized input could cause these endpoints to hang.Consider applying consistent input validation and timeout handling across all ML-backed endpoints.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 365 - 378, Add input validation and timeout protection around all ML-backed endpoints (/get_mcq_answer, /get_shortq_answer, /get_boolean_answer): validate and reject overly large or empty input_text and input_questions before calling qa_model or answer.predict_boolean_answer, and wrap calls to qa_model(...) and answer.predict_boolean_answer(...) in a bounded-time execution (e.g., asyncio.wait_for or concurrent.futures with a timeout) so slow or malicious inputs raise a timeout and return a 4xx/5xx response. Update the handlers that call qa_model and answer.predict_boolean_answer to catch validation errors and timeout exceptions and return a clear error response instead of hanging.
♻️ Duplicate comments (1)
backend/server.py (1)
42-68:⚠️ Potential issue | 🔴 CriticalCritical: Multiprocessing approach cannot efficiently handle bound methods with ML model state.
This
execute_with_timeoutimplementation passes bound methods (e.g.,MCQGen.generate_mcq) tomultiprocessing.Process, which requires pickling the entire generator instance. TheMCQGenerator,BoolQGenerator, andShortQGeneratorclasses (frombackend/Generator/main.py) all load heavy objects in__init__:
- T5ForConditionalGeneration models
- spaCy NLP pipelines
- Sense2Vec models
- NLTK FreqDist
This causes:
- Multi-GB serialization overhead per request — negating timeout benefits
- Potential
PicklingError— PyTorch models may fail to pickle- Resource leaks —
p.terminate()(Line 57) sends SIGTERM but doesn't guarantee GPU memory or file handle cleanupConsider a worker pool pattern where generators are initialized once per worker process, or use thread-based timeouts with signals for CPU-bound operations.
,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 42 - 68, The current execute_with_timeout wrapper spawns a new multiprocessing.Process with a bound method (e.g., MCQGen.generate_mcq) which forces pickling of heavy model-laden generator instances (MCQGenerator, BoolQGenerator, ShortQGenerator), causing huge serialization overhead, PicklingError risks, and unsafe resource cleanup on terminate; fix by replacing this pattern with a persistent worker pool where each worker process initializes its generator instances once at startup and exposes a lightweight task interface (e.g., task IDs or plain data inputs) that execute_with_timeout (or a new submit_with_timeout) uses to enqueue work, or alternatively switch to a thread-based timeout/signal approach for CPU-bound tasks—ensure you stop passing bound instances into Process/wrapper and instead reference worker-local generators by name, and update execute_with_timeout/wrapper to accept only serializable data payloads and return status via IPC (Queue) without pickling model objects.
🧹 Nitpick comments (4)
backend/server.py (4)
460-464: Broad exception catch obscures failure reasons.Catching bare
Exceptionhides whether the failure was due to authentication, network issues, or an invalid document. Consider logging the actual exception for debugging while returning a generic message to the client:- except Exception: + except Exception as e: + app.logger.error(f"Document fetch failed for doc_id={doc_id}: {e}") return jsonify({ "error": "Failed to retrieve document content", "code": "document_fetch_error" }), 400🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 460 - 464, The bare except in the document retrieval handler masks root causes; modify the except Exception block that returns jsonify({"error":"Failed to retrieve document content","code":"document_fetch_error"}) to catch/log the real exception: add logging (e.g., use the existing logger or app.logger) to record the exception and stacktrace, and optionally narrow catches to known error types (auth/network/NotFound) while still returning the same generic JSON error to the client; ensure you reference the same exception handling block around the document fetch logic so the logged error includes exception info and context (document id/request params).
47-48: Consider preserving exception type for better diagnostics.Catching bare
Exceptionloses the exception type and traceback. For debugging production issues, consider capturing more context:except Exception as e: - q.put(("error", str(e))) + q.put(("error", f"{type(e).__name__}: {e}"))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 47 - 48, The except block that currently does `except Exception as e: q.put(("error", str(e)))` should preserve exception type and traceback for diagnostics: import the traceback module and change the payload sent to the queue from a single string to include the exception class name and the formatted traceback (e.g., include type(e).__name__ and traceback.format_exc()) so callers receiving `q.put(...)` (the error message producer in backend/server.py) get full context for debugging.
308-316: ThreadPoolExecutor wrapping multiprocessing adds unnecessary overhead.Each thread just spawns a subprocess via
execute_with_timeout. Consider usingmultiprocessing.Poolorconcurrent.futures.ProcessPoolExecutordirectly to avoid the extra threading layer. However, this is secondary to the fundamental pickling issue with ML model instances.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 308 - 316, The ThreadPoolExecutor is just spawning subprocesses via execute_with_timeout, adding unnecessary threading overhead and exposing pickling problems for ML model instances; replace the ThreadPoolExecutor block that submits run_mcq/run_boolq/run_shortq with a process-based executor (e.g., concurrent.futures.ProcessPoolExecutor or multiprocessing.Pool) and ensure the target call uses execute_with_timeout in the worker process. Also make the tasks picklable by loading any ML model instances inside the worker process (move model initialization out of module/global scope into the body of run_mcq/run_boolq/run_shortq or implement custom __getstate__/__setstate__ for model objects) so the ProcessPoolExecutor can serialize the tasks correctly. Ensure you reference/modify run_mcq, run_boolq, run_shortq, and execute_with_timeout accordingly.
581-588: Simplify redundant key check.The condition already verifies
qapair["options"]is truthy, making the subsequentor []unnecessary:- if "options" in qapair and qapair["options"]: - options = qapair.get("options") or [] - valid_options = [opt for opt in options if opt] + options = qapair.get("options") + if options: + valid_options = [opt for opt in options if opt]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 581 - 588, The code redundantly uses qapair.get("options") or [] after already checking "options" in qapair and qapair["options"] is truthy; replace that line with a direct access (options = qapair["options"]) to simplify the logic and keep behavior identical, leaving the rest of the block (valid_options, choices, choices_list) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/server.py`:
- Around line 663-667: The qg.generate(...) call in the hard-question endpoints
(/get_shortq_hard, /get_mcq_hard, /get_boolq_hard) lacks timeout protection and
input-length validation; wrap the qg.generate invocation with the same
execute_with_timeout utility used elsewhere (use the same timeout value) and
enforce the same input text length checks (10-50000 chars) before calling it;
update the handler functions that call qg.generate to validate the incoming
article text and then call execute_with_timeout(() => qg.generate(...)) so the
endpoint returns a timeout/error consistently instead of hanging.
- Around line 456-458: The endpoint currently passes a raw doc_id to
get_document_content which expects a full document URL (since
extract_document_id(url) looks for '/document/d/{id}'); construct the full URL
that matches that pattern before calling get_document_content (e.g. build a
string like f"{<docs host>}/document/d/{doc_id}" or use
docs_service.base_url/format helper if available) and pass that URL to
docs_service.get_document_content(document_url) instead of the raw doc_id.
---
Outside diff comments:
In `@backend/server.py`:
- Around line 365-378: Add input validation and timeout protection around all
ML-backed endpoints (/get_mcq_answer, /get_shortq_answer, /get_boolean_answer):
validate and reject overly large or empty input_text and input_questions before
calling qa_model or answer.predict_boolean_answer, and wrap calls to
qa_model(...) and answer.predict_boolean_answer(...) in a bounded-time execution
(e.g., asyncio.wait_for or concurrent.futures with a timeout) so slow or
malicious inputs raise a timeout and return a 4xx/5xx response. Update the
handlers that call qa_model and answer.predict_boolean_answer to catch
validation errors and timeout exceptions and return a clear error response
instead of hanging.
---
Duplicate comments:
In `@backend/server.py`:
- Around line 42-68: The current execute_with_timeout wrapper spawns a new
multiprocessing.Process with a bound method (e.g., MCQGen.generate_mcq) which
forces pickling of heavy model-laden generator instances (MCQGenerator,
BoolQGenerator, ShortQGenerator), causing huge serialization overhead,
PicklingError risks, and unsafe resource cleanup on terminate; fix by replacing
this pattern with a persistent worker pool where each worker process initializes
its generator instances once at startup and exposes a lightweight task interface
(e.g., task IDs or plain data inputs) that execute_with_timeout (or a new
submit_with_timeout) uses to enqueue work, or alternatively switch to a
thread-based timeout/signal approach for CPU-bound tasks—ensure you stop passing
bound instances into Process/wrapper and instead reference worker-local
generators by name, and update execute_with_timeout/wrapper to accept only
serializable data payloads and return status via IPC (Queue) without pickling
model objects.
---
Nitpick comments:
In `@backend/server.py`:
- Around line 460-464: The bare except in the document retrieval handler masks
root causes; modify the except Exception block that returns
jsonify({"error":"Failed to retrieve document
content","code":"document_fetch_error"}) to catch/log the real exception: add
logging (e.g., use the existing logger or app.logger) to record the exception
and stacktrace, and optionally narrow catches to known error types
(auth/network/NotFound) while still returning the same generic JSON error to the
client; ensure you reference the same exception handling block around the
document fetch logic so the logged error includes exception info and context
(document id/request params).
- Around line 47-48: The except block that currently does `except Exception as
e: q.put(("error", str(e)))` should preserve exception type and traceback for
diagnostics: import the traceback module and change the payload sent to the
queue from a single string to include the exception class name and the formatted
traceback (e.g., include type(e).__name__ and traceback.format_exc()) so callers
receiving `q.put(...)` (the error message producer in backend/server.py) get
full context for debugging.
- Around line 308-316: The ThreadPoolExecutor is just spawning subprocesses via
execute_with_timeout, adding unnecessary threading overhead and exposing
pickling problems for ML model instances; replace the ThreadPoolExecutor block
that submits run_mcq/run_boolq/run_shortq with a process-based executor (e.g.,
concurrent.futures.ProcessPoolExecutor or multiprocessing.Pool) and ensure the
target call uses execute_with_timeout in the worker process. Also make the tasks
picklable by loading any ML model instances inside the worker process (move
model initialization out of module/global scope into the body of
run_mcq/run_boolq/run_shortq or implement custom __getstate__/__setstate__ for
model objects) so the ProcessPoolExecutor can serialize the tasks correctly.
Ensure you reference/modify run_mcq, run_boolq, run_shortq, and
execute_with_timeout accordingly.
- Around line 581-588: The code redundantly uses qapair.get("options") or []
after already checking "options" in qapair and qapair["options"] is truthy;
replace that line with a direct access (options = qapair["options"]) to simplify
the logic and keep behavior identical, leaving the rest of the block
(valid_options, choices, choices_list) unchanged.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
backend/server.py (1)
42-52:⚠️ Potential issue | 🔴 CriticalPer-request multiprocessing with bound model methods is still a high-risk execution model.
execute_with_timeoutspins a newProcessfor each request while passing model-bound callables. This is expensive and can break under spawn-based start methods. Prefer long-lived worker processes (initialized once with models) and submit lightweight jobs with per-task timeouts.#!/bin/bash set -euo pipefail echo "1) Locate timeout executor and process primitives" rg -n "def execute_with_timeout|Process\(|Queue\(|set_start_method|get_context" backend/server.py echo echo "2) Locate generator singletons and timeout callsites" rg -n "MCQGen =|BoolQGen =|ShortQGen =|qg =|execute_with_timeout\(" backend/server.py echo echo "3) AST check: nested function used as Process target" python - <<'PY' import ast, pathlib p = pathlib.Path("backend/server.py") tree = ast.parse(p.read_text()) for fn in [n for n in ast.walk(tree) if isinstance(n, ast.FunctionDef) and n.name == "execute_with_timeout"]: nested = [n.name for n in fn.body if isinstance(n, ast.FunctionDef)] print("nested defs in execute_with_timeout:", nested) for n in ast.walk(fn): if isinstance(n, ast.Call) and isinstance(n.func, ast.Name) and n.func.id == "Process": print("Process() call line:", n.lineno) PYAlso applies to: 141-145, 186-190, 232-236, 288-306, 677-683, 737-743, 796-802
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 42 - 52, The current execute_with_timeout creates a new Process per request using a nested wrapper and passes bound model methods (wrapper, Process, Queue), which is expensive and unsafe with spawn-based start; refactor to a long-lived worker pool or persistent worker Process(es) that are initialized once with model instances and accept lightweight task messages (use a manager Queue or multiprocessing.Pool/ProcessPoolExecutor with apply_async) instead of spawning per call, remove the nested wrapper as a Process target and instead send task descriptors (function name + args) or callables that the worker resolves against its local model, and update callsites (MCQGen, BoolQGen, ShortQGen, qg and all execute_with_timeout usages) to submit tasks to the worker pool and enforce per-task timeouts via the pool’s timeout support or Future.result(timeout=...) rather than creating new Process objects each request.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/server.py`:
- Around line 47-48: Replace instances where the code enqueues raw exception
text via q.put(("error", str(e))) with a stable public error indicator (for
example q.put(("error", {"code":"internal_server_error","message":"An internal
error occurred"}))) and log the full exception details server-side using
logging.exception or logger.exception so stack traces stay in server logs only;
locate all occurrences of q.put(("error", str(e))) in server.py (the q variable
and its error enqueue sites) and apply the same mapping so API responses never
contain raw exception strings but the server logs retain the detailed exception
information.
- Around line 461-465: The error handler in the /get_content endpoint currently
catches all exceptions and returns a 400, which hides server errors; update the
exception handling in the function that serves /get_content (the try/except
block around document retrieval) to explicitly catch expected client/validation
errors (e.g., ValueError, werkzeug.exceptions.BadRequest) and return a 400 with
the existing error JSON, and for all other unexpected exceptions log the full
exception (including stack trace) and return a 500 response with code
"internal_server_error" using jsonify; ensure you reference the same response
shape and use the existing logger instead of swallowing the exception.
- Around line 338-343: The handler that returns the "All generators failed"
response currently always uses HTTP 500; update the logic in the /get_problems
response where mcq_err, bool_err, and short_err are checked so that if all three
indicate a timeout (e.g., mcq_err == "timeout" && bool_err == "timeout" &&
short_err == "timeout" or using your TIMEOUT_ERROR constant), return the JSON
error with HTTP 504 instead of 500; otherwise keep the existing 500 return path.
Reference variables: mcq_err, bool_err, short_err and the existing
jsonify({...}) return.
---
Duplicate comments:
In `@backend/server.py`:
- Around line 42-52: The current execute_with_timeout creates a new Process per
request using a nested wrapper and passes bound model methods (wrapper, Process,
Queue), which is expensive and unsafe with spawn-based start; refactor to a
long-lived worker pool or persistent worker Process(es) that are initialized
once with model instances and accept lightweight task messages (use a manager
Queue or multiprocessing.Pool/ProcessPoolExecutor with apply_async) instead of
spawning per call, remove the nested wrapper as a Process target and instead
send task descriptors (function name + args) or callables that the worker
resolves against its local model, and update callsites (MCQGen, BoolQGen,
ShortQGen, qg and all execute_with_timeout usages) to submit tasks to the worker
pool and enforce per-task timeouts via the pool’s timeout support or
Future.result(timeout=...) rather than creating new Process objects each
request.
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
backend/server.py (2)
475-647:⚠️ Potential issue | 🟠 MajorValidate
question_typeandqa_pairsbefore indexing into them.This handler only checks that the body is a dict. After that it assumes every
qapairhas the keys required by the selected branch (question,answer, and sometimesoptions), and unknownquestion_typevalues silently fall into the mixed-question path. Payloads like{"qa_pairs":[{}]}or{"qa_pairs":"oops"}will still becomeKeyError/TypeError500s instead of the structured 400s the rest of this PR is adding.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 475 - 647, The handler currently trusts request.get_json and indexes qa_pairs/question_type directly, causing KeyError/TypeError for malformed payloads; validate that qa_pairs is a list and question_type is a string, and validate each item in qa_pairs is a dict containing required keys before building requests_list (check 'question' present for all, and for MCQ paths ensure 'answer' exists and 'options' is a list), return a 400 JSON error when validation fails; also treat unknown question_type values explicitly (return 400) instead of falling into the generic else branch so only supported types ("get_shortq","get_mcq","get_boolq") are accepted; update code around qa_pairs/question_type parsing and before the branches that reference qa_pairs (symbols: qa_pairs, question_type, get_shortq, get_mcq, get_boolq, requests_list, NEW_QUESTION).
346-377:⚠️ Potential issue | 🟠 MajorThe answer endpoints still lack timeout protection and field type validation.
These handlers have top-level JSON validation but
qa_model(...)andanswer.predict_boolean_answer(...)are called directly without theexecute_with_timeoutwrapper available elsewhere in the codebase. The input fields (input_text,input_question,input_options) are not type-checked before use. Slow model inference can pin worker threads indefinitely, and malformed arrays can trigger unhandled 500 errors instead of the 400/504 behavior implemented in other endpoints.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 346 - 377, Validate and type-check incoming fields before use: ensure request.get_json(...) returns a dict, input_text is a string, input_questions is a list of strings, and input_options is a list of lists of strings and that len(input_questions) == len(input_options); if validation fails return 400 with code "invalid_request". Wrap any blocking model calls (qa_model(...) and any answer.predict_boolean_answer(...) usages) with the existing execute_with_timeout(...) helper and catch TimeoutError to return 504 with code "model_timeout"; also catch other exceptions from the wrapped call and return a 500/appropriate error. Use the same response key consistently (outputs vs output) when returning results. Reference: request.get_json, qa_model, answer.predict_boolean_answer, execute_with_timeout.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/server.py`:
- Around line 653-688: The request handler currently uses len(input_questions)
directly for num_questions (passed to execute_with_timeout -> qg.generate)
without validating input_question; ensure you validate and sanitize it: check
that input_questions (from data.get("input_question")) is either missing/None or
a list, return a 400 if it is present but not a list (or coerce to an empty
list), validate that each entry is of an expected type (e.g., str) if
applicable, and enforce a hard cap like MAX_NUM_QUESTIONS (e.g., 20) before
computing num_questions; then pass this safe, bounded num_questions into
execute_with_timeout/qg.generate instead of raw len(input_questions).
- Around line 452-454: The endpoint handling the Google Doc import changed to
return {"content": ...}, breaking the extension which expects the response body
to be the raw string; update the /get_content handler (the block that builds
document_url and calls docs_service.get_document_content) to preserve backward
compatibility by returning both forms — include the nested key ({"content":
content}) and also return the raw content as the top-level JSON value (so the
previous client code that reads the response as the string still receives the
same value) or otherwise restore the previous raw-string response; ensure the
response uses docs_service.get_document_content(document_url) value in both
places so callers using data or data.content continue to work.
- Around line 131-137: The call to process_input_text(input_text, use_mediawiki)
performs a synchronous mediawikiapi.summary() outside the timeout boundary,
risking a hung Flask worker; wrap the MediaWiki enrichment in a timeout or move
the enrichment inside the execute_with_timeout call so it runs under the same
deadline. Specifically, either (A) create a small wrapper function (e.g.,
enrich_with_mediawiki) that calls process_input_text / mediawikiapi.summary and
invoke it via execute_with_timeout before calling MCQGen.generate_mcq, or (B)
expand the task passed to execute_with_timeout to perform both enrichment and
MCQGen.generate_mcq in one call; update all handlers that call
process_input_text (the occurrences around MCQGen.generate_mcq and
execute_with_timeout) to use one of these patterns so mediawikiapi.summary is
protected by a timeout.
---
Outside diff comments:
In `@backend/server.py`:
- Around line 475-647: The handler currently trusts request.get_json and indexes
qa_pairs/question_type directly, causing KeyError/TypeError for malformed
payloads; validate that qa_pairs is a list and question_type is a string, and
validate each item in qa_pairs is a dict containing required keys before
building requests_list (check 'question' present for all, and for MCQ paths
ensure 'answer' exists and 'options' is a list), return a 400 JSON error when
validation fails; also treat unknown question_type values explicitly (return
400) instead of falling into the generic else branch so only supported types
("get_shortq","get_mcq","get_boolq") are accepted; update code around
qa_pairs/question_type parsing and before the branches that reference qa_pairs
(symbols: qa_pairs, question_type, get_shortq, get_mcq, get_boolq,
requests_list, NEW_QUESTION).
- Around line 346-377: Validate and type-check incoming fields before use:
ensure request.get_json(...) returns a dict, input_text is a string,
input_questions is a list of strings, and input_options is a list of lists of
strings and that len(input_questions) == len(input_options); if validation fails
return 400 with code "invalid_request". Wrap any blocking model calls
(qa_model(...) and any answer.predict_boolean_answer(...) usages) with the
existing execute_with_timeout(...) helper and catch TimeoutError to return 504
with code "model_timeout"; also catch other exceptions from the wrapped call and
return a 500/appropriate error. Use the same response key consistently (outputs
vs output) when returning results. Reference: request.get_json, qa_model,
answer.predict_boolean_answer, execute_with_timeout.
| document_url = f"https://docs.google.com/document/d/{doc_id}/edit" | ||
| content = docs_service.get_document_content(document_url) | ||
| return jsonify({"content": content}) |
There was a problem hiding this comment.
This /get_content contract change breaks the extension client.
This route now returns {"content": ...}, but extension/src/pages/text_input/TextInput.jsx:78-80 still does setText(data || "Error in retrieving"). Until that caller is updated to use data.content (or this endpoint preserves backward compatibility during rollout), Google Doc imports in the extension will stop populating the text field correctly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/server.py` around lines 452 - 454, The endpoint handling the Google
Doc import changed to return {"content": ...}, breaking the extension which
expects the response body to be the raw string; update the /get_content handler
(the block that builds document_url and calls docs_service.get_document_content)
to preserve backward compatibility by returning both forms — include the nested
key ({"content": content}) and also return the raw content as the top-level JSON
value (so the previous client code that reads the response as the string still
receives the same value) or otherwise restore the previous raw-string response;
ensure the response uses docs_service.get_document_content(document_url) value
in both places so callers using data or data.content continue to work.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
backend/server.py (1)
565-568:⚠️ Potential issue | 🟠 Major
/get_contentresponse-shape change is likely breaking existing clients.The endpoint now returns
{"content": ...}, but the provided frontend usage (eduaid_web/src/pages/Text_Input.jsx) sets text fromdatadirectly. That path will receive an object instead of a string.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 565 - 568, The endpoint now wraps the document text in {"content": ...} which breaks callers expecting the raw string; change the /get_content handler so it returns the document text in the original shape (a plain string or the exact JSON previously used). Specifically, after computing document_url and calling docs_service.get_document_content(document_url) (and in the handler function that uses document_url and docs_service.get_document_content), return the content as the original value (e.g., return jsonify(content) or return Response(content, mimetype="text/plain")) instead of jsonify({"content": content}) so existing frontend code that reads data as a string continues working.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/server.py`:
- Around line 68-71: The limiter currently trusts the X-Forwarded-For header via
the custom key_func (lambda using request.headers.get("X-Forwarded-For",
request.remote_addr)), which allows header spoofing; instead install and
configure Werkzeug's ProxyFix middleware for your Flask app to normalize
X-Forwarded-For from trusted proxies and replace the custom key_func with
flask_limiter.util.get_remote_address when constructing the Limiter (refer to
Limiter and get_remote_address), ensuring you set ProxyFix's x_for to the
correct number of trusted proxies for your deployment.
- Around line 41-50: The timeout wrapper execute_with_timeout currently uses
ThreadPoolExecutor as a context manager which calls shutdown(wait=True) on exit
and blocks when a TimeoutError occurs; update execute_with_timeout to avoid
blocking on return by creating the ThreadPoolExecutor without the with-statement
(or explicitly call executor.shutdown(wait=False, cancel_futures=True) on
timeout), then submit the task to future and on catching TimeoutError return
immediately after cancelling/triggering non-waiting shutdown; reference
ThreadPoolExecutor, future, future.cancel(), executor.shutdown(wait=False,
cancel_futures=True) (or use cancel_futures=True for Python 3.9+) to ensure the
function does not hang when a timeout occurs.
---
Duplicate comments:
In `@backend/server.py`:
- Around line 565-568: The endpoint now wraps the document text in {"content":
...} which breaks callers expecting the raw string; change the /get_content
handler so it returns the document text in the original shape (a plain string or
the exact JSON previously used). Specifically, after computing document_url and
calling docs_service.get_document_content(document_url) (and in the handler
function that uses document_url and docs_service.get_document_content), return
the content as the original value (e.g., return jsonify(content) or return
Response(content, mimetype="text/plain")) instead of jsonify({"content":
content}) so existing frontend code that reads data as a string continues
working.
Addressed Issues:
Fixes #592
Screenshots/Recordings:
N/A (Backend improvement – no UI changes)
Additional Notes:
This PR introduces robust timeout handling for ML endpoints to prevent indefinite execution and resource exhaustion.
Key improvements include:
This ensures better performance, stability, and user experience.
AI Usage Disclosure:
I have used the following AI models and tools:
Checklist
Summary by CodeRabbit
New Features
Improvements