fix(web): prevent Output page crash from undefined forEach payload fields by mahek2016 · Pull Request #628 · AOSSIE-Org/EduAid

mahek2016 · 2026-03-26T11:17:12Z

Addressed Issues:

Fixes #610

Additional Notes:

Added safe JSON parsing using try/catch to handle corrupted or missing localStorage data.
Normalized all iterable fields (output, output_mcq.questions, output_boolq.Boolean_Questions, output_shortq.questions) using a helper function to prevent runtime errors.
Replaced unsafe .forEach calls with guarded iterations.
Used optional chaining (?.) to safely access nested properties.
Prevented duplicate rendering across question types.
Ensured consistent handling of all question types (Boolean, MCQ, Short).

This improves the robustness of the Output page and prevents crashes due to incomplete or malformed payload data.

AI Usage Disclosure:

This PR does not contain AI-generated code at all.
This PR contains AI-generated code. I have read the AI Usage Policy and this PR complies with this policy. I have tested the code locally and I am responsible for it.

I have used the following AI models and tools:

ChatGPT (for guidance, debugging, and code structuring)

Checklist

My PR addresses a single issue, fixes a single bug or makes a single improvement.
My code follows the project's code style and conventions
If applicable, I have made corresponding changes or additions to the documentation
If applicable, I have made corresponding changes or additions to tests
My changes generate no new warnings or errors
I have joined the Discord server and I will share a link to this PR with the project maintainers there
I have read the Contribution Guidelines
Once I submit my PR, CodeRabbit AI will automatically review it and I will address CodeRabbit's comments.
I have filled this PR template completely and carefully, and I understand that my PR may be closed without review otherwise.

Summary by CodeRabbit

New Features
- Added API rate limiting, request size caps, and per-endpoint timeouts to improve stability.
- Returned structured form links instead of opening a browser.
Bug Fixes
- Hardened input validation and clearer error responses (400/404/504/500) across endpoints.
- Improved YouTube transcript handling and temporary file management.
- Fixed PDF dropdown click handling and resilient localStorage parsing in Output page.
Refactor
- Switched string-similarity implementation and updated dependency list for compatibility.

…Flask-Limiter

…ints

…patibility

…Rabbit review

…IE-Org#620)

…elds

coderabbitai · 2026-03-26T11:17:30Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 04c71fd0-e11a-4c6e-8c43-e04c88c3aa75

📥 Commits

Reviewing files that changed from the base of the PR and between 54c5bab and 0970c6a.

📒 Files selected for processing (1)

eduaid_web/src/pages/Output.jsx

🚧 Files skipped from review as they are similar to previous changes (1)

eduaid_web/src/pages/Output.jsx

📝 Walkthrough

Walkthrough

The PR swaps string-similarity imports to strsimpy, tightens server infrastructure (request size cap, rate limiting, timeout executor), defers generator initialization and uses timeout/wrapped generation for endpoints, hardens request validation, fixes Output page localStorage parsing and missing-array crashes, and updates dependencies.

Changes

Cohort / File(s)	Summary
String Similarity Migration `backend/Generator/main.py`, `backend/Generator/mcq.py`	Changed import source for `NormalizedLevenshtein` from `similarity.normalized_levenshtein` → `strsimpy.normalized_levenshtein`; instantiation/usage unchanged.
Server Infrastructure & Endpoint Refactoring `backend/server.py`	Added `MAX_CONTENT_LENGTH`, `ProxyFix`, Flask-Limiter rate limits and handlers; introduced `execute_with_timeout()`; deferred generator initializations to `None`; replaced some endpoints with mock responses; `/get_problems` now validates input, runs MediaWiki and per-generator generation in parallel with timeouts and fallbacks; tightened JSON parsing/validation across many handlers; updated `/getTranscript` to use `yt-dlp` with temp files and clearer error codes.
Frontend Output Crash Fix `eduaid_web/src/pages/Output.jsx`	Wrapped `localStorage` parsing in try/catch, added `normalizeArray()` to coerce missing arrays to `[]`, refactored combining of QA pairs, and added `questionType` to effect deps to recompute when type changes.
Dependencies `requirements.txt`	Fixed malformed `google-auth` line; added `Flask-Limiter==3.5.0`, `yt-dlp`, and `strsimpy`; cleaned trailing newline.

Sequence Diagram

sequenceDiagram
    participant Client
    participant RateLimiter as Rate Limiter
    participant Server as Flask Server
    participant Executor as Timeout Executor
    participant MediaWiki
    participant MCQGen as MCQ Generator
    participant BoolQGen as Bool Generator
    participant ShortQGen as Short Generator

    Client->>RateLimiter: POST /get_problems
    RateLimiter->>Server: allow / reject
    Server->>Server: validate JSON & inputs
    Server->>MediaWiki: fetch/process content (with timeout via Executor)
    MediaWiki-->>Server: content / timeout
    Server->>Executor: start parallel generators (MCQ/BoolQ/ShortQ)
    par Parallel
        Executor->>MCQGen: generate (timeout)
        Executor->>BoolQGen: generate (timeout)
        Executor->>ShortQGen: generate (timeout)
    end
    MCQGen-->>Executor: result / error
    BoolQGen-->>Executor: result / error
    ShortQGen-->>Executor: result / error
    Executor-->>Server: merged results + fallbacks
    Server-->>Client: JSON response (results or aggregated error)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

feat: Add Qwen3-0.6B LLM endpoint for question generation #452 — Modifies backend generator initialization and adds LLM endpoints; closely related changes around generator lifecycle and endpoints.

Poem

🐇 I nibbled through imports late at night,

Swapped in strsimpy to make matches right.
Timeouts hum, rate limits keep the beat,
LocalStorage now won't face defeat.
Hooray — small hops, and the app runs light!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Out of Scope Changes check	⚠️ Warning	Changes in backend files (main.py, mcq.py, server.py) and dependency updates are unrelated to the Output page crash fix described in the title and linked issue `#610`.	Remove or isolate backend changes (import replacements, rate limiting, timeouts) and dependency additions to separate PRs focused on their specific objectives.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title correctly identifies the primary fix: preventing Output page crashes from undefined forEach operations on payload fields.
Linked Issues check	✅ Passed	Code changes address all requirements from issue `#610`: safe localStorage parsing with try/catch, array normalization helper, guarded iteration patterns, and optional chaining to prevent forEach crashes on undefined fields.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

backend/server.py (1)

609-617: ⚠️ Potential issue | 🟠 Major

Wrap each boolean question in a list before calling the predictor.

AnswerPredictor.predict_boolean_answer() iterates payload["input_question"]. Passing a bare string here makes it score one character at a time, and if qa_response: then turns every non-empty question into "True".

Proposed fix

     for question in input_questions:
         qa_response = answer.predict_boolean_answer(
-            {"input_text": input_text, "input_question": question}
+            {"input_text": input_text, "input_question": [question]}
         )

-        if qa_response:
+        if qa_response and qa_response[0]:
             output.append("True")
         else:
             output.append("False")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 609 - 617, The loop is passing a bare string
to answer.predict_boolean_answer which iterates payload["input_question"]
character-by-character; change the call to pass the question as a single-item
list (e.g., {"input_text": input_text, "input_question": [question]}) and then
inspect the returned list element (qa_response[0]) rather than the truthiness of
qa_response before appending to output (use output.append("True") or "False"
based on qa_response[0], and guard for an empty qa_response).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/server.py`:
- Around line 94-102: The file sets MCQGen, BoolQGen, ShortQGen, answer, qg,
qa_model, and llm_generator to None which prevents real backends from being used
and causes 500s when routes dereference them; remove these unconditional None
assignments (or replace them with proper initializers) and ensure these symbols
are instantiated with their real classes (e.g., MCQGen = MCQGenerator(...),
BoolQGen = BoolQGenerator(...), ShortQGen = ShortQGenerator(...), qg =
QuestionGenerator(...), qa_model = QAModel(...), llm_generator =
LLMGenerator(...), answer = AnswerService(...)) during app startup or lazily
inside the route handlers for /get_mcq, /get_boolq, /get_shortq and the
corresponding _llm/_answer/_hard routes so that routes never see None before use
and any initialization errors are logged and handled gracefully.
- Around line 981-988: The call to qg.generate via execute_with_timeout uses
answer_style="true_false", but QuestionGenerator.generate only accepts "all",
"sentences", or "multiple_choice", so update the code that constructs qg (the
generator instance used here) to use a boolean-capable generator implementation
(one that supports "true_false") or, if you prefer not to add a new generator,
change the answer_style argument to one of the supported values ("all",
"sentences", or "multiple_choice"); locate the qg instantiation (the code that
creates the generator passed into qg.generate) and either replace it with a
TrueFalse-capable generator class or add handling to map/convert "true_false"
requests into a supported style before calling execute_with_timeout.
- Around line 234-248: The mock `/get_boolq` response is returning list items as
{"question","answer"} objects but the frontend (Output.jsx when questionType ===
"get_boolq") expects each item in the "output" array to be the question string;
change the `questions` variable to be a list of strings (e.g. ["AI is useful?",
...]) and keep the existing slicing (`questions = questions[:max_questions]`)
and the JSON response keys ("output", "status", "mock_mode", "warning") intact
so the UI receives an array of question strings rather than objects.
- Around line 490-519: The response currently flattens each generator to a bare
array; instead preserve the nested objects so the frontend can access
output_mcq.questions, output_boolq.Boolean_Questions and
output_shortq.questions. Update the assignments that use mcq_out, bool_out,
short_out so that when no error and isinstance(..., dict) you set
response["output_mcq"] = mcq_out or {} (and likewise response["output_boolq"] =
bool_out or {}, response["output_shortq"] = short_out or {}), leaving empty
dicts as the fallback; keep the existing all-failed error return but ensure its
output_* values are empty objects ({}), not arrays.
- Around line 43-58: The current execute_with_timeout uses ThreadPoolExecutor
and simply calls future.result(timeout=...), so on FuturesTimeoutError the
worker thread keeps running; replace this with a cancellable approach: either
switch to ProcessPoolExecutor (in execute_with_timeout) so you can
terminate/kill worker processes after timeout (call
process_executor.shutdown(...) and terminate child processes) or implement
cooperative cancellation by passing a threading.Event into the submitted
function (e.g., cancel_event) and have the target function periodically check
cancel_event.is_set(); update the submit call and function signatures
accordingly, handle FuturesTimeoutError by setting cancel_event.set() and then
shutting down the executor, and ensure executor.shutdown(wait=False) is still
called; reference execute_with_timeout, ThreadPoolExecutor, ProcessPoolExecutor,
future.result, FuturesTimeoutError, executor.shutdown, and the cooperative
cancel Event so reviewers can locate and modify the code.
- Around line 438-476: The generator wrapper functions run_mcq, run_boolq, and
run_shortq are calling MCQGen.generate_mcq / BoolQGen.generate_boolq /
ShortQGen.generate_shortq with keyword args, but those methods expect a single
payload dict; change each lambda passed to execute_with_timeout to call the
generator with a single dict payload (e.g., {'input_text': input_text,
'max_questions': max_questions_mcq} for run_mcq, analogous keys/values for
run_boolq and run_shortq) so the generators receive one argument as defined;
keep the same timeout and existing None/"not_available" checks and return
behavior.

In `@eduaid_web/src/pages/Output.jsx`:
- Around line 22-23: The current outside-click guard in the click handler (the
condition using dropdown and !event.target.closest('button')) is too broad and
prevents the dropdown from closing when any other button is clicked; narrow the
check to only ignore the specific PDF toggle instead. Modify the condition in
the Output.jsx click handler to look for a specific selector or ref for the PDF
toggle (e.g., event.target.closest('.pdf-toggle') or compare against a
pdfToggleRef/current element) instead of generic 'button', so only clicks on
that exact toggle prevent closing while all other outside clicks close the
dropdown.
- Around line 141-162: The current conditional skips iterating `output` when
`questionType === "get_mcq"`, dropping MCQs; update the logic in Output.jsx
around the `questionType` branches so that when `questionType === "get_mcq"` you
still iterate `output` (push into `combinedQaPairs`) and also include items from
`output_mcq.questions` (merge both sources), e.g., add an explicit `if
(questionType === "get_mcq")` branch that processes `output.forEach(...)` and
`output_mcq.questions.forEach(...)`, keeping the existing field mapping used for
other branches and retaining `combinedQaPairs` as the single merged array.

In `@requirements.txt`:
- Around line 31-38: Update the mammoth dependency in requirements.txt to at
least version 1.11.0 (e.g., change "mammoth" to "mammoth>=1.11.0") and rerun
dependency install so that backend code using mammoth.extract_raw_text (in
Generator/main.py) uses the patched library; after upgrading, run tests or a
quick smoke test of the .docx upload path to confirm external file access is
disabled and behavior remains correct.

---

Outside diff comments:
In `@backend/server.py`:
- Around line 609-617: The loop is passing a bare string to
answer.predict_boolean_answer which iterates payload["input_question"]
character-by-character; change the call to pass the question as a single-item
list (e.g., {"input_text": input_text, "input_question": [question]}) and then
inspect the returned list element (qa_response[0]) rather than the truthiness of
qa_response before appending to output (use output.append("True") or "False"
based on qa_response[0], and guard for an empty qa_response).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8725e66b-6869-4a93-8bbe-775ce182c4da

📥 Commits

Reviewing files that changed from the base of the PR and between 2038116 and 54c5bab.

⛔ Files ignored due to path filters (1)

backend/package-lock.json is excluded by !**/package-lock.json

📒 Files selected for processing (5)

backend/Generator/main.py
backend/Generator/mcq.py
backend/server.py
eduaid_web/src/pages/Output.jsx
requirements.txt

coderabbitai · 2026-03-26T11:28:44Z

backend/server.py

+def execute_with_timeout(func, timeout, *args, **kwargs):
+    executor = ThreadPoolExecutor(max_workers=1)
+    future = executor.submit(func, *args, **kwargs)
+
+    try:
+        result = future.result(timeout=timeout)
+        executor.shutdown(wait=False)  # ✅ FIX
+        return result, None
+
+    except FuturesTimeoutError:
+        executor.shutdown(wait=False)  # ✅ FIX
+        return None, "timeout"
+
+    except Exception:
+        executor.shutdown(wait=False)
+        logging.exception("Error occurred during execution")


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "server.py" -path "*/backend/*" | head -5

Repository: AOSSIE-Org/EduAid

Length of output: 80

🏁 Script executed:

cat -n backend/server.py | sed -n '43,62p'

Repository: AOSSIE-Org/EduAid

Length of output: 855

🌐 Web query:

ThreadPoolExecutor future.result timeout does the function keep running Python concurrent.futures

💡 Result:

Yes, when calling future.result(timeout=...) on a ThreadPoolExecutor future in Python's concurrent.futures and the timeout expires, a TimeoutError is raised in the calling thread, but the submitted function continues running in its worker thread until completion. The timeout only limits how long the caller waits for the result; it does not cancel or terminate the underlying task. This behavior occurs because Python threads cannot be forcibly terminated from another thread (due to risks like leaving locks held). The future.cancel method only works for pending (not yet started) tasks; for running tasks, it returns False. Official documentation confirms: Future.result(timeout) waits up to the specified seconds and raises TimeoutError if not completed, but does not mention stopping the task. To stop long-running tasks, make them cooperative by checking a shared threading.Event flag periodically, or use ProcessPoolExecutor (though it also doesn't auto-terminate on timeout without extra work).

Citations:

1: https://www.pythontutorials.net/blog/how-to-use-concurrent-futures-with-timeouts/

2: https://stackoverflow.com/questions/78460097/why-cant-threadpoolexecutor-timeout-a-long-running-expression

3: https://docs.python.org/3/library/concurrent.futures.html

4: https://stackoverflow.com/questions/6509261/how-to-use-concurrent-futures-with-timeouts

This timeout only stops waiting; it does not stop the work.

After future.result(timeout=...) raises TimeoutError, the submitted function continues running in its worker thread, and executor.shutdown(wait=False) returns immediately without terminating the thread. On model endpoints this means timed-out jobs keep burning CPU/GPU after a 504 response; repeated timeouts can accumulate and snowball into resource exhaustion. To enforce a hard execution cap, the work must either check a cooperative cancellation flag (threading.Event) periodically, or use a killable mechanism like ProcessPoolExecutor (which also requires additional cancellation logic beyond shutdown).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 43 - 58, The current execute_with_timeout uses ThreadPoolExecutor and simply calls future.result(timeout=...), so on FuturesTimeoutError the worker thread keeps running; replace this with a cancellable approach: either switch to ProcessPoolExecutor (in execute_with_timeout) so you can terminate/kill worker processes after timeout (call process_executor.shutdown(...) and terminate child processes) or implement cooperative cancellation by passing a threading.Event into the submitted function (e.g., cancel_event) and have the target function periodically check cancel_event.is_set(); update the submit call and function signatures accordingly, handle FuturesTimeoutError by setting cancel_event.set() and then shutting down the executor, and ensure executor.shutdown(wait=False) is still called; reference execute_with_timeout, ThreadPoolExecutor, ProcessPoolExecutor, future.result, FuturesTimeoutError, executor.shutdown, and the cooperative cancel Event so reviewers can locate and modify the code.

coderabbitai · 2026-03-26T11:28:44Z

backend/server.py

+MCQGen = None
+answer = None
+BoolQGen = None
+ShortQGen = None
+qg = None
 file_processor = main.FileProcessor()
 mediawikiapi = MediaWikiAPI()
-qa_model = pipeline("question-answering")
-llm_generator = LLMQuestionGenerator()
-
+qa_model = None
+llm_generator = None


⚠️ Potential issue | 🔴 Critical

These None assignments disable the real backend paths.

Nothing in this module reinitializes MCQGen, BoolQGen, ShortQGen, answer, qg, qa_model, or llm_generator after setting them to None. That leaves /get_mcq/get_boolq/get_shortq permanently in mock mode, while /get_*_llm, /get_*_answer, /get_boolean_answer, and the *_hard routes dereference None and 500.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 94 - 102, The file sets MCQGen, BoolQGen, ShortQGen, answer, qg, qa_model, and llm_generator to None which prevents real backends from being used and causes 500s when routes dereference them; remove these unconditional None assignments (or replace them with proper initializers) and ensure these symbols are instantiated with their real classes (e.g., MCQGen = MCQGenerator(...), BoolQGen = BoolQGenerator(...), ShortQGen = ShortQGenerator(...), qg = QuestionGenerator(...), qa_model = QAModel(...), llm_generator = LLMGenerator(...), answer = AnswerService(...)) during app startup or lazily inside the route handlers for /get_mcq, /get_boolq, /get_shortq and the corresponding _llm/_answer/_hard routes so that routes never see None before use and any initialization errors are logged and handled gracefully.

backend/server.py

coderabbitai · 2026-03-26T11:28:44Z

backend/server.py

+    # ✅ SAFE GENERATOR CALLS (FIXED WITH LAMBDA)
+    def run_mcq():
+        if MCQGen is None:
+            logging.warning("MCQGen not initialized")
+            return None, "not_available"
+
+        return execute_with_timeout(
+            lambda: MCQGen.generate_mcq(
+                input_text=input_text,
+                max_questions=max_questions_mcq
+            ),
+            60
+        )
+
+    def run_boolq():
+        if BoolQGen is None:
+            logging.warning("BoolQGen not initialized")
+            return None, "not_available"
+
+        return execute_with_timeout(
+            lambda: BoolQGen.generate_boolq(
+                input_text=input_text,
+                max_questions=max_questions_boolq
+            ),
+            60
+        )
+
+    def run_shortq():
+        if ShortQGen is None:
+            logging.warning("ShortQGen not initialized")
+            return None, "not_available"
+
+        return execute_with_timeout(
+            lambda: ShortQGen.generate_shortq(
+                input_text=input_text,
+                max_questions=max_questions_shortq
+            ),
+            60
+        )


⚠️ Potential issue | 🔴 Critical

Pass payload dicts into the generator methods.

backend/Generator/main.py defines generate_mcq(), generate_boolq(), and generate_shortq() as single-argument payload methods. Calling them with input_text= / max_questions= kwargs raises TypeError, so /get_problems can never reach the real generation path once those generators are re-enabled.

Proposed fix

def run_mcq(): if MCQGen is None: logging.warning("MCQGen not initialized") return None, "not_available" return execute_with_timeout( - lambda: MCQGen.generate_mcq( - input_text=input_text, - max_questions=max_questions_mcq - ), - 60 + MCQGen.generate_mcq, + 60, + { + "input_text": input_text, + "max_questions": max_questions_mcq, + }, ) def run_boolq(): if BoolQGen is None: logging.warning("BoolQGen not initialized") return None, "not_available" return execute_with_timeout( - lambda: BoolQGen.generate_boolq( - input_text=input_text, - max_questions=max_questions_boolq - ), - 60 + BoolQGen.generate_boolq, + 60, + { + "input_text": input_text, + "max_questions": max_questions_boolq, + }, ) def run_shortq(): if ShortQGen is None: logging.warning("ShortQGen not initialized") return None, "not_available" return execute_with_timeout( - lambda: ShortQGen.generate_shortq( - input_text=input_text, - max_questions=max_questions_shortq - ), - 60 + ShortQGen.generate_shortq, + 60, + { + "input_text": input_text, + "max_questions": max_questions_shortq, + }, )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 438 - 476, The generator wrapper functions run_mcq, run_boolq, and run_shortq are calling MCQGen.generate_mcq / BoolQGen.generate_boolq / ShortQGen.generate_shortq with keyword args, but those methods expect a single payload dict; change each lambda passed to execute_with_timeout to call the generator with a single dict payload (e.g., {'input_text': input_text, 'max_questions': max_questions_mcq} for run_mcq, analogous keys/values for run_boolq and run_shortq) so the generators receive one argument as defined; keep the same timeout and existing None/"not_available" checks and return behavior.

coderabbitai · 2026-03-26T11:28:44Z

backend/server.py

+    # ✅ FINAL SAFE RESPONSE
+    response = {
+        "output_mcq": [],
+        "output_boolq": [],
+        "output_shortq": []
+    }
+
+    if not mcq_err and isinstance(mcq_out, dict):
+        response["output_mcq"] = mcq_out.get("questions", []) or []
+
+    if not bool_err and isinstance(bool_out, dict):
+        response["output_boolq"] = bool_out.get("Boolean_Questions", []) or []
+
+    if not short_err and isinstance(short_out, dict):
+        response["output_shortq"] = short_out.get("questions", []) or []
+
+    # ✅ If ALL failed
+    if mcq_err and bool_err and short_err:
+        all_errors = [mcq_err, bool_err, short_err]
+
+        status = 504 if all(e == "timeout" for e in all_errors) else 500
+        code = "timeout" if status == 504 else "all_failed"
+
+        return jsonify({
+            "error": "All generators failed",
+            "code": code,
+            "output_mcq": [],
+            "output_boolq": [],
+            "output_shortq": []
+        }), status


⚠️ Potential issue | 🟠 Major

Preserve the nested output_* objects in the mixed response.

The Output page reads output_mcq.questions, output_boolq.Boolean_Questions plus Text, and output_shortq.questions. Flattening each generator result to a bare array makes those lookups undefined, so mixed MCQ/Boolean/Short results disappear even though the page no longer crashes.

Proposed response-shape fix

- response = { - "output_mcq": [], - "output_boolq": [], - "output_shortq": [] - } + response = { + "output_mcq": {"questions": []}, + "output_boolq": {"Boolean_Questions": [], "Text": input_text}, + "output_shortq": {"questions": []}, + } if not mcq_err and isinstance(mcq_out, dict): - response["output_mcq"] = mcq_out.get("questions", []) or [] + response["output_mcq"]["questions"] = mcq_out.get("questions", []) or [] if not bool_err and isinstance(bool_out, dict): - response["output_boolq"] = bool_out.get("Boolean_Questions", []) or [] + response["output_boolq"]["Boolean_Questions"] = bool_out.get("Boolean_Questions", []) or [] + response["output_boolq"]["Text"] = bool_out.get("Text", input_text) if not short_err and isinstance(short_out, dict): - response["output_shortq"] = short_out.get("questions", []) or [] + response["output_shortq"]["questions"] = short_out.get("questions", []) or [] @@ return jsonify({ "error": "All generators failed", "code": code, - "output_mcq": [], - "output_boolq": [], - "output_shortq": [] + "output_mcq": {"questions": []}, + "output_boolq": {"Boolean_Questions": [], "Text": input_text}, + "output_shortq": {"questions": []} }), status

Based on learnings, mixed question types are intentionally supported in eduaid_web/src/pages/Output.jsx, and output_mcq plus output are distinct data sources.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 490 - 519, The response currently flattens each generator to a bare array; instead preserve the nested objects so the frontend can access output_mcq.questions, output_boolq.Boolean_Questions and output_shortq.questions. Update the assignments that use mcq_out, bool_out, short_out so that when no error and isinstance(..., dict) you set response["output_mcq"] = mcq_out or {} (and likewise response["output_boolq"] = bool_out or {}, response["output_shortq"] = short_out or {}), leaving empty dicts as the fallback; keep the existing all-failed error return but ensure its output_* values are empty objects ({}), not arrays.

coderabbitai · 2026-03-26T11:28:44Z

backend/server.py

+    # ✅ Question generation with timeout
+    generated, error = execute_with_timeout(
+        qg.generate,
+        60,
        article=input_text,
-        num_questions=input_questions,
+        num_questions=len(input_questions),
        answer_style="true_false"
    )


⚠️ Potential issue | 🔴 Critical

QuestionGenerator.generate() does not accept "true_false".

backend/Generator/main.py only allows answer_style values "all", "sentences", and "multiple_choice" here. This route will always fall into the error path even after qg is initialized, so it needs to call a boolean-capable generator instead of forcing an unsupported answer_style.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 981 - 988, The call to qg.generate via execute_with_timeout uses answer_style="true_false", but QuestionGenerator.generate only accepts "all", "sentences", or "multiple_choice", so update the code that constructs qg (the generator instance used here) to use a boolean-capable generator implementation (one that supports "true_false") or, if you prefer not to add a new generator, change the answer_style argument to one of the supported values ("all", "sentences", or "multiple_choice"); locate the qg instantiation (the code that creates the generator passed into qg.generate) and either replace it with a TrueFalse-capable generator class or add handling to map/convert "true_false" requests into a supported style before calling execute_with_timeout.

eduaid_web/src/pages/Output.jsx

coderabbitai · 2026-03-26T11:28:44Z

requirements.txt

 mammoth
 mediawikiapi
 PyMuPDF
 textblob
-llama-cpp-python
+Flask-Limiter==3.5.0
+yt-dlp
+llama-cpp-python
+strsimpy


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n requirements.txt | grep -A2 -B2 mammoth

Repository: AOSSIE-Org/EduAid

Length of output: 151

🏁 Script executed:

head -50 backend/Generator/main.py | tail -30

Repository: AOSSIE-Org/EduAid

Length of output: 1075

🏁 Script executed:

sed -n '360,375p' backend/Generator/main.py

Repository: AOSSIE-Org/EduAid

Length of output: 544

🌐 Web query:

mammoth library directory traversal vulnerability Python docx parser

💡 Result:

Yes, the Mammoth Python library (a .docx to HTML/Markdown converter) has a directory traversal vulnerability identified as CVE-2025-11849. It affects versions from 0.3.25 up to but not including 1.11.0. The vulnerability occurs due to lack of path or file type validation when processing .docx files containing images referenced via external links (r:link attribute instead of embedded r:embed). The library resolves these URIs to local file paths, reads the files, base64-encodes them, and embeds them in the output HTML as data URIs. Attackers can craft malicious .docx files with paths like ../../etc/passwd or /dev/random to read arbitrary files or cause resource exhaustion. Fix: Upgrade to Mammoth version 1.11.0 or higher, which disables external file access by default (via externalFileAccess: false). Official sources: PyPI (pypi.org/project/mammoth/), GitHub (github.com/mwilliamson/python-mammoth). Latest version as of 2026-03-13 is 1.12.0. No relation to any "mammoth library" in the context of directory services or unrelated projects.

Citations:

1: https://security.snyk.io/vuln/SNYK-JS-MAMMOTH-13554470

2: https://advisories.gitlab.com/pkg/npm/mammoth/CVE-2025-11849/

3: https://nvd.nist.gov/vuln/detail/CVE-2025-11849

4: https://www.sentinelone.com/vulnerability-database/cve-2025-11849/

5: https://security.snyk.io/package/pip/mammoth/1.10.0

6: https://pypi.org/project/mammoth/

7: https://github.com/mwilliamson/python-mammoth

8: https://security.snyk.io/vuln/SNYK-PYTHON-MAMMOTH-13561967

Upgrade mammoth to version 1.11.0 or later to patch CVE-2025-11849.

backend/Generator/main.py lines 365-368 use mammoth.extract_raw_text() on user-uploaded .docx files. Mammoth versions before 1.11.0 are vulnerable to directory traversal (CVE-2025-11849), allowing attackers to read arbitrary files or exhaust resources via crafted .docx files with external file references. Upgrade to mammoth>=1.11.0 immediately, which disables external file access by default.

🧰 Tools

🪛 OSV Scanner (2.3.5)

[CRITICAL] 31-31: mammoth 1.9.1: Mammoth is vulnerable to Directory Traversal

(GHSA-rmjr-87wv-gf87)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@requirements.txt` around lines 31 - 38, Update the mammoth dependency in requirements.txt to at least version 1.11.0 (e.g., change "mammoth" to "mammoth>=1.11.0") and rerun dependency install so that backend code using mammoth.extract_raw_text (in Generator/main.py) uses the patched library; after upgrading, run tests or a quick smoke test of the .docx upload path to confirm external file access is disabled and behavior remains correct.

mahek2016 added 30 commits March 12, 2026 20:49

feat: add rate limiting and abuse protection for API endpoints using …

512c301

…Flask-Limiter

fix: address CodeRabbit review comments

44578dd

fix: address CodeRabbit nitpick suggestions and protect heavy endpoints

33c8a42

fix: add rate limiting to generate_gform endpoint

dc6b0c7

fix: improve transcript endpoint security and rate limiting

30a433f

fix: improve transcript endpoint security and timeout handling

47eafe0

feat(backend): add request payload size limits to API endpoints

3e8f61d

fix: address CodeRabbit review comments

a161db2

fix: use resolved yt-dlp path correctly

cf1331a

fix: support document_url and doc_id for get_content endpoint

048b175

fix: handle empty or invalid JSON request bodies across backend endpo…

b0cc7d1

…ints

chore: add yt-dlp dependency required for transcript endpoint

77265d1

fix: address CodeRabbit review comments and improve code consistency

aa949cd

fix: address CodeRabbit review comments and improve code style

cdd951e

fix: handle Google Docs errors and restore form_link for backward com…

747cea4

…patibility

fix: correct conditional structure in generate_gform and resolve Code…

b0b570e

…Rabbit review

fix: robust timeout handling with multiprocessing and parallel execution

87ab02b

fix: handle None outputs safely and improve rate limiting

5893fd2

fix: ensure all qg.generate calls use timeout wrapper

6704c41

fix: final corrections - timeout, error handling, response logic

1dc9c9b

fix: finalize thread-based timeout implementation and import corrections

7566149

fix: minor improvements

b675ecb

fix: add validation for max_questions in get_problems

167f27b

Fix: prevent crash when generating quiz with all question types (AOSS…

49bf816

…IE-Org#620)

Resolve merge conflicts in requirements.txt

f892c51

Fix duplicate try block and clean get_content API

eb3a149

Fix missing ThreadPoolExecutor import

6af9572

Fix execute_with_timeout position

e96e86f

Fix mock fallback logic (remove blocking 500 responses)

e290b4c

Fix get_problems with safe generator execution and fallback handling

775a9ba

fix(web): prevent Output page crash from undefined forEach payload fi…

54c5bab

…elds

coderabbitai bot reviewed Mar 26, 2026

View reviewed changes

fix(web): resolve Output page crash and address CodeRabbit feedback

0970c6a

Uh oh!

Conversation

mahek2016 commented Mar 26, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Addressed Issues:

Additional Notes:

AI Usage Disclosure:

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mahek2016 commented Mar 26, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 26, 2026 •

edited

Loading