Skip to content

fix(web): prevent Output page crash from undefined forEach payload fields#628

Open
mahek2016 wants to merge 32 commits intoAOSSIE-Org:mainfrom
mahek2016:fix/output-foreach-crash-610
Open

fix(web): prevent Output page crash from undefined forEach payload fields#628
mahek2016 wants to merge 32 commits intoAOSSIE-Org:mainfrom
mahek2016:fix/output-foreach-crash-610

Conversation

@mahek2016
Copy link
Copy Markdown

@mahek2016 mahek2016 commented Mar 26, 2026

Addressed Issues:

Fixes #610

Additional Notes:

  • Added safe JSON parsing using try/catch to handle corrupted or missing localStorage data.
  • Normalized all iterable fields (output, output_mcq.questions, output_boolq.Boolean_Questions, output_shortq.questions) using a helper function to prevent runtime errors.
  • Replaced unsafe .forEach calls with guarded iterations.
  • Used optional chaining (?.) to safely access nested properties.
  • Prevented duplicate rendering across question types.
  • Ensured consistent handling of all question types (Boolean, MCQ, Short).

This improves the robustness of the Output page and prevents crashes due to incomplete or malformed payload data.

AI Usage Disclosure:

  • This PR does not contain AI-generated code at all.
  • This PR contains AI-generated code. I have read the AI Usage Policy and this PR complies with this policy. I have tested the code locally and I am responsible for it.

I have used the following AI models and tools:

  • ChatGPT (for guidance, debugging, and code structuring)

Checklist

  • My PR addresses a single issue, fixes a single bug or makes a single improvement.
  • My code follows the project's code style and conventions
  • If applicable, I have made corresponding changes or additions to the documentation
  • If applicable, I have made corresponding changes or additions to tests
  • My changes generate no new warnings or errors
  • I have joined the Discord server and I will share a link to this PR with the project maintainers there
  • I have read the Contribution Guidelines
  • Once I submit my PR, CodeRabbit AI will automatically review it and I will address CodeRabbit's comments.
  • I have filled this PR template completely and carefully, and I understand that my PR may be closed without review otherwise.

Summary by CodeRabbit

  • New Features

    • Added API rate limiting, request size caps, and per-endpoint timeouts to improve stability.
    • Returned structured form links instead of opening a browser.
  • Bug Fixes

    • Hardened input validation and clearer error responses (400/404/504/500) across endpoints.
    • Improved YouTube transcript handling and temporary file management.
    • Fixed PDF dropdown click handling and resilient localStorage parsing in Output page.
  • Refactor

    • Switched string-similarity implementation and updated dependency list for compatibility.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 26, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 04c71fd0-e11a-4c6e-8c43-e04c88c3aa75

📥 Commits

Reviewing files that changed from the base of the PR and between 54c5bab and 0970c6a.

📒 Files selected for processing (1)
  • eduaid_web/src/pages/Output.jsx
🚧 Files skipped from review as they are similar to previous changes (1)
  • eduaid_web/src/pages/Output.jsx

📝 Walkthrough

Walkthrough

The PR swaps string-similarity imports to strsimpy, tightens server infrastructure (request size cap, rate limiting, timeout executor), defers generator initialization and uses timeout/wrapped generation for endpoints, hardens request validation, fixes Output page localStorage parsing and missing-array crashes, and updates dependencies.

Changes

Cohort / File(s) Summary
String Similarity Migration
backend/Generator/main.py, backend/Generator/mcq.py
Changed import source for NormalizedLevenshtein from similarity.normalized_levenshteinstrsimpy.normalized_levenshtein; instantiation/usage unchanged.
Server Infrastructure & Endpoint Refactoring
backend/server.py
Added MAX_CONTENT_LENGTH, ProxyFix, Flask-Limiter rate limits and handlers; introduced execute_with_timeout(); deferred generator initializations to None; replaced some endpoints with mock responses; /get_problems now validates input, runs MediaWiki and per-generator generation in parallel with timeouts and fallbacks; tightened JSON parsing/validation across many handlers; updated /getTranscript to use yt-dlp with temp files and clearer error codes.
Frontend Output Crash Fix
eduaid_web/src/pages/Output.jsx
Wrapped localStorage parsing in try/catch, added normalizeArray() to coerce missing arrays to [], refactored combining of QA pairs, and added questionType to effect deps to recompute when type changes.
Dependencies
requirements.txt
Fixed malformed google-auth line; added Flask-Limiter==3.5.0, yt-dlp, and strsimpy; cleaned trailing newline.

Sequence Diagram

sequenceDiagram
    participant Client
    participant RateLimiter as Rate Limiter
    participant Server as Flask Server
    participant Executor as Timeout Executor
    participant MediaWiki
    participant MCQGen as MCQ Generator
    participant BoolQGen as Bool Generator
    participant ShortQGen as Short Generator

    Client->>RateLimiter: POST /get_problems
    RateLimiter->>Server: allow / reject
    Server->>Server: validate JSON & inputs
    Server->>MediaWiki: fetch/process content (with timeout via Executor)
    MediaWiki-->>Server: content / timeout
    Server->>Executor: start parallel generators (MCQ/BoolQ/ShortQ)
    par Parallel
        Executor->>MCQGen: generate (timeout)
        Executor->>BoolQGen: generate (timeout)
        Executor->>ShortQGen: generate (timeout)
    end
    MCQGen-->>Executor: result / error
    BoolQGen-->>Executor: result / error
    ShortQGen-->>Executor: result / error
    Executor-->>Server: merged results + fallbacks
    Server-->>Client: JSON response (results or aggregated error)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Poem

🐇 I nibbled through imports late at night,

Swapped in strsimpy to make matches right.
Timeouts hum, rate limits keep the beat,
LocalStorage now won't face defeat.
Hooray — small hops, and the app runs light!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning Changes in backend files (main.py, mcq.py, server.py) and dependency updates are unrelated to the Output page crash fix described in the title and linked issue #610. Remove or isolate backend changes (import replacements, rate limiting, timeouts) and dependency additions to separate PRs focused on their specific objectives.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title correctly identifies the primary fix: preventing Output page crashes from undefined forEach operations on payload fields.
Linked Issues check ✅ Passed Code changes address all requirements from issue #610: safe localStorage parsing with try/catch, array normalization helper, guarded iteration patterns, and optional chaining to prevent forEach crashes on undefined fields.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/server.py (1)

609-617: ⚠️ Potential issue | 🟠 Major

Wrap each boolean question in a list before calling the predictor.

AnswerPredictor.predict_boolean_answer() iterates payload["input_question"]. Passing a bare string here makes it score one character at a time, and if qa_response: then turns every non-empty question into "True".

Proposed fix
     for question in input_questions:
         qa_response = answer.predict_boolean_answer(
-            {"input_text": input_text, "input_question": question}
+            {"input_text": input_text, "input_question": [question]}
         )

-        if qa_response:
+        if qa_response and qa_response[0]:
             output.append("True")
         else:
             output.append("False")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 609 - 617, The loop is passing a bare string
to answer.predict_boolean_answer which iterates payload["input_question"]
character-by-character; change the call to pass the question as a single-item
list (e.g., {"input_text": input_text, "input_question": [question]}) and then
inspect the returned list element (qa_response[0]) rather than the truthiness of
qa_response before appending to output (use output.append("True") or "False"
based on qa_response[0], and guard for an empty qa_response).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/server.py`:
- Around line 94-102: The file sets MCQGen, BoolQGen, ShortQGen, answer, qg,
qa_model, and llm_generator to None which prevents real backends from being used
and causes 500s when routes dereference them; remove these unconditional None
assignments (or replace them with proper initializers) and ensure these symbols
are instantiated with their real classes (e.g., MCQGen = MCQGenerator(...),
BoolQGen = BoolQGenerator(...), ShortQGen = ShortQGenerator(...), qg =
QuestionGenerator(...), qa_model = QAModel(...), llm_generator =
LLMGenerator(...), answer = AnswerService(...)) during app startup or lazily
inside the route handlers for /get_mcq, /get_boolq, /get_shortq and the
corresponding _llm/_answer/_hard routes so that routes never see None before use
and any initialization errors are logged and handled gracefully.
- Around line 981-988: The call to qg.generate via execute_with_timeout uses
answer_style="true_false", but QuestionGenerator.generate only accepts "all",
"sentences", or "multiple_choice", so update the code that constructs qg (the
generator instance used here) to use a boolean-capable generator implementation
(one that supports "true_false") or, if you prefer not to add a new generator,
change the answer_style argument to one of the supported values ("all",
"sentences", or "multiple_choice"); locate the qg instantiation (the code that
creates the generator passed into qg.generate) and either replace it with a
TrueFalse-capable generator class or add handling to map/convert "true_false"
requests into a supported style before calling execute_with_timeout.
- Around line 234-248: The mock `/get_boolq` response is returning list items as
{"question","answer"} objects but the frontend (Output.jsx when questionType ===
"get_boolq") expects each item in the "output" array to be the question string;
change the `questions` variable to be a list of strings (e.g. ["AI is useful?",
...]) and keep the existing slicing (`questions = questions[:max_questions]`)
and the JSON response keys ("output", "status", "mock_mode", "warning") intact
so the UI receives an array of question strings rather than objects.
- Around line 490-519: The response currently flattens each generator to a bare
array; instead preserve the nested objects so the frontend can access
output_mcq.questions, output_boolq.Boolean_Questions and
output_shortq.questions. Update the assignments that use mcq_out, bool_out,
short_out so that when no error and isinstance(..., dict) you set
response["output_mcq"] = mcq_out or {} (and likewise response["output_boolq"] =
bool_out or {}, response["output_shortq"] = short_out or {}), leaving empty
dicts as the fallback; keep the existing all-failed error return but ensure its
output_* values are empty objects ({}), not arrays.
- Around line 43-58: The current execute_with_timeout uses ThreadPoolExecutor
and simply calls future.result(timeout=...), so on FuturesTimeoutError the
worker thread keeps running; replace this with a cancellable approach: either
switch to ProcessPoolExecutor (in execute_with_timeout) so you can
terminate/kill worker processes after timeout (call
process_executor.shutdown(...) and terminate child processes) or implement
cooperative cancellation by passing a threading.Event into the submitted
function (e.g., cancel_event) and have the target function periodically check
cancel_event.is_set(); update the submit call and function signatures
accordingly, handle FuturesTimeoutError by setting cancel_event.set() and then
shutting down the executor, and ensure executor.shutdown(wait=False) is still
called; reference execute_with_timeout, ThreadPoolExecutor, ProcessPoolExecutor,
future.result, FuturesTimeoutError, executor.shutdown, and the cooperative
cancel Event so reviewers can locate and modify the code.
- Around line 438-476: The generator wrapper functions run_mcq, run_boolq, and
run_shortq are calling MCQGen.generate_mcq / BoolQGen.generate_boolq /
ShortQGen.generate_shortq with keyword args, but those methods expect a single
payload dict; change each lambda passed to execute_with_timeout to call the
generator with a single dict payload (e.g., {'input_text': input_text,
'max_questions': max_questions_mcq} for run_mcq, analogous keys/values for
run_boolq and run_shortq) so the generators receive one argument as defined;
keep the same timeout and existing None/"not_available" checks and return
behavior.

In `@eduaid_web/src/pages/Output.jsx`:
- Around line 22-23: The current outside-click guard in the click handler (the
condition using dropdown and !event.target.closest('button')) is too broad and
prevents the dropdown from closing when any other button is clicked; narrow the
check to only ignore the specific PDF toggle instead. Modify the condition in
the Output.jsx click handler to look for a specific selector or ref for the PDF
toggle (e.g., event.target.closest('.pdf-toggle') or compare against a
pdfToggleRef/current element) instead of generic 'button', so only clicks on
that exact toggle prevent closing while all other outside clicks close the
dropdown.
- Around line 141-162: The current conditional skips iterating `output` when
`questionType === "get_mcq"`, dropping MCQs; update the logic in Output.jsx
around the `questionType` branches so that when `questionType === "get_mcq"` you
still iterate `output` (push into `combinedQaPairs`) and also include items from
`output_mcq.questions` (merge both sources), e.g., add an explicit `if
(questionType === "get_mcq")` branch that processes `output.forEach(...)` and
`output_mcq.questions.forEach(...)`, keeping the existing field mapping used for
other branches and retaining `combinedQaPairs` as the single merged array.

In `@requirements.txt`:
- Around line 31-38: Update the mammoth dependency in requirements.txt to at
least version 1.11.0 (e.g., change "mammoth" to "mammoth>=1.11.0") and rerun
dependency install so that backend code using mammoth.extract_raw_text (in
Generator/main.py) uses the patched library; after upgrading, run tests or a
quick smoke test of the .docx upload path to confirm external file access is
disabled and behavior remains correct.

---

Outside diff comments:
In `@backend/server.py`:
- Around line 609-617: The loop is passing a bare string to
answer.predict_boolean_answer which iterates payload["input_question"]
character-by-character; change the call to pass the question as a single-item
list (e.g., {"input_text": input_text, "input_question": [question]}) and then
inspect the returned list element (qa_response[0]) rather than the truthiness of
qa_response before appending to output (use output.append("True") or "False"
based on qa_response[0], and guard for an empty qa_response).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8725e66b-6869-4a93-8bbe-775ce182c4da

📥 Commits

Reviewing files that changed from the base of the PR and between 2038116 and 54c5bab.

⛔ Files ignored due to path filters (1)
  • backend/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (5)
  • backend/Generator/main.py
  • backend/Generator/mcq.py
  • backend/server.py
  • eduaid_web/src/pages/Output.jsx
  • requirements.txt

Comment on lines +43 to +58
def execute_with_timeout(func, timeout, *args, **kwargs):
executor = ThreadPoolExecutor(max_workers=1)
future = executor.submit(func, *args, **kwargs)

try:
result = future.result(timeout=timeout)
executor.shutdown(wait=False) # ✅ FIX
return result, None

except FuturesTimeoutError:
executor.shutdown(wait=False) # ✅ FIX
return None, "timeout"

except Exception:
executor.shutdown(wait=False)
logging.exception("Error occurred during execution")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "server.py" -path "*/backend/*" | head -5

Repository: AOSSIE-Org/EduAid

Length of output: 80


🏁 Script executed:

cat -n backend/server.py | sed -n '43,62p'

Repository: AOSSIE-Org/EduAid

Length of output: 855


🌐 Web query:

ThreadPoolExecutor future.result timeout does the function keep running Python concurrent.futures

💡 Result:

Yes, when calling future.result(timeout=...) on a ThreadPoolExecutor future in Python's concurrent.futures and the timeout expires, a TimeoutError is raised in the calling thread, but the submitted function continues running in its worker thread until completion. The timeout only limits how long the caller waits for the result; it does not cancel or terminate the underlying task. This behavior occurs because Python threads cannot be forcibly terminated from another thread (due to risks like leaving locks held). The future.cancel method only works for pending (not yet started) tasks; for running tasks, it returns False. Official documentation confirms: Future.result(timeout) waits up to the specified seconds and raises TimeoutError if not completed, but does not mention stopping the task. To stop long-running tasks, make them cooperative by checking a shared threading.Event flag periodically, or use ProcessPoolExecutor (though it also doesn't auto-terminate on timeout without extra work).

Citations:


This timeout only stops waiting; it does not stop the work.

After future.result(timeout=...) raises TimeoutError, the submitted function continues running in its worker thread, and executor.shutdown(wait=False) returns immediately without terminating the thread. On model endpoints this means timed-out jobs keep burning CPU/GPU after a 504 response; repeated timeouts can accumulate and snowball into resource exhaustion. To enforce a hard execution cap, the work must either check a cooperative cancellation flag (threading.Event) periodically, or use a killable mechanism like ProcessPoolExecutor (which also requires additional cancellation logic beyond shutdown).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 43 - 58, The current execute_with_timeout
uses ThreadPoolExecutor and simply calls future.result(timeout=...), so on
FuturesTimeoutError the worker thread keeps running; replace this with a
cancellable approach: either switch to ProcessPoolExecutor (in
execute_with_timeout) so you can terminate/kill worker processes after timeout
(call process_executor.shutdown(...) and terminate child processes) or implement
cooperative cancellation by passing a threading.Event into the submitted
function (e.g., cancel_event) and have the target function periodically check
cancel_event.is_set(); update the submit call and function signatures
accordingly, handle FuturesTimeoutError by setting cancel_event.set() and then
shutting down the executor, and ensure executor.shutdown(wait=False) is still
called; reference execute_with_timeout, ThreadPoolExecutor, ProcessPoolExecutor,
future.result, FuturesTimeoutError, executor.shutdown, and the cooperative
cancel Event so reviewers can locate and modify the code.

Comment on lines +94 to +102
MCQGen = None
answer = None
BoolQGen = None
ShortQGen = None
qg = None
file_processor = main.FileProcessor()
mediawikiapi = MediaWikiAPI()
qa_model = pipeline("question-answering")
llm_generator = LLMQuestionGenerator()

qa_model = None
llm_generator = None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

These None assignments disable the real backend paths.

Nothing in this module reinitializes MCQGen, BoolQGen, ShortQGen, answer, qg, qa_model, or llm_generator after setting them to None. That leaves /get_mcq/get_boolq/get_shortq permanently in mock mode, while /get_*_llm, /get_*_answer, /get_boolean_answer, and the *_hard routes dereference None and 500.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 94 - 102, The file sets MCQGen, BoolQGen,
ShortQGen, answer, qg, qa_model, and llm_generator to None which prevents real
backends from being used and causes 500s when routes dereference them; remove
these unconditional None assignments (or replace them with proper initializers)
and ensure these symbols are instantiated with their real classes (e.g., MCQGen
= MCQGenerator(...), BoolQGen = BoolQGenerator(...), ShortQGen =
ShortQGenerator(...), qg = QuestionGenerator(...), qa_model = QAModel(...),
llm_generator = LLMGenerator(...), answer = AnswerService(...)) during app
startup or lazily inside the route handlers for /get_mcq, /get_boolq,
/get_shortq and the corresponding _llm/_answer/_hard routes so that routes never
see None before use and any initialization errors are logged and handled
gracefully.

Comment on lines +438 to +476
# ✅ SAFE GENERATOR CALLS (FIXED WITH LAMBDA)
def run_mcq():
if MCQGen is None:
logging.warning("MCQGen not initialized")
return None, "not_available"

return execute_with_timeout(
lambda: MCQGen.generate_mcq(
input_text=input_text,
max_questions=max_questions_mcq
),
60
)

def run_boolq():
if BoolQGen is None:
logging.warning("BoolQGen not initialized")
return None, "not_available"

return execute_with_timeout(
lambda: BoolQGen.generate_boolq(
input_text=input_text,
max_questions=max_questions_boolq
),
60
)

def run_shortq():
if ShortQGen is None:
logging.warning("ShortQGen not initialized")
return None, "not_available"

return execute_with_timeout(
lambda: ShortQGen.generate_shortq(
input_text=input_text,
max_questions=max_questions_shortq
),
60
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Pass payload dicts into the generator methods.

backend/Generator/main.py defines generate_mcq(), generate_boolq(), and generate_shortq() as single-argument payload methods. Calling them with input_text= / max_questions= kwargs raises TypeError, so /get_problems can never reach the real generation path once those generators are re-enabled.

Proposed fix
     def run_mcq():
         if MCQGen is None:
             logging.warning("MCQGen not initialized")
             return None, "not_available"

         return execute_with_timeout(
-            lambda: MCQGen.generate_mcq(
-                input_text=input_text,
-                max_questions=max_questions_mcq
-            ),
-            60
+            MCQGen.generate_mcq,
+            60,
+            {
+                "input_text": input_text,
+                "max_questions": max_questions_mcq,
+            },
         )

     def run_boolq():
         if BoolQGen is None:
             logging.warning("BoolQGen not initialized")
             return None, "not_available"

         return execute_with_timeout(
-            lambda: BoolQGen.generate_boolq(
-                input_text=input_text,
-                max_questions=max_questions_boolq
-            ),
-            60
+            BoolQGen.generate_boolq,
+            60,
+            {
+                "input_text": input_text,
+                "max_questions": max_questions_boolq,
+            },
         )

     def run_shortq():
         if ShortQGen is None:
             logging.warning("ShortQGen not initialized")
             return None, "not_available"

         return execute_with_timeout(
-            lambda: ShortQGen.generate_shortq(
-                input_text=input_text,
-                max_questions=max_questions_shortq
-            ),
-            60
+            ShortQGen.generate_shortq,
+            60,
+            {
+                "input_text": input_text,
+                "max_questions": max_questions_shortq,
+            },
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 438 - 476, The generator wrapper functions
run_mcq, run_boolq, and run_shortq are calling MCQGen.generate_mcq /
BoolQGen.generate_boolq / ShortQGen.generate_shortq with keyword args, but those
methods expect a single payload dict; change each lambda passed to
execute_with_timeout to call the generator with a single dict payload (e.g.,
{'input_text': input_text, 'max_questions': max_questions_mcq} for run_mcq,
analogous keys/values for run_boolq and run_shortq) so the generators receive
one argument as defined; keep the same timeout and existing None/"not_available"
checks and return behavior.

Comment on lines +490 to +519
# ✅ FINAL SAFE RESPONSE
response = {
"output_mcq": [],
"output_boolq": [],
"output_shortq": []
}

if not mcq_err and isinstance(mcq_out, dict):
response["output_mcq"] = mcq_out.get("questions", []) or []

if not bool_err and isinstance(bool_out, dict):
response["output_boolq"] = bool_out.get("Boolean_Questions", []) or []

if not short_err and isinstance(short_out, dict):
response["output_shortq"] = short_out.get("questions", []) or []

# ✅ If ALL failed
if mcq_err and bool_err and short_err:
all_errors = [mcq_err, bool_err, short_err]

status = 504 if all(e == "timeout" for e in all_errors) else 500
code = "timeout" if status == 504 else "all_failed"

return jsonify({
"error": "All generators failed",
"code": code,
"output_mcq": [],
"output_boolq": [],
"output_shortq": []
}), status
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Preserve the nested output_* objects in the mixed response.

The Output page reads output_mcq.questions, output_boolq.Boolean_Questions plus Text, and output_shortq.questions. Flattening each generator result to a bare array makes those lookups undefined, so mixed MCQ/Boolean/Short results disappear even though the page no longer crashes.

Proposed response-shape fix
-    response = {
-        "output_mcq": [],
-        "output_boolq": [],
-        "output_shortq": []
-    }
+    response = {
+        "output_mcq": {"questions": []},
+        "output_boolq": {"Boolean_Questions": [], "Text": input_text},
+        "output_shortq": {"questions": []},
+    }

     if not mcq_err and isinstance(mcq_out, dict):
-        response["output_mcq"] = mcq_out.get("questions", []) or []
+        response["output_mcq"]["questions"] = mcq_out.get("questions", []) or []

     if not bool_err and isinstance(bool_out, dict):
-        response["output_boolq"] = bool_out.get("Boolean_Questions", []) or []
+        response["output_boolq"]["Boolean_Questions"] = bool_out.get("Boolean_Questions", []) or []
+        response["output_boolq"]["Text"] = bool_out.get("Text", input_text)

     if not short_err and isinstance(short_out, dict):
-        response["output_shortq"] = short_out.get("questions", []) or []
+        response["output_shortq"]["questions"] = short_out.get("questions", []) or []
@@
         return jsonify({
             "error": "All generators failed",
             "code": code,
-            "output_mcq": [],
-            "output_boolq": [],
-            "output_shortq": []
+            "output_mcq": {"questions": []},
+            "output_boolq": {"Boolean_Questions": [], "Text": input_text},
+            "output_shortq": {"questions": []}
         }), status

Based on learnings, mixed question types are intentionally supported in eduaid_web/src/pages/Output.jsx, and output_mcq plus output are distinct data sources.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 490 - 519, The response currently flattens
each generator to a bare array; instead preserve the nested objects so the
frontend can access output_mcq.questions, output_boolq.Boolean_Questions and
output_shortq.questions. Update the assignments that use mcq_out, bool_out,
short_out so that when no error and isinstance(..., dict) you set
response["output_mcq"] = mcq_out or {} (and likewise response["output_boolq"] =
bool_out or {}, response["output_shortq"] = short_out or {}), leaving empty
dicts as the fallback; keep the existing all-failed error return but ensure its
output_* values are empty objects ({}), not arrays.

Comment on lines +981 to 988
# ✅ Question generation with timeout
generated, error = execute_with_timeout(
qg.generate,
60,
article=input_text,
num_questions=input_questions,
num_questions=len(input_questions),
answer_style="true_false"
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

QuestionGenerator.generate() does not accept "true_false".

backend/Generator/main.py only allows answer_style values "all", "sentences", and "multiple_choice" here. This route will always fall into the error path even after qg is initialized, so it needs to call a boolean-capable generator instead of forcing an unsupported answer_style.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 981 - 988, The call to qg.generate via
execute_with_timeout uses answer_style="true_false", but
QuestionGenerator.generate only accepts "all", "sentences", or
"multiple_choice", so update the code that constructs qg (the generator instance
used here) to use a boolean-capable generator implementation (one that supports
"true_false") or, if you prefer not to add a new generator, change the
answer_style argument to one of the supported values ("all", "sentences", or
"multiple_choice"); locate the qg instantiation (the code that creates the
generator passed into qg.generate) and either replace it with a
TrueFalse-capable generator class or add handling to map/convert "true_false"
requests into a supported style before calling execute_with_timeout.

Comment on lines 31 to +38
mammoth
mediawikiapi
PyMuPDF
textblob
llama-cpp-python No newline at end of file
Flask-Limiter==3.5.0
yt-dlp
llama-cpp-python
strsimpy
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n requirements.txt | grep -A2 -B2 mammoth

Repository: AOSSIE-Org/EduAid

Length of output: 151


🏁 Script executed:

head -50 backend/Generator/main.py | tail -30

Repository: AOSSIE-Org/EduAid

Length of output: 1075


🏁 Script executed:

sed -n '360,375p' backend/Generator/main.py

Repository: AOSSIE-Org/EduAid

Length of output: 544


🌐 Web query:

mammoth library directory traversal vulnerability Python docx parser

💡 Result:

Yes, the Mammoth Python library (a .docx to HTML/Markdown converter) has a directory traversal vulnerability identified as CVE-2025-11849. It affects versions from 0.3.25 up to but not including 1.11.0. The vulnerability occurs due to lack of path or file type validation when processing .docx files containing images referenced via external links (r:link attribute instead of embedded r:embed). The library resolves these URIs to local file paths, reads the files, base64-encodes them, and embeds them in the output HTML as data URIs. Attackers can craft malicious .docx files with paths like ../../etc/passwd or /dev/random to read arbitrary files or cause resource exhaustion. Fix: Upgrade to Mammoth version 1.11.0 or higher, which disables external file access by default (via externalFileAccess: false). Official sources: PyPI (pypi.org/project/mammoth/), GitHub (github.com/mwilliamson/python-mammoth). Latest version as of 2026-03-13 is 1.12.0. No relation to any "mammoth library" in the context of directory services or unrelated projects.

Citations:


Upgrade mammoth to version 1.11.0 or later to patch CVE-2025-11849.

backend/Generator/main.py lines 365-368 use mammoth.extract_raw_text() on user-uploaded .docx files. Mammoth versions before 1.11.0 are vulnerable to directory traversal (CVE-2025-11849), allowing attackers to read arbitrary files or exhaust resources via crafted .docx files with external file references. Upgrade to mammoth>=1.11.0 immediately, which disables external file access by default.

🧰 Tools
🪛 OSV Scanner (2.3.5)

[CRITICAL] 31-31: mammoth 1.9.1: Mammoth is vulnerable to Directory Traversal

(GHSA-rmjr-87wv-gf87)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@requirements.txt` around lines 31 - 38, Update the mammoth dependency in
requirements.txt to at least version 1.11.0 (e.g., change "mammoth" to
"mammoth>=1.11.0") and rerun dependency install so that backend code using
mammoth.extract_raw_text (in Generator/main.py) uses the patched library; after
upgrading, run tests or a quick smoke test of the .docx upload path to confirm
external file access is disabled and behavior remains correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG][HIGH PRIORITY]: Output page crashes with Cannot read properties of undefined (reading forEach)

1 participant