fix(backend): always cleanup uploaded temp files by Ashvin-KS · Pull Request #591 · AOSSIE-Org/EduAid

Ashvin-KS · 2026-03-16T16:04:45Z

Addressed Issues:

Fixes #589

Screenshots/Recordings:

Not applicable (backend-only resource cleanup fix).

Additional Notes:

This PR fixes a temporary file cleanup issue in upload processing.
If extraction failed for uploaded files, temp files could remain on disk.
This PR adds guaranteed cleanup so temporary uploaded files are removed even when extraction raises an exception.
Scope is intentionally minimal: single-issue, single-file reliability/security fix.

AI Usage Disclosure:

We encourage contributors to use AI tools responsibly when creating Pull Requests. While AI can be a valuable aid, it is essential to ensure that your contributions meet the task requirements, build successfully, include relevant tests, and pass all linters. Submissions that do not meet these standards may be closed without warning to maintain the quality and integrity of the project. Please take the time to understand the changes you are proposing and their impact. AI slop is strongly discouraged and may lead to banning and blocking. Do not spam our repos with AI slop.

Check one of the checkboxes below:

This PR does not contain AI-generated code at all.
This PR contains AI-generated code. I have read the AI Usage Policy and this PR complies with this policy. I have tested the code locally and I am responsible for it.

I have used the following AI models and tools: GPT-5.3-Codex via GitHub Copilot, local terminal validation.

Checklist

My PR addresses a single issue, fixes a single bug or makes a single improvement.
My code follows the project's code style and conventions
If applicable, I have made corresponding changes or additions to the documentation
If applicable, I have made corresponding changes or additions to tests
My changes generate no new warnings or errors
I have joined the Discord server and I will share a link to this PR with the project maintainers there
I have read the Contribution Guidelines
Once I submit my PR, CodeRabbit AI will automatically review it and I will address CodeRabbit's comments.
I have filled this PR template completely and carefully, and I understand that my PR may be closed without review otherwise.

Summary by CodeRabbit

Bug Fixes
- Safer file upload handling: uploads are stored with unique, collision-resistant names while preserving file type to avoid naming conflicts.
- More robust cleanup after processing: temporary files are removed reliably, with graceful handling if already absent.
- More consistent content extraction across supported file types.

coderabbitai · 2026-03-16T16:05:02Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dc93c300-ccb7-49b4-a946-9b595071ea49

📥 Commits

Reviewing files that changed from the base of the PR and between 69da1d7 and 7490a65.

📒 Files selected for processing (1)

backend/Generator/main.py

🚧 Files skipped from review as they are similar to previous changes (1)

backend/Generator/main.py

📝 Walkthrough

Walkthrough

FileProcessor.process_file now saves uploads under a UUID-based filename (preserving original extension), extracts content based on extension (.txt, .pdf, .docx) inside a try block, and always removes the temporary file in a finally block that ignores FileNotFoundError.

Changes

Cohort / File(s)	Summary
Upload processing & cleanup `backend/Generator/main.py`	Add `uuid` import; save uploaded files to a UUID-based safe filename (keep extension); branch extraction for `.txt`, `.pdf`, `.docx`; wrap processing in try/finally to ensure removal of temp file and ignore `FileNotFoundError` during cleanup.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐇 I name each file with a UUID hat,
I read, try, and tidy—leave no spat,
Finally I clean, no crumbs remain,
Temp files gone — hop home again. 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically summarizes the main change: adding guaranteed cleanup of temporary uploaded files.
Linked Issues check	✅ Passed	The PR directly addresses issue `#589` by modifying file cleanup to use a finally block that guarantees removal of temporary uploaded files on all code paths, including exception paths.
Out of Scope Changes check	✅ Passed	All changes in backend/Generator/main.py are directly related to fixing the temporary file cleanup issue; no unrelated modifications were introduced.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This PR updates the backend file ingestion flow to ensure temporary uploaded files are cleaned up reliably, even when text extraction fails, reducing leftover files in the upload directory.

Changes:

Wrap file content extraction in a try/finally to guarantee temp-file deletion.
Add a defensive existence check before deleting the temp file.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

backend/Generator/main.py

@@ -372,16 +372,18 @@ def process_file(self, file):
        file.save(file_path)


backend/Generator/main.py

@@ -372,16 +372,18 @@ def process_file(self, file):
        file.save(file_path)
        content = ""

-        if file.filename.endswith('.txt'):
-            with open(file_path, 'r') as f:
-                content = f.read()
-        elif file.filename.endswith('.pdf'):
-            content = self.extract_text_from_pdf(file_path)
-        elif file.filename.endswith('.docx'):
-            content = self.extract_text_from_docx(file_path)
-
-        os.remove(file_path)
-        return content
+        try:
+            if file.filename.endswith('.txt'):


backend/Generator/main.py

+            if os.path.exists(file_path):
+                os.remove(file_path)


coderabbitai

🧹 Nitpick comments (1)

backend/Generator/main.py (1)

384-386: Prevent cleanup errors from masking extraction failures.

The os.remove() in the finally block (lines 384-386) can raise an exception that becomes the primary exception raised to the caller. While the original extraction exception is preserved in exception chaining via __context__, this masks the real failure and complicates debugging. Wrap the cleanup in a try-except to catch and log deletion errors instead.

Proposed patch

+import logging
@@
 class FileProcessor:
@@
     def process_file(self, file):
@@
         try:
             if file.filename.endswith('.txt'):
                 with open(file_path, 'r') as f:
                     content = f.read()
             elif file.filename.endswith('.pdf'):
                 content = self.extract_text_from_pdf(file_path)
             elif file.filename.endswith('.docx'):
                 content = self.extract_text_from_docx(file_path)
             return content
         finally:
-            if os.path.exists(file_path):
-                os.remove(file_path)
+            try:
+                os.remove(file_path)
+            except FileNotFoundError:
+                pass
+            except OSError as exc:
+                logging.warning("Temp file cleanup failed for %s: %s", file_path, exc)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/Generator/main.py` around lines 384 - 386, Wrap the file deletion in
the finally block around os.remove(file_path) in a try/except so a deletion
error cannot override the original extraction exception; catch Exception as e
and log it (e.g., logger.exception or logging.exception) with context like
"failed to remove temporary file" and the file_path, but do not re-raise so the
original exception remains the primary failure; update the finally block where
os.remove(file_path) is called to perform this safe-delete.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@backend/Generator/main.py`:
- Around line 384-386: Wrap the file deletion in the finally block around
os.remove(file_path) in a try/except so a deletion error cannot override the
original extraction exception; catch Exception as e and log it (e.g.,
logger.exception or logging.exception) with context like "failed to remove
temporary file" and the file_path, but do not re-raise so the original exception
remains the primary failure; update the finally block where os.remove(file_path)
is called to perform this safe-delete.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 21c68693-b28a-41b3-ab88-be1419a2c44c

📥 Commits

Reviewing files that changed from the base of the PR and between fc3bf1a and c6e36d2.

📒 Files selected for processing (1)

backend/Generator/main.py

backend/Generator/main.py

coderabbitai

🧹 Nitpick comments (2)

backend/Generator/main.py (2)

372-393: Please add a regression test for malformed upload extraction failures.

This fix is high-value reliability behavior; a test that uploads a corrupted PDF/DOCX and asserts no leftover temp file would lock the behavior in.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/Generator/main.py` around lines 372 - 393, Add a regression test that
uploads a corrupted PDF and a corrupted DOCX to the upload handler path that
uses file.save, extract_text_from_pdf and extract_text_from_docx (the logic
shown where safe_filename is created and file_path is removed in finally), and
assert that after the handler runs (even if
extract_text_from_pdf/extract_text_from_docx raises) the temporary file does not
exist in upload_folder; simulate the uploaded file object (with filename and
save()) or use test client multipart upload, cause extraction to fail (e.g.,
provide truncated/corrupted bytes) and assert no leftover file_path remains and
that the code path that catches FileNotFoundError still passes.

390-393: Prevent cleanup failures from masking the original processing error.

At Line 391, non-FileNotFoundError OSErrors raised in finally can replace the root extraction failure. Consider best-effort cleanup for broader OSError with warning-level logging.

♻️ Proposed adjustment

         finally:
             try:
                 os.remove(file_path)
             except FileNotFoundError:
                 pass
+            except OSError as cleanup_error:
+                # best-effort cleanup; avoid masking extraction exceptions
+                print(f"Warning: failed to remove temp file {file_path}: {cleanup_error}")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/Generator/main.py` around lines 390 - 393, The current cleanup uses
try: os.remove(file_path) except FileNotFoundError: pass which can let other
OSError variants raised during cleanup overwrite the original processing
exception; change the cleanup in the finally block to catch OSError (not just
FileNotFoundError) and log a warning instead of suppressing or re-raising it so
the original error remains visible — update the os.remove(file_path) handler to:
except OSError as e: logger.warning("Failed to remove %s during cleanup: %s",
file_path, e, exc_info=True) (or use the module's existing logger) so cleanup is
best-effort and won't mask the root error.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@backend/Generator/main.py`:
- Around line 372-393: Add a regression test that uploads a corrupted PDF and a
corrupted DOCX to the upload handler path that uses file.save,
extract_text_from_pdf and extract_text_from_docx (the logic shown where
safe_filename is created and file_path is removed in finally), and assert that
after the handler runs (even if extract_text_from_pdf/extract_text_from_docx
raises) the temporary file does not exist in upload_folder; simulate the
uploaded file object (with filename and save()) or use test client multipart
upload, cause extraction to fail (e.g., provide truncated/corrupted bytes) and
assert no leftover file_path remains and that the code path that catches
FileNotFoundError still passes.
- Around line 390-393: The current cleanup uses try: os.remove(file_path) except
FileNotFoundError: pass which can let other OSError variants raised during
cleanup overwrite the original processing exception; change the cleanup in the
finally block to catch OSError (not just FileNotFoundError) and log a warning
instead of suppressing or re-raising it so the original error remains visible —
update the os.remove(file_path) handler to: except OSError as e:
logger.warning("Failed to remove %s during cleanup: %s", file_path, e,
exc_info=True) (or use the module's existing logger) so cleanup is best-effort
and won't mask the root error.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fb5655a7-6dbe-470a-bac5-bce31a640fc8

📥 Commits

Reviewing files that changed from the base of the PR and between c6e36d2 and 69da1d7.

📒 Files selected for processing (1)

backend/Generator/main.py

fix(backend): always cleanup uploaded temp files

c6e36d2

Copilot AI review requested due to automatic review settings March 16, 2026 16:04

Copilot started reviewing on behalf of Ashvin-KS March 16, 2026 16:05 View session

Copilot AI reviewed Mar 16, 2026

View reviewed changes

coderabbitai bot reviewed Mar 16, 2026

View reviewed changes

Ashvin-KS mentioned this pull request Mar 17, 2026

[BUG][SECURITY]: Uploaded temporary files are not cleaned up when file extraction fails #589

Open

2 tasks

fix(backend): harden upload temp-file handling

69da1d7

github-advanced-security bot found potential problems Mar 18, 2026

View reviewed changes

backend/Generator/main.py Fixed Show fixed Hide fixed

backend/Generator/main.py Fixed Show fixed Hide fixed

backend/Generator/main.py Fixed Show fixed Hide fixed

fix(backend): remove user-derived extension from temp path

7490a65

coderabbitai bot reviewed Mar 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(backend): always cleanup uploaded temp files#591

fix(backend): always cleanup uploaded temp files#591
Ashvin-KS wants to merge 3 commits intoAOSSIE-Org:mainfrom
Ashvin-KS:fix/upload-tempfile-cleanup

Ashvin-KS commented Mar 16, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 16, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

Copilot AI left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -372,16 +372,18 @@ def process_file(self, file):
		file.save(file_path)

Uh oh!

Conversation

Ashvin-KS commented Mar 16, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Addressed Issues:

Screenshots/Recordings:

Additional Notes:

AI Usage Disclosure:

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Ashvin-KS commented Mar 16, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 16, 2026 •

edited

Loading