Skip to content

fix(backend): always cleanup uploaded temp files#591

Open
Ashvin-KS wants to merge 3 commits intoAOSSIE-Org:mainfrom
Ashvin-KS:fix/upload-tempfile-cleanup
Open

fix(backend): always cleanup uploaded temp files#591
Ashvin-KS wants to merge 3 commits intoAOSSIE-Org:mainfrom
Ashvin-KS:fix/upload-tempfile-cleanup

Conversation

@Ashvin-KS
Copy link
Copy Markdown
Contributor

@Ashvin-KS Ashvin-KS commented Mar 16, 2026

Addressed Issues:

Fixes #589

Screenshots/Recordings:

Not applicable (backend-only resource cleanup fix).

Additional Notes:

This PR fixes a temporary file cleanup issue in upload processing.
If extraction failed for uploaded files, temp files could remain on disk.
This PR adds guaranteed cleanup so temporary uploaded files are removed even when extraction raises an exception.
Scope is intentionally minimal: single-issue, single-file reliability/security fix.

AI Usage Disclosure:

We encourage contributors to use AI tools responsibly when creating Pull Requests. While AI can be a valuable aid, it is essential to ensure that your contributions meet the task requirements, build successfully, include relevant tests, and pass all linters. Submissions that do not meet these standards may be closed without warning to maintain the quality and integrity of the project. Please take the time to understand the changes you are proposing and their impact. AI slop is strongly discouraged and may lead to banning and blocking. Do not spam our repos with AI slop.

Check one of the checkboxes below:

  • This PR does not contain AI-generated code at all.
  • This PR contains AI-generated code. I have read the AI Usage Policy and this PR complies with this policy. I have tested the code locally and I am responsible for it.

I have used the following AI models and tools: GPT-5.3-Codex via GitHub Copilot, local terminal validation.

Checklist

  • My PR addresses a single issue, fixes a single bug or makes a single improvement.
  • My code follows the project's code style and conventions
  • If applicable, I have made corresponding changes or additions to the documentation
  • If applicable, I have made corresponding changes or additions to tests
  • My changes generate no new warnings or errors
  • I have joined the Discord server and I will share a link to this PR with the project maintainers there
  • I have read the Contribution Guidelines
  • Once I submit my PR, CodeRabbit AI will automatically review it and I will address CodeRabbit's comments.
  • I have filled this PR template completely and carefully, and I understand that my PR may be closed without review otherwise.

Summary by CodeRabbit

  • Bug Fixes
    • Safer file upload handling: uploads are stored with unique, collision-resistant names while preserving file type to avoid naming conflicts.
    • More robust cleanup after processing: temporary files are removed reliably, with graceful handling if already absent.
    • More consistent content extraction across supported file types.

Copilot AI review requested due to automatic review settings March 16, 2026 16:04
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 16, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dc93c300-ccb7-49b4-a946-9b595071ea49

📥 Commits

Reviewing files that changed from the base of the PR and between 69da1d7 and 7490a65.

📒 Files selected for processing (1)
  • backend/Generator/main.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/Generator/main.py

📝 Walkthrough

Walkthrough

FileProcessor.process_file now saves uploads under a UUID-based filename (preserving original extension), extracts content based on extension (.txt, .pdf, .docx) inside a try block, and always removes the temporary file in a finally block that ignores FileNotFoundError.

Changes

Cohort / File(s) Summary
Upload processing & cleanup
backend/Generator/main.py
Add uuid import; save uploaded files to a UUID-based safe filename (keep extension); branch extraction for .txt, .pdf, .docx; wrap processing in try/finally to ensure removal of temp file and ignore FileNotFoundError during cleanup.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐇 I name each file with a UUID hat,
I read, try, and tidy—leave no spat,
Finally I clean, no crumbs remain,
Temp files gone — hop home again. 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: adding guaranteed cleanup of temporary uploaded files.
Linked Issues check ✅ Passed The PR directly addresses issue #589 by modifying file cleanup to use a finally block that guarantees removal of temporary uploaded files on all code paths, including exception paths.
Out of Scope Changes check ✅ Passed All changes in backend/Generator/main.py are directly related to fixing the temporary file cleanup issue; no unrelated modifications were introduced.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the backend file ingestion flow to ensure temporary uploaded files are cleaned up reliably, even when text extraction fails, reducing leftover files in the upload directory.

Changes:

  • Wrap file content extraction in a try/finally to guarantee temp-file deletion.
  • Add a defensive existence check before deleting the temp file.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines 371 to 372
@@ -372,16 +372,18 @@ def process_file(self, file):
file.save(file_path)
Comment on lines +371 to +376
@@ -372,16 +372,18 @@ def process_file(self, file):
file.save(file_path)
content = ""

if file.filename.endswith('.txt'):
with open(file_path, 'r') as f:
content = f.read()
elif file.filename.endswith('.pdf'):
content = self.extract_text_from_pdf(file_path)
elif file.filename.endswith('.docx'):
content = self.extract_text_from_docx(file_path)

os.remove(file_path)
return content
try:
if file.filename.endswith('.txt'):
Comment on lines +385 to +386
if os.path.exists(file_path):
os.remove(file_path)
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
backend/Generator/main.py (1)

384-386: Prevent cleanup errors from masking extraction failures.

The os.remove() in the finally block (lines 384-386) can raise an exception that becomes the primary exception raised to the caller. While the original extraction exception is preserved in exception chaining via __context__, this masks the real failure and complicates debugging. Wrap the cleanup in a try-except to catch and log deletion errors instead.

Proposed patch
+import logging
@@
 class FileProcessor:
@@
     def process_file(self, file):
@@
         try:
             if file.filename.endswith('.txt'):
                 with open(file_path, 'r') as f:
                     content = f.read()
             elif file.filename.endswith('.pdf'):
                 content = self.extract_text_from_pdf(file_path)
             elif file.filename.endswith('.docx'):
                 content = self.extract_text_from_docx(file_path)
             return content
         finally:
-            if os.path.exists(file_path):
-                os.remove(file_path)
+            try:
+                os.remove(file_path)
+            except FileNotFoundError:
+                pass
+            except OSError as exc:
+                logging.warning("Temp file cleanup failed for %s: %s", file_path, exc)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/Generator/main.py` around lines 384 - 386, Wrap the file deletion in
the finally block around os.remove(file_path) in a try/except so a deletion
error cannot override the original extraction exception; catch Exception as e
and log it (e.g., logger.exception or logging.exception) with context like
"failed to remove temporary file" and the file_path, but do not re-raise so the
original exception remains the primary failure; update the finally block where
os.remove(file_path) is called to perform this safe-delete.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@backend/Generator/main.py`:
- Around line 384-386: Wrap the file deletion in the finally block around
os.remove(file_path) in a try/except so a deletion error cannot override the
original extraction exception; catch Exception as e and log it (e.g.,
logger.exception or logging.exception) with context like "failed to remove
temporary file" and the file_path, but do not re-raise so the original exception
remains the primary failure; update the finally block where os.remove(file_path)
is called to perform this safe-delete.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 21c68693-b28a-41b3-ab88-be1419a2c44c

📥 Commits

Reviewing files that changed from the base of the PR and between fc3bf1a and c6e36d2.

📒 Files selected for processing (1)
  • backend/Generator/main.py

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
backend/Generator/main.py (2)

372-393: Please add a regression test for malformed upload extraction failures.

This fix is high-value reliability behavior; a test that uploads a corrupted PDF/DOCX and asserts no leftover temp file would lock the behavior in.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/Generator/main.py` around lines 372 - 393, Add a regression test that
uploads a corrupted PDF and a corrupted DOCX to the upload handler path that
uses file.save, extract_text_from_pdf and extract_text_from_docx (the logic
shown where safe_filename is created and file_path is removed in finally), and
assert that after the handler runs (even if
extract_text_from_pdf/extract_text_from_docx raises) the temporary file does not
exist in upload_folder; simulate the uploaded file object (with filename and
save()) or use test client multipart upload, cause extraction to fail (e.g.,
provide truncated/corrupted bytes) and assert no leftover file_path remains and
that the code path that catches FileNotFoundError still passes.

390-393: Prevent cleanup failures from masking the original processing error.

At Line 391, non-FileNotFoundError OSErrors raised in finally can replace the root extraction failure. Consider best-effort cleanup for broader OSError with warning-level logging.

♻️ Proposed adjustment
         finally:
             try:
                 os.remove(file_path)
             except FileNotFoundError:
                 pass
+            except OSError as cleanup_error:
+                # best-effort cleanup; avoid masking extraction exceptions
+                print(f"Warning: failed to remove temp file {file_path}: {cleanup_error}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/Generator/main.py` around lines 390 - 393, The current cleanup uses
try: os.remove(file_path) except FileNotFoundError: pass which can let other
OSError variants raised during cleanup overwrite the original processing
exception; change the cleanup in the finally block to catch OSError (not just
FileNotFoundError) and log a warning instead of suppressing or re-raising it so
the original error remains visible — update the os.remove(file_path) handler to:
except OSError as e: logger.warning("Failed to remove %s during cleanup: %s",
file_path, e, exc_info=True) (or use the module's existing logger) so cleanup is
best-effort and won't mask the root error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@backend/Generator/main.py`:
- Around line 372-393: Add a regression test that uploads a corrupted PDF and a
corrupted DOCX to the upload handler path that uses file.save,
extract_text_from_pdf and extract_text_from_docx (the logic shown where
safe_filename is created and file_path is removed in finally), and assert that
after the handler runs (even if extract_text_from_pdf/extract_text_from_docx
raises) the temporary file does not exist in upload_folder; simulate the
uploaded file object (with filename and save()) or use test client multipart
upload, cause extraction to fail (e.g., provide truncated/corrupted bytes) and
assert no leftover file_path remains and that the code path that catches
FileNotFoundError still passes.
- Around line 390-393: The current cleanup uses try: os.remove(file_path) except
FileNotFoundError: pass which can let other OSError variants raised during
cleanup overwrite the original processing exception; change the cleanup in the
finally block to catch OSError (not just FileNotFoundError) and log a warning
instead of suppressing or re-raising it so the original error remains visible —
update the os.remove(file_path) handler to: except OSError as e:
logger.warning("Failed to remove %s during cleanup: %s", file_path, e,
exc_info=True) (or use the module's existing logger) so cleanup is best-effort
and won't mask the root error.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fb5655a7-6dbe-470a-bac5-bce31a640fc8

📥 Commits

Reviewing files that changed from the base of the PR and between c6e36d2 and 69da1d7.

📒 Files selected for processing (1)
  • backend/Generator/main.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG][SECURITY]: Uploaded temporary files are not cleaned up when file extraction fails

2 participants