fix(backend): always cleanup uploaded temp files#591
fix(backend): always cleanup uploaded temp files#591Ashvin-KS wants to merge 3 commits intoAOSSIE-Org:mainfrom
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughFileProcessor.process_file now saves uploads under a UUID-based filename (preserving original extension), extracts content based on extension (.txt, .pdf, .docx) inside a try block, and always removes the temporary file in a finally block that ignores FileNotFoundError. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR updates the backend file ingestion flow to ensure temporary uploaded files are cleaned up reliably, even when text extraction fails, reducing leftover files in the upload directory.
Changes:
- Wrap file content extraction in a
try/finallyto guarantee temp-file deletion. - Add a defensive existence check before deleting the temp file.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
backend/Generator/main.py
Outdated
| @@ -372,16 +372,18 @@ def process_file(self, file): | |||
| file.save(file_path) | |||
backend/Generator/main.py
Outdated
| @@ -372,16 +372,18 @@ def process_file(self, file): | |||
| file.save(file_path) | |||
| content = "" | |||
|
|
|||
| if file.filename.endswith('.txt'): | |||
| with open(file_path, 'r') as f: | |||
| content = f.read() | |||
| elif file.filename.endswith('.pdf'): | |||
| content = self.extract_text_from_pdf(file_path) | |||
| elif file.filename.endswith('.docx'): | |||
| content = self.extract_text_from_docx(file_path) | |||
|
|
|||
| os.remove(file_path) | |||
| return content | |||
| try: | |||
| if file.filename.endswith('.txt'): | |||
backend/Generator/main.py
Outdated
| if os.path.exists(file_path): | ||
| os.remove(file_path) |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
backend/Generator/main.py (1)
384-386: Prevent cleanup errors from masking extraction failures.The
os.remove()in thefinallyblock (lines 384-386) can raise an exception that becomes the primary exception raised to the caller. While the original extraction exception is preserved in exception chaining via__context__, this masks the real failure and complicates debugging. Wrap the cleanup in a try-except to catch and log deletion errors instead.Proposed patch
+import logging @@ class FileProcessor: @@ def process_file(self, file): @@ try: if file.filename.endswith('.txt'): with open(file_path, 'r') as f: content = f.read() elif file.filename.endswith('.pdf'): content = self.extract_text_from_pdf(file_path) elif file.filename.endswith('.docx'): content = self.extract_text_from_docx(file_path) return content finally: - if os.path.exists(file_path): - os.remove(file_path) + try: + os.remove(file_path) + except FileNotFoundError: + pass + except OSError as exc: + logging.warning("Temp file cleanup failed for %s: %s", file_path, exc)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/Generator/main.py` around lines 384 - 386, Wrap the file deletion in the finally block around os.remove(file_path) in a try/except so a deletion error cannot override the original extraction exception; catch Exception as e and log it (e.g., logger.exception or logging.exception) with context like "failed to remove temporary file" and the file_path, but do not re-raise so the original exception remains the primary failure; update the finally block where os.remove(file_path) is called to perform this safe-delete.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@backend/Generator/main.py`:
- Around line 384-386: Wrap the file deletion in the finally block around
os.remove(file_path) in a try/except so a deletion error cannot override the
original extraction exception; catch Exception as e and log it (e.g.,
logger.exception or logging.exception) with context like "failed to remove
temporary file" and the file_path, but do not re-raise so the original exception
remains the primary failure; update the finally block where os.remove(file_path)
is called to perform this safe-delete.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
backend/Generator/main.py (2)
372-393: Please add a regression test for malformed upload extraction failures.This fix is high-value reliability behavior; a test that uploads a corrupted PDF/DOCX and asserts no leftover temp file would lock the behavior in.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/Generator/main.py` around lines 372 - 393, Add a regression test that uploads a corrupted PDF and a corrupted DOCX to the upload handler path that uses file.save, extract_text_from_pdf and extract_text_from_docx (the logic shown where safe_filename is created and file_path is removed in finally), and assert that after the handler runs (even if extract_text_from_pdf/extract_text_from_docx raises) the temporary file does not exist in upload_folder; simulate the uploaded file object (with filename and save()) or use test client multipart upload, cause extraction to fail (e.g., provide truncated/corrupted bytes) and assert no leftover file_path remains and that the code path that catches FileNotFoundError still passes.
390-393: Prevent cleanup failures from masking the original processing error.At Line 391, non-
FileNotFoundErrorOSErrors raised infinallycan replace the root extraction failure. Consider best-effort cleanup for broaderOSErrorwith warning-level logging.♻️ Proposed adjustment
finally: try: os.remove(file_path) except FileNotFoundError: pass + except OSError as cleanup_error: + # best-effort cleanup; avoid masking extraction exceptions + print(f"Warning: failed to remove temp file {file_path}: {cleanup_error}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/Generator/main.py` around lines 390 - 393, The current cleanup uses try: os.remove(file_path) except FileNotFoundError: pass which can let other OSError variants raised during cleanup overwrite the original processing exception; change the cleanup in the finally block to catch OSError (not just FileNotFoundError) and log a warning instead of suppressing or re-raising it so the original error remains visible — update the os.remove(file_path) handler to: except OSError as e: logger.warning("Failed to remove %s during cleanup: %s", file_path, e, exc_info=True) (or use the module's existing logger) so cleanup is best-effort and won't mask the root error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@backend/Generator/main.py`:
- Around line 372-393: Add a regression test that uploads a corrupted PDF and a
corrupted DOCX to the upload handler path that uses file.save,
extract_text_from_pdf and extract_text_from_docx (the logic shown where
safe_filename is created and file_path is removed in finally), and assert that
after the handler runs (even if extract_text_from_pdf/extract_text_from_docx
raises) the temporary file does not exist in upload_folder; simulate the
uploaded file object (with filename and save()) or use test client multipart
upload, cause extraction to fail (e.g., provide truncated/corrupted bytes) and
assert no leftover file_path remains and that the code path that catches
FileNotFoundError still passes.
- Around line 390-393: The current cleanup uses try: os.remove(file_path) except
FileNotFoundError: pass which can let other OSError variants raised during
cleanup overwrite the original processing exception; change the cleanup in the
finally block to catch OSError (not just FileNotFoundError) and log a warning
instead of suppressing or re-raising it so the original error remains visible —
update the os.remove(file_path) handler to: except OSError as e:
logger.warning("Failed to remove %s during cleanup: %s", file_path, e,
exc_info=True) (or use the module's existing logger) so cleanup is best-effort
and won't mask the root error.
Addressed Issues:
Fixes #589
Screenshots/Recordings:
Not applicable (backend-only resource cleanup fix).
Additional Notes:
This PR fixes a temporary file cleanup issue in upload processing.
If extraction failed for uploaded files, temp files could remain on disk.
This PR adds guaranteed cleanup so temporary uploaded files are removed even when extraction raises an exception.
Scope is intentionally minimal: single-issue, single-file reliability/security fix.
AI Usage Disclosure:
We encourage contributors to use AI tools responsibly when creating Pull Requests. While AI can be a valuable aid, it is essential to ensure that your contributions meet the task requirements, build successfully, include relevant tests, and pass all linters. Submissions that do not meet these standards may be closed without warning to maintain the quality and integrity of the project. Please take the time to understand the changes you are proposing and their impact. AI slop is strongly discouraged and may lead to banning and blocking. Do not spam our repos with AI slop.
Check one of the checkboxes below:
I have used the following AI models and tools: GPT-5.3-Codex via GitHub Copilot, local terminal validation.
Checklist
Summary by CodeRabbit