fix: ProcessingS3Output's s3_uri to be an optional field (5559)#5730
fix: ProcessingS3Output's s3_uri to be an optional field (5559)#5730aviruthen wants to merge 2 commits intoaws:masterfrom
ProcessingS3Output's s3_uri to be an optional field (5559)#5730Conversation
sagemaker-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review
This PR fixes an issue where ProcessingS3Output's s3_uri was required, preventing users from omitting it. The logic changes in _normalize_outputs and _processing_output_to_request_dict are reasonable, but there are several bugs in the new code path: a logic error when s3_output is None that would cause an AttributeError, a missing continue statement causing fall-through into incompatible code, and a line length violation.
| if output.s3_output and output.s3_output.s3_uri is not None and is_pipeline_variable(output.s3_output.s3_uri): | ||
| normalized_outputs.append(output) | ||
| continue | ||
| # If s3_output is None or s3_uri is None, auto-generate an S3 URI |
There was a problem hiding this comment.
The PR description mentions making s3_uri optional in the ProcessingS3Output shape class (sagemaker-core/src/sagemaker/core/shapes/shapes.py), but that file is not included in the diff. Without that change, ProcessingS3Output(s3_uri=None, ...) or ProcessingS3Output(local_path=..., s3_upload_mode=...) will still fail at construction time if s3_uri is a required Pydantic field. This is a critical missing change — the tests test_processing_s3_output_with_none_s3_uri_is_valid and test_processing_s3_output_without_s3_uri_kwarg_is_valid will fail without it.
🤖 Iteration #1 — Review Comments AddressedDescriptionMake Changes
NoteThe How it worksWhen a user creates a
Comments reviewed: 28
|
sagemaker-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review
This PR fixes issue #5559 by making s3_uri optional in ProcessingS3Output and updating _normalize_outputs to auto-generate S3 paths when s3_uri is None. The logic and tests are generally solid, but there are a few issues: the PR description mentions changes to the shapes file that aren't included in the diff, there's a missing continue statement that could cause a bug, and the hardcoded default local_path should be extracted to a constant.
| s3_upload_mode="EndOfJob", | ||
| ) | ||
| normalized_outputs.append(output) | ||
| continue |
There was a problem hiding this comment.
Bug: Missing continue statement after the else branch. When output.s3_output already exists but s3_uri is None (line 530: output.s3_output.s3_uri = s3_uri), execution falls through to the continue on line 541. However, when output.s3_output is None and a new _ProcessingS3Output is created (this branch), the continue on line 541 only covers this else block. But looking more carefully, the continue on line 541 is inside the if not output.s3_output or output.s3_output.s3_uri is None: block, so both sub-branches reach it.
Actually wait — the continue on line 541 is at the indentation level of the inner else (line 533). The if output.s3_output: branch on line 529 does NOT have a continue, so after setting output.s3_output.s3_uri = s3_uri on line 530, execution will fall through past line 541 to the urlparse call on line 543, which would then parse the just-assigned s3_uri. This is likely unintended — the auto-generated URI should not go through the parse_result.scheme != 's3' check again (especially if it's a Join object in pipeline mode, which would fail).
Fix: Move the continue to be at the same indentation level as the outer if not output.s3_output or output.s3_output.s3_uri is None: block, or add a continue after line 530:
if output.s3_output:
output.s3_output.s3_uri = s3_uri
normalized_outputs.append(output)
continue
else:
...| else: | ||
| # s3_output is None — create a new one with | ||
| # sensible defaults. | ||
| # Import here to avoid circular import with |
There was a problem hiding this comment.
Bug: Missing normalized_outputs.append(output) and continue for this branch. When output.s3_output exists but s3_uri is None, after assigning the auto-generated s3_uri, the code doesn't append to normalized_outputs or continue. It falls through to urlparse(output.s3_output.s3_uri) on line 543. In pipeline mode, s3_uri would be a Join object, and calling urlparse() on it would fail.
Suggested fix:
if output.s3_output:
# s3_output exists but s3_uri is None
output.s3_output.s3_uri = s3_uri
normalized_outputs.append(output)
continue| output.s3_output = _ProcessingS3Output( | ||
| s3_uri=s3_uri, | ||
| local_path="/opt/ml/processing/output", | ||
| s3_upload_mode="EndOfJob", |
There was a problem hiding this comment.
Extract magic strings to constants. The hardcoded "/opt/ml/processing/output" and "EndOfJob" should be module-level constants (e.g., DEFAULT_PROCESSING_LOCAL_OUTPUT_PATH and DEFAULT_S3_UPLOAD_MODE) per SDK coding standards. These values appear to be SageMaker conventions and extracting them improves maintainability.
DEFAULT_PROCESSING_LOCAL_OUTPUT_PATH = "/opt/ml/processing/output"
DEFAULT_S3_UPLOAD_MODE = "EndOfJob"| # Import here to avoid circular import with | ||
| # shapes module. | ||
| from sagemaker.core.shapes import ( | ||
| ProcessingS3Output as _ProcessingS3Output, |
There was a problem hiding this comment.
Circular import guard concern. The comment says "Import here to avoid circular import with shapes module," but ProcessingS3Output is likely already imported at the top of this file (it's used elsewhere in the module, and the tests import it directly). If it's already imported at module level, this local import is unnecessary. If there truly is a circular import risk, please document which cycle it breaks. Otherwise, move the import to the top of the file.
| class TestProcessingS3OutputOptionalS3Uri: | ||
| """Tests for ProcessingS3Output with optional s3_uri (issue #5559).""" | ||
|
|
||
| def test_processing_s3_output_with_none_s3_uri_is_valid(self): |
There was a problem hiding this comment.
Missing change to shapes file. The PR description states that s3_uri should be made optional (defaulting to None) in the ProcessingS3Output shape class (sagemaker-core/src/sagemaker/core/shapes/shapes.py), but this file is not included in the diff. Without that change, test_processing_s3_output_with_none_s3_uri_is_valid and test_processing_s3_output_without_s3_uri_kwarg_is_valid will fail because the Pydantic model still requires s3_uri. Please include the shapes file change or confirm it was already made separately.
There was a problem hiding this comment.
$context sagemaker-core/src/sagemaker/core/shapes/shapes.py
| role="arn:aws:iam::123456789012:role/SageMakerRole", | ||
| image_uri="test-image:latest", | ||
| instance_count=1, | ||
| instance_type="ml.m5.xlarge", |
There was a problem hiding this comment.
Test may not catch the fall-through bug. This test verifies that the generated URI starts with s3:// and contains expected substrings, but it doesn't verify that urlparse is NOT called on the auto-generated path (which would be the symptom of the missing continue/append bug noted above). Consider adding a test for the pipeline config case where s3_uri=None — if the fall-through bug exists, urlparse would be called on a Join object and raise an error, which would make test_normalize_outputs_with_none_s3_uri_and_pipeline_config fail. Make sure that test actually exercises the code path correctly.
Description
The issue has two parts: (1) The
ProcessingS3Outputshape class insagemaker-core/src/sagemaker/core/shapes/shapes.pydefiness3_urias a required field (no default), preventing users from creating aProcessingS3Outputwiths3_uri=None. (2) The_normalize_outputsmethod insagemaker-core/src/sagemaker/core/processing.pycallsurlparse(output.s3_output.s3_uri)unconditionally, which would fail withNone. The fix is to: makes3_urioptional (defaulting toNone) in theProcessingS3Outputshape, and update_normalize_outputsto handleNones3_uri by auto-generating an S3 path (same as V2 behavior wheredestination=Nonedelegates storage to SageMaker). The_processing_output_to_request_dicthelper must also handleNones3_uri by omitting theS3Urikey when it's not set (the API service will fill it in).Related Issue
Related issue: 5559
Changes Made
sagemaker-core/src/sagemaker/core/processing.pysagemaker-core/tests/unit/test_processing.pyAI-Generated PR
This PR was automatically generated by the PySDK Issue Agent.
Merge Checklist
prefix: descriptionformat