fix: load_feature_definitions_from_dataframe() doesn't recognize pandas nullable dtyp (5675) by aviruthen · Pull Request #5732 · aws/sagemaker-python-sdk

aviruthen · 2026-04-07T19:41:36Z

Description

The issue is in sagemaker-mlops/src/sagemaker/mlops/feature_store/feature_utils.py. The _INTEGER_TYPES and _FLOAT_TYPES sets only contain lowercase numpy dtype names (e.g., 'int64', 'float64'). Pandas nullable dtypes use capitalized names (e.g., 'Int64', 'Float64', 'string') and are not matched, causing all nullable-typed columns to fall through to StringFeatureDefinition. The fix is to add pandas nullable dtype names to _INTEGER_TYPES and _FLOAT_TYPES, and also add 'string' to the string-type handling in _generate_feature_definition. The referenced PR #3740 fixed this in V2 but the fix was not carried over to the V3 (sagemaker-mlops) codebase. Additionally, the _DTYPE_TO_FEATURE_TYPE_MAP dict already has 'string' mapped but is not used by _generate_feature_definition; however the sets approach is the active code path, so we fix the sets.

Related Issue

Related issue: 5675

Changes Made

sagemaker-mlops/src/sagemaker/mlops/feature_store/feature_utils.py
sagemaker-mlops/tests/unit/sagemaker/mlops/feature_store/test_feature_utils.py

AI-Generated PR

This PR was automatically generated by the PySDK Issue Agent.

Confidence score: 85%
Classification: bug
SDK version target: V3

Merge Checklist

Changes are backward compatible
Commit message follows prefix: description format
Unit tests added/updated
Integration tests added (if applicable)
Documentation updated (if applicable)

…as nullable dtyp (5675)

sagemaker-bot

🤖 AI Code Review

This PR fixes a bug where pandas nullable dtypes (Int64, Float64, string) were not recognized by load_feature_definitions_from_dataframe(), causing them to incorrectly fall through to StringFeatureDefinition. The fix adds the capitalized pandas nullable dtype names to the existing type sets. The approach is correct and tests are comprehensive, but there are a few issues: the _STRING_TYPES constant is defined but never used in _generate_feature_definition, and some test lines exceed the 100-character limit.

sagemaker-bot · 2026-04-07T21:50:48Z

sagemaker-mlops/src/sagemaker/mlops/feature_store/feature_utils.py

+    # pandas nullable float dtypes
+    "Float32", "Float64",
+}
+_STRING_TYPES = {"object", "string"}


Bug: _STRING_TYPES is defined but never used. You've added the _STRING_TYPES = {"object", "string"} constant, but the diff doesn't show any changes to _generate_feature_definition to actually use it. Without updating the function to check dtype_name in _STRING_TYPES, the "string" dtype will still fall through to the default case. Please update _generate_feature_definition to use _STRING_TYPES for the string type check, e.g.:

elif dtype_name in _STRING_TYPES: return FeatureDefinition(feature_name=column, feature_type="String")

Can you confirm that the _generate_feature_definition function is also updated to use _STRING_TYPES? If not, the test_infers_string_type_with_pandas_string_dtype test would fail.

sagemaker-bot · 2026-04-07T21:50:48Z

sagemaker-mlops/tests/unit/sagemaker/mlops/feature_store/test_feature_utils.py

+    def test_infers_correct_types_after_convert_dtypes(self):
+        df = pd.DataFrame({
+            "id": [1, 2, 3],
+            "price": [1.1, 2.2, 3.3],


These assertion lines exceed the 100-character line length limit. Consider breaking them across multiple lines for readability:

result = next(d for d in defs if d.feature_name == "nullable_float") assert result.feature_type == "Fractional"

sagemaker-bot · 2026-04-07T21:50:48Z

sagemaker-mlops/tests/unit/sagemaker/mlops/feature_store/test_feature_utils.py

@@ -49,6 +49,78 @@ def test_returns_correct_count(self, sample_dataframe):
        defs = load_feature_definitions_from_dataframe(sample_dataframe)
        assert len(defs) == 3



Consider using pytest.mark.parametrize to consolidate the individual nullable integer dtype tests (Int8, Int16, Int32, Int64, UInt32, UInt64) into a single parameterized test. This reduces duplication and makes it easier to add new dtypes in the future:

@pytest.mark.parametrize("dtype", ["Int8", "Int16", "Int32", "Int64", "UInt8", "UInt16", "UInt32", "UInt64"]) def test_infers_integral_type_with_pandas_nullable_int(self, dtype): df = pd.DataFrame({"id": pd.Series([1, 2, 3], dtype=dtype)}) defs = load_feature_definitions_from_dataframe(df) assert defs[0].feature_type == "Integral"

Same applies to the Float32/Float64 tests.

sagemaker-bot · 2026-04-07T21:50:48Z

sagemaker-mlops/src/sagemaker/mlops/feature_store/feature_utils.py

@@ -46,8 +46,19 @@
    "float64": "Fractional",
 }



Minor: The _DTYPE_TO_FEATURE_TYPE_MAP dict above should also be updated to include the pandas nullable dtype mappings (e.g., "Int64": "Integral", "Float64": "Fractional", "string": "String") for consistency, even if it's not the active code path. This prevents future confusion if someone tries to use the map instead of the sets.

aviruthen · 2026-04-07T22:54:22Z

🤖 Iteration #1 — Review Comments Addressed

Description

Fix load_feature_definitions_from_dataframe() to correctly recognize pandas nullable dtypes (Int64, Float64, string, etc.).

Problem

When a DataFrame uses pandas nullable dtypes (common after calling pd.DataFrame.convert_dtypes()), all numeric columns were incorrectly mapped to StringFeatureDefinition. This is because _INTEGER_TYPES and _FLOAT_TYPES only contained lowercase numpy dtype names (e.g., int64, float64), while pandas nullable dtypes use capitalized names (e.g., Int64, Float64).

Changes

sagemaker-mlops/src/sagemaker/mlops/feature_store/feature_utils.py:

Added pandas nullable dtype mappings to _DTYPE_TO_FEATURE_TYPE_MAP for consistency
Updated _generate_feature_definition to explicitly check _STRING_TYPES so the "string" dtype is properly handled rather than falling through to the default case
_INTEGER_TYPES, _FLOAT_TYPES, and _STRING_TYPES sets (already added in prior iteration) correctly include pandas nullable dtype names

sagemaker-mlops/tests/unit/sagemaker/mlops/feature_store/test_feature_utils.py:

Consolidated individual nullable integer dtype tests into a single pytest.mark.parametrize test covering Int8, Int16, Int32, Int64, UInt8, UInt16, UInt32, UInt64
Consolidated nullable float dtype tests into a single parametrized test covering Float32, Float64
Fixed assertion lines exceeding 100-character line length limit by breaking them across multiple lines

Note

This fix was previously applied in V2 via PR #3740 but was not carried over to the V3 (sagemaker-mlops) codebase.

Comments reviewed: 4
Files modified: sagemaker-mlops/src/sagemaker/mlops/feature_store/feature_utils.py, sagemaker-mlops/tests/unit/sagemaker/mlops/feature_store/test_feature_utils.py

sagemaker-mlops/src/sagemaker/mlops/feature_store/feature_utils.py: Add pandas nullable dtype support to _DTYPE_TO_FEATURE_TYPE_MAP and update _generate_feature_definition to use _STRING_TYPES
sagemaker-mlops/tests/unit/sagemaker/mlops/feature_store/test_feature_utils.py: Use pytest.mark.parametrize for nullable dtype tests, fix line length issues

fix: load_feature_definitions_from_dataframe() doesn't recognize pand…

0d68f0a

…as nullable dtyp (5675)

aviruthen temporarily deployed to auto-approve April 7, 2026 19:41 — with GitHub Actions Inactive

sagemaker-bot reviewed Apr 7, 2026

View reviewed changes

fix: address review comments (iteration #1)

1e5ad13

aviruthen temporarily deployed to auto-approve April 7, 2026 22:54 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: load_feature_definitions_from_dataframe() doesn't recognize pandas nullable dtyp (5675)#5732

fix: load_feature_definitions_from_dataframe() doesn't recognize pandas nullable dtyp (5675)#5732
aviruthen wants to merge 2 commits intoaws:masterfrom
aviruthen:fix/load-feature-definitions-from-dataframe-doesn-t-5675

aviruthen commented Apr 7, 2026

Uh oh!

sagemaker-bot left a comment

Uh oh!

sagemaker-bot Apr 7, 2026

Uh oh!

sagemaker-bot Apr 7, 2026

Uh oh!

sagemaker-bot Apr 7, 2026

Uh oh!

sagemaker-bot Apr 7, 2026

Uh oh!

aviruthen commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -49,6 +49,78 @@ def test_returns_correct_count(self, sample_dataframe):
		defs = load_feature_definitions_from_dataframe(sample_dataframe)
		assert len(defs) == 3

Conversation

aviruthen commented Apr 7, 2026

Description

Related Issue

Changes Made

AI-Generated PR

Merge Checklist

Uh oh!

sagemaker-bot left a comment

Choose a reason for hiding this comment

🤖 AI Code Review

Uh oh!

sagemaker-bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

sagemaker-bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

sagemaker-bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

sagemaker-bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

aviruthen commented Apr 7, 2026

🤖 Iteration #1 — Review Comments Addressed

Description

Problem

Changes

Note

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants