GH-49232: [Python] deprecate feather python#49590
GH-49232: [Python] deprecate feather python#49590piyushka-ally wants to merge 3 commits intoapache:mainfrom
Conversation
Deprecate the pyarrow.feather module in favor of pyarrow.ipc, since Feather V2 is the Arrow IPC file format. Add FutureWarning to write_feather, read_feather, read_table, and FeatherDataset. Extract _read_table_internal to avoid double-warnings. Update docs with deprecation notices and migration guide. Co-authored-by: Isaac
…sion note - Add missing filterwarnings decorator to test_write_dataset_schema_metadata - Use tempdir fixture in test_write_feather_deprecated to avoid leaked files - Document that IPC does not compress by default unlike feather.write_feather Co-authored-by: Isaac
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format? or See also: |
|
|
AlenkaF
left a comment
There was a problem hiding this comment.
Thanks for the PR @piyushka-ally!
I think it looks good and I just have two minor comments
Also, it might be worth syncing with Arrow C++ and other bindings so that we could get the deprecation in the same release. @pitrou is there an open PR for C++?
- Move `.. deprecated::` after `.. currentmodule::` in formats.rst - Merge write_feather, read_table, read_feather deprecation tests into a single test_feather_deprecation_warnings function Co-authored-by: Isaac
The R deprecation PR is already open at #49276. I don't see a C++ PR for #49231 yet. I'll watch for it. Happy to rebase/adjust timing if needed to align with the same release. |
AlenkaF
left a comment
There was a problem hiding this comment.
Thanks for a quick update!
I missed something, added one more comment.
|
|
||
| # --- Deprecation warning tests --- | ||
|
|
||
| @pytest.mark.pandas |
There was a problem hiding this comment.
I am not sure if pandas mark is needed here?
There was a problem hiding this comment.
The @pytest.mark.pandas is there because read_feather calls .to_pandas() internally, so the test would fail in environments without pandas installed. write_feather and read_table don't need it, but read_feather does. Should I split this back into two tests, one without the pandas mark for write_feather/read_table, and a separate pandas-marked one for read_feather? Or would you prefer a different approach?
Rationale for this change
Arrow C++ is deprecating the Feather reader/writer (#49231) so we should update the Python functions too. Feather V2 is exactly the Arrow IPC file format, making the separate
pyarrow.feathermodule redundant. Users should migrate topyarrow.ipc.What changes are included in this PR?
FutureWarningdeprecation warnings to all four public APIs inpyarrow.feather:write_feather(),read_feather(),read_table(), andFeatherDataset_read_table_internal()so thatread_feather()callingread_table()internally does not produce double warningsFeatherDataset.read_table()to use the internal function for the same reason.. deprecated:: 24.0.0directives to all four public API docstringspytestmarkfilter intest_feather.pyto suppress warnings in existing tests@pytest.mark.filterwarningsto 5 test functions intest_dataset.pythat use feather as a utilityfeather.rstwith a deprecation notice and migration guide (including a note thatpyarrow.ipcdoes not compress by default unlikefeather.write_feather)formats.rstto mark the Feather API section as deprecatedAre these changes tested?
Yes. Five new tests verify that each deprecated function emits exactly one
FutureWarning, and existing tests continue to pass with module/function-level warning filters.Are there any user-facing changes?
Yes — calling
pyarrow.feather.write_feather(),read_feather(),read_table(), orFeatherDataset()now emits aFutureWarningdirecting users topyarrow.ipcequivalents. Existing functionality is unchanged.