Skip to content

HBASE-30049 RestoreSnapshotHelper creates StoreFileTracker with wrong…#8013

Open
bjomobo wants to merge 1 commit intoapache:masterfrom
bjomobo:HBASE-30049-fix-restore-sft
Open

HBASE-30049 RestoreSnapshotHelper creates StoreFileTracker with wrong…#8013
bjomobo wants to merge 1 commit intoapache:masterfrom
bjomobo:HBASE-30049-fix-restore-sft

Conversation

@bjomobo
Copy link
Copy Markdown

@bjomobo bjomobo commented Mar 31, 2026

Description

RestoreSnapshotHelper.restoreRegion() creates a StoreFileTracker using the raw Master conf which lacks table-level settings like hbase.store.file-tracker.impl=FILE. This causes DefaultStoreFileTracker to be used, whose doSetStoreFiles() is a no-op. The .filelist is never updated after restore, leading to FileNotFoundException when regions try to open files that were archived.

Regression introduced by HBASE-28564.

Changes

  • Merge table descriptor config via StoreUtils.createStoreConfiguration() before creating the tracker in restoreRegion()
  • Move tracker creation inside the snapshotFamilyFiles != null check to avoid NullPointerException on families being removed
  • Add withColumnFamilyDescriptor() to the "Add families not present in the table" code path

New Tests

  • TestRestoreSnapshotProcedureFileBasedSFT — end-to-end restore with FILE tracker
  • TestRestoreSnapshotHelperWithFileBasedSFT — unit-level .filelist verification
  • TestRestoreSnapshotFileTrackerTableLevel — table-level FILE tracker with compaction and multi-family restore

Jira: https://issues.apache.org/jira/browse/HBASE-30049

* 3. Restore from snapshot
* 4. Verify all regions open and data matches the snapshot
*
* Before the fix, step 4 would fail with FileNotFoundException because
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test testRestoreSnapshotWithFileTrackerAfterDataChange is still working without the fix as global SFT config itself is FILE, but this issue happens when global is Default and Table config is FILE like here - https://github.com/apache/hbase/pull/8013/changes#diff-35d1d555129ce54d8d45094be2f76300b7cd9d29c9a109ff95d81c626b428939R77

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for pointing this out. removed the global TRACKER_IMPL=FILE setting from setupCluster() and now setting it at the table descriptor level only.

* HFiles, not the compaction output.
*/
@Test
public void testRestoreSnapshotAfterCompaction() throws Exception {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this as well, since the global SFT config itself is FILE, this should work, but looks like an issue with the test, it fails at this assert https://github.com/apache/hbase/pull/8013/changes#diff-66c1760bfcfc64f561b37dd18df6a41595e93d1c516a326de27ad6e9c631ebb2R266
it should be failing here instead?

… config causing no-op filelist updates

RestoreSnapshotHelper.restoreRegion() creates a StoreFileTracker using
the raw Master Configuration object, which does not contain table-level
settings like hbase.store.file-tracker.impl=FILE. This causes
DefaultStoreFileTracker to be instantiated, whose doSetStoreFiles() is
a complete no-op. The .filelist is never updated after the restore moves
HFiles to the archive and creates link files for the snapshot's HFiles.

When a region subsequently opens, the stale .filelist references HFiles
that were moved to the archive, resulting in FileNotFoundException and
the region getting stuck in OPENING state indefinitely.

This is a regression introduced by HBASE-28564, which refactored
reference file creation to go through the StoreFileTracker interface.
The cloneRegion() method in the same commit correctly merges the table
descriptor config via StoreUtils.createStoreConfiguration() before
creating the tracker, but restoreRegion() was missed.

The fix applies the same pattern: merge the table descriptor and column
family descriptor configuration into the Configuration object before
passing it to StoreFileTrackerFactory.create(). This ensures the
correct StoreFileTracker implementation is resolved based on the
table-level setting.

Both locations in restoreRegion() are fixed:
1. For existing families already on disk
2. For new families added from the snapshot
@bjomobo bjomobo force-pushed the HBASE-30049-fix-restore-sft branch from ecafaf9 to e3e3c55 Compare April 1, 2026 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants