Skip to content

feat: metrics reporting for scan and commit#589

Draft
evindj wants to merge 5 commits intoapache:mainfrom
evindj:metrics_core_components
Draft

feat: metrics reporting for scan and commit#589
evindj wants to merge 5 commits intoapache:mainfrom
evindj:metrics_core_components

Conversation

@evindj
Copy link
Copy Markdown
Contributor

@evindj evindj commented Mar 11, 2026

Initial commit for addressing #533

@evindj evindj marked this pull request as draft March 11, 2026 20:33
@evindj evindj force-pushed the metrics_core_components branch from 5fa02cb to 0a85180 Compare March 11, 2026 20:48
@wgtmac
Copy link
Copy Markdown
Member

wgtmac commented Mar 20, 2026

Let me know when ready for review :)

@evindj
Copy link
Copy Markdown
Contributor Author

evindj commented Mar 21, 2026

Let me know when ready for review :)

for sure ! most likely tomorrow

@evindj evindj force-pushed the metrics_core_components branch from eaacbe8 to 5241044 Compare March 25, 2026 02:56
@evindj evindj force-pushed the metrics_core_components branch from 5241044 to d26fb96 Compare March 25, 2026 03:15
@evindj evindj marked this pull request as ready for review March 25, 2026 04:08
@wgtmac
Copy link
Copy Markdown
Member

wgtmac commented Mar 27, 2026

Thanks for updating this! I still need to take some time to get familiar with the Java implementation and its API in the rest spec before reviewing it.

Copy link
Copy Markdown
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! I have some general questions on the design, especially on the capability of customization. IMO we can focus on the API design for now and later integrate with other classes. Please let me know what you think.

struct ICEBERG_EXPORT ScanReport {
/// \brief The fully qualified name of the table that was scanned.
std::string table_name;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why some metrics have blank lines in between but others don't? I think we can remove all these blank lines to be compact.

int64_t snapshot_id = -1;

/// \brief Filter expression used in the scan, if any.
std::string filter;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be std::shared_ptr<Expression>?

namespace iceberg {

/// \brief Duration type for metrics reporting in milliseconds.
using DurationMs = std::chrono::milliseconds;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hard-codes our reported duration unit to be milliseconds, which violates the spec I think?

std::string filter;

/// \brief Schema ID.
int32_t schema_id = -1;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are missing some fields like projectedFieldIds and projectedFieldNames from the Java implementation. I think they are required by the REST spec: https://github.com/apache/iceberg/blob/149cc464f9b7df800cc5718af725983473819504/open-api/rest-catalog-open-api.yaml#L3990-L4023

std::string table_name;

/// \brief The snapshot ID created by this commit.
int64_t snapshot_id = -1;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use kInvalidSnapshotId defined in the iceberg/constant.h

/// \brief Total number of data manifests.
int64_t total_data_manifests = 0;

/// \brief Number of data manifests that were skipped.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is inconsistent.

int64_t skipped_data_files = 0;

/// \brief Number of data manifests that were skipped.
int64_t skipped_delete_files = 0;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

int32_t schema_id = -1;

/// \brief Total duration of the entire scan operation.
DurationMs total_duration{0};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove this as this is not defined by the Java implementation?

///
/// This variant type allows handling both report types uniformly through
/// the MetricsReporter interface.
using MetricsReport = std::variant<ScanReport, CommitReport>;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we define MetricsReport as a std::variant, we cannot support customizing metrics report. For example, engines may have more metrics to report than defined by the Java implementation. Even the REST spec does not explicitly define what keys are required.

Instead, should we define the MetricsReport like below?

struct MetricsReport {
  std::string kind;  // can be "scan" or "commit", or whatever customized
  std::unordered_map<std::string, CounterResult> counter_results;
  std::unordered_map<std::string, TimerResult> timer_results;
};

What do you think?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking more on this, I'm fine to use your current approach to define ScanReport and CommitReport with explicit fields. MetricsReports are collected by this library so users do not have the flexibility to customize them. We only need to customize MetricsReporter.

Copy link
Copy Markdown
Contributor Author

@evindj evindj Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the bag of keys approach because it provides for customization without necessarily having to require compilation. let me think a bit more about the entire thing, I might reach out on slack for a quick sync.

///
/// \param reporter_type Case-insensitive type identifier (e.g., "noop").
/// \param factory Factory function that produces the reporter.
static void Register(std::string_view reporter_type, MetricsReporterFactory factory);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we support the Java parity CompositeMetricsReporter? It would be useful if we want to report metrics to multiple sinks.

@evindj
Copy link
Copy Markdown
Contributor Author

evindj commented Mar 31, 2026

Thanks for working on this! I have some general questions on the design, especially on the capability of customization. IMO we can focus on the API design for now and later integrate with other classes. Please let me know what you think.

This is fair, I just wanted to show the end to end picture for ease of understanding.

@evindj evindj marked this pull request as draft March 31, 2026 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants