Skip to content

Improve error handling in crawler and add comprehensive test suite#54

Merged
YusukeHirao merged 4 commits intomainfrom
claude/update-dealer-dependency-RsQAB
Mar 11, 2026
Merged

Improve error handling in crawler and add comprehensive test suite#54
YusukeHirao merged 4 commits intomainfrom
claude/update-dealer-dependency-RsQAB

Conversation

@YusukeHirao
Copy link
Copy Markdown
Member

Summary

This PR enhances error handling in the crawler to properly handle AggregateError exceptions from the dealer, adds a comprehensive test suite for error scenarios, and improves worker-level error handling to ensure graceful degradation.

Key Changes

  • Refactored deal-level error handling: Extracted error emission logic into a new #emitDealErrors() method that intelligently handles both AggregateError (emitting each inner error as a separate event) and regular errors (emitting a single event). This method is now used by both start() and startMultiple().

  • Added worker-level error handling: Wrapped the worker's main processing logic in a try-catch block to catch and emit errors from individual URL processing, allowing the crawler to continue processing remaining URLs even when one fails.

  • Comprehensive test suite: Added 238 lines of test coverage (crawler.spec.ts) covering:

    • AggregateError handling with multiple errors being emitted as individual events
    • Non-Error values within AggregateError being converted to Error instances
    • Regular Error handling as a single event
    • crawlEnd event emission after deal failures
    • Worker-level exception handling and continuation
    • Both start() and startMultiple() methods
  • Updated documentation: Clarified in CLAUDE.md and ARCHITECTURE.md that the core package uses a bounded Promise pool rather than deal() for parallel processing, and noted that @d-zero/dealer is used for progress display via Lanes.

Implementation Details

  • The #emitDealErrors() method checks if the error is an AggregateError and iterates through its errors array; otherwise treats the error as a single failure.
  • Worker errors are caught and passed to handleScrapeError() for state management, then emitted as error events with proper context (URL, external flag, etc.).
  • All error events maintain consistent structure with pid, isMainProcess, url, isExternal, and error fields.
  • Dependencies updated: @d-zero/dealer (1.6.3 → 1.7.0) and @d-zero/shared (0.20.0 → 0.20.1) across all affected packages.

https://claude.ai/code/session_01DZApYkRAury35FhWGY72xf

claude added 4 commits March 10, 2026 08:24
- Update @d-zero/dealer from 1.6.3 to 1.7.0 in all packages (crawler,
  core, cli, report-google-sheets)
- Add catch block in crawler's deal() worker callback to handle
  per-URL errors gracefully instead of letting them propagate as
  unhandled rejections
- Handle AggregateError from deal() in start() and startMultiple()
  to emit individual error events for each failed worker

Closes #18

https://claude.ai/code/session_01DZApYkRAury35FhWGY72xf
…or to worker catch, add tests

- Extract duplicate AggregateError expansion logic from start() and startMultiple()
  into a shared #emitDealErrors private method
- Call handleScrapeError in the worker-level catch block so errored URLs are
  properly marked as done in the LinkList (prevents stale progress state)
- Add crawler.spec.ts with 6 tests covering:
  - AggregateError expansion into individual error events
  - Non-Error values in AggregateError converted to Error instances
  - Regular Error emitted as single error event
  - crawlEnd emitted after deal failure
  - startMultiple() AggregateError handling
  - Worker-level exceptions caught and emitted as error events

https://claude.ai/code/session_01DZApYkRAury35FhWGY72xf
Align direct dependency versions with the transitive dependency
pulled in by @d-zero/dealer 1.7.0 and @d-zero/beholder 2.0.0,
eliminating duplicate resolutions in yarn.lock.

https://claude.ai/code/session_01DZApYkRAury35FhWGY72xf
- ARCHITECTURE.md: Add missing format-crawl-progress.ts to crawler module listing
- ARCHITECTURE.md: Add missing statusText column to resources table summary
- ARCHITECTURE.md: Clarify @d-zero/dealer usage (deal() for crawler only, Lanes for cli/core/report)
- ARCHITECTURE.md: Note that cli and core also depend on @d-zero/dealer for Lanes type
- CLAUDE.md: Correct core package description — uses bounded Promise pool, not deal()
- CLAUDE.md: Update analyze data flow to reflect actual implementation

https://claude.ai/code/session_01DZApYkRAury35FhWGY72xf
@YusukeHirao YusukeHirao merged commit 81c9f6f into main Mar 11, 2026
3 checks passed
@YusukeHirao YusukeHirao deleted the claude/update-dealer-dependency-RsQAB branch March 11, 2026 03:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants