Improve error handling in crawler and add comprehensive test suite#54
Merged
YusukeHirao merged 4 commits intomainfrom Mar 11, 2026
Merged
Improve error handling in crawler and add comprehensive test suite#54YusukeHirao merged 4 commits intomainfrom
YusukeHirao merged 4 commits intomainfrom
Conversation
- Update @d-zero/dealer from 1.6.3 to 1.7.0 in all packages (crawler, core, cli, report-google-sheets) - Add catch block in crawler's deal() worker callback to handle per-URL errors gracefully instead of letting them propagate as unhandled rejections - Handle AggregateError from deal() in start() and startMultiple() to emit individual error events for each failed worker Closes #18 https://claude.ai/code/session_01DZApYkRAury35FhWGY72xf
…or to worker catch, add tests - Extract duplicate AggregateError expansion logic from start() and startMultiple() into a shared #emitDealErrors private method - Call handleScrapeError in the worker-level catch block so errored URLs are properly marked as done in the LinkList (prevents stale progress state) - Add crawler.spec.ts with 6 tests covering: - AggregateError expansion into individual error events - Non-Error values in AggregateError converted to Error instances - Regular Error emitted as single error event - crawlEnd emitted after deal failure - startMultiple() AggregateError handling - Worker-level exceptions caught and emitted as error events https://claude.ai/code/session_01DZApYkRAury35FhWGY72xf
Align direct dependency versions with the transitive dependency pulled in by @d-zero/dealer 1.7.0 and @d-zero/beholder 2.0.0, eliminating duplicate resolutions in yarn.lock. https://claude.ai/code/session_01DZApYkRAury35FhWGY72xf
- ARCHITECTURE.md: Add missing format-crawl-progress.ts to crawler module listing - ARCHITECTURE.md: Add missing statusText column to resources table summary - ARCHITECTURE.md: Clarify @d-zero/dealer usage (deal() for crawler only, Lanes for cli/core/report) - ARCHITECTURE.md: Note that cli and core also depend on @d-zero/dealer for Lanes type - CLAUDE.md: Correct core package description — uses bounded Promise pool, not deal() - CLAUDE.md: Update analyze data flow to reflect actual implementation https://claude.ai/code/session_01DZApYkRAury35FhWGY72xf
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR enhances error handling in the crawler to properly handle
AggregateErrorexceptions from the dealer, adds a comprehensive test suite for error scenarios, and improves worker-level error handling to ensure graceful degradation.Key Changes
Refactored deal-level error handling: Extracted error emission logic into a new
#emitDealErrors()method that intelligently handles bothAggregateError(emitting each inner error as a separate event) and regular errors (emitting a single event). This method is now used by bothstart()andstartMultiple().Added worker-level error handling: Wrapped the worker's main processing logic in a try-catch block to catch and emit errors from individual URL processing, allowing the crawler to continue processing remaining URLs even when one fails.
Comprehensive test suite: Added 238 lines of test coverage (
crawler.spec.ts) covering:AggregateErrorhandling with multiple errors being emitted as individual eventsAggregateErrorbeing converted to Error instancescrawlEndevent emission after deal failuresstart()andstartMultiple()methodsUpdated documentation: Clarified in CLAUDE.md and ARCHITECTURE.md that the core package uses a bounded Promise pool rather than
deal()for parallel processing, and noted that@d-zero/dealeris used for progress display viaLanes.Implementation Details
#emitDealErrors()method checks if the error is anAggregateErrorand iterates through itserrorsarray; otherwise treats the error as a single failure.handleScrapeError()for state management, then emitted as error events with proper context (URL, external flag, etc.).pid,isMainProcess,url,isExternal, anderrorfields.@d-zero/dealer(1.6.3 → 1.7.0) and@d-zero/shared(0.20.0 → 0.20.1) across all affected packages.https://claude.ai/code/session_01DZApYkRAury35FhWGY72xf