Skip to content

docs: LHCb Workflows and Commands#125

Open
AcquaDiGiorgio wants to merge 9 commits intoDIRACGrid:mainfrom
AcquaDiGiorgio:issue-112-doc-lhcb-workflows
Open

docs: LHCb Workflows and Commands#125
AcquaDiGiorgio wants to merge 9 commits intoDIRACGrid:mainfrom
AcquaDiGiorgio:issue-112-doc-lhcb-workflows

Conversation

@AcquaDiGiorgio
Copy link
Copy Markdown
Contributor

See #112

Apparantly, Git does not support ELK as a mermaid renderer...

@AcquaDiGiorgio AcquaDiGiorgio self-assigned this Mar 9, 2026
@AcquaDiGiorgio AcquaDiGiorgio requested a review from aldbr March 9, 2026 13:49
@aldbr aldbr linked an issue Mar 9, 2026 that may be closed by this pull request
1 task
Copy link
Copy Markdown
Contributor

@aldbr aldbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The approach sounds good. A few comments though:

  • I think it would be interesting to have, for each type of jobs:

    • left: current XML workflow
    • right: the CWL + pre/post process equivalent
  • Could you add a very brief description of each module please?

  • AnalyseXMLSummary cannot work in Processing I think because it needs to set some file status. So in the new workflow, we would need to have it in a post process step. We could potentially have a check within LbRunApp as we said, to fail the workflow before it starts executing further steps.

  • BookkeepingReport and WorkflowAccounting would need modifications (there will process multiple app outputs at once, this needs to appear)

@AcquaDiGiorgio AcquaDiGiorgio force-pushed the issue-112-doc-lhcb-workflows branch from c2ac24f to 83bf433 Compare March 12, 2026 10:10
@AcquaDiGiorgio AcquaDiGiorgio requested a review from aldbr March 12, 2026 11:42
Copy link
Copy Markdown
Contributor

@aldbr aldbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments, but I think it's almost ready.

Also we could add a note about the shared instances between the modules. Instead of sharing instances, here we could depend on files.
Example: commands would add json requests in a failover directory, that would be read by FailoverRequest (may be could be generalized and used directly in the JobWrapper instead of as a Command?). What do you think?

Also I wonder whether UserJobFinalization is useful (I have the impression it's could be reproduced if we chain other commands correctly). What do you think?

I guess it would be easier to answer to all these questions if you start the implementation, wouldn't be?

}

state Processing_New {
LbRunApp_New: LbRunApp
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here (and for each new workflow), the processing should include dirac-cwl workflow.cwl.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this, you mean it should imply somewhere that LbRunApp executes dirac-cwl workflow.cwl?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it would be contrary: dirac-cwl workflow.cwl would execute LbRunApp in a Command Line Tool (or a workflows of multiple CLT with LbRunApp for MCReconstruction jobs)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, that's right 👌🏻

Wouldn't this be the case for the USER Job Types too? Those would require to execute dirac-cwl workflow.cwl too, no?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the user job, we would just have dirac-cwl-workflow.cwl (they can execute whatever they want inside their workflow, they have total control on it)

Processing_New: Processing
PostProcessing_New: PostProcessing

state PreProcessing_New {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need (only for simulation) to run getEventsToProduce that we would need to include in the prodconf.json file. What do you think?

I think for both simulation and reconstruction, we might need to resolve the DDB/CondDB tags.

Unless you see another way to compute them in the CWL workflow in LbProdRun?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getEventsToProduce is comprised of multiple gConfig.getValue(...) calls that obtain:

Info from Config:

  • CPUNormalizationFactor

Info from the CE:

  • GridCE
  • CEQueue
  • maxCPUTime
  • CPUTimeLeft
  • CPUNormalizationFactor (if not known already)

And then performs some calculations.

getEventsToProduce is a utility function, so it can probably reside inside LbProdRun, as it is only used during the processing step, no other command uses this info (the number of events).
Also, couldn't this be processed before sending the workflow? The maxCPUTime, CPUTimeLeft and CPUNormalizationFactor values of each CE are known beforehand, no?

For the tags, currently they are obtained from the Parameters of the GaudiApplication Module, so they are defined in the Workflow before sending the job and only used inside GaudiApplication.

I think for both cases, we should simply modify LbProdRun, as it's fixed information, it shouldn't contact DIRAC, everything is obtained from files that dirac-cwl should have access to.

What do you think?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't getEventsToProduce need CPUe that is outside of the node? (CPUe being the CPU work needed to compute 1 event).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's part of the function call: def getEventsToProduce(CPUe, CPUTime=None, ... )
And it's obtained from gaudiAppModule.CPUe at RunApplication, so in principle it should be in the workflow.

#### RunApplication.py 159-171

if (
    not gaudiAppModule.stepInputData
    and gaudiAppModule.CPUe
    and gaudiAppModule.maxNumberOfEvents
    and gaudiAppModule.numberOfEvents <= 0
):
    # Here we set maxCPUTime to 24 hours, which seems reasonable
    prodInfo["input"]["n_of_events"] = getEventsToProduce(
        gaudiAppModule.CPUe, maxNumberOfEvents=gaudiAppModule.maxNumberOfEvents, jobMaxCPUTime=86400
    )
else:
    prodInfo["input"]["n_of_events"] = gaudiAppModule.numberOfEvents

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you're right, this is injected from the transformation in the XML job description indeed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I wonder if it makes sense to try to get these grid-related details from LbProdRun that could also be executed locally. What do you think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, when it is executed locally it doesn't make much sense.
We could create a pre-processing command that retrieves this info, stores it in some kind of "config file" that is read by LbProdRun, mimicking workflow_commons. If it's executed locally, we could prepare this file however we like.

}

state PostProcessing_New {
AnalyseXmlSummary_New: AnalyseXmlSummary
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be interesting to know whether any of these commands could run in parallel. Naively, it seems that UploadLogFile, WorkflowAccounting and BookkeepingReport could potentially run in parallel. Am I wrong?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, I can put a fork in the state diagram for those 3 commands. This happens also during reconstruction right? For this one I think we should also include RemoveInputData in the fork.

We will need to take that into account during implementation though. Now the commands run one after another.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes 🙂


Report file usage to a DataFileUsage service.

### UploadOutputData
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it could be interesting to see if there any generic piece we could extract from the LHCb modules to be ported to DIRAC (this can wait for the implementation I think, then we can see with CTAO how to generalize for instance).

| WorkflowAccounting | N/A | N/A | RunNumber ProdID EventType SiteName ProcessingStep CpuTime NormCpuTime InputsStats OutputStats InputEvents OutputEvents EventTime NProcs JobGroup FinalState |
| AnalyseFileAccess | XMLSummary.xml pool_xml_catalog.xml | N/A | N/A |
| UserJobFinalization | UserOutputData | request.json | JobId UserOutputSE SiteName UserOutputPath ReplicateUserOutData UserOutputLFNPrep |
| AnalyzeXmlSummary | XMLSummary.xml | N/A | ProdId ApplicationName |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| AnalyzeXmlSummary | XMLSummary.xml | N/A | ProdId ApplicationName |
| AnalyseXmlSummary | XMLSummary.xml | N/A | ProdId ApplicationName |

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Documentation about the new LHCb workflows

2 participants