docs: LHCb Workflows and Commands by AcquaDiGiorgio · Pull Request #125 · DIRACGrid/dirac-cwl

AcquaDiGiorgio · 2026-03-09T13:49:16Z

See #112

Apparantly, Git does not support ELK as a mermaid renderer...

aldbr

Thanks! The approach sounds good. A few comments though:

I think it would be interesting to have, for each type of jobs:
- left: current XML workflow
- right: the CWL + pre/post process equivalent
Could you add a very brief description of each module please?
AnalyseXMLSummary cannot work in Processing I think because it needs to set some file status. So in the new workflow, we would need to have it in a post process step. We could potentially have a check within LbRunApp as we said, to fail the workflow before it starts executing further steps.
BookkeepingReport and WorkflowAccounting would need modifications (there will process multiple app outputs at once, this needs to appear)

docs/design/lhcb-workflows.md

chore(docs): Rewrite lhcb-workflows explanations

aldbr

A few minor comments, but I think it's almost ready.

Also we could add a note about the shared instances between the modules. Instead of sharing instances, here we could depend on files.
Example: commands would add json requests in a failover directory, that would be read by FailoverRequest (may be could be generalized and used directly in the JobWrapper instead of as a Command?). What do you think?

Also I wonder whether UserJobFinalization is useful (I have the impression it's could be reproduced if we chain other commands correctly). What do you think?

I guess it would be easier to answer to all these questions if you start the implementation, wouldn't be?

aldbr · 2026-04-07T14:21:18Z

docs/design/lhcb-workflows.md

+        }
+
+        state Processing_New {
+            LbRunApp_New: LbRunApp


Here (and for each new workflow), the processing should include dirac-cwl workflow.cwl.

I don't understand this, you mean it should imply somewhere that LbRunApp executes dirac-cwl workflow.cwl?

Well it would be contrary: dirac-cwl workflow.cwl would execute LbRunApp in a Command Line Tool (or a workflows of multiple CLT with LbRunApp for MCReconstruction jobs)

Oh yeah, that's right 👌🏻

Wouldn't this be the case for the USER Job Types too? Those would require to execute dirac-cwl workflow.cwl too, no?

For the user job, we would just have dirac-cwl-workflow.cwl (they can execute whatever they want inside their workflow, they have total control on it)

aldbr · 2026-04-07T14:28:57Z

docs/design/lhcb-workflows.md

+        Processing_New: Processing
+        PostProcessing_New: PostProcessing
+
+        state PreProcessing_New {


I think we need (only for simulation) to run getEventsToProduce that we would need to include in the prodconf.json file. What do you think?

I think for both simulation and reconstruction, we might need to resolve the DDB/CondDB tags.

Unless you see another way to compute them in the CWL workflow in LbProdRun?

getEventsToProduce is comprised of multiple gConfig.getValue(...) calls that obtain:

Info from Config:

CPUNormalizationFactor

Info from the CE:

GridCE

CEQueue

maxCPUTime

CPUTimeLeft

CPUNormalizationFactor (if not known already)

And then performs some calculations.

getEventsToProduce is a utility function, so it can probably reside inside LbProdRun, as it is only used during the processing step, no other command uses this info (the number of events).
Also, couldn't this be processed before sending the workflow? The maxCPUTime, CPUTimeLeft and CPUNormalizationFactor values of each CE are known beforehand, no?

For the tags, currently they are obtained from the Parameters of the GaudiApplication Module, so they are defined in the Workflow before sending the job and only used inside GaudiApplication.

I think for both cases, we should simply modify LbProdRun, as it's fixed information, it shouldn't contact DIRAC, everything is obtained from files that dirac-cwl should have access to.

What do you think?

Doesn't getEventsToProduce need CPUe that is outside of the node? (CPUe being the CPU work needed to compute 1 event).

It's part of the function call: def getEventsToProduce(CPUe, CPUTime=None, ... )
And it's obtained from gaudiAppModule.CPUe at RunApplication, so in principle it should be in the workflow.

#### RunApplication.py 159-171 if ( not gaudiAppModule.stepInputData and gaudiAppModule.CPUe and gaudiAppModule.maxNumberOfEvents and gaudiAppModule.numberOfEvents <= 0 ): # Here we set maxCPUTime to 24 hours, which seems reasonable prodInfo["input"]["n_of_events"] = getEventsToProduce( gaudiAppModule.CPUe, maxNumberOfEvents=gaudiAppModule.maxNumberOfEvents, jobMaxCPUTime=86400 ) else: prodInfo["input"]["n_of_events"] = gaudiAppModule.numberOfEvents

Yes you're right, this is injected from the transformation in the XML job description indeed.

But I wonder if it makes sense to try to get these grid-related details from LbProdRun that could also be executed locally. What do you think?

Yeah, when it is executed locally it doesn't make much sense.
We could create a pre-processing command that retrieves this info, stores it in some kind of "config file" that is read by LbProdRun, mimicking workflow_commons. If it's executed locally, we could prepare this file however we like.

aldbr · 2026-04-07T14:42:18Z

docs/design/lhcb-workflows.md

+        }
+
+        state PostProcessing_New {
+            AnalyseXmlSummary_New: AnalyseXmlSummary


It could be interesting to know whether any of these commands could run in parallel. Naively, it seems that UploadLogFile, WorkflowAccounting and BookkeepingReport could potentially run in parallel. Am I wrong?

That's right, I can put a fork in the state diagram for those 3 commands. This happens also during reconstruction right? For this one I think we should also include RemoveInputData in the fork.

We will need to take that into account during implementation though. Now the commands run one after another.

aldbr · 2026-04-07T14:45:23Z

docs/design/lhcb-workflows.md

+
+Report file usage to a DataFileUsage service.
+
+### UploadOutputData


Also it could be interesting to see if there any generic piece we could extract from the LHCb modules to be ported to DIRAC (this can wait for the implementation I think, then we can see with CTAO how to generalize for instance).

aldbr · 2026-04-07T14:46:13Z

docs/design/lhcb-workflows.md

+| WorkflowAccounting | N/A | N/A | RunNumber ProdID EventType SiteName ProcessingStep CpuTime NormCpuTime InputsStats OutputStats InputEvents OutputEvents EventTime NProcs JobGroup FinalState |
+| AnalyseFileAccess | XMLSummary.xml pool_xml_catalog.xml | N/A | N/A |
+| UserJobFinalization | UserOutputData | request.json | JobId UserOutputSE SiteName UserOutputPath ReplicateUserOutData UserOutputLFNPrep |
+| AnalyzeXmlSummary | XMLSummary.xml | N/A | ProdId ApplicationName |


chore: Update processing stage to show dirac-cwl command

docs: LHCb Workflows and Commands

16d13c5

AcquaDiGiorgio self-assigned this Mar 9, 2026

AcquaDiGiorgio requested a review from aldbr March 9, 2026 13:49

aldbr linked an issue Mar 9, 2026 that may be closed by this pull request

[Feature]: Documentation about the new LHCb workflows #112

Open

1 task

aldbr requested changes Mar 10, 2026

View reviewed changes

AcquaDiGiorgio added 3 commits March 12, 2026 11:07

docs(lhcb-workflows): Add comparisons between current and new system

0f5729b

docs(lhcb-workflows): Add brief explanation for each command

56f1a0b

docs(lhcb-workflows): Fix spelling

83bf433

AcquaDiGiorgio force-pushed the issue-112-doc-lhcb-workflows branch from c2ac24f to 83bf433 Compare March 12, 2026 10:10

AcquaDiGiorgio requested a review from aldbr March 12, 2026 11:42

aldbr reviewed Mar 18, 2026

View reviewed changes

AcquaDiGiorgio added 3 commits March 26, 2026 16:36

fix(docs): Correct lhcb-workflows

8349284

chore(docs): Improve styling in lhcb command relationship flowchart

d52dd62

chore(docs): Change lhcb-workflows JobType flowcharts to state diagrams

47b7bac

chore(docs): Rewrite lhcb-workflows explanations

aldbr reviewed Apr 7, 2026

View reviewed changes

AcquaDiGiorgio added 2 commits April 9, 2026 11:56

fix(docs): lhcb-workflows typo

81024ec

chore(docs): Add forks for parallelly executed commands

80b1c52

chore: Update processing stage to show dirac-cwl command


		Report file usage to a DataFileUsage service.

		### UploadOutputData

	\| AnalyzeXmlSummary \| XMLSummary.xml \| N/A \| ProdId ApplicationName \|
	\| AnalyseXmlSummary \| XMLSummary.xml \| N/A \| ProdId ApplicationName \|

Conversation

AcquaDiGiorgio commented Mar 9, 2026

Uh oh!

aldbr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aldbr left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants