Summary
quantmsdiann currently runs DIA-NN in Generic scoring mode (the default). DIA-NN offers three scoring modes that significantly affect FDR estimation and variant peptide confidence, but none are exposed as pipeline parameters.
Background: DIA-NN Scoring Modes
| Mode |
Flag |
Decoy strategy |
Best for |
| Generic |
(default) |
Shuffles most of the peptide sequence |
Standard proteomics — maximizes protein IDs |
| Proteoforms |
--proteoforms |
Mutates a single residue per decoy |
Amino acid substitutions, distinguishing paralogues, proteogenomics |
| Peptidoforms |
--peptidoforms |
Generic main q-values + extra peptidoform q-values |
PTM analysis when also wanting max protein IDs |
Per DIA-NN documentation (PTMs and peptidoforms):
"If the purpose of the experiment is to identify/quantify specific PTMs, amino acid substitutions or distinguish proteins with high sequence identity, then the Peptidoforms (or Proteoforms) scoring option is recommended."
"It is only the Proteoforms mode that can be used to reliably distinguish paralogue/orthologue proteins originating due to amino acid substitutions, hence the name for this mode."
Why this matters for proteogenomics
When searching with a variant-containing FASTA (e.g., COSMIC mutations), the database contains both canonical and variant protein sequences. For missense mutations, the canonical and variant peptides often differ by a single amino acid and share most fragment ions:
Canonical KRAS: LVVVGAGGVGK (WT)
KRAS G12D variant: LVVVGAGDVGK (G→D at position 12)
In Generic mode:
- Decoys are fully shuffled → very different from both canonical and variant
- Both the canonical and variant peptide easily beat the decoy
- The q-value does NOT specifically validate whether the single-residue difference (G vs D) is correctly assigned
- FDR may be underestimated for variant-specific peptides (confirmed by Armando et al. 2024)
In Proteoforms mode:
- Decoys are single-residue mutations → directly model the confusion between canonical and variant peptides
- The q-value provides confidence that the exact amino acid sequence is correct
- FDR is properly estimated for single-residue variants
- May slightly reduce total IDs (~5-10% fewer in some datasets)
For frameshift/nonsense variants (completely different sequences from canonical), Generic mode is sufficient. But missense variants represent ~85% of detected COSMIC variants, making Proteoforms mode important for this use case.
Proposed change
Add a new parameter diann_scoring_mode with three allowed values:
// nextflow.config
diann_scoring_mode = 'generic' // default, backward-compatible
// nextflow_schema.json
"diann_scoring_mode": {
"type": "string",
"default": "generic",
"enum": ["generic", "proteoforms", "peptidoforms"],
"description": "DIA-NN scoring mode. 'generic' maximizes IDs (default). 'proteoforms' recommended for proteogenomics/variant detection — validates single-residue differences. 'peptidoforms' provides extra peptidoform q-values for PTM analysis."
}
In the DIA-NN modules, conditionally add the flag:
scoring_mode = params.diann_scoring_mode == 'proteoforms' ? '--proteoforms' :
params.diann_scoring_mode == 'peptidoforms' ? '--peptidoforms' : ''
The flag should be added to all DIA-NN steps (in-silico library generation, preliminary analysis, individual analysis, final quantification) and added to each module's blocked flags list.
Additional output columns
When --proteoforms or --peptidoforms is enabled, DIA-NN produces additional columns in the report:
Peptidoform.Q.Value — run-specific peptidoform confidence
Global.Peptidoform.Q.Value — global peptidoform confidence
Lib.Peptidoform.Q.Value — library peptidoform confidence
These should be preserved in the output and documented.
DIA-NN version compatibility
--proteoforms: Available since DIA-NN 2.0
--peptidoforms: Available since DIA-NN 1.8+
--no-peptidoforms: Available since DIA-NN 1.8+ (disables automatic activation with --var-mod)
Note: Since --proteoforms requires DIA-NN >= 2.0, the pipeline should validate that the selected DIA-NN version supports the chosen scoring mode.
Files to modify
nextflow.config — add diann_scoring_mode = 'generic'
nextflow_schema.json — add parameter definition with enum and description
modules/local/diann/insilico_library_generation/main.nf — add scoring flag + blocked list
modules/local/diann/preliminary_analysis/main.nf — add scoring flag + blocked list
modules/local/diann/individual_analysis/main.nf — add scoring flag + blocked list
modules/local/diann/final_quantification/main.nf — add scoring flag + blocked list
Related
Summary
quantmsdiann currently runs DIA-NN in Generic scoring mode (the default). DIA-NN offers three scoring modes that significantly affect FDR estimation and variant peptide confidence, but none are exposed as pipeline parameters.
Background: DIA-NN Scoring Modes
--proteoforms--peptidoformsPer DIA-NN documentation (PTMs and peptidoforms):
Why this matters for proteogenomics
When searching with a variant-containing FASTA (e.g., COSMIC mutations), the database contains both canonical and variant protein sequences. For missense mutations, the canonical and variant peptides often differ by a single amino acid and share most fragment ions:
In Generic mode:
In Proteoforms mode:
For frameshift/nonsense variants (completely different sequences from canonical), Generic mode is sufficient. But missense variants represent ~85% of detected COSMIC variants, making Proteoforms mode important for this use case.
Proposed change
Add a new parameter
diann_scoring_modewith three allowed values:In the DIA-NN modules, conditionally add the flag:
The flag should be added to all DIA-NN steps (in-silico library generation, preliminary analysis, individual analysis, final quantification) and added to each module's blocked flags list.
Additional output columns
When
--proteoformsor--peptidoformsis enabled, DIA-NN produces additional columns in the report:Peptidoform.Q.Value— run-specific peptidoform confidenceGlobal.Peptidoform.Q.Value— global peptidoform confidenceLib.Peptidoform.Q.Value— library peptidoform confidenceThese should be preserved in the output and documented.
DIA-NN version compatibility
--proteoforms: Available since DIA-NN 2.0--peptidoforms: Available since DIA-NN 1.8+--no-peptidoforms: Available since DIA-NN 1.8+ (disables automatic activation with--var-mod)Note: Since
--proteoformsrequires DIA-NN >= 2.0, the pipeline should validate that the selected DIA-NN version supports the chosen scoring mode.Files to modify
nextflow.config— adddiann_scoring_mode = 'generic'nextflow_schema.json— add parameter definition with enum and descriptionmodules/local/diann/insilico_library_generation/main.nf— add scoring flag + blocked listmodules/local/diann/preliminary_analysis/main.nf— add scoring flag + blocked listmodules/local/diann/individual_analysis/main.nf— add scoring flag + blocked listmodules/local/diann/final_quantification/main.nf— add scoring flag + blocked listRelated