Add parentage functions by josuechinchilla · Pull Request #57 · Breeding-Insight/BIGr

josuechinchilla · 2026-04-09T17:29:12Z

Added parentage functions, updated thinSNP importFrom statements and updated documentation

…t/BIGr into ped_indels_update

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Madc2vcf updates

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…ht/BIGr into ped_indels_update

Modified Sire and Dam calls to male_parent and female_parent to be species agnostic. Modified the test files to accomodate the changes in functions Modified importFrom statement for thinSNP to call specific functions and resolve warnings when installing BIGr Updated package documentation

Copilot

Pull request overview

This PR expands BIGr’s pedigree and MADC processing capabilities by adding new parentage-related functions, a new polyRAD-based multiallelic MADC→VCF converter, and broader MADC sanity checking, along with associated documentation, tests, and CI updates.

Changes:

Added validate_pedigree() and find_parentage() for trio validation and parent assignment using Mendelian/homozygous mismatch logic.
Added check_madc_sanity() and madc2vcf_multi() plus supporting utilities (vmsg, URL handling) and updated imports/namespace/news.
Expanded and updated test coverage and CI (including external PanelHub-based integration-style tests).

Reviewed changes

Copilot reviewed 41 out of 42 changed files in this pull request and generated 16 comments.

Show a summary per file

File	Description
R/validate_pedigree.R	New trio validation + correction/fill outputs for pedigrees.
R/find_parentage.R	New parentage assignment function (single-parent + best-pair methods).
R/madc2vcf_multi.R	New polyRAD pipeline entrypoint for multiallelic MADC→VCF conversion.
R/check_madc_sanity.R	New MADC “sanity check” helper and `check_botloci()` relocation.
R/get_countsMADC.R	Expanded MADC count extraction API (supports object input + collapsing matches).
R/utils.R	Added `vmsg()` and `url_exists()` utilities; updated globalVariables.
R/filterVCF.R	Added `quality.rates` outputs and modified console/output behavior.
R/imputation_concordance.R	Extended concordance API (plotting + print control) and updated docs.
R/check_ped.R	Reworked pedigree QC workflow/reporting and added interactive saving behavior.
R/thinSNP.R	Narrowed imports to `importFrom` for slimmer namespace.
tests/testthat/test-validate_pedigree.R	New unit tests for `validate_pedigree()`.
tests/testthat/test-find_parentage.R	New unit tests for `find_parentage()`.
tests/testthat/test-check_madc_sanity.R	Added tests for `check_madc_sanity()` using external fixtures.
tests/testthat/test-madc2vcf_multi.R	Added tests for `madc2vcf_multi()` (PanelHub fixtures).
tests/testthat/test-madc2vcf_targets.R	Large expansion of MADC→VCF target tests (PanelHub fixtures).
tests/testthat/test-madc2vcf_all.R	Large expansion of MADC→VCF all-sites tests (PanelHub fixtures).
tests/testthat/test-check_ped.R	Updated expectations for changed `check_ped()` behavior.
tests/testthat/test-imputation_concordance.R	Minor formatting update to concordance tests.
tests/testthat/corrected_pedigree.txt	Added corrected pedigree fixture/output artifact.
tests/testthat/.gitignore	Ignores corrected pedigree output in test directory.
NEWS.md	Added release notes for 0.7.0 / recent MADC and utility changes.
DESCRIPTION	Version bump to 0.7.0; added imports/suggests updates.
NAMESPACE	Exported new functions and added required imports.
man/*.Rd	Generated/updated documentation for new/changed functions.
.github/workflows/R-CMD-check.yaml	CI installs polyRAD and VariantAnnotation for test execution.
.gitignore	Added `.DS_Store`.
BIGr.Rproj	Added ProjectId.
CRAN-SUBMISSION	Removed.
cran-comments.md	Removed.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-09T17:36:03Z

R/validate_pedigree.R

+  genos_mat <- base::as.matrix(genos, rownames = "ID")
+
+  # Homozygous-only matrix
+  genos_hom   <- data.table::copy(genos)
+  marker_cols <- base::setdiff(base::names(genos_hom), "ID")
+  for (col in marker_cols) {
+    genos_hom[base::get(col) == 1, (col) := NA_integer_]
+  }
+  genos_hom_mat <- base::as.matrix(genos_hom, rownames = "ID")


as.matrix() does not accept a rownames = argument here, and converting the full genos table (including the ID column) would coerce the entire matrix to character, breaking all downstream numeric comparisons. Build the genotype matrix from marker columns only, then set rownames(genos_mat) <- genos$ID explicitly.

Suggested change

genos_mat <- base::as.matrix(genos, rownames = "ID")

# Homozygous-only matrix

genos_hom <- data.table::copy(genos)

marker_cols <- base::setdiff(base::names(genos_hom), "ID")

for (col in marker_cols) {

genos_hom[base::get(col) == 1, (col) := NA_integer_]

}

genos_hom_mat <- base::as.matrix(genos_hom, rownames = "ID")

marker_cols <- base::setdiff(base::names(genos), "ID")

genos_mat <- base::as.matrix(genos[, ..marker_cols])

base::rownames(genos_mat) <- genos$ID

# Homozygous-only matrix

genos_hom <- data.table::copy(genos)

for (col in marker_cols) {

genos_hom[base::get(col) == 1, (col) := NA_integer_]

}

genos_hom_mat <- base::as.matrix(genos_hom[, ..marker_cols])

base::rownames(genos_hom_mat) <- genos_hom$ID

Copilot · 2026-04-09T17:36:04Z

R/validate_pedigree.R

+  for (col in marker_cols) {
+    genos_hom[base::get(col) == 1, (col) := NA_integer_]
+  }
+  genos_hom_mat <- base::as.matrix(genos_hom, rownames = "ID")


Same issue as above for genos_hom_mat: as.matrix(genos_hom, rownames = "ID") will either error (unused argument) or include the ID column and coerce to character. Create the homozygous-only numeric matrix from marker columns and set rownames from genos_hom$ID.

Suggested change

genos_hom_mat <- base::as.matrix(genos_hom, rownames = "ID")

genos_hom_mat <- base::as.matrix(genos_hom[, ..marker_cols])

base::rownames(genos_hom_mat) <- genos_hom$ID

Copilot · 2026-04-09T17:36:04Z

R/validate_pedigree.R

+      comparisons <- base::sum(!base::is.na(cand_hom) & !base::is.na(prog_hom))
+      if (comparisons == 0) return(NA_real_)
+      (base::sum(cand_hom != prog_hom, na.rm = TRUE) / comparisons) * 100
+    })


which.min(errors) returns integer(0) when all entries are NA (e.g., when comparisons == 0 for every candidate). In that case this returns id = character(0) / error_pct = numeric(0) rather than NA. Add an explicit guard like if (all(is.na(errors))) (or length(best_idx)==0) to return NA_character_ / NA_real_.

Suggested change

})

})

if (base::all(base::is.na(errors))) {

return(base::list(id = NA_character_, error_pct = NA_real_))

}

Copilot · 2026-04-09T17:36:05Z

R/find_parentage.R

+    parent_genos  <- base::as.matrix(genos_hom[ID %in% parent_ids], rownames = "ID")
+    progeny_genos <- base::as.matrix(genos_hom[ID %in% progeny_candidates$ID], rownames = "ID")


as.matrix(genos_hom[...], rownames = "ID") is not a valid as.matrix() call and (even if it worked) would include the ID column and coerce to character. For mismatch calculations, subset marker columns only and set rownames from the ID column explicitly.

Suggested change

parent_genos <- base::as.matrix(genos_hom[ID %in% parent_ids], rownames = "ID")

progeny_genos <- base::as.matrix(genos_hom[ID %in% progeny_candidates$ID], rownames = "ID")

parent_dt <- genos_hom[ID %in% parent_ids]

parent_genos <- base::as.matrix(parent_dt[, ..marker_cols])

base::rownames(parent_genos) <- parent_dt$ID

progeny_dt <- genos_hom[ID %in% progeny_candidates$ID]

progeny_genos <- base::as.matrix(progeny_dt[, ..marker_cols])

base::rownames(progeny_genos) <- progeny_dt$ID

Copilot · 2026-04-09T17:36:05Z

R/find_parentage.R

+  }
+  #### Logic for Best Pair Method ####
+  if (method == "best_pair") {
+    genos_mat    <- base::as.matrix(genos, rownames = "ID")


genos_mat <- as.matrix(genos, rownames = "ID") has the same problem: invalid argument and would include ID causing character coercion. Create a numeric marker matrix with rownames(genos_mat) <- genos$ID and exclude the ID column before comparisons.

Suggested change

genos_mat <- base::as.matrix(genos, rownames = "ID")

genos_mat <- base::as.matrix(genos[, !("ID"), with = FALSE])

base::rownames(genos_mat) <- genos$ID

Copilot · 2026-04-09T17:36:07Z

tests/testthat/test-madc2vcf_all.R

+test_that("simu alfalfa",{
+
+  github_path <- "https://raw.githubusercontent.com/Breeding-Insight/BIGapp-PanelHub/refs/heads/long_seq/"
+
+  # External alfalfa test files
+  alfalfa_madc           <- paste0(github_path, "test_madcs/alfalfa_madc.csv")


Same nested-test structure issue here: multiple test_that() calls are nested inside an outer test_that("simu alfalfa", ...). Please refactor to avoid nested test_that() and gate network-heavy tests with skip_on_cran() (and possibly skip_on_ci() if they are slow/flaky).

Copilot · 2026-04-09T17:36:08Z

tests/testthat/test-check_madc_sanity.R

+  github_path <- "https://raw.githubusercontent.com/Breeding-Insight/BIGapp-PanelHub/refs/heads/long_seq/test_madcs/"
+  names <- c("Columns", "FixAlleleIDs", "IUPACcodes", "LowerCase", "Indels", "ChromPos", "allNAcol", "allNArow", "RefAltSeqs", "OtherAlleles")
+
+  # raw madc
+  report <- read.csv(paste0(github_path,"/alfalfa_madc_raw.csv"))
+


These tests download MADC fixtures directly from GitHub via read.csv() but do not call skip_if_offline() / skip_on_cran(). This will fail in offline/CRAN environments. Add appropriate skips (or vendor minimal fixtures into tests/testthat/) before attempting network I/O.

Copilot · 2026-04-09T17:36:08Z

R/madc2vcf_multi.R

+#' This function converts a DArTag MADC file to a VCF using the polyRAD package's
+#' `readDArTag` and `RADdata2VCF` pipeline. It runs `check_madc_sanity` before
+#' loading the data, applies corrections for lowercase sequences and all-NA
+#' rows/columns, and sets `n.header.rows` automatically based on whether the
+#' MADC file follows the raw DArT format (6 header rows) or the fixed allele ID
+#' format (no header rows).


The documentation states the function supports raw DArT MADC format and “sets n.header.rows automatically”, but the implementation stops when FixAlleleIDs is FALSE and always calls readDArTag(..., n.header.rows = 0L). Please align the docs with actual behavior (either document that raw MADCs are rejected, or implement the advertised auto-detection).

Copilot · 2026-04-09T17:36:08Z

R/find_parentage.R

+  if (!"Sex" %in% base::colnames(all_parents)) {
+    warning("No 'Sex' column in parents file. All parents treated as ambiguous ('A').")
+    all_parents[, Sex := "A"]
+  }


If a Sex column exists but contains missing values (NA), toupper(NA) stays NA and the subsequent %in% c("M","A","NA") filters will drop those rows rather than treating them as ambiguous. Consider normalizing missing/blank Sex values to "A" (or explicitly including is.na(Sex) in the candidate filters) so candidate sets don't silently shrink.

Suggested change

}

}

all_parents[, Sex := base::as.character(Sex)]

all_parents[, Sex := trimws(Sex)]

all_parents[is.na(Sex) | Sex == "", Sex := "A"]

Copilot · 2026-04-09T17:36:08Z

R/check_madc_sanity.R

+#' Performs nine quick validations on an allele report:
+#' 1) **Columns** - required columns are present (`CloneID`, `AlleleID`, `AlleleSequence`);
+#' 2) **FixAlleleIDs** - first column's first up-to-6 rows are not all blank or `"*"`
+#'    *and* both `_0001` and `_0002` appear in `AlleleID`;
+#' 3) **IUPACcodes** - presence of non-ATCG characters in `AlleleSequence`;


The roxygen docs say “nine quick validations” and enumerate 1–9, but the implementation also computes an OtherAlleles check. Please update the description/return docs to reflect the actual set/count of checks returned in checks.

Cristianetaniguti and others added 30 commits October 3, 2025 15:11

indels support for madc2vcf_targets

778aefa

updated check_ped to save corrected dataframe and report

1b761b9

reorganized report and fixed language

743043a

bugfix - if hapDB padding is not matching report

0b97b46

added option to print plot or list to imputation_concordance

82279af

ignore DS_STore

6b81982

added option to print pre-filtering depth and genotyping rate

8205e4e

added calculation for Ho

31248e3

up version

757b01c

Merge branch 'check_ped_update' of https://github.com/Breeding-Insigh…

0934210

…t/BIGr into ped_indels_update

merge dev branches

e18b2c6

Merge branch 'development' into ped_indels_update

768ab93

opt messages

5c0b590

messages ok

9afb265

targets okay

c31118d

targets ok

5d54f0d

Potential fix for pull request finding

ee50981

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Potential fix for pull request finding

d3a4061

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Potential fix for pull request finding

f765c7c

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Potential fix for pull request finding

87bb1fc

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Merge branch 'ped_indels_update' into madc2vcf_all_updates

7c12d49

Merge pull request #53 from Breeding-Insight/madc2vcf_all_updates

df6fe92

Madc2vcf updates

Update R/madc2vcf_targets.R

6059c10

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update R/check_madc_sanity.R

b09b0c1

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update R/get_countsMADC.R

409dbd3

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update R/get_countsMADC.R

e6fce19

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update R/check_madc_sanity.R

669ac4e

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

fix tests

bbfbee2

Merge branch 'ped_indels_update' of https://github.com/Breeding-Insig…

38c3564

…ht/BIGr into ped_indels_update

madc2vcf_all indels support okay

55ee61a

Cristianetaniguti and others added 20 commits March 31, 2026 09:56

madc2vcf_all support indel

bf5ff4c

add support for Others

291ae8e

up version

84852da

add madc2vcf_multi

96a4ed1

fix checks

cec168d

fix checks 2

0be2e0f

add VariantAnnotation to test env

33fc87c

ignore madc2vcf_multi tests in actions

77107ba

more messages and tests

ccf9e77

bugfix

8a00c9e

update man

f2013e3

minor version up

b01c12b

added v1 of parentage function

d0e02e2

modified code to use data.table for increased efficiency

5baae79

improved assign_parentage and validate_parentage

a252f22

finalized parentage functions for diploids and test files

bf0a468

Added parentage functions and updated associated files

01e943d

updated headers and importFrom for functions along iwth namespace

8ee65a3

deleted cra check files

e5b2004

josuechinchilla requested review from alex-sandercock and Copilot April 9, 2026 17:29

Copilot started reviewing on behalf of josuechinchilla April 9, 2026 17:29 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parentage functions#57

Add parentage functions#57
josuechinchilla wants to merge 50 commits intodevelopmentfrom
add_parentage_functions

josuechinchilla commented Apr 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	genos_hom_mat <- base::as.matrix(genos_hom, rownames = "ID")
	genos_hom_mat <- base::as.matrix(genos_hom[, ..marker_cols])
	base::rownames(genos_hom_mat) <- genos_hom$ID

-    })
+    })
+    if (base::all(base::is.na(errors))) {
+      return(base::list(id = NA_character_, error_pct = NA_real_))
+    }

		parent_genos <- base::as.matrix(genos_hom[ID %in% parent_ids], rownames = "ID")
		progeny_genos <- base::as.matrix(genos_hom[ID %in% progeny_candidates$ID], rownames = "ID")

-    parent_genos  <- base::as.matrix(genos_hom[ID %in% parent_ids], rownames = "ID")
-    progeny_genos <- base::as.matrix(genos_hom[ID %in% progeny_candidates$ID], rownames = "ID")
+    parent_dt <- genos_hom[ID %in% parent_ids]
+    parent_genos <- base::as.matrix(parent_dt[, ..marker_cols])
+    base::rownames(parent_genos) <- parent_dt$ID
+    progeny_dt <- genos_hom[ID %in% progeny_candidates$ID]
+    progeny_genos <- base::as.matrix(progeny_dt[, ..marker_cols])
+    base::rownames(progeny_genos) <- progeny_dt$ID

	genos_mat <- base::as.matrix(genos, rownames = "ID")
	genos_mat <- base::as.matrix(genos[, !("ID"), with = FALSE])
	base::rownames(genos_mat) <- genos$ID

-  }
+  }
+  all_parents[, Sex := base::as.character(Sex)]
+  all_parents[, Sex := trimws(Sex)]
+  all_parents[is.na(Sex) | Sex == "", Sex := "A"]

Conversation

josuechinchilla commented Apr 9, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants