ukbflow

RAP-Native R Workflow for UK Biobank Analysis

📚 Documentation • 🚀 Get Started • 💬 Issues • 🤝 Contributing

Languages: English | 简体中文

Note

🎉 2026-04 — ukbflow is now available on CRAN! Install with install.packages("ukbflow").

Overview

ukbflow provides a streamlined, RAP-native R workflow for UK Biobank analysis — from phenotype extraction and disease derivation to association analysis and publication-quality figures.

UK Biobank Data Policy (2024+): Individual-level data must remain within the RAP environment. Only summary-level outputs may be downloaded locally. All ukbflow functions are designed with this constraint in mind.

library(ukbflow)

# Simulate UKB-style data locally (on RAP: replace with extract_batch() + job_wait())
data <- ops_toy(n = 5000, seed = 2026) |>
  derive_missing()

# Derive lung cancer outcome (ICD-10 C34) and follow-up time
data <- data |>
  derive_icd10(name = "lung", icd10 = "C34",
               source = c("cancer_registry", "hes")) |>
  derive_followup(name        = "lung",
                  event_col   = "lung_icd10_date",
                  baseline_col = "p53_i0",
                  censor_date  = as.Date("2022-10-31"),
                  death_col    = "p40000_i0")

# Define exposure: ever vs. never smoker
data[, smoking_ever := factor(
  ifelse(p20116_i0 == "Never", "Never", "Ever"),
  levels = c("Never", "Ever")
)]

# Cox regression: smoking → lung cancer (3-model adjustment)
res <- assoc_coxph(data,
  outcome_col  = "lung_icd10",
  time_col     = "lung_followup_years",
  exposure_col = "smoking_ever",
  covariates   = c("p21022", "p31", "p22189"))

# Forest plot
res_df <- as.data.frame(res)
plot_forest(
  data      = res_df,
  est       = res_df$HR,
  lower     = res_df$CI_lower,
  upper     = res_df$CI_upper,
  ci_column = 2L
)

Installation

# From CRAN (recommended)
install.packages("ukbflow")

# Latest development version from GitHub
pak::pkg_install("evanbio/ukbflow")

# or
remotes::install_github("evanbio/ukbflow")

Requirements: R ≥ 4.1 · dxpy (dx-toolkit, required for RAP interaction)

pip install dxpy

Core Features

Layer	Key Functions	Description
Connection	`auth_login`, `auth_select_project`	Authenticate to RAP via dx-toolkit
Data Access	`fetch_metadata`, `extract_batch`, `job_wait`	Retrieve phenotype data from UKB dataset on RAP
Data Processing	`decode_names`, `decode_values`, `derive_icd10`, `derive_followup`, `derive_case`	Harmonize multi-source records; derive analysis-ready cohort
Association Analysis	`assoc_coxph`, `assoc_logistic`, `assoc_subgroup`	Three-model adjustment; subgroup & trend analysis
Genomic Scoring	`grs_bgen2pgen`, `grs_score`, `grs_standardize`	Distributed plink2 scoring on RAP worker nodes
Visualization	`plot_forest`, `plot_tableone`	Publication-ready figures & tables
Utilities	`ops_setup`, `ops_toy`, `ops_na`, `ops_snapshot`, `ops_withdraw`	Environment check, synthetic data, pipeline diagnostics, and cohort management

Function Reference

Auth & Fetch

auth_login(), auth_status(), auth_logout(), auth_list_projects(), auth_select_project() — RAP authentication
fetch_ls(), fetch_tree(), fetch_url(), fetch_file() — RAP file system
fetch_metadata(), fetch_field() — UKB metadata shortcuts

Extract & Decode

extract_ls(), extract_pheno(), extract_batch() — phenotype extraction
decode_values() — integer codes → human-readable labels
decode_names() — field IDs → snake_case column names

Job Monitoring

job_status() — query job status by ID
job_wait() — block until job completes (with timeout)
job_path() — get output path of a completed job
job_result() — retrieve job result object
job_ls() — list recent jobs

Derive — Phenotypes

derive_missing() — handle "Do not know" / "Prefer not to answer"
derive_covariate() — type conversion + summary
derive_cut() — bin continuous variables
derive_selfreport() — self-reported disease status + date
derive_hes() — HES inpatient ICD-10
derive_first_occurrence() — First Occurrence fields
derive_cancer_registry() — cancer registry
derive_death_registry() — death registry
derive_icd10() — combine sources (wrapper)
derive_case() — merge self-report + ICD-10

Derive — Survival

derive_timing() — prevalent vs. incident classification
derive_age() — age at event
derive_followup() — follow-up end date and duration

Association Analysis

assoc_coxph() / assoc_cox() — Cox proportional hazards (HR)
assoc_logistic() / assoc_logit() — logistic regression (OR)
assoc_linear() / assoc_lm() — linear regression (β)
assoc_coxph_zph() — proportional hazards assumption test
assoc_subgroup() — stratified analysis + interaction LRT
assoc_trend() — dose-response trend + p_trend
assoc_competing() — Fine-Gray competing risks (SHR)
assoc_lag() — lagged exposure sensitivity analysis

Visualisation

plot_forest() — forest plot (PNG / PDF / JPG / TIFF, 300 dpi)
plot_tableone() — Table 1 (DOCX / HTML / PDF / PNG)

Utilities & Diagnostics

ops_setup() — environment health check (dx CLI, RAP auth, R packages)
ops_toy() — generate synthetic UKB-like data for development and testing
ops_na() — summarise missing values (NA and "") across all columns
ops_snapshot() — record pipeline checkpoints and track dataset changes
ops_snapshot_cols() — retrieve column list from a saved snapshot
ops_snapshot_diff() — compare columns between two snapshots
ops_snapshot_remove() — remove columns added after a given snapshot
ops_set_safe_cols() — define protected columns that ops_snapshot_remove will not drop
ops_withdraw() — exclude UKB withdrawn participants from a cohort

GRS Pipeline

grs_check() — validate SNP weights file
grs_bgen2pgen() — convert BGEN → PGEN on RAP (submits cloud jobs)
grs_score() — score GRS across chromosomes with plink2
grs_standardize() / grs_zscore() — Z-score standardisation
grs_validate() — OR/HR per SD, high vs low, trend, AUC/C-index

Documentation

Full vignettes and function reference:

https://evanbio.github.io/ukbflow/

Contributing

Bug reports, feature requests, and pull requests are welcome. See CONTRIBUTING.md.

License

Made with ❤️ by Yibin Zhou

⬆ Back to Top

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
R		R
build		build
inst		inst
man		man
tests		tests
vignettes		vignettes
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
MD5		MD5
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ukbflow

RAP-Native R Workflow for UK Biobank Analysis

Overview

Installation

Core Features

Function Reference

Documentation

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ukbflow

RAP-Native R Workflow for UK Biobank Analysis

Overview

Installation

Core Features

Function Reference

Documentation

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages