nf-core/phaseimpute
A bioinformatics pipeline to phase and impute genetic data
genomicsgenotypeimputationlow-pass-sequencingphasing
Version history
What’s Changed
This release adds two new imputation tools: Beagle5 and Minimac4.
It also updates the templates to match the latest version of nf-core (v3.5.1).
Special thanks to Kübra Narci and Diego Alvarez S. for the review of this release.
Added
- #175 - Add support for all input files in
.jsonor.yamlformat. - #181 - Add nf-co2footprint plugin to the config file.
- #184 - Add support
.csiindex for.bamfiles. - #188 - Add documentation for all subworkflows.
- #210 - Add BEAGLE5 support for genotype imputation.
- #211 - Add MINIMAC4 support for genotype imputation.
Changed
- #166 - Bump version to 1.1.0dev and update
CHANGELOG.md. - #170 - Update TEMPLATE to nf-core tools version 3.1.2.
- #175 - Update TEMPLATE to nf-core tools version 3.2.0. Move
CHRCHECKfunctions to the workflow directory. - #182 - Add dark version of the metromap and dynamically change it in the README.
- #185 - Add
--sampleNames_fileoption forSTICHandQUILT. - #187 - Update modules and subworkflows.
- #197 - Update to nf-core/tools version 3.3.1 and update nf-test.
- #188 - Change name of
BAM_IMPUTE_GLIMPSE2toBAM\_VCF_IMPUTE_GLIMPSE2. - #199 - Update all modules to latest nf-core versions.
- #201 - Update TEMPLATE to nf-core tools version 3.3.2.
- #209 - Update TEMPLATE to nf-core tools version 3.4.1.
- #234 - Bump version to 1.1.0 for release and fix version error for
BAM/VCF-CHREXTRACT - #235 - Fix reviewer comments (@kubranarci)
- #236 - Fix reviewer 2 comments (@dialvarezs)
Fixed
- #166 - Fix depth type to
numberto enable float. - #179 - Fix VCF usage in
GLIMPSE2. - #183 - Remove wrongfully added files in
BAM_EXTRACT_REGION_SAMTOOLS. - #185 - Fix CSV generation and check that all mentioned path files exist.
- #189 - Set meta map id as string to avoid error when using numbers in csv files.
- #225 - Fix
CHRCHECKconfig - #230 - Fix
test_fullconfig
Dependencies
| Dependency | Old version | New version |
|---|---|---|
bcftools | 1.20 | 1.21 |
gunzip | 1.10 | 1.13 |
lbzip2 | 2.5 | |
multiqc | 1.27 | 1.29 |
r-stitch | 1.6.10 | 1.7.3 |
shapeit5 | 1.0.0 | 5.1.1 |
vcflib | 1.0.3 | 1.0.14 |
beagle5 | 5.2 | |
minimac4 | 4.1.6 |
Full Changelog: https://github.com/nf-core/phaseimpute/compare/1.0.0…1.1.0
Initial release of nf-core/phaseimpute, created with the nf-core template. Special thanks to Matthias Hörtenhuber, Mazzalab and Sofia Stamouli for the review of this release.
Added
- #20 - Added automatic detection of vcf contigs for the reference panel and automatic renaming available.
- #22 - Add validation step for concordance analysis. Input channels changed to match inputs steps. Outdir folder organised by steps. Modules config by subworkflows.
- #26 - Added QUILT method.
- #47 - Add possibility to remove samples from reference panel. Add glimpse2 chunking method. Add full-size test parameters.
- #58 - Add external params posfile and chunks. Add glimpse2 phasing and imputation.
- #67 - Export CSVs from each step.
- #71 - Allow external panel to be used in step impute.
- #97 - Add dog reference panel and config to test pipeline with other species.
- #102 - Add dog panel test.
- #119 - Add dog test with panelprep and imputation.
- #118 - Explain how to customize arguments in the pipeline.
- #111 - Add nf-test for all subworkflow, workflow, modules and functions.
- #131 - Set normalisation as optional. Fix extension detection function. Add support for validation with vcf files. Concatenate vcf only if more than one file. Change
--phasedto--phasefor consistency. - #143 - Improve contigs warning and error logging. The number of chromosomes contigs is summarized if above
max_chr_names. - #146 - Add
seedparameter forQUILT. - #164 - Add additional requirement on input schema
"uniqueEntries": ["panel", "chr"]andendshould be greater thanstartin regions.
Changed
- #18 - Maps and region by chromosome. Update tests config files. Correct meta map propagation.
test_imputeandtest_simworks. - #19 - Changed reference panel to accept a csv, update modules and subworkflows (glimpse1/2 and shapeit5)
- #40 - Add
STITCHmethod. Reorganize panelprep subworkflows. - #51 - Update all process and fix linting errors. Remove
FASTQCadded by the template. - #56 - Move to nf-test to check the output files names generated. Fix validation and concatenation by chromosomes missing. Add dedicated GLIMPSE1 subworkflow. Fix posfile generation to be done once for glimpse and stitch.
- #68 -
QUILTcan handle external params chunks and hap-legend files. - #78 - Separate validate step from panel preparation.
- #84 - Change depth computation to use
SAMTOOLS_DEPTHand make separation by chromosome only if regions are specified. - #85 - Use external params in individual tests for tools.
- #86 - Move
BCFTOOLS_CONVERTtoVCF_SITES_EXTRACT_BCFTOOLS. - #88 - Improve multiQC report with more information.
- #91 - Update metro map with all steps and remove deprecated ones.
- #93 - Add support for CRAM file.
- #93 - Check contigs name at workflow level for BAM and VCF.
- #93 - Samples remove with multi-allelics records.
- #93 - Samtools merge in
BAM_REGIONsubworkflow. - #93 - Fix glimpse2_phase output file names.
- #93 - Fix fai combination to fasta.
- #96 - Simplify csv export
- #96 - Use only legend file as posfile for all imputation workflow.
- #100 - Update bcftools, samtools, … nf-core modules. All indexing is now done with the file creation for most of them.
- #101 - Set
--compute_freqasfalseby default. - #102 - Compute chr name from whole vcf.
- #102 - Only warn the user if some contigs are absent from files, the regions to compute is now the intersection of regions, panel, posfile, chunks, map.
- #102 - Update all test and recompute snapshot to match new version of the phaseimpute test dataset.
- #103 - Update
GLIMPSE2_PHASE,GUNZIPandMULTIQC - #135 - Impute by batch of 100 individuals by default using
--batch_sizeparameter. All individuals BAM files are gathered and VCF are allowed forGLIMPSE1andGLIMPSE2. Channel preprocessing of stitch is done in stitch subworkflow. Genotype likelihood computation forGLIMPSE1is now done outside of the subworkflow and merge the resulting vcf with all the samples. New test added to check batch separation. Improveusage.mddocumentation. Add validation to initialization of the pipeline to ensure compatibility between tools, steps and the files provided by the user. - #139 - Update all nf-core modules.
- #146 - Remove conda CI check for PR due to Nextflow error.
- #144 - Documentation updates.
- #148 - Fix AWS fulltest github action for manual dispatch.
- #149 - Remove the map file from the AWS fulltest.
- #152 - Fix URLs in the documentation and remove tools citation in the README, use a white background for all images in the documentation.
- #153 - Update and simplify subworkflows snapshot and check only for files names (no md5sum for bam and vcf files due to timestamp).
- #157 - Add
chunk_modelas parameter for better control overGLIMPSE2_CHUNKand set window size inGLIMPSE1_CHUNKandGLIMPSE2_chunkto 4mb to reduce number of chunks (empirical). - #160 - Improve
CHANGELOG.mdand add details tousage.md - #158 - Remove frequency computation and phasing from full test to reduce cost and computational time.
- #164 - Rename
BAM_REGION_SAMTOOLStoBAM_EXTRACT_REGION_SAMTOOLS. RemoveGLIMPSE2_SPLITREFERENCEas it is not used. Add more steps totest_allprofile for more exhaustivity. - #163 - Improve configuration for demanding processes. Use Genome in a Bottle VCF benchmarking file for AWS full test. Moved from
glimpse1toglimpse2for the full test profile. - #165 - Update metro map and add logo to the documentation.
Fixed
- #15 - Changed test csv files to point to nf-core repository.
- #16 - Removed
outdirfrom test config files. - #65 - Separate stitch output by individuals.
- #75 - Set frequency computation with
VCFFIXUPprocess as optional with--compute_freq. UseGLIMPSE_CHUNKon panel vcf to compute the chunk and not makewindows on fasta. - #117 - Fix directories in CSV.
- #151 - Fix
Type not supported: class org.codehaus.groovy.runtime.GStringImplerror due toStringtest ingetFileExtension(). - #158 - Fix contigs usage when regions is only a subset of the given contigs (e.g. if panel file has the 22 chr and the region file only 2 then only the 2 common will be processed). Fix
multiQCsamples names for better comprehension. Fix-resumeerrors whench_fastais use by addingcache = 'lenient'in necessary processes. Fix--window-sizeofGLIMPSE_CHUNKfrom4to4000000. - #153 - Fix getFileExtension function. Fix image in
usage.md. Fix small warnings and errors with updated language server.defhas been added when necessary,:use instead of,in assertions,_added to variables not used in closures,forloop replaced by.each{}, remove unused code / input. - #161 - Fix
VCF_SPLIT_BCFTOOLSwhen only one sample present by updatingBCFTOOLS_PLUGINSPLITand addingBCFTOOLS_QUERYto get truth samples names for renaming the resulting files. - #162 - Fix
faiusage when provided bygenomesparameter. - #164 - Improve documentation writing
- #163 - Fix MULTIQC samples names (add post-processing for clean up
FILTER_CHR_DWN,FILTER_CHR_INP,GAWK_ERROR_SPL,GAWK_RSQUARE_SPL). Fix output panelpublisDir. Fix java version to17inci.ymldue to new nextflow version.
Dependencies
| Dependency | New version |
|---|---|
bcftools | 1.20 |
bedtools | 2.31.1 |
gawk | 5.3.0 |
glimpse-bio | 1.1.1 |
glimpse-bio | 2.0.1 |
gunzip | 1.10 |
htslib | 1.21 |
multiqc | 1.25.1 |
r-quilt | 1.0.5 |
r-stitch | 1.6.10 |
samtools | 1.21 |
shapeit5 | 1.0.0 |
tabix | 1.11 |
vcflib | 1.0.3 |
Contributors
Louis Le Nézet Anabella Trigila Eugenia Fontecha Maxime U Garcia Matias Romero Victorica Nicolas Schcolnicov Hemanoel Passarelli Matthias Hörtenhuber Sofia Stamouli