Title: | Genome-wide Association Studies for Autopolyploids |
---|---|
Description: | Designed for genome-wide association studies in autopolyploids. |
Authors: | Jeffrey B. Endelman, Umesh R. Rosyara |
Maintainer: | Jeffrey Endelman <[email protected]> |
License: | GPL-3 |
Version: | 2.13 |
Built: | 2024-12-10 06:00:18 UTC |
Source: | https://github.com/jendelman/GWASpoly |
Test markers as QTL under backward elimination
fit.QTL(data, trait, qtl, fixed = NULL)
fit.QTL(data, trait, qtl, fixed = NULL)
data |
variable inheriting from class |
trait |
name of trait |
qtl |
data frame to specify the multi-QTL model (see Details) |
fixed |
data frame to specify the fixed effects (see Details) |
qtl
is a data frame with columns "Marker" and "Model", where each row corresponds to a QTL. fixed
is a data frame with columns "Effect" and "Type": the first column is the name of the effect, which must match a column in the phenotype input file, and the second column is either "factor" or "numeric". The p-value and R2 for each marker are based on the likelihood ratio test under backward elimination, comparing the deviance to the chi-squared distribution.
data frame with partial r2 and p-values
Output a table with significant markers
get.QTL(data, traits = NULL, models = NULL, bp.window = NULL)
get.QTL(data, traits = NULL, models = NULL, bp.window = NULL)
data |
Output from |
traits |
Vector of trait names (by default, all traits) |
models |
Vector of model names (by default, all models) |
bp.window |
prune output to return only the most significant marker within this window size |
To return all significant markers (original behavior of the function), use bp.window=NULL
. Assumes input map position in bp.
Data frame with results. Score = -log10(p). Effect = marker effect (not available for the general and diplo-general models because there are multiple effects).
Compute marker significance scores
GWASpoly(data, models, traits = NULL, params = NULL, n.core = 1, quiet = F)
GWASpoly(data, models, traits = NULL, params = NULL, n.core = 1, quiet = F)
data |
Output from |
models |
Vector of model names |
traits |
Vector trait names (by default, all traits) |
params |
Optional list of params created by |
n.core |
Number of cores for parallel computing |
quiet |
TRUE/FALSE whether to suppress output charting progress |
The following marker-effect models are available:
"additive": Indicates the marker effect is proportional to the dosage of the alternate allele
"X-dom": where X can be any integer between 1 and ploidy/2 and refers to the allele dosage needed for complete dominance (e.g., "1-dom" = simplex dominance, "2-dom" = duplex dominance). The software tries both dominance patterns for a given dosage model, e.g., whether the reference or alternate allele is dominant
"diplo-general": All heterozygotes have the same effect
"diplo-additive": All heterozygotes have the same effect, constrained to be halfway between the homozygous effects
"general": There are no constraints on the effects of the different dosage levels
To specify additional model parameters, such as the inclusion of fixed effects (Q matrix) and the minimum minor allele frequency, use set.params
Variable of class GWASpoly.fitted
S4 class with genotype and phenotype data
map
data frame with columns Marker,Chrom,Position,Ref,Alt
pheno
data frame of phenotypes
geno
matrix (individuals x markers) of allele dosages (0,1,2,...ploidy)
fixed
data frame of fixed effects
ploidy
ploidy
S4 class with results from genome-wide scan
map
data frame with columns Marker,Chrom,Position,Ref,Alt
pheno
data frame of phenotypes
geno
matrix with allele dosages
fixed
data frame of fixed effects
ploidy
ploidy
K
list of covariance matrices
scores
-log10(p) results
effects
estimated marker effects
params
parameters used for the analysis
S4 class with genotypes, phenotypes, and polygenic covariance
map
data frame with columns Marker,Chrom,Position,Ref,Alt
pheno
data frame of phenotypes
geno
matrix with allele dosages
fixed
data frame of fixed effects
ploidy
ploidy
K
list of covariance matrices (one for each chromosome)
S4 class with results from genome-wide scan and detection threshold
map
data frame with columns Marker,Chrom,Position,Ref,Alt
pheno
data frame of phenotypes
geno
matrix with allele dosages
fixed
data frame of fixed effects
ploidy
ploidy
K
list of covariance matrices
scores
-log10(p) results
effects
estimated marker effects
params
parameters used for the analysis
threshold
thresholds for significance
Plot LD vs distance
LD.plot(data, max.pair = 10000, dof = 8, max.loci = NULL, position = "bp")
LD.plot(data, max.pair = 10000, dof = 8, max.loci = NULL, position = "bp")
data |
variable inheriting from class |
max.pair |
maximum number of r2 pairs for the spline |
dof |
degrees of freedom for the spline |
max.loci |
maximum number of markers to use per chromosome |
position |
"bp" or "cM" |
A monotone decreasing, convex spline is fit using R package scam
.
ggplot2 object
Create Manhattan plot
manhattan.plot(data, traits = NULL, models = NULL, chrom = NULL)
manhattan.plot(data, traits = NULL, models = NULL, chrom = NULL)
data |
Variable of class |
traits |
Vector of trait names (by default, all traits plotted) |
models |
Vector of model names (by default, all models plotted) |
chrom |
optional, to plot only one chromosome |
Results for the ref and alt versions of the dominance model are combined. If data
is the output from set.threshold
, then the threshold is displayed as a horizontal dashed line when models
contains a single model. Because the threshold varies between models, it is not drawn when multiple models are included. Although the ref and alt versions of each dominance model are slightly different (as seen with qq.plot
), they are treated as a single model for the Manhattan plot, and the average threshold is shown.
ggplot2 object
Inspect p-value inflation using a QQ plot
qq.plot(data, trait, models = NULL)
qq.plot(data, trait, models = NULL)
data |
Variable of class |
trait |
Trait name |
models |
Vector of model names (by default, all models plotted) |
ggplot2 object
Read in marker and phenotype data
read.GWASpoly(ploidy, pheno.file, geno.file, format, n.traits, delim = ",")
read.GWASpoly(ploidy, pheno.file, geno.file, format, n.traits, delim = ",")
ploidy |
Ploidy (e.g., 2 for diploid, 4 for tetraploid) |
pheno.file |
Name of the phenotype file |
geno.file |
Name of the genotype file |
format |
Format for the marker data. See details. |
n.traits |
Number of traits |
delim |
Character to indicate the delimiter in the data files (e.g., "," for csv, "\t" for tab-delimited) |
The first column of the phenotype file contains the genotype identifier, columns 2 through (n.traits + 1) contain trait values, and subsequent columns contain the levels (for factors) or numeric values (for covariates) of any fixed effects. The first three columns of the genotype file are (1) marker name, (2) chromosome, and (3) position. Optionally, columns 4 and 5 can be REF and ALT, respectively. Subsequent columns contain the marker data for each individual in the population. Marker data can be coded in one of three formats:
"numeric": markers are coded based on the dosage of the alternate allele, taking on values between 0 and ploidy
"AB": e.g., AAAB, ABBB for tetraploids
"ACGT": e.g., AAAT, GGCC for tetraploids
Only bi-allelic markers are allowed. As of version 2.02 of the package, fractional values of dosage are allowed for the "numeric" format, with missing values imputed by the population mean for each marker. The fractional values are only used for the additive genetic model; for the other models, dosages are rounded to the nearest whole number. If the input allele dosages are whole numbers, then missing values are imputed with the population mode (most frequent value) for each marker.
Variable of class GWASpoly.data
Set covariance matrix for polygenic effect
set.K(data, K = NULL, n.core = 1, LOCO = NULL)
set.K(data, K = NULL, n.core = 1, LOCO = NULL)
data |
Output from |
K |
Optional: user-supplied matrix |
n.core |
Number of cores for parallel computing |
LOCO |
TRUE/FALSE, whether to use leave-one-chromosome-out |
When LOCO
= TRUE, K is computed for each chromosome as $K=MM'$, where M is the centered genotype matrix (lines x markers), and scaled to have unit diagonal (the overall scaling is not important for GWAS). When LOCO
= FALSE, a single K matrix is computed for all markers (this was the original behavior of the function). Alternatively, the user can supply their own positive semidefinite K, with row.names that match the genotype identifiers (this option cannot be used with LOCO).
Variable of class GWASpoly.K
Set parameters
set.params( fixed = NULL, fixed.type = NULL, n.PC = 0, MAF = NULL, geno.freq = NULL, P3D = TRUE )
set.params( fixed = NULL, fixed.type = NULL, n.PC = 0, MAF = NULL, geno.freq = NULL, P3D = TRUE )
fixed |
Vector of names of fixed effects |
fixed.type |
Vector of effect types ("numeric" or "factor"), corresponding to the effects listed in "fixed" |
n.PC |
Number of principal components to include as covariates |
MAF |
Minimum minor allele frequency |
geno.freq |
Maximum genotype frequency (after applying dominance relations) |
P3D |
TRUE/FALSE whether to use the P3D approximation (variance components not re-estimated for every marker) |
The list returned by the function should be passed to GWASpoly
function.
A list with the following components
fixed |
Names of fixed effects |
fixed.type |
Types of fixed effects |
n.PC |
Number of principal components to include as covariates |
min.MAF |
Minimum minor allele frequency |
max.geno.freq |
Maximum genotype frequency (after applying dominance relations) |
P3D |
TRUE/FALSE whether to use the P3D approximation |
Set the significance threshold
set.threshold( data, method = "M.eff", level = 0.05, n.permute = 1000, n.core = 1 )
set.threshold( data, method = "M.eff", level = 0.05, n.permute = 1000, n.core = 1 )
data |
Variable of class |
method |
One of the following: "M.eff","Bonferroni","FDR","permute" |
level |
Genome-wide false positive or false discovery rate (depending on |
n.permute |
Number of permutations for method "permute" |
n.core |
Number of cores to use for multicore processing |
The default method, "M.eff", is a Bonferroni-type correction but using an effective number of markers that accounts for LD between markers (Moskvina and Schmidt, 2008). The FDR method is based on version 1.30.0 of the qvalue package.
Variable of class GWASpoly.thresh
Moskvina V, Schmidt KM (2008) On multiple-testing correction in genome-wide association studies. Genetic Epidemiology 32:567-573. doi:10.1002/gepi.20331
Convert VCF to dosage file
VCF2dosage( VCF.file, dosage.file, geno.code, ploidy, samples = NULL, min.DP = 1, max.missing, min.minor = 5 )
VCF2dosage( VCF.file, dosage.file, geno.code, ploidy, samples = NULL, min.DP = 1, max.missing, min.minor = 5 )
VCF.file |
VCF filename (can be gzipped) |
dosage.file |
CSV filename to output with allele dosage |
geno.code |
genotype code in the FORMAT field: "GT" of "DS" |
ploidy |
ploidy |
samples |
optional vector of sample names, to export subset of the population |
min.DP |
minimum per sample depth (DP) to export genotype. Default is 1, for no filtering. |
max.missing |
threshold for missing data per marker, as a proportion. |
min.minor |
minimum number of samples with the minor allele. Default is 5. |
Only bi-allelic variants supported. The "GT" option for geno.code
is the posterior maximum genotype (e.g., 0/0/1/1). "DS" represents the posterior mean dosage of the alternate allele. VCF file must conform to 4.1 or later.
Write results to file
write.GWASpoly(data, trait, filename, what = "scores", delim = ",")
write.GWASpoly(data, trait, filename, what = "scores", delim = ",")
data |
Variable of class |
trait |
Trait name |
filename |
Filename |
what |
Either "scores" or "effects" |
delim |
Delimiter to use in the output file (default is comma) |
Score = -log10(p). Effect = marker effect (not available for the general and diplo-general models).