Title: | Genetic Linkage Maps in Autopolyploids |
---|---|
Description: | Construction of genetic maps in autopolyploid full-sib populations. Uses pairwise recombination fraction estimation as the first source of information to sequentially position allelic variants in specific homologous chromosomes. For situations where pairwise analysis has limited power, the algorithm relies on the multilocus likelihood obtained through a hidden Markov model (HMM). For more detail, please see Mollinari and Garcia (2019) <doi:10.1534/g3.119.400378> and Mollinari et al. (2020) <doi:10.1534/g3.119.400620>. |
Authors: | Marcelo Mollinari [aut, cre] , Gabriel Gesteira [aut] , Cristiane Taniguti [aut] , Jeekin Lau [aut] , Oscar Riera-Lizarazu [ctb] , Guilhereme Pereira [ctb] , Augusto Garcia [ctb] , Zhao-Bang Zeng [ctb] , Katharine Preedy [ctb, cph] (MDS ordering algorithm), Robert Gentleman [cph] (C code for MLE optimization in src/pairwise_estimation.cpp), Ross Ihaka [cph] (C code for MLE optimization in src/pairwise_estimation.cpp), R Foundation [cph] (C code for MLE optimization in src/pairwise_estimation.cpp), R-core [cph] (C code for MLE optimization in src/pairwise_estimation.cpp) |
Maintainer: | Marcelo Mollinari <[email protected]> |
License: | GPL-3 |
Version: | 0.4.1 |
Built: | 2024-12-02 05:31:06 UTC |
Source: | https://github.com/mmollina/MAPpoly |
Creates a new map by adding a marker in a given position in a pre-built map.
add_marker( input.map, mrk, pos, rf.matrix, genoprob = NULL, phase.config = "best", tol = 0.001, extend.tail = NULL, r.test = NULL, verbose = TRUE )
add_marker( input.map, mrk, pos, rf.matrix, genoprob = NULL, phase.config = "best", tol = 0.001, extend.tail = NULL, r.test = NULL, verbose = TRUE )
input.map |
an object of class |
mrk |
the name of the marker to be inserted |
pos |
the name of the marker after which the new marker should be added.
One also can inform the numeric position (between markers) were the
new marker should be added. To insert a marker at the beginning of a
map, use |
rf.matrix |
an object of class |
genoprob |
an object of class |
phase.config |
which phase configuration should be used. "best" (default) will choose the maximum likelihood configuration |
tol |
the desired accuracy (default = 10e-04) |
extend.tail |
the length of the chain's tail that should
be used to calculate the likelihood of the map. If |
r.test |
for internal use only |
verbose |
if |
add_marker
splits the input map into two sub-maps to the left and the
right of the given position. Using the genotype probabilities, it computes
the log-likelihood of all possible linkage phases under a two-point threshold
inherited from function rf_list_to_matrix
.
A list of class mappoly.map
with two elements:
i) info: a list containing information about the map, regardless of the linkage phase configuration:
ploidy |
the ploidy level |
n.mrk |
number of markers |
seq.num |
a vector containing the (ordered) indices of markers in the map, according to the input file |
mrk.names |
the names of markers in the map |
seq.dose.p1 |
a vector containing the dosage in parent 1 for all markers in the map |
seq.dose.p2 |
a vector containing the dosage in parent 2 for all markers in the map |
chrom |
a vector indicating the sequence (usually chromosome) each marker belongs
as informed in the input file. If not available,
|
genome.pos |
physical position (usually in megabase) of the markers into the sequence |
seq.ref |
reference base used for each marker (i.e. A, T, C, G). If not available,
|
seq.alt |
alternative base used for each marker (i.e. A, T, C, G). If not available,
|
chisq.pval |
a vector containing p-values of the chi-squared test of Mendelian segregation for all markers in the map |
data.name |
name of the dataset of class |
ph.thres |
the LOD threshold used to define the linkage phase configurations to test |
ii) a list of maps with possible linkage phase configuration. Each map in the list is also a list containing
seq.num |
a vector containing the (ordered) indices of markers in the map, according to the input file |
seq.rf |
a vector of size ( |
seq.ph |
linkage phase configuration for all markers in both parents |
loglike |
the hmm-based multipoint likelihood |
Marcelo Mollinari, [email protected]
sub.map <- get_submap(maps.hexafake[[1]], 1:20, reestimate.rf = FALSE) plot(sub.map, mrk.names = TRUE) s <- make_seq_mappoly(hexafake, sub.map$info$mrk.names) tpt <- est_pairwise_rf(s) rf.matrix <- rf_list_to_matrix(input.twopt = tpt, thresh.LOD.ph = 3, thresh.LOD.rf = 3, shared.alleles = TRUE) ###### Removing marker "M_1" (first) ####### mrk.to.remove <- "M_1" input.map <- drop_marker(sub.map, mrk.to.remove) plot(input.map, mrk.names = TRUE) ## Computing conditional probabilities using the resulting map genoprob <- calc_genoprob(input.map) res.add.M_1 <- add_marker(input.map = input.map, mrk = "M_1", pos = 0, rf.matrix = rf.matrix, genoprob = genoprob, tol = 10e-4) plot(res.add.M_1, mrk.names = TRUE) best.phase <- res.add.M_1$maps[[1]]$seq.ph names.id <- names(best.phase$P) plot_compare_haplotypes(ploidy = 6, hom.allele.p1 = best.phase$P[names.id], hom.allele.q1 = best.phase$Q[names.id], hom.allele.p2 = sub.map$maps[[1]]$seq.ph$P[names.id], hom.allele.q2 = sub.map$maps[[1]]$seq.ph$Q[names.id]) ###### Removing marker "M_10" (middle or last) ####### mrk.to.remove <- "M_10" input.map <- drop_marker(sub.map, mrk.to.remove) plot(input.map, mrk.names = TRUE) # Computing conditional probabilities using the resulting map genoprob <- calc_genoprob(input.map) res.add.M_10 <- add_marker(input.map = input.map, mrk = "M_10", pos = "M_9", rf.matrix = rf.matrix, genoprob = genoprob, tol = 10e-4) plot(res.add.M_10, mrk.names = TRUE) best.phase <- res.add.M_10$maps[[1]]$seq.ph names.id <- names(best.phase$P) plot_compare_haplotypes(ploidy = 6, hom.allele.p1 = best.phase$P[names.id], hom.allele.q1 = best.phase$Q[names.id], hom.allele.p2 = sub.map$maps[[1]]$seq.ph$P[names.id], hom.allele.q2 = sub.map$maps[[1]]$seq.ph$Q[names.id])
sub.map <- get_submap(maps.hexafake[[1]], 1:20, reestimate.rf = FALSE) plot(sub.map, mrk.names = TRUE) s <- make_seq_mappoly(hexafake, sub.map$info$mrk.names) tpt <- est_pairwise_rf(s) rf.matrix <- rf_list_to_matrix(input.twopt = tpt, thresh.LOD.ph = 3, thresh.LOD.rf = 3, shared.alleles = TRUE) ###### Removing marker "M_1" (first) ####### mrk.to.remove <- "M_1" input.map <- drop_marker(sub.map, mrk.to.remove) plot(input.map, mrk.names = TRUE) ## Computing conditional probabilities using the resulting map genoprob <- calc_genoprob(input.map) res.add.M_1 <- add_marker(input.map = input.map, mrk = "M_1", pos = 0, rf.matrix = rf.matrix, genoprob = genoprob, tol = 10e-4) plot(res.add.M_1, mrk.names = TRUE) best.phase <- res.add.M_1$maps[[1]]$seq.ph names.id <- names(best.phase$P) plot_compare_haplotypes(ploidy = 6, hom.allele.p1 = best.phase$P[names.id], hom.allele.q1 = best.phase$Q[names.id], hom.allele.p2 = sub.map$maps[[1]]$seq.ph$P[names.id], hom.allele.q2 = sub.map$maps[[1]]$seq.ph$Q[names.id]) ###### Removing marker "M_10" (middle or last) ####### mrk.to.remove <- "M_10" input.map <- drop_marker(sub.map, mrk.to.remove) plot(input.map, mrk.names = TRUE) # Computing conditional probabilities using the resulting map genoprob <- calc_genoprob(input.map) res.add.M_10 <- add_marker(input.map = input.map, mrk = "M_10", pos = "M_9", rf.matrix = rf.matrix, genoprob = genoprob, tol = 10e-4) plot(res.add.M_10, mrk.names = TRUE) best.phase <- res.add.M_10$maps[[1]]$seq.ph names.id <- names(best.phase$P) plot_compare_haplotypes(ploidy = 6, hom.allele.p1 = best.phase$P[names.id], hom.allele.q1 = best.phase$Q[names.id], hom.allele.p2 = sub.map$maps[[1]]$seq.ph$P[names.id], hom.allele.q2 = sub.map$maps[[1]]$seq.ph$Q[names.id])
Returns the frequency of each genotype for two-point reduction of dimensionality. The frequency is calculated for all pairwise combinations and for all possible linkage phase configurations.
cache_counts_twopt( input.seq, cached = FALSE, cache.prev = NULL, ncpus = 1L, verbose = TRUE, joint.prob = FALSE )
cache_counts_twopt( input.seq, cached = FALSE, cache.prev = NULL, ncpus = 1L, verbose = TRUE, joint.prob = FALSE )
input.seq |
an object of class |
cached |
If |
cache.prev |
an object of class |
ncpus |
Number of parallel processes to spawn (default = 1) |
verbose |
If |
joint.prob |
If |
An object of class cache.info
which contains one (conditional probabilities)
or two (both conditional and joint probabilities) lists. Each list
contains all pairs of dosages between parents for all markers
in the sequence. The names in each list are of the form 'A-B-C-D', where: A
represents the dosage in parent 1, marker k; B represents the dosage in parent
1, marker k+1; C represents the dosage in parent 2, marker k;
and D represents the dosage in parent 2, marker k+1. For each
list, the frequencies were computed for all possible linkage
phase configurations. The frequencies for each linkage phase
configuration are distributed in matrices whose names
represents the number of homologous chromosomes that share
alleles. The rows on these matrices represents the dosages in markers k
and k+1 for an individual in the offspring. See Table 3 of
S3 Appendix in Mollinari and Garcia (2019) for an example.
Marcelo Mollinari, [email protected] with updates by Gabriel Gesteira, [email protected]
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
all.mrk <- make_seq_mappoly(tetra.solcap, 1:20) ## local computation counts <- cache_counts_twopt(all.mrk, ncpus = 1) ## load from internal file or web-stored counts (especially important for high ploidy levels) counts.cached <- cache_counts_twopt(all.mrk, cached = TRUE)
all.mrk <- make_seq_mappoly(tetra.solcap, 1:20) ## local computation counts <- cache_counts_twopt(all.mrk, ncpus = 1) ## load from internal file or web-stored counts (especially important for high ploidy levels) counts.cached <- cache_counts_twopt(all.mrk, cached = TRUE)
Conditional genotype probabilities are calculated for each marker position and each individual given a map.
calc_genoprob(input.map, step = 0, phase.config = "best", verbose = TRUE)
calc_genoprob(input.map, step = 0, phase.config = "best", verbose = TRUE)
input.map |
An object of class |
step |
Maximum distance (in cM) between positions at which the genotype probabilities are calculated, though for step = 0, probabilities are calculated only at the marker locations. |
phase.config |
which phase configuration should be used. "best" (default) will choose the phase configuration associated with the maximum likelihood |
verbose |
if |
An object of class 'mappoly.genoprob' which has two elements: a tridimensional array containing the probabilities of all possible genotypes for each individual in each marker position; and the marker sequence with it's recombination frequencies
Marcelo Mollinari, [email protected]
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
## tetraploid example probs.t <- calc_genoprob(input.map = solcap.dose.map[[1]], verbose = TRUE) probs.t ## displaying individual 1, 36 genotypic states ## (rows) across linkage group 1 (columns) image(t(probs.t$probs[,,1]))
## tetraploid example probs.t <- calc_genoprob(input.map = solcap.dose.map[[1]], verbose = TRUE) probs.t ## displaying individual 1, 36 genotypic states ## (rows) across linkage group 1 (columns) image(t(probs.t$probs[,,1]))
Conditional genotype probabilities are calculated for each marker position and each individual given a map. In this function, the probabilities are not calculated between markers.
calc_genoprob_dist( input.map, dat.prob = NULL, phase.config = "best", verbose = TRUE )
calc_genoprob_dist( input.map, dat.prob = NULL, phase.config = "best", verbose = TRUE )
input.map |
An object of class |
dat.prob |
an object of class |
phase.config |
which phase configuration should be used. "best" (default) will choose the phase configuration with the maximum likelihood |
verbose |
if |
An object of class 'mappoly.genoprob' which has two elements: a tridimensional array containing the probabilities of all possible genotypes for each individual in each marker position; and the marker sequence with it's recombination frequencies
Marcelo Mollinari, [email protected]
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
## tetraploid example probs.t <- calc_genoprob_dist(input.map = solcap.prior.map[[1]], dat.prob = tetra.solcap.geno.dist, verbose = TRUE) probs.t ## displaying individual 1, 36 genotypic states ## (rows) across linkage group 1 (columns) image(t(probs.t$probs[,,1]))
## tetraploid example probs.t <- calc_genoprob_dist(input.map = solcap.prior.map[[1]], dat.prob = tetra.solcap.geno.dist, verbose = TRUE) probs.t ## displaying individual 1, 36 genotypic states ## (rows) across linkage group 1 (columns) image(t(probs.t$probs[,,1]))
Conditional genotype probabilities are calculated for each marker position and each individual given a map.
calc_genoprob_error( input.map, step = 0, phase.config = "best", error = 0.01, th.prob = 0.95, restricted = TRUE, verbose = TRUE )
calc_genoprob_error( input.map, step = 0, phase.config = "best", error = 0.01, th.prob = 0.95, restricted = TRUE, verbose = TRUE )
input.map |
An object of class |
step |
Maximum distance (in cM) between positions at which the genotype probabilities are calculated, though for step = 0, probabilities are calculated only at the marker locations. |
phase.config |
which phase configuration should be used. "best" (default) will choose the maximum likelihood configuration |
error |
the assumed global error rate (default = 0.01) |
th.prob |
the threshold for using global error or genotype probability distribution contained in the dataset (default = 0.95) |
restricted |
if |
verbose |
if |
An object of class 'mappoly.genoprob' which has two elements: a tridimensional array containing the probabilities of all possible genotypes for each individual in each marker position; and the marker sequence with it's recombination frequencies
Marcelo Mollinari, [email protected]
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
probs.error <- calc_genoprob_error(input.map = solcap.err.map[[1]], error = 0.05, verbose = TRUE)
probs.error <- calc_genoprob_error(input.map = solcap.err.map[[1]], error = 0.05, verbose = TRUE)
Conditional genotype probabilities are calculated for each marker position and each individual given a map
calc_genoprob_single_parent( input.map, step = 0, info.parent = 1, uninfo.parent = 2, global.err = 0, phase.config = "best", verbose = TRUE )
calc_genoprob_single_parent( input.map, step = 0, info.parent = 1, uninfo.parent = 2, global.err = 0, phase.config = "best", verbose = TRUE )
input.map |
An object of class |
step |
Maximum distance (in cM) between positions at which the genotype probabilities are calculated, though for step = 0, probabilities are calculated only at the marker locations. |
info.parent |
index for informative parent |
uninfo.parent |
index for uninformative parent |
global.err |
the assumed global error rate (default = 0.0) |
phase.config |
which phase configuration should be used. "best" (default) will choose the phase configuration associated with the maximum likelihood |
verbose |
if |
An object of class 'mappoly.genoprob' which has two elements: a tridimensional array containing the probabilities of all possible genotypes for each individual in each marker position; and the marker sequence with it's recombination frequencies
Marcelo Mollinari, [email protected]
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
## tetraploid example s <- make_seq_mappoly(tetra.solcap, 'seq12', info.parent = "p1") tpt <- est_pairwise_rf(s) map <- est_rf_hmm_sequential(input.seq = s, twopt = tpt, start.set = 10, thres.twopt = 10, thres.hmm = 10, extend.tail = 4, info.tail = TRUE, sub.map.size.diff.limit = 8, phase.number.limit = 4, reestimate.single.ph.configuration = TRUE, tol = 10e-2, tol.final = 10e-3) plot(map) probs <- calc_genoprob_single_parent(input.map = map, info.parent = 1, uninfo.parent = 2, step = 1) probs ## displaying individual 1, 6 genotypic states ## (rows) across linkage group 1 (columns) image(t(probs$probs[,,2]))
## tetraploid example s <- make_seq_mappoly(tetra.solcap, 'seq12', info.parent = "p1") tpt <- est_pairwise_rf(s) map <- est_rf_hmm_sequential(input.seq = s, twopt = tpt, start.set = 10, thres.twopt = 10, thres.hmm = 10, extend.tail = 4, info.tail = TRUE, sub.map.size.diff.limit = 8, phase.number.limit = 4, reestimate.single.ph.configuration = TRUE, tol = 10e-2, tol.final = 10e-3) plot(map) probs <- calc_genoprob_single_parent(input.map = map, info.parent = 1, uninfo.parent = 2, step = 1) probs ## displaying individual 1, 6 genotypic states ## (rows) across linkage group 1 (columns) image(t(probs$probs[,,2]))
Compute homolog probabilities for all individuals in the full-sib population given a map and conditional genotype probabilities.
calc_homologprob(input.genoprobs, verbose = TRUE)
calc_homologprob(input.genoprobs, verbose = TRUE)
input.genoprobs |
an object of class |
verbose |
if |
Marcelo Mollinari, [email protected]
Mollinari M., Olukolu B. A., Pereira G. da S., Khan A., Gemenet D., Yencho G. C., Zeng Z-B. (2020), Unraveling the Hexaploid Sweetpotato Inheritance Using Ultra-Dense Multilocus Mapping, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400620
## tetraploid example w1 <- calc_genoprob(solcap.dose.map[[1]]) h.prob <- calc_homologprob(w1) print(h.prob) plot(h.prob, ind = 5, use.plotly = FALSE) ## using error modeling (removing noise) w2 <- calc_genoprob_error(solcap.err.map[[1]]) h.prob2 <- calc_homologprob(w2) print(h.prob2) plot(h.prob2, ind = 5, use.plotly = FALSE)
## tetraploid example w1 <- calc_genoprob(solcap.dose.map[[1]]) h.prob <- calc_homologprob(w1) print(h.prob) plot(h.prob, ind = 5, use.plotly = FALSE) ## using error modeling (removing noise) w2 <- calc_genoprob_error(solcap.err.map[[1]]) h.prob2 <- calc_homologprob(w2) print(h.prob2) plot(h.prob2, ind = 5, use.plotly = FALSE)
Given the genotype conditional probabilities for a map, this function computes the probability profiles for all possible homolog pairing configurations in both parents.
calc_prefpair_profiles(input.genoprobs, verbose = TRUE)
calc_prefpair_profiles(input.genoprobs, verbose = TRUE)
input.genoprobs |
an object of class |
verbose |
if |
Marcelo Mollinari, [email protected] and Guilherme Pereira, [email protected]
Mollinari M., Olukolu B. A., Pereira G. da S., Khan A., Gemenet D., Yencho G. C., Zeng Z-B. (2020), Unraveling the Hexaploid Sweetpotato Inheritance Using Ultra-Dense Multilocus Mapping, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400620
## tetraploid example w1 <- lapply(solcap.dose.map[1:12], calc_genoprob) x1 <- calc_prefpair_profiles(w1) print(x1) plot(x1)
## tetraploid example w1 <- lapply(solcap.dose.map[1:12], calc_genoprob) x1 <- calc_prefpair_profiles(w1) print(x1) plot(x1)
Checks the consistency of a dataset
check_data_sanity(x)
check_data_sanity(x)
x |
an object of class |
if consistent, returns 0. If not consistent, returns a
vector with a number of tests, where TRUE
indicates
a failed test.
Marcelo Mollinari, [email protected]
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
check_data_sanity(tetra.solcap)
check_data_sanity(tetra.solcap)
Compare lengths, density, maximum gaps and log likelihoods in a list of maps. In order to make the maps comparable, the function uses the intersection of markers among maps.
compare_maps(...)
compare_maps(...)
... |
a list of objects of class |
A data frame where the lines correspond to the maps in the order provided in input list list
Simulate an autopolyploid full-sib population with one or two informative parents under random chromosome segregation.
cross_simulate( parental.phases, map.length, n.ind, draw = FALSE, file = "output.pdf", prefix = NULL, seed = NULL, width = 12, height = 6, prob.P = NULL, prob.Q = NULL )
cross_simulate( parental.phases, map.length, n.ind, draw = FALSE, file = "output.pdf", prefix = NULL, seed = NULL, width = 12, height = 6, prob.P = NULL, prob.Q = NULL )
parental.phases |
a list containing the linkage phase information for both parents |
map.length |
the map length |
n.ind |
number of individuals in the offspring |
draw |
if |
file |
name of the output file. It is ignored if
|
prefix |
prefix used in all marker names. |
seed |
random number generator seed (default = NULL) |
width |
the width of the graphics region in inches (default = 12) |
height |
the height of the graphics region in inches (default = 6) |
prob.P |
a vector indicating the proportion of preferential pairing in parent P (currently ignored) |
prob.Q |
a vector indicating the proportion of preferential pairing in parent Q (currently ignored) |
parental.phases.p
and parental.phases.q
are lists of vectors
containing linkage phase configurations. Each vector contains the
numbers of the homologous chromosomes in which the alleles are
located. For instance, a vector containing means that
the marker has three doses located in the chromosomes 1, 3 and 4. For
zero doses, use 0.
For more sophisticated simulations, we strongly recommend using PedigreeSim V2.0
https://github.com/PBR/pedigreeSim
an object of class mappoly.data
. See
read_geno
for more information
Marcelo Mollinari, [email protected]
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
h.temp <- sim_homologous(ploidy = 6, n.mrk = 20) fake.poly.dat <- cross_simulate(h.temp, map.length = 100, n.ind = 200) plot(fake.poly.dat)
h.temp <- sim_homologous(ploidy = 6, n.mrk = 20) fake.poly.dat <- cross_simulate(h.temp, map.length = 100, n.ind = 200) plot(fake.poly.dat)
Detects which parent is informative
detect_info_par(x)
detect_info_par(x)
x |
an object of class |
This function creates a new map by removing markers from an existing one.
drop_marker(input.map, mrk, verbose = TRUE)
drop_marker(input.map, mrk, verbose = TRUE)
input.map |
an object of class |
mrk |
a vector containing markers to be removed from the input map, identified by their names or positions |
verbose |
if |
an object of class mappoly.map
Marcelo Mollinari, [email protected]
sub.map <- get_submap(maps.hexafake[[1]], 1:50, reestimate.rf = FALSE) plot(sub.map, mrk.names = TRUE) mrk.to.remove <- c("M_1", "M_23", "M_34") red.map <- drop_marker(sub.map, mrk.to.remove) plot(red.map, mrk.names = TRUE)
sub.map <- get_submap(maps.hexafake[[1]], 1:50, reestimate.rf = FALSE) plot(sub.map, mrk.names = TRUE) mrk.to.remove <- c("M_1", "M_23", "M_34") red.map <- drop_marker(sub.map, mrk.to.remove) plot(red.map, mrk.names = TRUE)
Edit sequence ordered by reference genome positions comparing to another set order
edit_order(input.seq, invert = NULL, remove = NULL)
edit_order(input.seq, invert = NULL, remove = NULL)
input.seq |
object of class mappoly.sequence with alternative order (not genomic order) |
invert |
vector of marker names to be inverted |
remove |
vector of marker names to be removed |
object of class mappoly.edit.order
: a list containing
vector of marker names ordered according to editions ('edited_order');
vector of removed markers names ('removed');
vector of inverted markers names ('inverted').
Cristiane Taniguti, [email protected]
dat <- filter_segregation(tetra.solcap, inter = FALSE) seq_dat <- make_seq_mappoly(dat) seq_chr <- make_seq_mappoly(seq_dat, arg = seq_dat$seq.mrk.names[which(seq_dat$chrom=="1")]) tpt <- est_pairwise_rf(seq_chr) seq.filt <- rf_snp_filter(tpt, probs = c(0.05, 0.95)) mat <- rf_list_to_matrix(tpt) mat2 <- make_mat_mappoly(mat, seq.filt) seq_test_mds <- mds_mappoly(mat2) seq_mds <- make_seq_mappoly(seq_test_mds) edit_seq <- edit_order(input.seq = seq_mds)
dat <- filter_segregation(tetra.solcap, inter = FALSE) seq_dat <- make_seq_mappoly(dat) seq_chr <- make_seq_mappoly(seq_dat, arg = seq_dat$seq.mrk.names[which(seq_dat$chrom=="1")]) tpt <- est_pairwise_rf(seq_chr) seq.filt <- rf_snp_filter(tpt, probs = c(0.05, 0.95)) mat <- rf_list_to_matrix(tpt) mat2 <- make_mat_mappoly(mat, seq.filt) seq_test_mds <- mds_mappoly(mat2) seq_mds <- make_seq_mappoly(seq_test_mds) edit_seq <- edit_order(input.seq = seq_mds)
Eliminate markers with identical dosage information for all individuals.
elim_redundant(input.seq, data = NULL)
elim_redundant(input.seq, data = NULL)
input.seq |
an object of class |
data |
name of the dataset that contains sequence markers (optional, default = NULL) |
An object of class mappoly.unique.seq
which
is a list containing the following components:
unique.seq |
an object of class |
kept |
a vector containing the name of the informative markers |
eliminated |
a vector containing the name of the non-informative (eliminated) markers |
Marcelo Mollinari, [email protected], with minor modifications by Gabriel Gesteira, [email protected]
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
all.mrk <- make_seq_mappoly(hexafake, 'all') red.mrk <- elim_redundant(all.mrk) plot(red.mrk) unique.mrks <- make_seq_mappoly(red.mrk)
all.mrk <- make_seq_mappoly(hexafake, 'all') red.mrk <- elim_redundant(all.mrk) plot(red.mrk) unique.mrks <- make_seq_mappoly(red.mrk)
This function considers a global error when re-estimating a genetic map using Hidden Markov models. Since this function uses the whole transition space in the HMM, its computation can take a while, especially for hexaploid maps.
est_full_hmm_with_global_error( input.map, error = NULL, tol = 0.001, restricted = TRUE, th.prob = 0.95, verbose = FALSE )
est_full_hmm_with_global_error( input.map, error = NULL, tol = 0.001, restricted = TRUE, th.prob = 0.95, verbose = FALSE )
input.map |
an object of class |
error |
the assumed global error rate (default = NULL) |
tol |
the desired accuracy (default = 10e-04) |
restricted |
if |
th.prob |
the threshold for using global error or genotype probability distribution if present in the dataset (default = 0.95) |
verbose |
if |
A list of class mappoly.map
with two elements:
i) info: a list containing information about the map, regardless of the linkage phase configuration:
ploidy |
the ploidy level |
n.mrk |
number of markers |
seq.num |
a vector containing the (ordered) indices of markers in the map, according to the input file |
mrk.names |
the names of markers in the map |
seq.dose.p1 |
a vector containing the dosage in parent 1 for all markers in the map |
seq.dose.p2 |
a vector containing the dosage in parent 2 for all markers in the map |
chrom |
a vector indicating the sequence (usually chromosome) each marker belongs
as informed in the input file. If not available,
|
genome.pos |
physical position (usually in megabase) of the markers into the sequence |
seq.ref |
reference base used for each marker (i.e. A, T, C, G). If not available,
|
seq.alt |
alternative base used for each marker (i.e. A, T, C, G). If not available,
|
chisq.pval |
a vector containing p-values of the chi-squared test of Mendelian segregation for all markers in the map |
data.name |
name of the dataset of class |
ph.thres |
the LOD threshold used to define the linkage phase configurations to test |
ii) a list of maps with possible linkage phase configuration. Each map in the list is also a list containing
seq.num |
a vector containing the (ordered) indices of markers in the map, according to the input file |
seq.rf |
a vector of size ( |
seq.ph |
linkage phase configuration for all markers in both parents |
loglike |
the hmm-based multipoint likelihood |
Marcelo Mollinari, [email protected]
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
submap <- get_submap(solcap.dose.map[[1]], mrk.pos = 1:20, verbose = FALSE) err.submap <- est_full_hmm_with_global_error(submap, error = 0.01, tol = 10e-4, verbose = TRUE) err.submap plot_map_list(list(dose = submap, err = err.submap), title = "estimation procedure")
submap <- get_submap(solcap.dose.map[[1]], mrk.pos = 1:20, verbose = FALSE) err.submap <- est_full_hmm_with_global_error(submap, error = 0.01, tol = 10e-4, verbose = TRUE) err.submap plot_map_list(list(dose = submap, err = err.submap), title = "estimation procedure")
This function considers dosage prior distribution when re-estimating a genetic map using Hidden Markov models
est_full_hmm_with_prior_prob( input.map, dat.prob = NULL, phase.config = "best", tol = 0.001, verbose = FALSE )
est_full_hmm_with_prior_prob( input.map, dat.prob = NULL, phase.config = "best", tol = 0.001, verbose = FALSE )
input.map |
an object of class |
dat.prob |
an object of class |
phase.config |
which phase configuration should be used. "best" (default) will choose the maximum likelihood configuration |
tol |
the desired accuracy (default = 10e-04) |
verbose |
if |
A list of class mappoly.map
with two elements:
i) info: a list containing information about the map, regardless of the linkage phase configuration:
ploidy |
the ploidy level |
n.mrk |
number of markers |
seq.num |
a vector containing the (ordered) indices of markers in the map, according to the input file |
mrk.names |
the names of markers in the map |
seq.dose.p1 |
a vector containing the dosage in parent 1 for all markers in the map |
seq.dose.p2 |
a vector containing the dosage in parent 2 for all markers in the map |
chrom |
a vector indicating the sequence (usually chromosome) each marker belongs
as informed in the input file. If not available,
|
genome.pos |
physical position (usually in megabase) of the markers into the sequence |
seq.ref |
reference base used for each marker (i.e. A, T, C, G). If not available,
|
seq.alt |
alternative base used for each marker (i.e. A, T, C, G). If not available,
|
chisq.pval |
a vector containing p-values of the chi-squared test of Mendelian segregation for all markers in the map |
data.name |
name of the dataset of class |
ph.thres |
the LOD threshold used to define the linkage phase configurations to test |
ii) a list of maps with possible linkage phase configuration. Each map in the list is also a list containing
seq.num |
a vector containing the (ordered) indices of markers in the map, according to the input file |
seq.rf |
a vector of size ( |
seq.ph |
linkage phase configuration for all markers in both parents |
loglike |
the hmm-based multipoint likelihood |
Marcelo Mollinari, [email protected]
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
submap <- get_submap(solcap.dose.map[[1]], mrk.pos = 1:20, verbose = FALSE) prob.submap <- est_full_hmm_with_prior_prob(submap, dat.prob = tetra.solcap.geno.dist, tol = 10e-4, verbose = TRUE) prob.submap plot_map_list(list(dose = submap, prob = prob.submap), title = "estimation procedure")
submap <- get_submap(solcap.dose.map[[1]], mrk.pos = 1:20, verbose = FALSE) prob.submap <- est_full_hmm_with_prior_prob(submap, dat.prob = tetra.solcap.geno.dist, tol = 10e-4, verbose = TRUE) prob.submap plot_map_list(list(dose = submap, prob = prob.submap), title = "estimation procedure")
Performs the two-point pairwise analysis between all markers in a sequence. For each pair, the function estimates the recombination fraction for all possible linkage phase configurations and associated LOD Scores.
est_pairwise_rf( input.seq, count.cache = NULL, count.matrix = NULL, ncpus = 1L, mrk.pairs = NULL, n.batches = 1L, est.type = c("disc", "prob"), verbose = TRUE, memory.warning = TRUE, parallelization.type = c("PSOCK", "FORK"), tol = .Machine$double.eps^0.25, ll = FALSE )
est_pairwise_rf( input.seq, count.cache = NULL, count.matrix = NULL, ncpus = 1L, mrk.pairs = NULL, n.batches = 1L, est.type = c("disc", "prob"), verbose = TRUE, memory.warning = TRUE, parallelization.type = c("PSOCK", "FORK"), tol = .Machine$double.eps^0.25, ll = FALSE )
input.seq |
an object of class |
count.cache |
an object of class |
count.matrix |
similar to |
ncpus |
Number of parallel processes (cores) to spawn (default = 1) |
mrk.pairs |
a matrix of dimensions 2*N, containing N
pairs of markers to be analyzed. If |
n.batches |
deprecated. Not available on MAPpoly 0.3.0 or higher |
est.type |
Indicates whether to use the discrete ("disc") or the probabilistic ("prob") dosage scoring when estimating the two-point recombination fractions. |
verbose |
If |
memory.warning |
if |
parallelization.type |
one of the supported cluster types. This should be either PSOCK (default) or FORK. |
tol |
the desired accuracy. See |
ll |
will return log-likelihood instead of LOD scores. (for internal use) |
An object of class mappoly.twopt
which is a list containing the following components:
data.name
Name of the object of class mappoly.data
containing the raw data.
n.mrk
Number of markers in the sequence.
seq.num
A vector
containing the (ordered) indices of markers in the sequence, according to the input file.
pairwise
A list of size choose(length(input.seq$seq.num), 2)
, where each element is a matrix. The rows are named in the format x-y, where x and y indicate how many homologues share the same allelic variant in parents P and Q, respectively (see Mollinari and Garcia, 2019 for notation). The first column indicates the LOD Score for the most likely linkage phase configuration. The second column shows the estimated recombination fraction for each configuration, and the third column indicates the LOD Score for comparing the likelihood under no linkage (r = 0.5) with the estimated recombination fraction (evidence of linkage).
chisq.pval.thres
Threshold used to perform the segregation tests.
chisq.pval
P-values associated with the performed segregation tests.
Marcelo Mollinari, [email protected]
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
## Tetraploid example (first 50 markers) all.mrk <- make_seq_mappoly(tetra.solcap, 1:50) red.mrk <- elim_redundant(all.mrk) unique.mrks <- make_seq_mappoly(red.mrk) all.pairs <- est_pairwise_rf(input.seq = unique.mrks, ncpus = 1, verbose = TRUE) all.pairs plot(all.pairs, 20, 21) mat <- rf_list_to_matrix(all.pairs) plot(mat)
## Tetraploid example (first 50 markers) all.mrk <- make_seq_mappoly(tetra.solcap, 1:50) red.mrk <- elim_redundant(all.mrk) unique.mrks <- make_seq_mappoly(red.mrk) all.pairs <- est_pairwise_rf(input.seq = unique.mrks, ncpus = 1, verbose = TRUE) all.pairs plot(all.pairs, 20, 21) mat <- rf_list_to_matrix(all.pairs) plot(mat)
Performs the two-point pairwise analysis between all markers in a sequence. For each pair, the function estimates the recombination fraction for all possible linkage phase configurations and associated LOD Scores.
est_pairwise_rf2( input.seq, ncpus = 1L, mrk.pairs = NULL, verbose = TRUE, tol = .Machine$double.eps^0.25 )
est_pairwise_rf2( input.seq, ncpus = 1L, mrk.pairs = NULL, verbose = TRUE, tol = .Machine$double.eps^0.25 )
input.seq |
an object of class |
ncpus |
Number of parallel processes (cores) to spawn (default = 1) |
mrk.pairs |
a matrix of dimensions 2*N, containing N
pairs of markers to be analyzed. If |
verbose |
If |
tol |
the desired accuracy. See |
Differently from est_pairwise_rf this function returns only the values associated to the best linkage phase configuration.
An object of class mappoly.twopt2
Marcelo Mollinari, [email protected]
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
## Tetraploid example all.mrk <- make_seq_mappoly(tetra.solcap, 100:200) all.pairs <- est_pairwise_rf2(input.seq = all.mrk, ncpus = 2) m <- rf_list_to_matrix(all.pairs) plot(m, fact = 2)
## Tetraploid example all.mrk <- make_seq_mappoly(tetra.solcap, 100:200) all.pairs <- est_pairwise_rf2(input.seq = all.mrk, ncpus = 2) m <- rf_list_to_matrix(all.pairs) plot(m, fact = 2)
Performs the multipoint analysis proposed by Mollinari and Garcia (2019) in a sequence of markers
est_rf_hmm( input.seq, input.ph = NULL, thres = 0.5, twopt = NULL, verbose = FALSE, tol = 1e-04, est.given.0.rf = FALSE, reestimate.single.ph.configuration = TRUE, high.prec = TRUE ) ## S3 method for class 'mappoly.map' print(x, detailed = FALSE, ...) ## S3 method for class 'mappoly.map' plot( x, left.lim = 0, right.lim = Inf, phase = TRUE, mrk.names = FALSE, cex = 1, config = "best", P = "Parent 1", Q = "Parent 2", xlim = NULL, ... )
est_rf_hmm( input.seq, input.ph = NULL, thres = 0.5, twopt = NULL, verbose = FALSE, tol = 1e-04, est.given.0.rf = FALSE, reestimate.single.ph.configuration = TRUE, high.prec = TRUE ) ## S3 method for class 'mappoly.map' print(x, detailed = FALSE, ...) ## S3 method for class 'mappoly.map' plot( x, left.lim = 0, right.lim = Inf, phase = TRUE, mrk.names = FALSE, cex = 1, config = "best", P = "Parent 1", Q = "Parent 2", xlim = NULL, ... )
input.seq |
an object of class |
input.ph |
an object of class |
thres |
LOD Score threshold used to determine if the linkage phases compared via two-point analysis should be considered. Smaller values will result in smaller number of linkage phase configurations to be evaluated by the multipoint algorithm. |
twopt |
an object of class |
verbose |
if |
tol |
the desired accuracy (default = 1e-04) |
est.given.0.rf |
logical. If TRUE returns a map forcing all recombination fractions equals to 0 (1e-5, for internal use only. Default = FALSE) |
reestimate.single.ph.configuration |
logical. If |
high.prec |
logical. If |
x |
an object of the class |
detailed |
logical. if TRUE, prints the linkage phase configuration and the marker position for all maps. If FALSE (default), prints a map summary |
... |
currently ignored |
left.lim |
the left limit of the plot (in cM, default = 0). |
right.lim |
the right limit of the plot (in cM, default = Inf, i.e., will print the entire map) |
phase |
logical. If |
mrk.names |
if TRUE, marker names are displayed (default = FALSE) |
cex |
The magnification to be used for marker names |
config |
should be |
P |
a string containing the name of parent P |
Q |
a string containing the name of parent Q |
xlim |
range of the x-axis. If |
This function first enumerates a set of linkage phase configurations
based on two-point recombination fraction information using a threshold
provided by the user (argument thresh
). After that, for each
configuration, it reconstructs the genetic map using the
HMM approach described in Mollinari and Garcia (2019). As result, it returns
the multipoint likelihood for each configuration in form of LOD Score comparing
each configuration to the most likely one. It is recommended to use a small number
of markers (e.g. 50 markers for hexaploids) since the possible linkage
phase combinations bounded only by the two-point information can be huge.
Also, it can be quite sensible to small changes in 'thresh'
.
For a large number of markers, please see est_rf_hmm_sequential
.
A list of class mappoly.map
with two elements:
i) info: a list containing information about the map, regardless of the linkage phase configuration:
ploidy |
the ploidy level |
n.mrk |
number of markers |
seq.num |
a vector containing the (ordered) indices of markers in the map, according to the input file |
mrk.names |
the names of markers in the map |
seq.dose.p1 |
a vector containing the dosage in parent 1 for all markers in the map |
seq.dose.p2 |
a vector containing the dosage in parent 2 for all markers in the map |
chrom |
a vector indicating the sequence (usually chromosome) each marker belongs
as informed in the input file. If not available,
|
genome.pos |
physical position (usually in megabase) of the markers into the sequence |
seq.ref |
reference base used for each marker (i.e. A, T, C, G). If not available,
|
seq.alt |
alternative base used for each marker (i.e. A, T, C, G). If not available,
|
chisq.pval |
a vector containing p-values of the chi-squared test of Mendelian segregation for all markers in the map |
data.name |
name of the dataset of class |
ph.thres |
the LOD threshold used to define the linkage phase configurations to test |
ii) a list of maps with possible linkage phase configuration. Each map in the list is also a list containing
seq.num |
a vector containing the (ordered) indices of markers in the map, according to the input file |
seq.rf |
a vector of size ( |
seq.ph |
linkage phase configuration for all markers in both parents |
loglike |
the hmm-based multipoint likelihood |
Marcelo Mollinari, [email protected]
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. https://doi.org/10.1534/g3.119.400378
mrk.subset <- make_seq_mappoly(hexafake, 1:10) red.mrk <- elim_redundant(mrk.subset) unique.mrks <- make_seq_mappoly(red.mrk) subset.pairs <- est_pairwise_rf(input.seq = unique.mrks, ncpus = 1, verbose = TRUE) ## Estimating subset map with a low tolerance for the E.M. procedure ## for CRAN testing purposes subset.map <- est_rf_hmm(input.seq = unique.mrks, thres = 2, twopt = subset.pairs, verbose = TRUE, tol = 0.1, est.given.0.rf = FALSE) subset.map ## linkage phase configuration with highest likelihood plot(subset.map, mrk.names = TRUE, config = "best") ## the second one plot(subset.map, mrk.names = TRUE, config = 2)
mrk.subset <- make_seq_mappoly(hexafake, 1:10) red.mrk <- elim_redundant(mrk.subset) unique.mrks <- make_seq_mappoly(red.mrk) subset.pairs <- est_pairwise_rf(input.seq = unique.mrks, ncpus = 1, verbose = TRUE) ## Estimating subset map with a low tolerance for the E.M. procedure ## for CRAN testing purposes subset.map <- est_rf_hmm(input.seq = unique.mrks, thres = 2, twopt = subset.pairs, verbose = TRUE, tol = 0.1, est.given.0.rf = FALSE) subset.map ## linkage phase configuration with highest likelihood plot(subset.map, mrk.names = TRUE, config = "best") ## the second one plot(subset.map, mrk.names = TRUE, config = 2)
Performs the multipoint analysis proposed by Mollinari and Garcia (2019) in a sequence of markers removing unlikely phases using sequential multipoint information.
est_rf_hmm_sequential( input.seq, twopt, start.set = 4, thres.twopt = 5, thres.hmm = 50, extend.tail = NULL, phase.number.limit = 20, sub.map.size.diff.limit = Inf, info.tail = TRUE, reestimate.single.ph.configuration = FALSE, tol = 0.1, tol.final = 0.001, verbose = TRUE, detailed.verbose = FALSE, high.prec = FALSE )
est_rf_hmm_sequential( input.seq, twopt, start.set = 4, thres.twopt = 5, thres.hmm = 50, extend.tail = NULL, phase.number.limit = 20, sub.map.size.diff.limit = Inf, info.tail = TRUE, reestimate.single.ph.configuration = FALSE, tol = 0.1, tol.final = 0.001, verbose = TRUE, detailed.verbose = FALSE, high.prec = FALSE )
input.seq |
an object of class |
twopt |
an object of class |
start.set |
number of markers to start the phasing procedure (default = 4) |
thres.twopt |
the LOD threshold used to determine if the linkage
phases compared via two-point analysis should be considered
for the search space reduction (A.K.A. |
thres.hmm |
the LOD threshold used to determine if the linkage phases compared via hmm analysis should be evaluated in the next round of marker inclusion (default = 50) |
extend.tail |
the length of the chain's tail that should
be used to calculate the likelihood of the map. If |
phase.number.limit |
the maximum number of linkage phases of the sub-maps defined
by arguments |
sub.map.size.diff.limit |
the maximum accepted length
difference between the current and the previous sub-map defined
by arguments |
info.tail |
if |
reestimate.single.ph.configuration |
logical. If |
tol |
the desired accuracy during the sequential phase (default = 10e-02) |
tol.final |
the desired accuracy for the final map (default = 10e-04) |
verbose |
If |
detailed.verbose |
If |
high.prec |
logical. If |
This function sequentially includes markers into a map given an
ordered sequence. It uses two-point information to eliminate
unlikely linkage phase configurations given thres.twopt
. The
search is made within a window of size extend.tail
. For the
remaining configurations, the HMM-based likelihood is computed and
the ones that pass the HMM threshold (thres.hmm
) are eliminated.
A list of class mappoly.map
with two elements:
i) info: a list containing information about the map, regardless of the linkage phase configuration:
ploidy |
the ploidy level |
n.mrk |
number of markers |
seq.num |
a vector containing the (ordered) indices of markers in the map, according to the input file |
mrk.names |
the names of markers in the map |
seq.dose.p1 |
a vector containing the dosage in parent 1 for all markers in the map |
seq.dose.p2 |
a vector containing the dosage in parent 2 for all markers in the map |
chrom |
a vector indicating the sequence (usually chromosome) each marker belongs
as informed in the input file. If not available,
|
genome.pos |
physical position (usually in megabase) of the markers into the sequence |
seq.ref |
reference base used for each marker (i.e. A, T, C, G). If not available,
|
seq.alt |
alternative base used for each marker (i.e. A, T, C, G). If not available,
|
chisq.pval |
a vector containing p-values of the chi-squared test of Mendelian segregation for all markers in the map |
data.name |
name of the dataset of class |
ph.thres |
the LOD threshold used to define the linkage phase configurations to test |
ii) a list of maps with possible linkage phase configuration. Each map in the list is also a list containing
seq.num |
a vector containing the (ordered) indices of markers in the map, according to the input file |
seq.rf |
a vector of size ( |
seq.ph |
linkage phase configuration for all markers in both parents |
loglike |
the hmm-based multipoint likelihood |
Marcelo Mollinari, [email protected]
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
mrk.subset <- make_seq_mappoly(hexafake, 1:20) red.mrk <- elim_redundant(mrk.subset) unique.mrks <- make_seq_mappoly(red.mrk) subset.pairs <- est_pairwise_rf(input.seq = unique.mrks, ncpus = 1, verbose = TRUE) subset.map <- est_rf_hmm_sequential(input.seq = unique.mrks, thres.twopt = 5, thres.hmm = 10, extend.tail = 10, tol = 0.1, tol.final = 10e-3, phase.number.limit = 5, twopt = subset.pairs, verbose = TRUE) print(subset.map, detailed = TRUE) plot(subset.map) plot(subset.map, left.lim = 0, right.lim = 1, mrk.names = TRUE) plot(subset.map, phase = FALSE) ## Retrieving simulated linkage phase ph.P <- maps.hexafake[[1]]$maps[[1]]$seq.ph$P ph.Q <- maps.hexafake[[1]]$maps[[1]]$seq.ph$Q ## Estimated linkage phase ph.P.est <- subset.map$maps[[1]]$seq.ph$P ph.Q.est <- subset.map$maps[[1]]$seq.ph$Q compare_haplotypes(ploidy = 6, h1 = ph.P[names(ph.P.est)], h2 = ph.P.est) compare_haplotypes(ploidy = 6, h1 = ph.Q[names(ph.Q.est)], h2 = ph.Q.est)
mrk.subset <- make_seq_mappoly(hexafake, 1:20) red.mrk <- elim_redundant(mrk.subset) unique.mrks <- make_seq_mappoly(red.mrk) subset.pairs <- est_pairwise_rf(input.seq = unique.mrks, ncpus = 1, verbose = TRUE) subset.map <- est_rf_hmm_sequential(input.seq = unique.mrks, thres.twopt = 5, thres.hmm = 10, extend.tail = 10, tol = 0.1, tol.final = 10e-3, phase.number.limit = 5, twopt = subset.pairs, verbose = TRUE) print(subset.map, detailed = TRUE) plot(subset.map) plot(subset.map, left.lim = 0, right.lim = 1, mrk.names = TRUE) plot(subset.map, phase = FALSE) ## Retrieving simulated linkage phase ph.P <- maps.hexafake[[1]]$maps[[1]]$seq.ph$P ph.Q <- maps.hexafake[[1]]$maps[[1]]$seq.ph$Q ## Estimated linkage phase ph.P.est <- subset.map$maps[[1]]$seq.ph$P ph.Q.est <- subset.map$maps[[1]]$seq.ph$Q compare_haplotypes(ploidy = 6, h1 = ph.P[names(ph.P.est)], h2 = ph.P.est) compare_haplotypes(ploidy = 6, h1 = ph.Q[names(ph.Q.est)], h2 = ph.Q.est)
polymapR
See examples at https://rpubs.com/mmollin/tetra_mappoly_vignette.
export_data_to_polymapR(data.in)
export_data_to_polymapR(data.in)
data.in |
an object of class |
a dosage matrix
Marcelo Mollinari, [email protected]
Function to export genetic linkage map(s) generated by MAPpoly
.
The map(s) should be passed as a single object or a list of objects of class mappoly.map
.
export_map_list(map.list, file = "map_output.csv")
export_map_list(map.list, file = "map_output.csv")
map.list |
A list of objects or a single object of class |
file |
either a character string naming a file or a connection open for writing. "" indicates output to the console. |
Marcelo Mollinari, [email protected]
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
export_map_list(solcap.err.map[[1]], file = "")
export_map_list(solcap.err.map[[1]], file = "")
Compute homolog probabilities for all individuals in the full-sib population given a map and conditional genotype probabilities, and exports the results to be used for QTL mapping in the QTLpoly package.
export_qtlpoly(input.genoprobs, verbose = TRUE)
export_qtlpoly(input.genoprobs, verbose = TRUE)
input.genoprobs |
an object of class |
verbose |
if |
Marcelo Mollinari, [email protected]
Mollinari M., Olukolu B. A., Pereira G. da S., Khan A., Gemenet D., Yencho G. C., Zeng Z-B. (2020), Unraveling the Hexaploid Sweetpotato Inheritance Using Ultra-Dense Multilocus Mapping, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400620
## tetraploid example w1 <- calc_genoprob(solcap.dose.map[[1]]) h.prob <- export_qtlpoly(w1)
## tetraploid example w1 <- calc_genoprob(solcap.dose.map[[1]]) h.prob <- export_qtlpoly(w1)
Extract the maker position from an object of class 'mappoly.map'
extract_map(input.map, phase.config = "best")
extract_map(input.map, phase.config = "best")
input.map |
An object of class |
phase.config |
which phase configuration should be used. "best" (default) will choose the maximum likelihood configuration |
x <- maps.hexafake[[1]]$info$genome.pos/1e6 y <- extract_map(maps.hexafake[[1]]) plot(y~x, ylab = "Map position (cM)", xlab = "Genome Position (Mbp)")
x <- maps.hexafake[[1]]$info$genome.pos/1e6 y <- extract_map(maps.hexafake[[1]]) plot(y~x, ylab = "Map position (cM)", xlab = "Genome Position (Mbp)")
Filter aneuploid chromosomes from progeny individuals
filter_aneuploid(input.data, aneuploid.info, ploidy, rm_missing = TRUE)
filter_aneuploid(input.data, aneuploid.info, ploidy, rm_missing = TRUE)
input.data |
name of input object (class |
aneuploid.info |
data.frame with ploidy information by chromosome (columns) for each individual in progeny (rows). The chromosome and individuals names must match the ones in the file used as input in mappoly. |
ploidy |
main ploidy |
rm_missing |
remove also genotype information from chromosomes with missing data (NA) in the aneuploid.info file |
object of class mappoly.data
Cristiane Taniguti, [email protected]
aneuploid.info <- matrix(4, nrow=tetra.solcap$n.ind, ncol = 12) set.seed(8080) aneuploid.info[sample(1:length(aneuploid.info), round((4*length(aneuploid.info))/100),0)] <- 3 aneuploid.info[sample(1:length(aneuploid.info), round((4*length(aneuploid.info))/100),0)] <- 5 colnames(aneuploid.info) <- paste0(1:12) aneuploid.info <- cbind(inds = tetra.solcap$ind.names, aneuploid.info) filt.dat <- filter_aneuploid(input.data = tetra.solcap, aneuploid.info = aneuploid.info, ploidy = 4)
aneuploid.info <- matrix(4, nrow=tetra.solcap$n.ind, ncol = 12) set.seed(8080) aneuploid.info[sample(1:length(aneuploid.info), round((4*length(aneuploid.info))/100),0)] <- 3 aneuploid.info[sample(1:length(aneuploid.info), round((4*length(aneuploid.info))/100),0)] <- 5 colnames(aneuploid.info) <- paste0(1:12) aneuploid.info <- cbind(inds = tetra.solcap$ind.names, aneuploid.info) filt.dat <- filter_aneuploid(input.data = tetra.solcap, aneuploid.info = aneuploid.info, ploidy = 4)
This function removes individuals from the data set. Individuals can be user-defined or can be accessed via interactive kinship analysis.
filter_individuals( input.data, ind.to.remove = NULL, inter = TRUE, type = c("Gmat", "PCA"), verbose = TRUE )
filter_individuals( input.data, ind.to.remove = NULL, inter = TRUE, type = c("Gmat", "PCA"), verbose = TRUE )
input.data |
name of input object (class |
ind.to.remove |
individuals to be removed. If |
inter |
if |
type |
A character string specifying the procedure to be used for detecting outlier offspring. Options include "Gmat", which utilizes the genomic kinship matrix, and "PCA", which employs principal component analysis on the dosage matrix. coefficient (or covariance) is to be computed. One of "pearson" (default), "kendall", or "spearman": can be abbreviated. |
verbose |
if |
Marcelo Mollinari, [email protected]
Excludes markers or individuals based on their proportion of missing data.
filter_missing( input.data, type = c("marker", "individual"), filter.thres = 0.2, inter = TRUE )
filter_missing( input.data, type = c("marker", "individual"), filter.thres = 0.2, inter = TRUE )
input.data |
an object of class |
type |
one of the following options:
Please notice that removing individuals with certain amount of data can change some marker parameters (such as depth), and can also change the estimated genotypes for other individuals. So, be careful when removing individuals. |
filter.thres |
maximum percentage of missing data (default = 0.2). |
inter |
if |
Marcelo Mollinari, [email protected].
plot(tetra.solcap) dat.filt.mrk <- filter_missing(input.data = tetra.solcap, type = "marker", filter.thres = 0.1, inter = TRUE) plot(dat.filt.mrk)
plot(tetra.solcap) dat.filt.mrk <- filter_missing(input.data = tetra.solcap, type = "marker", filter.thres = 0.1, inter = TRUE) plot(dat.filt.mrk)
This function filter markers based on p-values of a chi-square test. The chi-square test assumes that markers follow the expected segregation patterns under Mendelian inheritance, random chromosome bivalent pairing and no double reduction.
filter_segregation(input.obj, chisq.pval.thres = NULL, inter = TRUE)
filter_segregation(input.obj, chisq.pval.thres = NULL, inter = TRUE)
input.obj |
name of input object (class |
chisq.pval.thres |
p-value threshold used for chi-square tests (default = Bonferroni aproximation with global alpha of 0.05, i.e., 0.05/n.mrk) |
inter |
if TRUE (default), plots distorted vs. non-distorted markers |
An object of class mappoly.chitest.seq
which contains a list with the following components:
keep |
markers that follow Mendelian segregation pattern |
exclude |
markers with distorted segregation |
chisq.pval.thres |
threshold p-value used for chi-square tests |
data.name |
input dataset used to perform the chi-square tests |
Marcelo Mollinari, [email protected]
mrks.chi.filt <- filter_segregation(input.obj = tetra.solcap, chisq.pval.thres = 0.05/tetra.solcap$n.mrk, inter = TRUE) seq.init <- make_seq_mappoly(mrks.chi.filt)
mrks.chi.filt <- filter_segregation(input.obj = tetra.solcap, chisq.pval.thres = 0.05/tetra.solcap$n.mrk, inter = TRUE) seq.init <- make_seq_mappoly(mrks.chi.filt)
Function to allocate markers into linkage blocks. This is an EXPERIMENTAL FUNCTION and should be used with caution.
find_blocks( input.seq, clustering.type = c("rf", "genome"), rf.limit = 1e-04, genome.block.threshold = 10000, rf.mat = NULL, ncpus = 1, ph.thres = 3, phase.number.limit = 10, error = 0.05, verbose = TRUE, tol = 0.01, tol.err = 0.001 )
find_blocks( input.seq, clustering.type = c("rf", "genome"), rf.limit = 1e-04, genome.block.threshold = 10000, rf.mat = NULL, ncpus = 1, ph.thres = 3, phase.number.limit = 10, error = 0.05, verbose = TRUE, tol = 0.01, tol.err = 0.001 )
input.seq |
an object of class |
clustering.type |
if |
rf.limit |
the maximum value to consider linked markers in
case of |
genome.block.threshold |
the threshold to assume markers are in the same linkage block.
to be considered when allocating markers into blocks in case of |
rf.mat |
an object of class |
ncpus |
Number of parallel processes to spawn |
ph.thres |
the threshold used to sequentially phase markers.
Used in |
phase.number.limit |
the maximum number of linkage phases of the sub-maps.
The default is 10. See |
error |
the assumed global genotyping error rate. If |
verbose |
if |
tol |
tolerance for the C routine, i.e., the value used to evaluate convergence. |
tol.err |
tolerance for the C routine, i.e., the value used to evaluate convergence, including the global genotyping error in the model. |
a list containing 1: a list of blocks in form of mappoly.map
objects;
2: a vector containing markers that were not included into blocks.
Marcelo Mollinari, [email protected]
## Not run: ## Selecting 50 markers in chromosome 5 s5 <- make_seq_mappoly(tetra.solcap, "seq5") s5 <- make_seq_mappoly(tetra.solcap, s5$seq.mrk.names[1:50]) tpt5 <- est_pairwise_rf(s5) m5 <- rf_list_to_matrix(tpt5, 3, 3) fb.rf <- find_blocks(s5, rf.mat = m5, verbose = FALSE, ncpus = 2) bl.rf <- fb.rf$blocks plot_map_list(bl.rf) ## Merging resulting maps map.merge <- merge_maps(bl.rf, tpt5) plot(map.merge, mrk.names = T) ## Comparing linkage phases with pre assembled map id <- na.omit(match(map.merge$info$mrk.names, solcap.err.map[[5]]$info$mrk.names)) map.orig <- get_submap(solcap.err.map[[5]], mrk.pos = id) p1.m<-map.merge$maps[[1]]$seq.ph$P p2.m<-map.merge$maps[[1]]$seq.ph$Q names(p1.m) <- names(p2.m) <- map.merge$info$mrk.names p1.o<-map.orig$maps[[1]]$seq.ph$P p2.o<-map.orig$maps[[1]]$seq.ph$Q names(p1.o) <- names(p2.o) <- map.orig$info$mrk.names n <- intersect(names(p1.m), names(p1.o)) plot_compare_haplotypes(4, p1.o[n], p2.o[n], p1.m[n], p2.m[n]) ### Using genome fb.geno <- find_blocks(s5, clustering.type = "genome", genome.block.threshold = 10^4) plot_map_list(fb.geno$blocks) splt <- lapply(fb.geno$blocks, split_mappoly, 1) plot_map_list(splt) ## End(Not run)
## Not run: ## Selecting 50 markers in chromosome 5 s5 <- make_seq_mappoly(tetra.solcap, "seq5") s5 <- make_seq_mappoly(tetra.solcap, s5$seq.mrk.names[1:50]) tpt5 <- est_pairwise_rf(s5) m5 <- rf_list_to_matrix(tpt5, 3, 3) fb.rf <- find_blocks(s5, rf.mat = m5, verbose = FALSE, ncpus = 2) bl.rf <- fb.rf$blocks plot_map_list(bl.rf) ## Merging resulting maps map.merge <- merge_maps(bl.rf, tpt5) plot(map.merge, mrk.names = T) ## Comparing linkage phases with pre assembled map id <- na.omit(match(map.merge$info$mrk.names, solcap.err.map[[5]]$info$mrk.names)) map.orig <- get_submap(solcap.err.map[[5]], mrk.pos = id) p1.m<-map.merge$maps[[1]]$seq.ph$P p2.m<-map.merge$maps[[1]]$seq.ph$Q names(p1.m) <- names(p2.m) <- map.merge$info$mrk.names p1.o<-map.orig$maps[[1]]$seq.ph$P p2.o<-map.orig$maps[[1]]$seq.ph$Q names(p1.o) <- names(p2.o) <- map.orig$info$mrk.names n <- intersect(names(p1.m), names(p1.o)) plot_compare_haplotypes(4, p1.o[n], p2.o[n], p1.m[n], p2.m[n]) ### Using genome fb.geno <- find_blocks(s5, clustering.type = "genome", genome.block.threshold = 10^4) plot_map_list(fb.geno$blocks) splt <- lapply(fb.geno$blocks, split_mappoly, 1) plot_map_list(splt) ## End(Not run)
Design linkage map framework in two steps: i) estimating the recombination fraction with HMM approach for each parent separately using only markers segregating individually (e.g. map 1 - P1:3 x P2:0, P1: 2x4; map 2 - P1:0 x P2:3, P1:4 x P2:2); ii) merging both maps and re-estimate recombination fractions.
framework_map( input.seq, twopt, start.set = 10, thres.twopt = 10, thres.hmm = 30, extend.tail = 30, inflation.lim.p1 = 5, inflation.lim.p2 = 5, phase.number.limit = 10, tol = 0.01, tol.final = 0.001, verbose = TRUE, method = "hmm" )
framework_map( input.seq, twopt, start.set = 10, thres.twopt = 10, thres.hmm = 30, extend.tail = 30, inflation.lim.p1 = 5, inflation.lim.p2 = 5, phase.number.limit = 10, tol = 0.01, tol.final = 0.001, verbose = TRUE, method = "hmm" )
input.seq |
object of class |
twopt |
object of class |
start.set |
number of markers to start the phasing procedure (default = 4) |
thres.twopt |
the LOD threshold used to determine if the linkage phases compared via two-point analysis should be considered for the search space reduction (default = 5) |
thres.hmm |
the LOD threshold used to determine if the linkage phases compared via hmm analysis should be evaluated in the next round of marker inclusion (default = 50) |
extend.tail |
the length of the chain's tail that should be used to calculate the likelihood of the map. If NULL (default), the function uses all markers positioned. Even if info.tail = TRUE, it uses at least extend.tail as the tail length |
inflation.lim.p1 |
the maximum accepted length difference between the current and the previous parent 1 sub-map defined by arguments info.tail and extend.tail. If the size exceeds this limit, the marker will not be inserted. If NULL(default), then it will insert all markers. |
inflation.lim.p2 |
same as 'inflation.lim.p1' but for parent 2 sub-map. |
phase.number.limit |
the maximum number of linkage phases of the sub-maps defined by arguments info.tail and extend.tail. Default is 20. If the size exceeds this limit, the marker will not be inserted. If Inf, then it will insert all markers. |
tol |
the desired accuracy during the sequential phase of each parental map (default = 10e-02) |
tol.final |
the desired accuracy for the final parental map (default = 10e-04) |
verbose |
If TRUE (default), current progress is shown; if FALSE, no output is produced |
method |
indicates whether to use 'hmm' (Hidden Markov Models), 'ols' (Ordinary Least Squares) to re-estimate the recombination fractions while merging the parental maps (default:hmm) |
list containing three mappoly.map
objects:1) map built with markers with segregation information from parent 1;
2) map built with markers with segregation information from parent 2; 3) maps in 1 and 2 merged
Marcelo Mollinari, [email protected] with documentation and minor modifications by Cristiane Taniguti [email protected]
These functions facilitate the conversion between recombination fractions (r) and genetic distances (d) using various mapping models. The functions starting with 'mf_' convert recombination fractions to genetic distances, while those starting with 'imf_' convert genetic distances back into recombination fractions.
mf_k(d) mf_h(d) mf_m(d) imf_k(r) imf_h(r) imf_m(r)
mf_k(d) mf_h(d) mf_m(d) imf_k(r) imf_h(r) imf_m(r)
d |
Numeric or numeric vector, representing genetic distances in centiMorgans (cM) for direct functions (mf_k, mf_h, mf_m). |
r |
Numeric or numeric vector, representing recombination fractions for inverse functions (imf_k, imf_h, imf_m). |
The 'mf_' prefixed functions apply different models to convert recombination fractions into genetic distances:
mf_k
: Kosambi mapping function.
mf_h
: Haldane mapping function.
mf_m
: Morgan mapping function.
The 'imf_' prefixed functions convert genetic distances back into recombination fractions:
imf_k
: Inverse Kosambi mapping function.
imf_h
: Inverse Haldane mapping function.
imf_m
: Inverse Morgan mapping function.
Kosambi, D.D. (1944). The estimation of map distances from recombination values. Ann Eugen., 12, 172-175. Haldane, J.B.S. (1919). The combination of linkage values, and the calculation of distances between the loci of linked factors. J Genet, 8, 299-309. Morgan, T.H. (1911). Random segregation versus coupling in Mendelian inheritance. Science, 34(873), 384.
This functions gets the genomic position of markers in a sequence and return an ordered data frame with the name and position of each marker
get_genomic_order(input.seq, verbose = TRUE) ## S3 method for class 'mappoly.geno.ord' print(x, ...) ## S3 method for class 'mappoly.geno.ord' plot(x, ...)
get_genomic_order(input.seq, verbose = TRUE) ## S3 method for class 'mappoly.geno.ord' print(x, ...) ## S3 method for class 'mappoly.geno.ord' plot(x, ...)
input.seq |
a sequence object of class |
verbose |
if |
x |
an object of the class mappoly.geno.ord |
... |
currently ignored |
Marcelo Mollinari, [email protected]
s1 <- make_seq_mappoly(tetra.solcap, "all") o1 <- get_genomic_order(s1) plot(o1) s.geno.ord <- make_seq_mappoly(o1)
s1 <- make_seq_mappoly(tetra.solcap, "all") o1 <- get_genomic_order(s1) plot(o1) s.geno.ord <- make_seq_mappoly(o1)
Given a pre-constructed map, it extracts a sub-map for a provided sequence of marker positions. Optionally, it can update the linkage phase configurations and respective recombination fractions.
get_submap( input.map, mrk.pos, phase.config = "best", reestimate.rf = TRUE, reestimate.phase = FALSE, thres.twopt = 5, thres.hmm = 3, extend.tail = 50, tol = 0.1, tol.final = 0.001, use.high.precision = FALSE, verbose = TRUE )
get_submap( input.map, mrk.pos, phase.config = "best", reestimate.rf = TRUE, reestimate.phase = FALSE, thres.twopt = 5, thres.hmm = 3, extend.tail = 50, tol = 0.1, tol.final = 0.001, use.high.precision = FALSE, verbose = TRUE )
input.map |
An object of class |
mrk.pos |
positions of the markers that should be considered in the new map. This can be in any order |
phase.config |
which phase configuration should be used. "best" (default) will choose the configuration associated with the maximum likelihood |
reestimate.rf |
logical. If |
reestimate.phase |
logical. If |
thres.twopt |
the LOD threshold used to determine if the linkage phases compared via two-point analysis should be considered (default = 5) |
thres.hmm |
the threshold used to determine if the linkage phases compared via hmm analysis should be considered (default = 3) |
extend.tail |
the length of the tail of the chain that should
be used to calculate the likelihood of the linkage phases. If
|
tol |
the desired accuracy during the sequential phase (default = 0.1) |
tol.final |
the desired accuracy for the final map (default = 10e-04) |
use.high.precision |
logical. If |
verbose |
If |
Marcelo Mollinari, [email protected]
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
## selecting the six first markers in linkage group 1 ## re-estimating the recombination fractions and linkage phases submap1.lg1 <- get_submap(input.map = maps.hexafake[[1]], mrk.pos = 1:6, verbose = TRUE, reestimate.phase = TRUE, tol.final = 10e-3) ## no recombination fraction re-estimation: first 20 markers submap2.lg1 <- get_submap(input.map = maps.hexafake[[1]], mrk.pos = 1:20, reestimate.rf = FALSE, verbose = TRUE, tol.final = 10e-3) plot(maps.hexafake[[1]]) plot(submap1.lg1, mrk.names = TRUE, cex = .8) plot(submap2.lg1, mrk.names = TRUE, cex = .8)
## selecting the six first markers in linkage group 1 ## re-estimating the recombination fractions and linkage phases submap1.lg1 <- get_submap(input.map = maps.hexafake[[1]], mrk.pos = 1:6, verbose = TRUE, reestimate.phase = TRUE, tol.final = 10e-3) ## no recombination fraction re-estimation: first 20 markers submap2.lg1 <- get_submap(input.map = maps.hexafake[[1]], mrk.pos = 1:20, reestimate.rf = FALSE, verbose = TRUE, tol.final = 10e-3) plot(maps.hexafake[[1]]) plot(submap1.lg1, mrk.names = TRUE, cex = .8) plot(submap2.lg1, mrk.names = TRUE, cex = .8)
Internal function
get_tab_mrks(x)
get_tab_mrks(x)
x |
an object of class |
Gabriel Gesteira, [email protected]
Identifies linkage groups of markers using the results of two-point (pairwise) analysis.
group_mappoly( input.mat, expected.groups = NULL, inter = TRUE, comp.mat = FALSE, LODweight = FALSE, verbose = TRUE )
group_mappoly( input.mat, expected.groups = NULL, inter = TRUE, comp.mat = FALSE, LODweight = FALSE, verbose = TRUE )
input.mat |
an object of class |
expected.groups |
when available, inform the number of expected linkage groups (i.e. chromosomes) for the species |
inter |
if |
comp.mat |
if |
LODweight |
if |
verbose |
logical. If |
Returns an object of class mappoly.group
, which is a list
containing the following components:
data.name |
the referred dataset name |
hc.snp |
a list containing information related to the UPGMA grouping method |
expected.groups |
the number of expected linkage groups |
groups.snp |
the groups to which each of the markers belong |
seq.vs.grouped.snp |
comparison between the genomic group information
(when available) and the groups provided by |
chisq.pval.thres |
the threshold used on the segregation test when reading the dataset |
chisq.pval |
the p-values associated with the segregation test for all markers in the sequence |
Marcelo Mollinari, [email protected]
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
## Getting first 20 markers from two linkage groups all.mrk <- make_seq_mappoly(hexafake, c(1:20,601:620)) red.mrk <- elim_redundant(all.mrk) unique.mrks <- make_seq_mappoly(red.mrk) counts <- cache_counts_twopt(unique.mrks, cached = TRUE) all.pairs <- est_pairwise_rf(input.seq = unique.mrks, count.cache = counts, ncpus = 1, verbose = TRUE) ## Full recombination fraction matrix mat.full <- rf_list_to_matrix(input.twopt = all.pairs) plot(mat.full, index = FALSE) lgs <- group_mappoly(input.mat = mat.full, expected.groups = 2, inter = TRUE, comp.mat = TRUE, #this data has physical information verbose = TRUE) lgs plot(lgs)
## Getting first 20 markers from two linkage groups all.mrk <- make_seq_mappoly(hexafake, c(1:20,601:620)) red.mrk <- elim_redundant(all.mrk) unique.mrks <- make_seq_mappoly(red.mrk) counts <- cache_counts_twopt(unique.mrks, cached = TRUE) all.pairs <- est_pairwise_rf(input.seq = unique.mrks, count.cache = counts, ncpus = 1, verbose = TRUE) ## Full recombination fraction matrix mat.full <- rf_list_to_matrix(input.twopt = all.pairs) plot(mat.full, index = FALSE) lgs <- group_mappoly(input.mat = mat.full, expected.groups = 2, inter = TRUE, comp.mat = TRUE, #this data has physical information verbose = TRUE) lgs plot(lgs)
A dataset of a hypothetical autohexaploid full-sib population containing three homology groups
hexafake
hexafake
An object of class mappoly.data
which contains a
list with the following components:
ploidy level = 6
number individuals = 300
total number of markers = 1500
the names of the individuals
the names of the markers
a vector containing the dosage in
parent P for all n.mrk
markers
a vector containing the dosage in
parent Q for all n.mrk
markers
a vector indicating the chromosome each marker belongs. Zero indicates that the marker was not assigned to any chromosome
Physical position of the markers into the sequence
a matrix containing the dosage for each markers (rows)
for each individual (columns). Missing data are represented by
ploidy_level + 1 = 7
There are no phenotypes in this simulation
There are no phenotypes in this simulation
vector containing p-values for all markers associated to the chi-square test for the expected segregation patterns under Mendelian segregation
A dataset of a hypothetical autohexaploid full-sib population
containing three homology groups. This dataset contains the
probability distribution of the genotypes and 2% of missing data,
but is essentially the same dataset found in hexafake
hexafake.geno.dist
hexafake.geno.dist
An object of class mappoly.data
which contains a
list with the following components:
ploidy level = 6
number individuals = 300
total number of markers = 1500
the names of the individuals
the names of the markers
a vector containing the dosage in
parent P for all n.mrk
markers
a vector containing the dosage in
parent Q for all n.mrk
markers
a vector indicating which sequence each marker belongs. Zero indicates that the marker was not assigned to any sequence
Physical position of the markers into the sequence
probability threshold to associate a marker call to a dosage. Markers with maximum genotype probability smaller than 'prob.thres' are considered as missing data for the dosage calling purposes
a data.frame containing the probability distribution for each combination of marker and offspring. The first two columns represent the marker and the offspring, respectively. The remaining elements represent the probability associated to each one of the possible dosages
a matrix containing the dosage for each markers (rows)
for each individual (columns). Missing data are represented by
ploidy_level + 1 = 7
There are no phenotypes in this simulation
There are no phenotypes in this simulation
Function to import datasets from polymapR.
import_data_from_polymapR( input.data, ploidy, parent1 = "P1", parent2 = "P2", input.type = c("discrete", "probabilistic"), prob.thres = 0.95, pardose = NULL, offspring = NULL, filter.non.conforming = TRUE, verbose = TRUE )
import_data_from_polymapR( input.data, ploidy, parent1 = "P1", parent2 = "P2", input.type = c("discrete", "probabilistic"), prob.thres = 0.95, pardose = NULL, offspring = NULL, filter.non.conforming = TRUE, verbose = TRUE )
input.data |
a |
ploidy |
the ploidy level |
parent1 |
a character string containing the name (or pattern of genotype IDs) of parent 1 |
parent2 |
a character string containing the name (or pattern of genotype IDs) of parent 2 |
input.type |
Indicates whether the input is discrete ("disc") or probabilistic ("prob") |
prob.thres |
threshold probability to assign a dosage to offspring. If the probability
is smaller than |
pardose |
matrix of dimensions (n.mrk x 3) containing the name of the markers in the first column, and the dosage of parents 1 and 2 in columns 2 and 3. (see polymapR vignette) |
offspring |
a character string containing the name (or pattern of genotype IDs) of the offspring
individuals. If |
filter.non.conforming |
if |
verbose |
if |
See examples at https://rpubs.com/mmollin/tetra_mappoly_vignette.
Marcelo Mollinari [email protected]
Bourke PM et al: (2019) PolymapR — linkage analysis and genetic map construction from F1 populations of outcrossing polyploids. _Bioinformatics_ 34:3496–3502. doi:10.1093/bioinformatics/bty1002
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Read objects with information related to genotype calling in polyploids.
Currently this function supports output objects created with the
updog
(output of multidog
function) package.
This function creates an object of class mappoly.data
import_from_updog( object, prob.thres = 0.95, filter.non.conforming = TRUE, chrom = NULL, genome.pos = NULL, verbose = TRUE )
import_from_updog( object, prob.thres = 0.95, filter.non.conforming = TRUE, chrom = NULL, genome.pos = NULL, verbose = TRUE )
object |
the name of the object of class |
prob.thres |
probability threshold to associate a marker call to a dosage. Markers with maximum genotype probability smaller than 'prob.thres' are considered as missing data for the dosage calling purposes |
filter.non.conforming |
if |
chrom |
a vector indicating which sequence each marker belongs. Zero indicates that the marker was not assigned to any sequence |
genome.pos |
vector with physical position of the markers into the sequence |
verbose |
if |
An object of class mappoly.data
which contains a
list with the following components:
ploidy |
ploidy level |
n.ind |
number individuals |
n.mrk |
total number of markers |
ind.names |
the names of the individuals |
mrk.names |
the names of the markers |
dosage.p1 |
a vector containing the dosage in
parent P for all |
dosage.p2 |
a vector containing the dosage in
parent Q for all |
chrom |
a vector indicating which sequence each marker belongs. Zero indicates that the marker was not assigned to any sequence |
genome.pos |
physical position of the markers into the sequence |
prob.thres |
probability threshold to associate a marker call to a dosage. Markers with maximum genotype probability smaller than 'prob.thres' were considered as missing data in the 'geno.dose' matrix |
geno.dose |
a matrix containing the dosage for each markers (rows)
for each individual (columns). Missing data are represented by
|
geno |
a data.frame
containing the probability distribution for each combination of
marker and offspring. The first two columns represent the marker
and the offspring, respectively. The remaining elements represent
the probability associated to each one of the possible
dosages. Missing data are converted from |
n.phen |
number of phenotypic traits |
phen |
a matrix containing the phenotypic data. The rows correspond to the traits and the columns correspond to the individuals |
chisq.pval |
a vector containing p-values related to the chi-squared test of Mendelian segregation performed for all markers |
Gabriel Gesteira, [email protected]
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
if(requireNamespace("updog", quietly = TRUE)){ library("updog") data("uitdewilligen") mout = multidog(refmat = t(uitdewilligen$refmat), sizemat = t(uitdewilligen$sizemat), ploidy = uitdewilligen$ploidy, model = "f1", p1_id = colnames(t(uitdewilligen$sizemat))[1], p2_id = colnames(t(uitdewilligen$sizemat))[2], nc = 2) mydata = import_from_updog(mout) mydata plot(mydata) }
if(requireNamespace("updog", quietly = TRUE)){ library("updog") data("uitdewilligen") mout = multidog(refmat = t(uitdewilligen$refmat), sizemat = t(uitdewilligen$sizemat), ploidy = uitdewilligen$ploidy, model = "f1", p1_id = colnames(t(uitdewilligen$sizemat))[1], p2_id = colnames(t(uitdewilligen$sizemat))[2], nc = 2) mydata = import_from_updog(mout) mydata plot(mydata) }
Function to import phased map lists from polymapR
import_phased_maplist_from_polymapR(maplist, mappoly.data, ploidy = NULL)
import_phased_maplist_from_polymapR(maplist, mappoly.data, ploidy = NULL)
maplist |
a list of phased maps obtained using function
|
mappoly.data |
a dataset used to obtain |
ploidy |
the ploidy level |
See examples at https://rpubs.com/mmollin/tetra_mappoly_vignette.
Marcelo Mollinari [email protected]
Bourke PM et al: (2019) PolymapR — linkage analysis and genetic map construction from F1 populations of outcrossing polyploids. _Bioinformatics_ 34:3496–3502. doi:10.1093/bioinformatics/bty1002
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Update the multipoint log-likelihood of a given map using the method proposed by Mollinari and Garcia (2019).
loglike_hmm(input.map, input.data = NULL, verbose = FALSE)
loglike_hmm(input.map, input.data = NULL, verbose = FALSE)
input.map |
An object of class |
input.data |
An object of class |
verbose |
If |
Marcelo Mollinari, [email protected]
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
hexa.map <- loglike_hmm(maps.hexafake[[1]]) hexa.map
hexa.map <- loglike_hmm(maps.hexafake[[1]]) hexa.map
Get a subset of an object of class mappoly.rf.matrix
, i.e.
recombination fraction and LOD score matrices based in a
sequence of markers.
make_mat_mappoly(input.mat, input.seq)
make_mat_mappoly(input.mat, input.seq)
input.mat |
an object of class |
input.seq |
an object of class |
an object of class mappoly.rf.matrix
,
which is a subset of 'input.mat'
.
See rf_list_to_matrix
for details
Marcelo Mollinari, [email protected]
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
# sequence with 20 markers mrk.seq <- make_seq_mappoly(hexafake, 1:20) mrk.pairs <- est_pairwise_rf(input.seq = mrk.seq, verbose = TRUE) ## Full recombination fraction matrix mat <- rf_list_to_matrix(input.twopt = mrk.pairs) plot(mat) ## Matrix subset id <- make_seq_mappoly(hexafake, 1:10) mat.sub <- make_mat_mappoly(mat, id) plot(mat.sub)
# sequence with 20 markers mrk.seq <- make_seq_mappoly(hexafake, 1:20) mrk.pairs <- est_pairwise_rf(input.seq = mrk.seq, verbose = TRUE) ## Full recombination fraction matrix mat <- rf_list_to_matrix(input.twopt = mrk.pairs) plot(mat) ## Matrix subset id <- make_seq_mappoly(hexafake, 1:10) mat.sub <- make_mat_mappoly(mat, id) plot(mat.sub)
Get a subset of an object of class mappoly.twopt
or mappoly.twopt2
(i.e.
recombination fraction) and LOD score statistics for all possible linkage
phase combinations based on a sequence of markers.
make_pairs_mappoly(input.twopt, input.seq)
make_pairs_mappoly(input.twopt, input.seq)
input.twopt |
an object of class |
input.seq |
an object of class |
an object of class mappoly.twopt
which is a
subset of input.twopt
.
See est_pairwise_rf
for details
Marcelo Mollinari, [email protected]
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
## selecting some markers along the genome some.mrk <- make_seq_mappoly(hexafake, seq(1, 1500, 30)) all.pairs <- est_pairwise_rf(input.seq = some.mrk) mat.full <- rf_list_to_matrix(input.twopt = all.pairs) plot(mat.full) ## selecting two-point information for chromosome 1 mrks.1 <- make_seq_mappoly(hexafake, names(which(some.mrk$chrom == 1))) p1 <- make_pairs_mappoly(input.seq = mrks.1, input.twopt = all.pairs) m1 <- rf_list_to_matrix(input.twopt = p1) plot(m1, main.text = "LG1")
## selecting some markers along the genome some.mrk <- make_seq_mappoly(hexafake, seq(1, 1500, 30)) all.pairs <- est_pairwise_rf(input.seq = some.mrk) mat.full <- rf_list_to_matrix(input.twopt = all.pairs) plot(mat.full) ## selecting two-point information for chromosome 1 mrks.1 <- make_seq_mappoly(hexafake, names(which(some.mrk$chrom == 1))) p1 <- make_pairs_mappoly(input.seq = mrks.1, input.twopt = all.pairs) m1 <- rf_list_to_matrix(input.twopt = p1) plot(m1, main.text = "LG1")
Constructs a sequence of markers based on an object belonging to various specified classes. This function is versatile, supporting multiple input types and configurations for generating marker sequences.
make_seq_mappoly( input.obj, arg = NULL, data.name = NULL, info.parent = c("all", "p1", "p2"), genomic.info = NULL ) ## S3 method for class 'mappoly.sequence' print(x, ...) ## S3 method for class 'mappoly.sequence' plot(x, ...)
make_seq_mappoly( input.obj, arg = NULL, data.name = NULL, info.parent = c("all", "p1", "p2"), genomic.info = NULL ) ## S3 method for class 'mappoly.sequence' print(x, ...) ## S3 method for class 'mappoly.sequence' plot(x, ...)
input.obj |
An object belonging to one of the specified classes: |
arg |
Specifies the markers to include in the sequence, accepting several formats: a string 'all' for all
markers; a string or vector of strings 'seqx' where x is the sequence number (0 for unassigned markers); a
vector of integers indicating specific markers; or a vector of integers representing linkage group numbers if
|
data.name |
Name of the |
info.parent |
Selection criteria based on parental information: |
genomic.info |
Optional and applicable only to |
x |
An object of class |
... |
Currently ignored. |
Returns an object of class 'mappoly.sequence', comprising:
"seq.num" |
Ordered vector of marker indices according to the input. |
"seq.phases" |
List of linkage phases between markers; -1 for undefined phases. |
"seq.rf" |
Vector of recombination frequencies; -1 for not estimated frequencies. |
"loglike" |
Log-likelihood of the linkage map. |
"data.name" |
Name of the 'mappoly.data' object with raw data. |
"twopt" |
Name of the 'mappoly.twopt' object with 2-point analyses; -1 if not computed. |
Marcelo Mollinari [email protected], with modifications by Gabriel Gesteira [email protected]
Mollinari, M., and Garcia, A. A. F. (2019). Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models. _G3: Genes|Genomes|Genetics_, doi:10.1534/g3.119.400378.
all.mrk <- make_seq_mappoly(hexafake, 'all') seq1.mrk <- make_seq_mappoly(hexafake, 'seq1') plot(seq1.mrk) some.mrk.pos <- c(1,4,28,32,45) some.mrk.1 <- make_seq_mappoly(hexafake, some.mrk.pos) plot(some.mrk.1)
all.mrk <- make_seq_mappoly(hexafake, 'all') seq1.mrk <- make_seq_mappoly(hexafake, 'seq1') plot(seq1.mrk) some.mrk.pos <- c(1,4,28,32,45) some.mrk.1 <- make_seq_mappoly(hexafake, some.mrk.pos) plot(some.mrk.1)
hexafake
A list containing three linkage groups estimated using the procedure available in [MAPpoly's tutorial](https://mmollina.github.io/MAPpoly/#estimating_the_map_for_a_given_order)
maps.hexafake
maps.hexafake
A list containing three objects of class mappoly.map
, each one
representing one linkage group in the simulated data.
Estimates loci position using Multidimensional Scaling proposed by
Preedy and Hackett (2016). The code is an adaptation from
the package MDSmap
, available under GNU GENERAL PUBLIC LICENSE,
Version 3, at https://CRAN.R-project.org/package=MDSMap
mds_mappoly( input.mat, p = NULL, n = NULL, ndim = 2, weight.exponent = 2, verbose = TRUE ) ## S3 method for class 'mappoly.pcmap' print(x, ...) ## S3 method for class 'mappoly.pcmap3d' print(x, ...)
mds_mappoly( input.mat, p = NULL, n = NULL, ndim = 2, weight.exponent = 2, verbose = TRUE ) ## S3 method for class 'mappoly.pcmap' print(x, ...) ## S3 method for class 'mappoly.pcmap3d' print(x, ...)
input.mat |
an object of class |
p |
integer. The smoothing parameter for the principal curve.
If |
n |
vector of integers or strings containing loci to be omitted from the analysis |
ndim |
number of dimensions to be considered in the multidimensional scaling procedure (default = 2) |
weight.exponent |
the exponent that should be used in the LOD score values to weight the MDS procedure (default = 2) |
verbose |
if |
x |
an object of class |
... |
currently ignored |
A list containing:
M |
the input distance map |
sm |
the unconstrained MDS results |
pc |
the principal curve results |
distmap |
a matrix of pairwise distances between loci where the columns are in the estimated order |
locimap |
a data frame of the loci containing the name and position of each locus in order of increasing distance |
length |
integer giving the total length of the segment |
removed |
a vector of the names of loci removed from the analysis |
scale |
the scaling factor from the MDS |
locikey |
a data frame showing the number associated with each locus name for interpreting the MDS configuration plot |
confplotno |
a data frame showing locus name associated with each number on the MDS configuration plots |
Marcelo Mollinari, [email protected] mostly adapted from MDSmap codes, written by Katharine F. Preedy, [email protected]
Preedy, K. F., & Hackett, C. A. (2016). A rapid marker ordering approach for high-density genetic linkage maps in experimental autotetraploid populations using multidimensional scaling. _Theoretical and Applied Genetics_, 129(11), 2117-2132. doi:10.1007/s00122-016-2761-8
s1 <- make_seq_mappoly(hexafake, 1:20) t1 <- est_pairwise_rf(s1, ncpus = 1) m1 <- rf_list_to_matrix(t1) o1 <- get_genomic_order(s1) s.go <- make_seq_mappoly(o1) plot(m1, ord = s.go$seq.mrk.names) mds.ord <- mds_mappoly(m1) plot(mds.ord) so <- make_seq_mappoly(mds.ord) plot(m1, ord = so$seq.mrk.names) plot(so$seq.num ~ I(so$genome.pos/1e6), xlab = "Genome Position", ylab = "MDS position")
s1 <- make_seq_mappoly(hexafake, 1:20) t1 <- est_pairwise_rf(s1, ncpus = 1) m1 <- rf_list_to_matrix(t1) o1 <- get_genomic_order(s1) s.go <- make_seq_mappoly(o1) plot(m1, ord = s.go$seq.mrk.names) mds.ord <- mds_mappoly(m1) plot(mds.ord) so <- make_seq_mappoly(mds.ord) plot(m1, ord = so$seq.mrk.names) plot(so$seq.num ~ I(so$genome.pos/1e6), xlab = "Genome Position", ylab = "MDS position")
This function merges two datasets of class mappoly.data
. This can be useful
when individuals of a population were genotyped using two or more techniques
and have datasets in different files or formats. Please notice that the datasets
should contain the same number of individuals and they must be represented identically
in both datasets (e.g. Ind_1
in both datasets, not Ind_1
in one dataset and ind_1
or Ind.1
in the other).
merge_datasets(dat.1 = NULL, dat.2 = NULL)
merge_datasets(dat.1 = NULL, dat.2 = NULL)
dat.1 |
the first dataset of class |
dat.2 |
the second dataset of class |
An object of class mappoly.data
which contains all markers
from both datasets. It will be a list with the following components:
ploidy |
ploidy level |
n.ind |
number individuals |
n.mrk |
total number of markers |
ind.names |
the names of the individuals |
mrk.names |
the names of the markers |
dosage.p1 |
a vector containing the dosage in
parent P for all |
dosage.p2 |
a vector containing the dosage in
parent Q for all |
chrom |
a vector indicating which sequence each marker belongs. Zero indicates that the marker was not assigned to any sequence |
genome.pos |
Physical position of the markers into the sequence |
seq.ref |
if one or both datasets originated from read_vcf, it keeps reference alleles from sequencing platform, otherwise is NULL |
seq.alt |
if one or both datasets originated from read_vcf, it keeps alternative alleles from sequencing platform, otherwise is NULL |
all.mrk.depth |
if one or both datasets originated from read_vcf, it keeps marker read depths from sequencing, otherwise is NULL |
prob.thres |
(unused field) |
geno.dose |
a matrix containing the dosage for each markers (rows)
for each individual (columns). Missing data are represented by
|
geno |
if both datasets contain genotype distribution information, the final object will contain 'geno'. This is set to NULL otherwise |
nphen |
(0) |
phen |
(NULL) |
chisq.pval |
a vector containing p-values related to the chi-squared test of Mendelian segregation performed for all markers in both datasets |
kept |
if elim.redundant = TRUE when reading any dataset, holds all non-redundant markers |
elim.correspondence |
if elim.redundant = TRUE when reading any dataset, holds all non-redundant markers and its equivalence to the redundant ones |
Gabriel Gesteira, [email protected]
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
## Loading a subset of SNPs from chromosomes 3 and 12 of sweetpotato dataset ## (SNPs anchored to Ipomoea trifida genome) dat <- NULL for(i in c(3, 12)){ cat("Loading chromosome", i, "...\n") tempfl <- tempfile(pattern = paste0("ch", i), fileext = ".vcf.gz") x <- "https://github.com/mmollina/MAPpoly_vignettes/raw/master/data/sweet_sample_ch" address <- paste0(x, i, ".vcf.gz") download.file(url = address, destfile = tempfl) dattemp <- read_vcf(file = tempfl, parent.1 = "PARENT1", parent.2 = "PARENT2", ploidy = 6, verbose = FALSE) dat <- merge_datasets(dat, dattemp) cat("\n") } dat plot(dat)
## Loading a subset of SNPs from chromosomes 3 and 12 of sweetpotato dataset ## (SNPs anchored to Ipomoea trifida genome) dat <- NULL for(i in c(3, 12)){ cat("Loading chromosome", i, "...\n") tempfl <- tempfile(pattern = paste0("ch", i), fileext = ".vcf.gz") x <- "https://github.com/mmollina/MAPpoly_vignettes/raw/master/data/sweet_sample_ch" address <- paste0(x, i, ".vcf.gz") download.file(url = address, destfile = tempfl) dattemp <- read_vcf(file = tempfl, parent.1 = "PARENT1", parent.2 = "PARENT2", ploidy = 6, verbose = FALSE) dat <- merge_datasets(dat, dattemp) cat("\n") } dat plot(dat)
Estimates the linkage phase and recombination fraction between pre-built maps and creates a new map by merging them.
merge_maps( map.list, twopt, thres.twopt = 10, genoprob.list = NULL, thres.hmm = "best", tol = 1e-04 )
merge_maps( map.list, twopt, thres.twopt = 10, genoprob.list = NULL, thres.hmm = "best", tol = 1e-04 )
map.list |
a list of objects of class |
twopt |
an object of class |
thres.twopt |
the threshold used to determine if the linkage phases compared via two-point analysis should be considered for the search space reduction (default = 3) |
genoprob.list |
a list of objects of class |
thres.hmm |
the threshold used to determine which linkage phase configurations should be returned when merging two maps. If "best" (default), returns only the best linkage phase configuration. NOTE: if merging multiple maps, it always uses the "best" linkage phase configuration at each block insertion. |
tol |
the desired accuracy (default = 10e-04) |
merge_maps
uses two-point information, under a given LOD threshold, to reduce the
linkage phase search space. The remaining linkage phases are tested using the genotype
probabilities.
A list of class mappoly.map
with two elements:
i) info: a list containing information about the map, regardless of the linkage phase configuration:
ploidy |
the ploidy level |
n.mrk |
number of markers |
seq.num |
a vector containing the (ordered) indices of markers in the map, according to the input file |
mrk.names |
the names of markers in the map |
seq.dose.p1 |
a vector containing the dosage in parent 1 for all markers in the map |
seq.dose.p2 |
a vector containing the dosage in parent 2 for all markers in the map |
chrom |
a vector indicating the sequence (usually chromosome) each marker belongs
as informed in the input file. If not available,
|
genome.pos |
physical position (usually in megabase) of the markers into the sequence |
seq.ref |
reference base used for each marker (i.e. A, T, C, G). If not available,
|
seq.alt |
alternative base used for each marker (i.e. A, T, C, G). If not available,
|
chisq.pval |
a vector containing p-values of the chi-squared test of Mendelian segregation for all markers in the map |
data.name |
name of the dataset of class |
ph.thres |
the LOD threshold used to define the linkage phase configurations to test |
ii) a list of maps with possible linkage phase configuration. Each map in the list is also a list containing
seq.num |
a vector containing the (ordered) indices of markers in the map, according to the input file |
seq.rf |
a vector of size ( |
seq.ph |
linkage phase configuration for all markers in both parents |
loglike |
the hmm-based multipoint likelihood |
Marcelo Mollinari, [email protected]
#### Tetraploid example ##### map1 <- get_submap(solcap.dose.map[[1]], 1:5) map2 <- get_submap(solcap.dose.map[[1]], 6:15) map3 <- get_submap(solcap.dose.map[[1]], 16:30) full.map <- get_submap(solcap.dose.map[[1]], 1:30) s <- make_seq_mappoly(tetra.solcap, full.map$maps[[1]]$seq.num) twopt <- est_pairwise_rf(input.seq = s) merged.maps <- merge_maps(map.list = list(map1, map2, map3), twopt = twopt, thres.twopt = 3) plot(merged.maps, mrk.names = TRUE) plot(full.map, mrk.names = TRUE) best.phase <- merged.maps$maps[[1]]$seq.ph names.id <- names(best.phase$P) compare_haplotypes(ploidy = 4, best.phase$P[names.id], full.map$maps[[1]]$seq.ph$P[names.id]) compare_haplotypes(ploidy = 4, best.phase$Q[names.id], full.map$maps[[1]]$seq.ph$Q[names.id])
#### Tetraploid example ##### map1 <- get_submap(solcap.dose.map[[1]], 1:5) map2 <- get_submap(solcap.dose.map[[1]], 6:15) map3 <- get_submap(solcap.dose.map[[1]], 16:30) full.map <- get_submap(solcap.dose.map[[1]], 1:30) s <- make_seq_mappoly(tetra.solcap, full.map$maps[[1]]$seq.num) twopt <- est_pairwise_rf(input.seq = s) merged.maps <- merge_maps(map.list = list(map1, map2, map3), twopt = twopt, thres.twopt = 3) plot(merged.maps, mrk.names = TRUE) plot(full.map, mrk.names = TRUE) best.phase <- merged.maps$maps[[1]]$seq.ph names.id <- names(best.phase$P) compare_haplotypes(ploidy = 4, best.phase$P[names.id], full.map$maps[[1]]$seq.ph$P[names.id]) compare_haplotypes(ploidy = 4, best.phase$Q[names.id], full.map$maps[[1]]$seq.ph$Q[names.id])
This function plots scatterplot(s) of physical distance (in Mbp) versus the genetic
distance (in cM). Map(s) should be passed as a single object or a list of objects
of class mappoly.map
.
plot_genome_vs_map( map.list, phase.config = "best", same.ch.lg = FALSE, alpha = 1/5, size = 3 )
plot_genome_vs_map( map.list, phase.config = "best", same.ch.lg = FALSE, alpha = 1/5, size = 3 )
map.list |
A list or a single object of class |
phase.config |
A vector containing which phase configuration should be
plotted. If |
same.ch.lg |
Logical. If |
alpha |
transparency factor for SNPs points |
size |
size of the SNP points |
Marcelo Mollinari, [email protected]
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
plot_genome_vs_map(solcap.mds.map, same.ch.lg = TRUE) plot_genome_vs_map(solcap.mds.map, same.ch.lg = FALSE, alpha = 1, size = 1/2)
plot_genome_vs_map(solcap.mds.map, same.ch.lg = TRUE) plot_genome_vs_map(solcap.mds.map, same.ch.lg = FALSE, alpha = 1, size = 1/2)
This function plots the genotypic information content given
an object of class mappoly.homoprob
.
plot_GIC(hprobs, P = "P1", Q = "P2")
plot_GIC(hprobs, P = "P1", Q = "P2")
hprobs |
an object of class |
P |
a string containing the name of parent P |
Q |
a string containing the name of parent Q |
w <- lapply(solcap.err.map[1:3], calc_genoprob) h.prob <- calc_homologprob(w) plot_GIC(h.prob)
w <- lapply(solcap.err.map[1:3], calc_genoprob) h.prob <- calc_homologprob(w) plot_GIC(h.prob)
This function plots a genetic linkage map(s) generated by MAPpoly
.
The map(s) should be passed as a single object or a list of objects of class mappoly.map
.
plot_map_list( map.list, horiz = TRUE, col = "lightgray", title = "Linkage group" )
plot_map_list( map.list, horiz = TRUE, col = "lightgray", title = "Linkage group" )
map.list |
A list of objects or a single object of class |
horiz |
logical. If FALSE, the maps are plotted vertically with the first map to the left. If TRUE (default), the maps are plotted horizontally with the first at the bottom |
col |
a vector of colors for each linkage group. (default = 'lightgray')
|
title |
a title (string) for the maps (default = 'Linkage group') |
A data.frame
object containing the name of the markers and their genetic position
Marcelo Mollinari, [email protected]
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
## hexafake map plot_map_list(maps.hexafake, horiz = FALSE) plot_map_list(maps.hexafake, col = c("#999999", "#E69F00", "#56B4E9")) ## solcap map plot_map_list(solcap.dose.map, col = "ggstyle") plot_map_list(solcap.dose.map, col = "mp_pallet3", horiz = FALSE)
## hexafake map plot_map_list(maps.hexafake, horiz = FALSE) plot_map_list(maps.hexafake, col = c("#999999", "#E69F00", "#56B4E9")) ## solcap map plot_map_list(solcap.dose.map, col = "ggstyle") plot_map_list(solcap.dose.map, col = "mp_pallet3", horiz = FALSE)
Plot object mappoly.map2
plot_mappoly.map2(x)
plot_mappoly.map2(x)
x |
object of class |
Plots summary statistics for a given marker
plot_mrk_info(input.data, mrk)
plot_mrk_info(input.data, mrk)
input.data |
an object of class |
mrk |
marker name or position in the dataset |
Marcelo Mollinari, [email protected]
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
plot_mrk_info(tetra.solcap.geno.dist, 2680) plot_mrk_info(tetra.solcap.geno.dist, "solcap_snp_c2_23828")
plot_mrk_info(tetra.solcap.geno.dist, 2680) plot_mrk_info(tetra.solcap.geno.dist, "solcap_snp_c2_23828")
Outputs a graphical representation ggplot with the percent of data changed.
plot_progeny_dosage_change( map_list, error, verbose = TRUE, output_corrected = FALSE )
plot_progeny_dosage_change( map_list, error, verbose = TRUE, output_corrected = FALSE )
map_list |
a list of multiple |
error |
error rate used in global error in the 'calc_genoprob_error()' |
verbose |
if TRUE (default), current progress is shown; if FALSE, no output is produced |
output_corrected |
logical. if FALSE only the ggplot of the changed dosage is printed, if TRUE then a new corrected dosage matrix is output. |
A ggplot of the changed and imputed genotypic dosages
Jeekin Lau, [email protected], with optimization by Cristiane Taniguti, [email protected]
x <- get_submap(solcap.err.map[[1]], 1:30, reestimate.rf = FALSE) plot_progeny_dosage_change(list(x), error=0.05, output_corrected=FALSE) corrected_matrix <- plot_progeny_dosage_change(list(x), error=0.05, output_corrected=FALSE) #output corrected
x <- get_submap(solcap.err.map[[1]], 1:30, reestimate.rf = FALSE) plot_progeny_dosage_change(list(x), error=0.05, output_corrected=FALSE) corrected_matrix <- plot_progeny_dosage_change(list(x), error=0.05, output_corrected=FALSE) #output corrected
Plots mappoly.homoprob
## S3 method for class 'mappoly.homoprob' plot( x, stack = FALSE, lg = NULL, ind = NULL, use.plotly = TRUE, verbose = TRUE, ... )
## S3 method for class 'mappoly.homoprob' plot( x, stack = FALSE, lg = NULL, ind = NULL, use.plotly = TRUE, verbose = TRUE, ... )
x |
an object of class |
stack |
logical. If |
lg |
indicates which linkage group should be plotted. If |
ind |
indicates which individuals should be plotted. It can be the
position of the individuals in the dataset or it's name.
If |
use.plotly |
if |
verbose |
if |
... |
unused arguments |
Plots mappoly.prefpair.profiles
## S3 method for class 'mappoly.prefpair.profiles' plot( x, type = c("pair.configs", "hom.pairs"), min.y.prof = 0, max.y.prof = 1, thresh = 0.01, P1 = "P1", P2 = "P2", ... )
## S3 method for class 'mappoly.prefpair.profiles' plot( x, type = c("pair.configs", "hom.pairs"), min.y.prof = 0, max.y.prof = 1, thresh = 0.01, P1 = "P1", P2 = "P2", ... )
x |
an object of class |
type |
a character string indicating which type of graphic is plotted:
|
min.y.prof |
lower bound for y axis on the probability profile graphic (default = 0) |
max.y.prof |
upper bound for y axis on the probability profile graphic (default = 1) |
thresh |
threshold for chi-square test (default = 0.01) |
P1 |
a string containing the name of parent P1 |
P2 |
a string containing the name of parent P2 |
... |
unused arguments |
Returns information related to a given set of markers
print_mrk(input.data, mrks)
print_mrk(input.data, mrks)
input.data |
an object |
mrks |
marker sequence index (integer vector) |
print_mrk(tetra.solcap.geno.dist, 1:5) print_mrk(hexafake, 256)
print_mrk(tetra.solcap.geno.dist, 1:5) print_mrk(hexafake, 256)
Reads an external data file generated as output of saveMarkerModels
.
This function creates an object of class mappoly.data
.
read_fitpoly( file.in, ploidy, parent1, parent2, offspring = NULL, filter.non.conforming = TRUE, elim.redundant = TRUE, parent.geno = c("joint", "max"), thresh.parent.geno = 0.95, prob.thres = 0.95, file.type = c("table", "csv"), verbose = TRUE )
read_fitpoly( file.in, ploidy, parent1, parent2, offspring = NULL, filter.non.conforming = TRUE, elim.redundant = TRUE, parent.geno = c("joint", "max"), thresh.parent.geno = 0.95, prob.thres = 0.95, file.type = c("table", "csv"), verbose = TRUE )
file.in |
a character string with the name of (or full path to) the input file |
ploidy |
the ploidy level |
parent1 |
a character string containing the name (or pattern of genotype IDs) of parent 1 |
parent2 |
a character string containing the name (or pattern of genotype IDs) of parent 2 |
offspring |
a character string containing the name (or pattern of genotype IDs) of the offspring
individuals. If |
filter.non.conforming |
if |
elim.redundant |
logical. If |
parent.geno |
indicates whether to use the joint probability |
thresh.parent.geno |
threshold probability to assign a dosage to parents. If the probability
is smaller than |
prob.thres |
threshold probability to assign a dosage to offspring. If the probability
is smaller than |
file.type |
indicates whether the characters in the input file are separated by 'white spaces' ("table") or by commas ("csv"). |
verbose |
if |
An object of class mappoly.data
which contains a
list with the following components:
ploidy |
ploidy level |
n.ind |
number individuals |
n.mrk |
total number of markers |
ind.names |
the names of the individuals |
mrk.names |
the names of the markers |
dosage.p1 |
a vector containing the dosage in
parent P for all |
dosage.p2 |
a vector containing the dosage in
parent Q for all |
chrom |
a vector indicating which sequence each marker belongs. Zero indicates that the marker was not assigned to any sequence |
genome.pos |
Physical position of the markers into the sequence |
seq.ref |
NULL (unused in this type of data) |
seq.alt |
NULL (unused in this type of data) |
all.mrk.depth |
NULL (unused in this type of data) |
geno.dose |
a matrix containing the dosage for each markers (rows)
for each individual (columns). Missing data are represented by
|
n.phen |
number of phenotypic traits |
phen |
a matrix containing the phenotypic data. The rows correspond to the traits and the columns correspond to the individuals |
kept |
if elim.redundant = TRUE, holds all non-redundant markers |
elim.correspondence |
if elim.redundant = TRUE, holds all non-redundant markers and its equivalence to the redundant ones |
Marcelo Mollinari, [email protected]
Voorrips, R.E., Gort, G. & Vosman, B. (2011) Genotype calling in tetraploid species from bi-allelic marker data using mixture models. _BMC Bioinformatics_. doi:10.1186/1471-2105-12-172
#### Tetraploid Example ft <- "https://raw.githubusercontent.com/mmollina/MAPpoly_vignettes/master/data/fitpoly.dat" tempfl <- tempfile() download.file(ft, destfile = tempfl) fitpoly.dat <- read_fitpoly(file.in = tempfl, ploidy = 4, parent1 = "P1", parent2 = "P2", verbose = TRUE) print(fitpoly.dat, detailed = TRUE) plot(fitpoly.dat) plot_mrk_info(fitpoly.dat, 37)
#### Tetraploid Example ft <- "https://raw.githubusercontent.com/mmollina/MAPpoly_vignettes/master/data/fitpoly.dat" tempfl <- tempfile() download.file(ft, destfile = tempfl) fitpoly.dat <- read_fitpoly(file.in = tempfl, ploidy = 4, parent1 = "P1", parent2 = "P2", verbose = TRUE) print(fitpoly.dat, detailed = TRUE) plot(fitpoly.dat) plot_mrk_info(fitpoly.dat, 37)
Reads an external data file. The format of the file is described in the Details
section. This function creates an object of class mappoly.data
read_geno( file.in, filter.non.conforming = TRUE, elim.redundant = TRUE, verbose = TRUE ) ## S3 method for class 'mappoly.data' print(x, detailed = FALSE, ...) ## S3 method for class 'mappoly.data' plot(x, thresh.line = 1e-05, ...)
read_geno( file.in, filter.non.conforming = TRUE, elim.redundant = TRUE, verbose = TRUE ) ## S3 method for class 'mappoly.data' print(x, detailed = FALSE, ...) ## S3 method for class 'mappoly.data' plot(x, thresh.line = 1e-05, ...)
file.in |
a character string with the name of (or full path to) the input file which contains the data to be read |
filter.non.conforming |
if |
elim.redundant |
logical. If |
verbose |
if |
x |
an object of class |
detailed |
if available, print the number of markers per sequence (default = FALSE) |
... |
currently ignored |
thresh.line |
position of a threshold line for p values of the segregation test (default = 10e-06) |
The first line of the input file contains the string ploidy
followed by the ploidy level of the parents.
The second and third lines contain the strings n.ind
and n.mrk
followed by the number of individuals in
the dataset and the total number of markers, respectively. Lines number 4 and 5 contain the strings
mrk.names
and ind.names
followed by a sequence of the names of the markers and the name of the individuals,
respectively. Lines 6 and 7 contain the strings dosageP
and dosageQ
followed by a sequence of numbers
containing the dosage of all markers in parent P
and Q
. Line 8, contains the string seq followed by
a sequence of integer numbers indicating the chromosome each marker belongs. It can be any 'a priori'
information regarding the physical distance between markers. For example, these numbers could refer
to chromosomes, scaffolds or even contigs, in which the markers are positioned. If this information
is not available for a particular marker, NA should be used. If this information is not available for
any of the markers, the string seq
should be followed by a single NA
. Line number 9 contains the string
seqpos
followed by the physical position of the markers into the sequence. The physical position can be
given in any unity of physical genomic distance (base pairs, for instance). However, the user should be
able to make decisions based on these values, such as the occurrence of crossing overs, etc. Line number 10
should contain the string nphen
followed by the number of phenotypic traits. Line number 11 is skipped
(Usually used as a spacer). The next elements are strings containing the name of the phenotypic trait with no space characters
followed by the phenotypic values. The number of lines should be the same number of phenotypic traits.
NA
represents missing values. The line number 12 + nphen
is skipped. Finally, the last element is a table
containing the dosage for each marker (rows) for each individual (columns). NA
represents missing values.
An object of class mappoly.data
which contains a
list with the following components:
ploidy |
ploidy level |
n.ind |
number individuals |
n.mrk |
total number of markers |
ind.names |
the names of the individuals |
mrk.names |
the names of the markers |
dosage.p1 |
a vector containing the dosage in
parent P for all |
dosage.p2 |
a vector containing the dosage in
parent Q for all |
chrom |
a vector indicating which sequence each marker belongs. Zero indicates that the marker was not assigned to any sequence |
genome.pos |
Physical position of the markers into the sequence |
seq.ref |
NULL (unused in this type of data) |
seq.alt |
NULL (unused in this type of data) |
all.mrk.depth |
NULL (unused in this type of data) |
geno.dose |
a matrix containing the dosage for each markers (rows)
for each individual (columns). Missing data are represented by
|
n.phen |
number of phenotypic traits |
phen |
a matrix containing the phenotypic data. The rows correspond to the traits and the columns correspond to the individuals |
kept |
if elim.redundant = TRUE, holds all non-redundant markers |
elim.correspondence |
if elim.redundant = TRUE, holds all non-redundant markers and its equivalence to the redundant ones |
Marcelo Mollinari, [email protected]
Mollinari M., Olukolu B. A., Pereira G. da S., Khan A., Gemenet D., Yencho G. C., Zeng Z-B. (2020), Unraveling the Hexaploid Sweetpotato Inheritance Using Ultra-Dense Multilocus Mapping, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400620
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
#### Tetraploid Example fl1 = "https://raw.githubusercontent.com/mmollina/MAPpoly_vignettes/master/data/SolCAP_dosage" tempfl <- tempfile() download.file(fl1, destfile = tempfl) SolCAP.dose <- read_geno(file.in = tempfl) print(SolCAP.dose, detailed = TRUE) plot(SolCAP.dose)
#### Tetraploid Example fl1 = "https://raw.githubusercontent.com/mmollina/MAPpoly_vignettes/master/data/SolCAP_dosage" tempfl <- tempfile() download.file(fl1, destfile = tempfl) SolCAP.dose <- read_geno(file.in = tempfl) print(SolCAP.dose, detailed = TRUE) plot(SolCAP.dose)
Reads an external comma-separated values (CSV) data file. The format of the file is described in the Details
section. This function creates an object of class mappoly.data
.
read_geno_csv( file.in, ploidy, filter.non.conforming = TRUE, elim.redundant = TRUE, verbose = TRUE )
read_geno_csv( file.in, ploidy, filter.non.conforming = TRUE, elim.redundant = TRUE, verbose = TRUE )
file.in |
a character string with the name of (or full path to) the input file containing the data to be read |
ploidy |
the ploidy level |
filter.non.conforming |
if |
elim.redundant |
logical. If |
verbose |
if |
This is an alternative and a somewhat more straightforward version of the function
read_geno
. The input is a standard CSV file where the rows
represent the markers, except for the first row which is used as a header.
The first five columns contain the marker names, the dosage in parents 1 and 2,
the chromosome information (i.e. chromosome, scaffold, contig, etc) and the
position of the marker within the sequence. The remaining columns contain
the dosage of the full-sib population. A tetraploid example of such file
can be found in the Examples
section.
An object of class mappoly.data
which contains a
list with the following components:
ploidy |
ploidy level |
n.ind |
number individuals |
n.mrk |
total number of markers |
ind.names |
the names of the individuals |
mrk.names |
the names of the markers |
dosage.p1 |
a vector containing the dosage in
parent P for all |
dosage.p2 |
a vector containing the dosage in
parent Q for all |
chrom |
a vector indicating which sequence each marker belongs. Zero indicates that the marker was not assigned to any sequence |
genome.pos |
Physical position of the markers into the sequence |
seq.ref |
NULL (unused in this type of data) |
seq.alt |
NULL (unused in this type of data) |
all.mrk.depth |
NULL (unused in this type of data) |
geno.dose |
a matrix containing the dosage for each markers (rows)
for each individual (columns). Missing data are represented by
|
n.phen |
number of phenotypic traits |
phen |
a matrix containing the phenotypic data. The rows correspond to the traits and the columns correspond to the individuals |
kept |
if elim.redundant = TRUE, holds all non-redundant markers |
elim.correspondence |
if elim.redundant = TRUE, holds all non-redundant markers and its equivalence to the redundant ones |
Marcelo Mollinari, [email protected], with minor changes by Gabriel Gesteira, [email protected]
Mollinari M., Olukolu B. A., Pereira G. da S., Khan A., Gemenet D., Yencho G. C., Zeng Z-B. (2020), Unraveling the Hexaploid Sweetpotato Inheritance Using Ultra-Dense Multilocus Mapping, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400620
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
#### Tetraploid Example ft = "https://raw.githubusercontent.com/mmollina/MAPpoly_vignettes/master/data/tetra_solcap.csv" tempfl <- tempfile() download.file(ft, destfile = tempfl) SolCAP.dose <- read_geno_csv(file.in = tempfl, ploidy = 4) print(SolCAP.dose, detailed = TRUE) plot(SolCAP.dose)
#### Tetraploid Example ft = "https://raw.githubusercontent.com/mmollina/MAPpoly_vignettes/master/data/tetra_solcap.csv" tempfl <- tempfile() download.file(ft, destfile = tempfl) SolCAP.dose <- read_geno_csv(file.in = tempfl, ploidy = 4) print(SolCAP.dose, detailed = TRUE) plot(SolCAP.dose)
Reads an external data file. The format of the file is described in the Details
section. This function creates an object of class mappoly.data
read_geno_prob( file.in, prob.thres = 0.95, filter.non.conforming = TRUE, elim.redundant = TRUE, verbose = TRUE )
read_geno_prob( file.in, prob.thres = 0.95, filter.non.conforming = TRUE, elim.redundant = TRUE, verbose = TRUE )
file.in |
a character string with the name of (or full path to) the input file which contains the data to be read |
prob.thres |
probability threshold to associate a marker call to a
dosage. Markers with maximum genotype probability smaller than |
filter.non.conforming |
if |
elim.redundant |
logical. If |
verbose |
if |
The first line of the input file contains the string ploidy
followed by the ploidy level of the parents.
The second and third lines contains the strings n.ind
and n.mrk
followed by the number of individuals in
the dataset and the total number of markers, respectively. Lines number 4 and 5 contain the string
mrk.names
and ind.names
followed by a sequence of the names of the markers and the name of the individuals,
respectively. Lines 6 and 7 contain the strings dosageP
and dosageQ
followed by a sequence of numbers
containing the dosage of all markers in parent P
and Q
. Line 8, contains the string seq followed by
a sequence of integer numbers indicating the chromosome each marker belongs. It can be any 'a priori'
information regarding the physical distance between markers. For example, these numbers could refer
to chromosomes, scaffolds or even contigs, in which the markers are positioned. If this information
is not available for a particular marker, NA should be used. If this information is not available for
any of the markers, the string seq
should be followed by a single NA
. Line number 9 contains the string
seqpos
followed by the physical position of the markers into the sequence. The physical position can be
given in any unity of physical genomic distance (base pairs, for instance). However, the user should be
able to make decisions based on these values, such as the occurrence of crossing overs, etc. Line number 10
should contain the string nphen
followed by the number of phenotypic traits. Line number 11 is skipped
(Usually used as a spacer). The next elements are strings containing the name of the phenotypic trait with no space characters
followed by the phenotypic values. The number of lines should be the same number of phenotypic traits.
NA
represents missing values. The line number 12 + nphen
is skipped. Finally, the last element is a table
containing the probability distribution for each combination of marker and offspring. The first two columns
represent the marker and the offspring, respectively. The remaining elements represent the probability
associated with each one of the possible dosages. NA
represents missing data.
an object of class mappoly.data
which contains a
list with the following components:
ploidy |
ploidy level |
n.ind |
number individuals |
n.mrk |
total number of markers |
ind.names |
the names of the individuals |
mrk.names |
the names of the markers |
dosage.p1 |
a vector containing the dosage in
parent P for all |
dosage.p2 |
a vector containing the dosage in
parent Q for all |
chrom |
a vector indicating which sequence each marker belongs. Zero indicates that the marker was not assigned to any sequence |
genome.pos |
physical position of the markers into the sequence |
seq.ref |
NULL (unused in this type of data) |
seq.alt |
NULL (unused in this type of data) |
all.mrk.depth |
NULL (unused in this type of data) |
prob.thres |
probability threshold to associate a marker call to a dosage. Markers with maximum genotype probability smaller than 'prob.thres' were considered as missing data in the 'geno.dose' matrix |
geno.dose |
a matrix containing the dosage for each markers (rows)
for each individual (columns). Missing data are represented by
|
geno |
a data.frame
containing the probability distribution for each combination of
marker and offspring. The first two columns represent the marker
and the offspring, respectively. The remaining elements represent
the probability associated to each one of the possible
dosages. Missing data are converted from NA to the expected
segregation ratio using function |
n.phen |
number of phenotypic traits |
phen |
a matrix containing the phenotypic data. The rows correspond to the traits and the columns correspond to the individuals |
chisq.pval |
a vector containing p-values related to the chi-squared test of Mendelian segregation performed for all markers |
kept |
if elim.redundant = TRUE, holds all non-redundant markers |
elim.correspondence |
if elim.redundant = TRUE, holds all non-redundant markers and its equivalence to the redundant ones |
Marcelo Mollinari, [email protected]
Mollinari M., Olukolu B. A., Pereira G. da S., Khan A., Gemenet D., Yencho G. C., Zeng Z-B. (2020), Unraveling the Hexaploid Sweetpotato Inheritance Using Ultra-Dense Multilocus Mapping, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400620
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
#### Tetraploid Example ft = "https://raw.githubusercontent.com/mmollina/MAPpoly_vignettes/master/data/hexa_sample" tempfl <- tempfile() download.file(ft, destfile = tempfl) SolCAP.dose.prob <- read_geno_prob(file.in = tempfl) print(SolCAP.dose.prob, detailed = TRUE) plot(SolCAP.dose.prob)
#### Tetraploid Example ft = "https://raw.githubusercontent.com/mmollina/MAPpoly_vignettes/master/data/hexa_sample" tempfl <- tempfile() download.file(ft, destfile = tempfl) SolCAP.dose.prob <- read_geno_prob(file.in = tempfl) print(SolCAP.dose.prob, detailed = TRUE) plot(SolCAP.dose.prob)
Reads an external VCF file and creates an object of class mappoly.data
read_vcf( file.in, parent.1, parent.2, ploidy = NA, filter.non.conforming = TRUE, thresh.line = 0.05, min.gt.depth = 0, min.av.depth = 0, max.missing = 1, elim.redundant = TRUE, verbose = TRUE, read.geno.prob = FALSE, prob.thres = 0.95 )
read_vcf( file.in, parent.1, parent.2, ploidy = NA, filter.non.conforming = TRUE, thresh.line = 0.05, min.gt.depth = 0, min.av.depth = 0, max.missing = 1, elim.redundant = TRUE, verbose = TRUE, read.geno.prob = FALSE, prob.thres = 0.95 )
file.in |
a character string with the name of (or full path to) the input file which contains the data (VCF format) |
parent.1 |
a character string containing the name of parent 1 |
parent.2 |
a character string containing the name of parent 2 |
ploidy |
the species ploidy (optional, it will be automatically detected) |
filter.non.conforming |
if |
thresh.line |
threshold used for p-values on segregation test (default = 0.05) |
min.gt.depth |
minimum genotype depth to keep information.
If the genotype depth is below |
min.av.depth |
minimum average depth to keep markers (default = 0) |
max.missing |
maximum proportion of missing data to keep markers (range = 0-1; default = 1) |
elim.redundant |
logical. If |
verbose |
if |
read.geno.prob |
If genotypic probabilities are available (PL field),
generates a probability-based dataframe (default = |
prob.thres |
probability threshold to associate a marker call to a
dosage. Markers with maximum genotype probability smaller than |
This function can handle .vcf files versions 4.0 or higher. The ploidy can be automatically detected, but it is highly recommended that you inform it to check for mismatches. All individual and marker names will be kept as they are in the .vcf file.
An object of class mappoly.data
which contains a
list with the following components:
ploidy |
ploidy level |
n.ind |
number individuals |
n.mrk |
total number of markers |
ind.names |
the names of the individuals |
mrk.names |
the names of the markers |
dosage.p1 |
a vector containing the dosage in
parent P for all |
dosage.p2 |
a vector containing the dosage in
parent Q for all |
chrom |
a vector indicating which sequence each marker belongs. Zero indicates that the marker was not assigned to any sequence |
genome.pos |
Physical position of the markers into the sequence |
seq.ref |
Reference base used for each marker (i.e. A, T, C, G) |
seq.alt |
Alternative base used for each marker (i.e. A, T, C, G) |
prob.thres |
(unused field) |
geno.dose |
a matrix containing the dosage for each markers (rows)
for each individual (columns). Missing data are represented by
|
geno |
a dataframe containing all genotypic probabilities columns for each
marker and individual combination (rows). Missing data are represented by
|
nphen |
(unused field) |
phen |
(unused field) |
all.mrk.depth |
DP information for all markers on VCF file |
chisq.pval |
a vector containing p-values related to the chi-squared test of Mendelian segregation performed for all markers |
kept |
if elim.redundant = TRUE, holds all non-redundant markers |
elim.correspondence |
if elim.redundant = TRUE, holds all non-redundant markers and its equivalence to the redundant ones |
Gabriel Gesteira, [email protected]
Mollinari M., Olukolu B. A., Pereira G. da S., Khan A., Gemenet D., Yencho G. C., Zeng Z-B. (2020), Unraveling the Hexaploid Sweetpotato Inheritance Using Ultra-Dense Multilocus Mapping, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400620
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
## Hexaploid sweetpotato: Subset of chromosome 3 fl = "https://github.com/mmollina/MAPpoly_vignettes/raw/master/data/sweet_sample_ch3.vcf.gz" tempfl <- tempfile(pattern = 'chr3_', fileext = '.vcf.gz') download.file(fl, destfile = tempfl) dat.dose.vcf = read_vcf(file = tempfl, parent.1 = "PARENT1", parent.2 = "PARENT2") print(dat.dose.vcf) plot(dat.dose.vcf)
## Hexaploid sweetpotato: Subset of chromosome 3 fl = "https://github.com/mmollina/MAPpoly_vignettes/raw/master/data/sweet_sample_ch3.vcf.gz" tempfl <- tempfile(pattern = 'chr3_', fileext = '.vcf.gz') download.file(fl, destfile = tempfl) dat.dose.vcf = read_vcf(file = tempfl, parent.1 = "PARENT1", parent.2 = "PARENT2") print(dat.dose.vcf) plot(dat.dose.vcf)
This function re-estimates the recombination fractions between all markers in a given map.
reest_rf( input.map, input.mat = NULL, tol = 0.01, phase.config = "all", method = c("hmm", "ols", "wMDS_to_1D_pc"), weight = TRUE, verbose = TRUE, high.prec = FALSE, max.rf.to.break.EM = 0.5, input.mds = NULL )
reest_rf( input.map, input.mat = NULL, tol = 0.01, phase.config = "all", method = c("hmm", "ols", "wMDS_to_1D_pc"), weight = TRUE, verbose = TRUE, high.prec = FALSE, max.rf.to.break.EM = 0.5, input.mds = NULL )
input.map |
An object of class |
input.mat |
An object of class |
tol |
tolerance for determining convergence (default = 10e-03) |
phase.config |
which phase configuration should be used. "best" (default) will choose the maximum likelihood configuration |
method |
indicates whether to use |
weight |
if |
verbose |
if |
high.prec |
logical. If |
max.rf.to.break.EM |
for internal use only. |
input.mds |
An object of class |
An updated object of class mappoly.pcmap
whose
order was used in the input.map
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Stam P (1993) Construction of integrated genetic-linkage maps by means of a new computer package: Joinmap. _Plant J_ 3:739–744 doi:10.1111/j.1365-313X.1993.00739.x
Provides the reverse of a given map.
rev_map(input.map)
rev_map(input.map)
input.map |
an object of class |
Marcelo Mollinari, [email protected]
plot_genome_vs_map(solcap.mds.map[[1]]) plot_genome_vs_map(rev_map(solcap.mds.map[[1]]))
plot_genome_vs_map(solcap.mds.map[[1]]) plot_genome_vs_map(rev_map(solcap.mds.map[[1]]))
Transforms the recombination fraction list contained in an object
of class mappoly.twopt
or mappoly.twopt2
into a recombination
fraction matrix
rf_list_to_matrix( input.twopt, thresh.LOD.ph = 0, thresh.LOD.rf = 0, thresh.rf = 0.5, ncpus = 1L, shared.alleles = FALSE, verbose = TRUE ) ## S3 method for class 'mappoly.rf.matrix' print(x, ...) ## S3 method for class 'mappoly.rf.matrix' plot( x, type = c("rf", "lod"), ord = NULL, rem = NULL, main.text = NULL, index = FALSE, fact = 1, ... )
rf_list_to_matrix( input.twopt, thresh.LOD.ph = 0, thresh.LOD.rf = 0, thresh.rf = 0.5, ncpus = 1L, shared.alleles = FALSE, verbose = TRUE ) ## S3 method for class 'mappoly.rf.matrix' print(x, ...) ## S3 method for class 'mappoly.rf.matrix' plot( x, type = c("rf", "lod"), ord = NULL, rem = NULL, main.text = NULL, index = FALSE, fact = 1, ... )
input.twopt |
an object of class |
thresh.LOD.ph |
LOD score threshold for linkage phase configurations (default = 0) |
thresh.LOD.rf |
LOD score threshold for recombination fractions (default = 0) |
thresh.rf |
the threshold used for recombination fraction filtering (default = 0.5) |
ncpus |
number of parallel processes (i.e. cores) to spawn (default = 1) |
shared.alleles |
if |
verbose |
if |
x |
an object of class |
... |
currently ignored |
type |
type of matrix that should be printed. Can be one of the
following: |
ord |
the order in which the markers should be plotted (default = NULL) |
rem |
which markers should be removed from the heatmap (default = NULL) |
main.text |
a character string as the title of the heatmap (default = NULL) |
index |
|
fact |
positive integer. factor expressed as number of cells to be aggregated (default = 1, no aggregation) |
thresh_LOD_ph
should be set in order to only select
recombination fractions that have LOD scores associated to the
linkage phase configuration higher than thresh_LOD_ph
when compared to the second most likely linkage phase configuration.
A list containing two matrices. The first one contains the filtered recombination fraction and the second one contains the information matrix
Marcelo Mollinari, [email protected]
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
all.mrk <- make_seq_mappoly(hexafake, 1:20) red.mrk <- elim_redundant(all.mrk) unique.mrks <- make_seq_mappoly(red.mrk) all.pairs <- est_pairwise_rf(input.seq = unique.mrks, ncpus = 1, verbose = TRUE) ## Full recombination fraction matrix mat.full <- rf_list_to_matrix(input.twopt = all.pairs) plot(mat.full) plot(mat.full, type = "lod")
all.mrk <- make_seq_mappoly(hexafake, 1:20) red.mrk <- elim_redundant(all.mrk) unique.mrks <- make_seq_mappoly(red.mrk) all.pairs <- est_pairwise_rf(input.seq = unique.mrks, ncpus = 1, verbose = TRUE) ## Full recombination fraction matrix mat.full <- rf_list_to_matrix(input.twopt = all.pairs) plot(mat.full) plot(mat.full, type = "lod")
Remove markers that do not meet a LOD and recombination fraction criteria for at least a percentage of the pairwise marker combinations. It also removes markers with strong evidence of linkage across the whole linkage group (false positive).
rf_snp_filter( input.twopt, thresh.LOD.ph = 5, thresh.LOD.rf = 5, thresh.rf = 0.15, probs = c(0.05, 1), diag.markers = NULL, mrk.order = NULL, ncpus = 1L, diagnostic.plot = TRUE, breaks = 100 )
rf_snp_filter( input.twopt, thresh.LOD.ph = 5, thresh.LOD.rf = 5, thresh.rf = 0.15, probs = c(0.05, 1), diag.markers = NULL, mrk.order = NULL, ncpus = 1L, diagnostic.plot = TRUE, breaks = 100 )
input.twopt |
an object of class |
thresh.LOD.ph |
LOD score threshold for linkage phase configuration (default = 5) |
thresh.LOD.rf |
LOD score threshold for recombination fraction (default = 5) |
thresh.rf |
threshold for recombination fractions (default = 0.15) |
probs |
indicates the probability corresponding to the filtering quantiles. (default = c(0.05, 1)) |
diag.markers |
A window where marker pairs should be considered. If NULL (default), all markers are considered. |
mrk.order |
marker order. Only has effect if 'diag.markers' is not NULL |
ncpus |
number of parallel processes (i.e. cores) to spawn (default = 1) |
diagnostic.plot |
if |
breaks |
number of cells for the histogram |
thresh.LOD.ph
should be set in order to only select
recombination fractions that have LOD scores associated to the
linkage phase configuration higher than thresh_LOD_ph
when compared to the second most likely linkage phase configuration.
That action usually eliminates markers that are unlinked to the
set of analyzed markers.
A filtered object of class mappoly.sequence
.
See make_seq_mappoly
for details
Marcelo Mollinari, [email protected] with updates by Gabriel Gesteira, [email protected]
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
all.mrk <- make_seq_mappoly(hexafake, 1:20) red.mrk <- elim_redundant(all.mrk) unique.mrks <- make_seq_mappoly(red.mrk) all.pairs <- est_pairwise_rf(input.seq = unique.mrks, ncpus = 1, verbose = TRUE) ## Full recombination fraction matrix mat.full <- rf_list_to_matrix(input.twopt = all.pairs) plot(mat.full) ## Removing disruptive SNPs tpt.filt <- rf_snp_filter(all.pairs, 2, 2, 0.07, probs = c(0.15, 1)) p1.filt <- make_pairs_mappoly(input.seq = tpt.filt, input.twopt = all.pairs) m1.filt <- rf_list_to_matrix(input.twopt = p1.filt) plot(mat.full, main.text = "LG1") plot(m1.filt, main.text = "LG1.filt")
all.mrk <- make_seq_mappoly(hexafake, 1:20) red.mrk <- elim_redundant(all.mrk) unique.mrks <- make_seq_mappoly(red.mrk) all.pairs <- est_pairwise_rf(input.seq = unique.mrks, ncpus = 1, verbose = TRUE) ## Full recombination fraction matrix mat.full <- rf_list_to_matrix(input.twopt = all.pairs) plot(mat.full) ## Removing disruptive SNPs tpt.filt <- rf_snp_filter(all.pairs, 2, 2, 0.07, probs = c(0.15, 1)) p1.filt <- make_pairs_mappoly(input.seq = tpt.filt, input.twopt = all.pairs) m1.filt <- rf_list_to_matrix(input.twopt = p1.filt) plot(mat.full, main.text = "LG1") plot(m1.filt, main.text = "LG1.filt")
Computes the polysomic segregation frequency given a ploidy level and the dosage of the locus in both parents. It does not consider double reduction.
segreg_poly(ploidy, dP, dQ)
segreg_poly(ploidy, dP, dQ)
ploidy |
the ploidy level |
dP |
the dosage in parent P |
dQ |
the dosage in parent Q |
a vector containing the expected segregation frequency for all possible genotypic classes.
Marcelo Mollinari, [email protected]
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
Serang O, Mollinari M, Garcia AAF (2012) Efficient Exact Maximum a Posteriori Computation for Bayesian SNP Genotyping in Polyploids. _PLoS ONE_ 7(2): e30906.
# autohexaploid with two and three doses in parents P and Q, # respectively seg <- segreg_poly(ploidy = 6, dP = 2, dQ = 3) barplot(seg, las = 2)
# autohexaploid with two and three doses in parents P and Q, # respectively seg <- segreg_poly(ploidy = 6, dP = 2, dQ = 3) barplot(seg, las = 2)
Simulate two homology groups (one for each parent) and their linkage phase configuration.
sim_homologous(ploidy, n.mrk, prob.dose = NULL, seed = NULL)
sim_homologous(ploidy, n.mrk, prob.dose = NULL, seed = NULL)
ploidy |
ploidy level. Must be an even number |
n.mrk |
number of markers |
prob.dose |
a vector indicating the proportion of markers for different dosage to be simulated (default = NULL) |
seed |
random number generator seed |
This function prevents the simulation of linkage phase configurations which are impossible to estimate via two point methods
a list containing the following components:
hom.allele.p |
a list of vectors
containing linkage phase configurations. Each vector contains the
numbers of the homologous chromosomes in which the alleles are
located. For instance, a vector containing |
p |
contains the indices of the starting positions of the
dosages, considering that the vectors contained in |
hom.allele.q |
Analogously to |
q |
Analogously to |
ploidy |
ploidy level |
Marcelo Mollinari, [email protected]
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
h.temp <- sim_homologous(ploidy = 6, n.mrk = 20)
h.temp <- sim_homologous(ploidy = 6, n.mrk = 20)
tetra.solcap
A list containing 12 linkage groups estimated using genomic order and dosage call
solcap.dose.map
solcap.dose.map
A list containing 12 objects of class mappoly.map
, each one
representing one linkage group in the tetra.solcap
dataset.
tetra.solcap
A list containing 12 linkage groups estimated using genomic order, dosage call and global calling error
solcap.err.map
solcap.err.map
A list containing 12 objects of class mappoly.map
, each one
representing one linkage group in the tetra.solcap
dataset.
tetra.solcap
A list containing 12 linkage groups estimated using mds_mappoly
order and dosage call
solcap.mds.map
solcap.mds.map
A list containing 12 objects of class mappoly.map
, each one
representing one linkage group in the tetra.solcap
dataset.
tetra.solcap.geno.dist
A list containing 12 linkage groups estimated using genomic order and prior probability distribution
solcap.prior.map
solcap.prior.map
A list containing 12 objects of class mappoly.map
, each one
representing one linkage group in the tetra.solcap.geno.dist
dataset.
The function splits the input map in sub-maps given a distance threshold of neighboring markers and evaluates alternative phases between the sub-maps.
split_and_rephase( input.map, twopt, gap.threshold = 5, size.rem.cluster = 1, phase.config = "best", thres.twopt = 3, thres.hmm = "best", tol.merge = 0.001, tol.final = 0.001, verbose = TRUE )
split_and_rephase( input.map, twopt, gap.threshold = 5, size.rem.cluster = 1, phase.config = "best", thres.twopt = 3, thres.hmm = "best", tol.merge = 0.001, tol.final = 0.001, verbose = TRUE )
input.map |
an object of class |
twopt |
an object of class |
gap.threshold |
distance threshold of neighboring markers where the map should be spitted. The default value is 5 cM |
size.rem.cluster |
the size of the marker cluster (in number of markers) from which the cluster should be removed. The default value is 1 |
phase.config |
which phase configuration should be used. "best" (default) will choose the maximum likelihood phase configuration |
thres.twopt |
the threshold used to determine if the linkage phases compared via two-point analysis should be considered for the search space reduction (default = 3) |
thres.hmm |
the threshold used to determine which linkage phase configurations should be returned when merging two maps. If "best" (default), returns only the best linkage phase configuration. NOTE: if merging multiple maps, it always uses the "best" linkage phase configuration at each block insertion. |
tol.merge |
the desired accuracy for merging maps (default = 10e-04) |
tol.final |
the desired accuracy for the final map (default = 10e-04) |
verbose |
if |
An object of class mappoly.map
Marcelo Mollinari, [email protected]
Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi:10.1534/g3.119.400378
map <- get_submap(solcap.dose.map[[1]], 1:20, verbose = FALSE) tpt <- est_pairwise_rf(make_seq_mappoly(map)) new.map <- split_and_rephase(map, tpt, 1, 1) map new.map plot_map_list(list(old.map = map, new.map = new.map), col = "ggstyle")
map <- get_submap(solcap.dose.map[[1]], 1:20, verbose = FALSE) tpt <- est_pairwise_rf(make_seq_mappoly(map)) new.map <- split_and_rephase(map, tpt, 1, 1) map new.map plot_map_list(list(old.map = map, new.map = new.map), col = "ggstyle")
This function generates a brief summary table of a list of mappoly.map
objects
summary_maps(map.list, verbose = TRUE)
summary_maps(map.list, verbose = TRUE)
map.list |
a list of objects of class |
verbose |
if |
a data frame containing a brief summary of all maps contained in map.list
Gabriel Gesteira, [email protected]
tetra.sum <- summary_maps(solcap.err.map) tetra.sum
tetra.sum <- summary_maps(solcap.err.map) tetra.sum
A dataset of the B2721 population which derived from a cross between two tetraploid potato varieties: Atlantic × B1829-5. The population comprises 160 offsprings genotyped with the SolCAP Infinium 8303 potato array. The original data set can be found in [The Solanaceae Coordinated Agricultural Project (SolCAP) webpage](http://solcap.msu.edu/potato_infinium.shtml) The dataset also contains the genomic order of the SNPs from the Solanum tuberosum genome version 4.03. The genotype calling was performed using the fitPoly R package.
tetra.solcap
tetra.solcap
An object of class mappoly.data
which contains a
list with the following components:
ploidy level = 4
number individuals = 160
total number of markers = 4017
the names of the individuals
the names of the markers
a vector containing the dosage in
parent P for all n.mrk
markers
a vector containing the dosage in
parent Q for all n.mrk
markers
a vector indicating the chromosome each marker belongs. Zero indicates that the marker was not assigned to any sequence
Physical position of the markers into the sequence
a matrix containing the dosage for each markers (rows)
for each individual (columns). Missing data are represented by
ploidy_level + 1 = 5
There are no phenotypes in this simulation
There are no phenotypes in this simulation
vector containing p-values for all markers associated to the chi-square test for the expected segregation patterns under Mendelian segregation
A dataset of the B2721 population which derived from a cross between
two tetraploid potato varieties: Atlantic × B1829-5. The population comprises 160
offsprings genotyped with the SolCAP Infinium 8303 potato array. The original data
set can be found in [The Solanaceae Coordinated Agricultural Project (SolCAP) webpage](http://solcap.msu.edu/potato_infinium.shtml)
The dataset also contains the genomic order of the SNPs from the Solanum
tuberosum genome version 4.03. The genotype calling was performed using the
fitPoly R package. Although this dataset contains the
probability distribution of the genotypes,
it is essentially the same dataset found in tetra.solcap
tetra.solcap.geno.dist
tetra.solcap.geno.dist
An object of class mappoly.data
which contains a
list with the following components:
ploidy level = 4
number individuals = 160
total number of markers = 4017
the names of the individuals
the names of the markers
a vector containing the dosage in
parent P for all n.mrk
markers
a vector containing the dosage in
parent Q for all n.mrk
markers
a vector indicating which sequence each marker belongs. Zero indicates that the marker was not assigned to any sequence
Physical position of the markers into the sequence
probability threshold to associate a marker call to a dosage. Markers with maximum genotype probability smaller than 'prob.thres' are considered as missing data for the dosage calling purposes
a data.frame containing the probability distribution for each combination of marker and offspring. The first two columns represent the marker and the offspring, respectively. The remaining elements represent the probability associated to each one of the possible dosages
a matrix containing the dosage for each markers (rows)
for each individual (columns). Missing data are represented by
ploidy_level + 1 = 5
There are no phenotypes in this simulation
There are no phenotypes in this simulation
Add markers that are informative in both parents using HMM approach and evaluating difference in LOD and gap size
update_framework_map( input.map.list, input.seq, twopt, thres.twopt = 10, init.LOD = 30, verbose = TRUE, method = "hmm", input.mds = NULL, max.rounds = 50, size.rem.cluster = 2, gap.threshold = 4 )
update_framework_map( input.map.list, input.seq, twopt, thres.twopt = 10, init.LOD = 30, verbose = TRUE, method = "hmm", input.mds = NULL, max.rounds = 50, size.rem.cluster = 2, gap.threshold = 4 )
input.map.list |
list containing three |
input.seq |
object of class |
twopt |
object of class |
thres.twopt |
the LOD threshold used to determine if the linkage phases compared via two-point analysis should be considered for the search space reduction (default = 5) |
init.LOD |
the LOD threshold used to determine if the marker will be included or not after hmm analysis (default = 30) |
verbose |
If TRUE (default), current progress is shown; if FALSE, no output is produced |
method |
indicates whether to use 'hmm' (Hidden Markov Models), 'ols' (Ordinary Least Squares) or 'wMDS_to_1D_pc' (weighted MDS followed by fitting a one dimensional principal curve) to re-estimate the recombination fractions after adding markers |
input.mds |
An object of class |
max.rounds |
integer defining number of times to try to fit the remaining markers in the sequence |
size.rem.cluster |
threshold for number of markers that must contain in a segment after a gap is removed to keep this segment in the sequence |
gap.threshold |
threshold for gap size |
object of class mappoly.map2
Marcelo Mollinari, [email protected] with documentation and minor modifications by Cristiane Taniguti [email protected]
This function takes an object of class mappoly.map
and checks for
removed redundant markers in the original dataset. Once redundant markers
are found, they are re-added to the map in their respective equivalent positions
and another HMM round is performed.
update_map(input.maps, verbose = TRUE)
update_map(input.maps, verbose = TRUE)
input.maps |
a single map or a list of maps of class |
verbose |
if TRUE (default), shows information about each update process |
an updated map (or list of maps) of class mappoly.map
, containing the original map(s) plus redundant markers
Gabriel Gesteira, [email protected]
orig.map <- solcap.err.map up.map <- lapply(solcap.err.map, update_map) summary_maps(orig.map) summary_maps(up.map)
orig.map <- solcap.err.map up.map <- lapply(solcap.err.map, update_map) summary_maps(orig.map) summary_maps(up.map)