Our exciting new R package that appeared recently in Molecular Ecology Resources Molecular Ecology Resources -- -- dartR for facilitating analysis of SNP data generated from reduced representation genome sequencing -- has had an upgrade.
We are pleased to announce that version 1.0.5 has now been uploaded to CRAN, and has a number of new features. Among them:
- Filters that take out individuals or populations now recalculate the locus metadata provided by DArT Pty Ltd, where these are affected (e.g. CallRate);
- The New Hybrids conversion script now optionally runs New Hybrids, provided you have it installed, and gives improved output of results;
- A new script has been added for assignment of individuals of unknown provinence to populations;
- A new script has been added to create a nexus file for conducing phylogenetic analysis of populations using PAUP's SVDQuartets analysis.
- The fixed difference analysis includes code for generating a p value for the observed number of fixed differences compared with those expected under simulation, for a given pair of sample sizes.
The University of Canberra has a special relationship with a company called Diversity Arrays Technology (DArT) that specialises in genotyping by sequencing, particularly for agriculture and plant breeding. DArT, co-located with the Institute for Applied Ecology, has recently moved into applying its wares in the animal domain, and have been transformational in the capacity they bring the IAE and its genetics and genomics team.
In the interests of painless exploratory analysis, Bernd Gruber, Arthur Georges, Peter Unmack and Olly Berry have constructed an R package {dartR} for
- loading DArTâ„¢ SNP and SilicoDArT data generated from the commercial service provided by Diversity Arrays Technology Pty Ltd;
- applying filters to those data based on locus metadata such as call rate, information content or reproducibility;
- assigning individuals to populations and selecting subsets of individuals or populations;
- visualization using Principal Coordinates Analysis (PCoA); and
- providing a conduit to a range of standard data formats and R packages for analysis.
In most cases, the scripts in {dartR} are simple wrappers for scripts included in other already available packages, to provide transparent access to these packages for analyzing DArT data, and to provide some enhanced output diagnostics. Relatively few scripts provide novel analyses. We make no apologies for this, as the objective of {dartR} is to provide fundamental tools for accessing and manipulating DArT datafiles in preparation for analysis by the vast suite of packages available in R through the CRAN repository.
A summary of the capabilities of {dartR} is as follows:
- Intelligent interpretation and input of DArT comma-delimited files to a compact genlight form of the R {adegenet} package.
- Filtering loci and individuals on criteria drawn from the DArT locus metadata (such as repAvg, AvgPIC) or on computed statistics (such as call rate).
- Relabelling individuals and recoding populations into new aggregations, and deleting selected individuals or populations.
- Visualization using Principal Coordinates Analysis (PCoA) and Neighbour-joining trees.
- Translation to other R packages (e.g. NewHybrids), to other {adegenet} objects (e.g. genind), and to standard data formats (e.g. fastA).
- A few specific analyses not available elsewhere (e.g. fixed difference analysis, assignment analysis).
The package is currently under development and available as a beta release on https://github.com/green-striped-gecko/dartR. There you will find installation instructions and other guidance. The package dartR (version 1.0.5) is now available on CRAN (install.packages("dartR").