Even for protocols that generate missing data on a much smaller scale, such as ultra-conserved elements (UCEs), the probability that the probes will work differs as a function of the evolutionary distance among species ( Faircloth et al. The likelihood of such mutations will depend on the overall sequence similarity of species ( Rubin et al. 1 a) or newly mutated enzyme-cutting sites that turn larger fragments into smaller ones that fall out of the size-selection range. Some taxa might have null alleles generated by either mutations at enzyme-cutting sites ( Fig. For example, restriction enzymes are often used to construct the reduced representation libraries in RADtag protocols (also known as RADseq Baird et al. Because of the technologies involved in constructing reduced representation libraries, missing data are also expected to be nonrandomly distributed across species, with the amount of missing data proportional to the genetic distance between taxa. However, when those reads are spread across individuals and across loci, just by chance, each locus will have missing sequences in some individuals (even if individuals have equal concentrations of genomic DNA Fig. For example, a HiSeq Illumina run may generate 140 million reads. With a finite number of sequencing reads spread across multiple individuals in next-generation sequencing data sets, there can be large variation among loci in the amount of missing data ( Fig. In contrast, the primary decision with data generated with next-generation sequencing methods is whether to delete a locus from a data matrix because of missing sequences across the individuals in a study. For example, in supermatrix studies with mixed representation of loci across taxa, the concern is whether species with limited sequence data across loci (e.g., a few mitochondrial markers) would lead to a poorly resolved phylogeny ( Bininda-Emonds et al. In studies employing traditional sequencing approaches, decisions about missing data tend to focus on whether to delete a taxon from a data matrix ( Roure et al. Moreover, the nature of the missing data also differs. 2013 ) compared with traditional Sanger sequencing that amplifies and generates data for each locus and individual separately. However, accompanying the dramatic increases in the amount of genomic data that can be readily collected across multiple species are also much larger amounts of missing data (e.g., Rubin et al. 2008 ) in particular, allows researchers to collect unprecedented amounts of multilocus sequence data irrespective of whether the taxa have any preexisting genomic resources. Next-generation sequencing technologies-RAD sequencing (RADseq) ( Baird et al. Multilocus data sets now dominate phylogenetic studies, spurred by shifts in the technologies used to gather sequence data, as well as the general recognition of the value of multiple independent loci for phylogenetic study ( Pamilo and Nei 1988 Cummings et al. We demonstrate that the intuitive appeals about being conservative by removing loci may be misguided. This effect is exacerbated further by factors involved in the preparation of the genomic library (i.e., the use of reduced representation libraries, as well as the coverage) and the taxonomic diversity represented in the library (i.e., the level of divergence among the individuals). In particular, as the tolerance for missing data becomes more stringent, the mutational spectrum represented in the sampled loci becomes truncated such that loci with the highest mutation rates are disproportionately excluded. Specifically, we show that in addition to the obvious effects associated with reducing the amount of data used to make historical inferences, the decisions we make about missing data (such as the minimum number of individuals with a sequence for a locus to be included in the study) also impact the types of loci sampled for a study. Here, we use simulations, focusing specifically on RAD (Restriction site Associated DNA) sequences, to highlight some of the unforeseen consequence of excluding missing data from next-generation sequencing. There is a lack of consensus on how next-generation sequence (NGS) data should be considered for phylogenetic and phylogeographic estimates, with some studies excluding loci with missing data, whereas others include them, even when sequences are missing from a large number of individuals.