Phylogeographic patterns of Capreolus capreolus in the centre of the Iberian peninsula

Phylogeographic patterns of Capreolus capreolus in the centre of the Iberian peninsula. One hundred and one samples of muscle tissue were obtained from roe deer in the centre of the Iberian peninsula. We compared the sequences of the control region (D–loop) of the mitochondrial DNA of these samples with those obtained in previous studies. Adding the information from microsatellite markers and derived genetic parameters to study the population structure, we found a philopatric structure, with females maintaining mitochondrial haplotype diversity, while males showed a pattern of genome homogenization. The population can thus be considered panmictic. Different times of palaeohistory of the species may explain these results: glacial–interglacial stages of the Pleistocene and the reduction and recovery of populations throughout the 20th century.


Introduction
The roe deer (Capreolus capreolus) was part of the fauna of the Iberian peninsula during the last glaciations (Gliozzi et al., 1997;Hufthammer and Aaris-Sørensen, 1998).Some authors claim that the populations followed different routes of dispersal and recolonization toward northern Europe after the Ice Age (Hewitt, 1999), while others (Lorenzini and Lovari, 2006) propose that the Iberian peninsula remained an isolated geographical area, not acting as a main source for postglacial recolonization (Sommer and Zachos, 2009).
Capreolus capreolus was present in the Middle Pleistocene of Central Spain, where paleontological remains have been found (Buitrago, 1992).In addition to functioning as a shelter in Pleistocene times, the Sierra de Guadarrama area might have played a fundamental role in safeguarding populations of roe deer (Tellería, 1999), although there are references indicating that populations of Central Spain underwent severe reductions in the 19 th and early 20 th century (Tellería and Virgós, 1997;Gortázar et al., 2000) that may have left their mark on their genetic parameters.
Among the few genetic analyses that exist on the Iberian populations, only the studies of Lorenzini and Lovari (2006), Lorenzini et al. (2014), Randi et al. (2004) and Royo et al. (2007) included some locations near the Sierra de Guadarrama.
With the aim of contributing to a comprehensive molecular analysis of the populations in the Sierra de Guadarrama we examined two types of markers, mitochondrial and nuclear (microsatellites), with the largest number of samples analysed to date in the Iberian peninsula, in order to increase our knowledge of the phylogeographic pattern of this species in Spain and dispel doubts about its heterogeneity, and to analyse the impact of isolation from recent centuries.

Study area
Samples were collected between 2002 and 2007 at 22 locations in the Sierra de Guadarrama, in the mountains north of Madrid, in the centre of the Iberian peninsula (fig. 1, table 1).Currently, four of these localities are within the National Park of the Sierra de Guadarrama (declared a National Park on June 25th, 2013) (fig.1).Sampled locations are within public forests of Pinus sylvestris and Quercus pyrenaica.

Sample collection and DNA extraction
We collected muscle tissue from 101 individuals distributed evenly throughout the study area.Samples were collected post-mortem from hunted deer or deer killed in road accidents.A small muscle biopsy was kept in absolute ethanol at -20 ºC until analysis was undertaken in the laboratory.
DNA was extracted using the standard phenol/ chloroform protocol (Sambrook et al., 1989).Agarose gels were used to test presence, concentration and possible degradation of the DNA extracted from the samples.
Chromatograms resulting from each specimen reaction were combined and primers were cleaned with Sequencher (Gene Codes Corporation).To compare the sequences obtained with those from other European populations, we downloaded Gen-Bank available sequences representing the different haplotypes cited mainly by Wiehler andTiedemann (1998), Versini et al. (2002), Randi et al. (2004) and Royo et al. (2007).We used the complete control region sequence of Capreolus pygargus and part of one of its subspecies, Capreolus pygargus ochracea, as an outgroup to root the phylogenetic tree.
Two treatments were used to analyse these mitochondrial DNA data, phylogenetic inferences and networks (using Haploviewer, http://www.cibiv.at/~greg/haploviewer) based on the haplotypes, their frequencies, and the localities where each sample was collected.For the phylogenetic inferences, we used maximum parsimony (using PAUP; Swofford, 2003).Both the complete matrix of data, and a matrix reduced to 336 characters were prepared, the latter to avoid the problem of missing data, due to the reduced length of some GenBank sequences.
A Mantel test was used to calculate the coefficient of association between genetic distance and altitude.
We tested for deviation from Hardy-Weinberg equilibrium at each locus and over all using the Fisher's exact test and Markov chain algorithms (1,000 batches, 10,000 iterations), and F IS (Weir and Cockerham, 1984) was calculated using the program Genepop 3.4 (Raymond and Rousset, 1995).To as-sess the level of genetic diversity, we calculated the mean expected and observed heterozygosities using Arlequin 3.0 (Schneider et al., 2000).In addition, we calculated the molecular coancestry coefficient and Fig. 1.Location of the study area (Sierra de Guadarrama, Madrid), in the centre of the Iberian peninsula, where the roe deer muscle samples were obtained.Circles contain the location code (table 1).
To define the genetic structure of the populations of roe deer in the study area we used the program Structure 2.1 (Pritchard and Wen, 2003).This program uses a Bayesian algorithm to group samples into genetically distinct clusters, K, based on the similarities between the genotypes of individuals.We tested K = 1-15, with 10 replicates for each K-level.The most likely number of clusters was identified using the posterior probability (Pritchard and Wen, 2003).

Mitochondrial DNA analysis
The complete D-loop region (928-930 base pairs) was sequenced for the 101 studied samples (Gen-Bank accession numbers MG760247 to MG760343).A maximum of only five gaps was necessary to align the complete data set, rendering a matrix length of 933 positions.We found 13 parsimony-informative characters in this matrix that exclusively contained the newly sequenced samples.Subsequently, sequences from GenBank were incorporated, including the different haplotypes previously found, most of them from partially sequenced D-loop, with lengths from 340-342, 427, 678 to the complete sequence, rea-ching a matrix of 201 samples and 936 characters.In this last matrix, the number of parsimony informative characters increased to 62.
The 101 individuals analysed comprised five different haplotypes, A, B, C, D and E (table 1), grouped in four clusters.Differences between haplotypes A and E were due to 9 polymorphic positions in haplotype E, probably because of some degree of heteroplasmy.To avoid spurious results and reticulations, haplotype E was eliminated from the matrix to the following data treatments.
The network constructed showed the complex relationships between the samples from different locations (fig.2).Usually in this kind of figure, all the haplotypes are included to show the relative weight of each haplotype and deduce the potential centre of dispersion.Here we have only represented the different haplotypes or those representatives of groups of haplotypes cited in the principal roe deer D-loop analyses (Wiehler and Tiedemann, 1998;Versini et al., 2002;Randi et al., 2004;Royo et al., 2007) plus the samples analysed here.This rendered a matrix of 68 haplotypes of 201 specimens.The large number of haplotypes in the Italian and French populations stands out, distributed throughout the network.The Iberian samples were situated in the centre of this representation, and their different haplotypes showed more similarity to other groups, than among them.
By analysing the phylogenetic relationship of the total matrix (201 specimens, 940 pb), the unique-   ness of some of the specimens from the Sierra de Guadarrama was confirmed and the relationships with some European haplotypes (fig.3) was again demonstrated.Furthermore, this study showed the non-monophyly (common source) of Iberian haplotypes, since some appeared more related to other European haplotypes than with others found in the Sierra de Guadarrama.Fundamentally, lineages A and B were related to others analysed from Spain and Portugal, but also to some from France and Italy.The C group was related to haplotypes from Germany, Poland, Slovakia, the Netherlands, Serbia, Italy and France, while the groups related to the unique roe deer with haplotype D were mostly from central and northern Europe (France, Germany, Italy, Serbia, Sweden and Norway).However, the most part of the groups signalled were hardly supported (moderate to low bootstrap values), since the number of differences between sequences was limited.In figure 4, centring attention on the Iberian samples, we divided the clades into four groups.The first of these groups included specimens with A and B haplotypes, where haplotype 'A' coincided exactly with one haplotype found in the study by Royo et al. (2007).This coincident haplotype of Royo et al. (2007) comes from Alcornocales (Cádiz, south of Spain).In this first cluster, there were others Iberian specimens coming from Asturias (Sueve and Picos de Europa), Asturias (Muniellos), and France (Bordeaux), although the latter location is not decisive, as reintroductions from different sources were conducted.
In our haplotype clustering C, in the group numbered 2 (fig.4), there was also an individual sequenced by Royo et al. (2007), also coming also from the Sierra de Guadarrama.Group 3 included animals presenting the 'D' haplotype, which coincided with one haplotype sequenced by Royo er al. (2007) from northern locations such as the Picos de Europa or Muniellos, related to a lineage composed of specimens from the Picos de Europa, plus others from Pobla de Segur (Lleida) and Bordeaux.The group marked 4 was composed entirely of specimens from the northern Iberian peninsula.
An analysis based on the Mantel test (fig.5) showed that genetic distance was not significantly correlated with altitude (p > 0.05), suggesting that altitude was not the principal factor influencing genetic differentiation in the 101 individuals analysed.

Microsatellite analysis
Of the 12 loci analysed, 11 were polymorphic (table 2), rendering a polymorphism equal to 0.91.The average number of alleles for the population was 4.75 ± 0.96, varying between 2 (CSSM41, CSSM43 and IDVGA29) and 13 (OarFCB304).The PIC reached its lowest value for one of the less polymorphic loci (PIC IDVGA29 = 0.1035) and the highest value for one of the most polymorphic loci (PIC BM757 = 0.7465).The observed heterozygosity ranged from 0.11 (IDVGA29) to 0.73 (BM1706), with a mean of 0.43.The mean molecular coancestry, which characterizes the degree of genetic similarity among individuals in a population, was 0.5067 regardless of the informational value of different markers; the average value considering the PIC was 0.3701.
The population deviated significantly from Hardy-Weinberg equilibrium (p < 0.05) due to 5 markers (BM757, HUJ1177, BMC1009, IDVGA8 and Oar-FCB304).These 5 markers had positive F IS (reflecting a deficit of heterozygotes).This was a general trend since most F IS values were positive.Only the microsatellites IDVGA29 and BM1706 with negative F IS , of -0.031 and -0.056, respectively, showed a weak excess of heterozygotes.The population F IS calculated by Weir and Cockerham (1984) was positive and equal to 0.1115.
The various simulations with the computer program Structure 2.1 determined that the estimated log-likelihood of the data (lnPr (X/K) was maximum when K = 1, i.e, when considering a single population.When a factorial correspondence analysis was performed, where roe deer were in the space with a number of dimensions equal to the markers analysed, the specimens differed slightly, appearing overlapped because of the genotype similarity (fig.6).The lower right area of figure 6 distinguished a unique individual, which could force the greater overlap of the other specimens that shared alleles other than this animal.This distinction is produced because this individual had a private allele at one of the loci analysed.

Discussion
The combined analysis of two types of markers increased knowledge of the roe deer population in the Sierra de Guadarrama.Data from mitochondrial genomes (maternally inherited) and nuclear (from both parents) were complementary.
Analysis of the variation in the mtDNA control region revealed a multiple relationship between roe deer from the Sierra de Guadarrama and those from the Iberian peninsula populations (Randi et al., 2004;Royo et al., 2007) and European populations (Douzery and Randi, 1997;Vernesi et al., 2002;Randi et al., 2004).This relationship could be interpreted as a multiple origin of the population, with migration coming from different lineages of roe deer to the mountains of Madrid, either naturally by range expansion or through reintroductions.If migration had been natural by expansion of the distribution range, the observed variation should be lower than in the centre of origin, as it would represent the limit distribution of these populations; however, it is inconsistent with the results obtained.The variation stands in the range that it was shown in other studies, even in the upper part of this range (Wiehler and Tiedemann, 1998;Vernesi et al., 2002;Randi et al., 2004).The possibility of re-introductions in this area is unlikely because there are no records of such strategies in the Community of Madrid (FIDA, 2008).
The most likely explanation is therefore the acceptance of the theories that argue for the existence of shelters in the Pleistocene in the Iberian peninsula (Gliozzi et al., 1997;Hufthammer and Aaris-Sørensen, 1998;Taberlet et al., 1998), followed by later recolonization routes to northern Europe (Hewitt, 1999), or the area remaining as an isolated geographical region that would not have been influenced as a source for postglacial recolonization (Lorenzini and Lovari, 2006).
It should be noted that various proposals of phylogenetic relationships between haplotypes can be obtained when the characters are taken in whole or in part.The sequences provided by Royo et al. (2007) have different lengths.In the present study, the data were handled by taking all available information (930 pb) for the analyses together with the European available data or by reducing our matrix to the number of characters in common with other information studies (up to 446 pb), thereby obtaining various proposed relationships between the detected lineages.Lineages A to D showed different clustering, with C being more closely related to D in the global analysis, while in the treatment where only the specimens from the Iberian peninsula were considered, C was apparently closer to A and B. In any case, it should be kept in mind that most of the nodes were poorly supported, and thus, if the nodes not fully supported were collapsed, no incongruence occurred.Besides, in these proposals, Table 2. Summary of basic analysis of genetic variability for the population of roe deer: H O , observed heterozygosity; H E , expected heterozygosity under the Hardy-Weinberg equilibrium; F IS , was calculated according to Weir and Cockerham (1984).Values of the average number of alleles, average heterozygosity and F IS calculated for all loci are shown in the last row.

Locus name
Allele  Royo et al. (2007).This confirms the point made in the comparative analysis of the study population with other European studies (Douzery and Randi, 1997;Vernesi et al., 2002;Randi et al., 2004).As the history of very recent populations, from an evolutionary point of view, shows this lack of support, since the number of diagnostic characters is limited, in this study we considered other markers such as microsatellites, which could elucidate the genetic structure of the population as they are more variable than mitochondrial markers.In addition, results from the population structure and Mantel test suggest that the relationship between genetic diversity and altitude is not significant, and hence it is possible to hypothesize that the species has not had sufficient time for evolutionary differentiation to occur along an altitude gradient.Analysis of the genetic variability using microsatellite markers indicated that there is probably a single genetically homogeneous population.The existence of a single population can be expected since the roe deer population has recovered recently in the mountains of Madrid (Tellería and Virgós, 1997).Only if movements of recolonization arose from different populations or if different nuclei had remained isolated for a long time would a clear genetic structure be expected.This also implies that gene flow between groups of roe deer is adequate and that, in principle, there are no major barriers that prevent adequate dispersion of individuals in the Madrid mountains.While this is a good sign, in order to preserve the population, it should be kept in mind that although the roe deer is a species with a high ecological plasticity (Tellería and Virgós 1997;Acevedo et al., 2005), it moves in small areas and is therefore highly susceptible to fragmentation and the presence of barriers (Coulon et al., 2004).
In-depth analysis of the genetic variability reveals that the population deviates from the Hardy-Weinberg equilibrium, so it is violating some of the assumptions of this principle (infinite population panmictic reproduction, absence of migration, mutation or selection).This was shown in five markers with positive values of F IS .This parameter has values ranging from -1 to 1, with negative values indicating an excess of heterozygotes and positive values indicating a deficit.Thus, the results point to some inbreeding in the population.Studies in other populations of roe deer (Wang and Schreiber, 2001;Coulon et al., 2004) also found deviations from the Hardy-Weinberg equilibrium due to a deficit of heterozygotes.In a study on the population in south-western France (Coulon et al., 2004), the same set of markers used in the present study (except NVHRT48) was analysed in 1,148 individuals in an area of 40 x 55 km.Two of the four microsatellites showing deficit of heterozygotes in the French population also showed this deficit in the Sierra de Guadarrama population.The average number of alleles and the heterozygosity values were higher than those observed here and the F IS  for this population was lower, but also positive.The results suggest that, in comparison with other studies with similar methodologies in European populations (Coulon et al., 2004), the genetic variability of the Guadarrama population is somewhat impoverished, although respect to the Spanish populations, it shows a genetic variability similar to or even superior to other more southern populations (Lorenzini et al., 2003;Royo et al., 2007).
The study by Royo et al. (2007) shared two microsatellites used in this study, NVHRT48 and BM757.These authors found four and 20 alleles for each marker respectively, while in our case, BM757, being one of the most polymorphic, showed only eight alleles and NVHRT48 was monomorphic.Although nine different populations were analysed in the study by Royo et al. (2007) without detailing the number of alleles in each of them, it should be noted that if in the rest of the Iberian peninsula four alleles were present for NVHRT48, in the population of Sierra de Guadarrama the variation was reduced to only one.
The average coefficient of coancestry helps us to analyse these results in more detail, giving an idea of how individuals are alike.Analyses in other populations of roe deer found mean coancestry values ranging between 0.277 and 0.476 (Royo et al., 2007).If we compare these results with the average molecular coancestry in our study, the value in the Madrid population was higher, indicating lower polymorphism or increased homozygosity, related to some inbreeding, probably due to the bottleneck the population underwent in the recent past (Tellería and Virgós, 1997;Gortázar et al., 2000).However, it should be kept in mind that the set of microsatellites used in the two studies and the population sizes were different, so it is more appropriate to compare the coancestry coefficient corrected by PIC.In this case, our value falls within the range found by Royo et al. (2007) and the value that these authors found for the 'central' population, including individuals from the Sierra de Guadarrama and Toledo (average coancestry coefficient of 0.373).
In conclusion, the combination of both results provided part of the complex story of the roe deer in the Iberian peninsula and especially in the Sierra de Guadarrama.On one hand, there is a philopatric structure with females maintaining the variation in mitochondrial haplotypes, clustered in four different groups without a common origin.These haplotypes are related to those from different populations of the Iberian peninsula and Europe.This complex origin leads to the consideration that no C. capreolus subspecies can be identified in the Iberian peninsula, agreeing in this case with Royo et al. (2007).On the other hand, as males contribute to the homogenisation of the genome, the population cannot thus be subdivided into different breeding units, but has to be considered as a single panmictic population.Additionally, heterozygosity was below the expected level in most markers and allelic richness was generally lower than in other populations.This provides evidence of a relative genetic impoverishment of the population.
In view of the above, the different times of the palaeohistory of the species may explain these re-sults: glacial-interglacial stages of the Pleistocene (Gliozzi et al., 1997;Hufthammer and Aaris-Sørensen, 1998) and the reduction and recovery of populations throughout the 20 th century (Tellería and Virgós, 1997;Gortázar et al., 2000).The two markers used, mitochondrial DNA of maternal inheritance and nuclear DNA of both parents, have a different heritage and unequal mutation rate, reflected in each of these two main stages in the recent evolution of the roe deer.
Our results have a strong implication for the management and conservation of roe deer in the Sierra de Guadarrama.Fortunately, in recent years, the roe deer population in the Iberian peninsula has benefited from a decline in ranching, abandonment of agriculture in foothill areas, and a significant increase in forest areas (Acevedo et al. 2005), all positive events to avoid population isolation.Even so, to preserve the populations, management must keep in mind that the species is highly susceptible to fragmentation and to barriers (Coulon et al., 2004) such as highways and fencing.Finally, taken together, our results indicate that restocking within the Sierra de Guadarrama should be prevented if we want to preserve the diversity among populations and the patterns of natural gene flow.
Genetic assessment of structure and connectivity of roe deer populations that recently recolonized a fragmented landscape could be a promising approach for future studies.

Fig. 2 .
Fig. 2. Network based on 336 base pairs of the mitochondrial D-loop of roe deer analysed (Sierra de Guadarrama) and the available sequences of several European populations.Circles represent the different haplotypes and are proportional to their frequency among the analysed data, as indicated.

Fig. 4 .
Fig. 4. Maximum parsimony phylogenetic tree with individuals analysed in this study together with the haplotypes found by Royo et al. (2007) for populations in the Iberian peninsula.The numbers in the circles indicate the four main nodes, and those located on the branches indicate bootstrap values of above 50 %.A, B, C and D are the haplotypes found in this study.

Fig. 6 .
Fig. 6.Factorial correspondence analysis of the roe deer individuals in the Sierra de Guadarrama (Madrid, Spain), determined by the alleles detected in 12 microsatellite loci.The colours of the individuals indicate the different localities.