Panel of informative SNP markers for two genetic lines of European bison : Lowland and Lowland – Caucasian

espanolDebido al cuello de botella demografico, la poblacion actual de bisonte europeo muestra un elevado grado de endogamia y una perdida significativa de variabilidad genetica. Es necesario que en los estudios realizados con estas especies especificas se apliquen metodos que permitan obtener tanta informacion genomica como sea posible en un tiempo breve. El objetivo de este estudio era definir un conjunto de marcadores de tipo PSN (polimorfismo de un solo nucleotido) que pudiera servir para analizar la diversidad genetica de las dos lineas de bisontes europeos: Lowland (LB) y Lowland–Caucasiana (LC). En el estudio se analizaron 57 individuos de la linea LB y 72 de la linea LC. Para caracterizar bien el rendimiento de los PSN en el bisonte europeo, se usaron dos micromatrices multigenicas (genochip) con densidades diferentes de marcadores: BovineSNP50 v2 BeadChip y BovineHD BeadChip. Como consecuencia de los criterios adoptados, se seleccionaron 1.421 y 22.122 marcadores, respectivamente. Sobre la base del analisis estadistico (frecuencias alelicas, prueba exacta de Fisher y prueba Z), en ultima instancia se selecciono un conjunto de 1.536 marcadores informativos de PSN para los estudios adicionales, 26 de los cuales tienen alelos privados para LB y 611, para la linea LC. La informacion obtenida en este estudio podria enriquecer aun mas y apoyar a los programas de reproduccion en un contexto de parentesco entre especimenes particulares y manadas que viven en cautividad en centros reproductivos. EnglishAs the result of a population bottleneck, the present living population of European bison shows a high level of inbreeding, and a significant loss of genetic variability. In studies on such specific species there is a need to apply methods that obtain as much information about the genome as possible in a short time. The aim of the study was to define a panel of SNP (Single Nucleotide Polymorphism) markers that could serve in genetic diversity analysis of European bison from two lines: Lowland (LB) and Lowland–Caucasian (LC). The study used 57 individuals from the LB line and 72 from the LC line. To identify well–performing SNPs in European bison, we used two microarrays with different markers densities: BovineSNP50 v2 BeadChip and BovineHD BeadChip. As a result of the adopted criteria, 1,421 and 22,122 markers, respectively, were selected. On the basis of statistical analysis (allele frequencies, Fisher’s exact test, and the Z–test), a panel of 1,536 informative SNP markers was ultimately selected for further study; 26 of these with private alleles for the LB line and 611 with private alleles for the LC line. The data obtained in this study could further enrich and support breeding programs in the context of relatedness between particular specimens and herds from captive breeding centres.


Introduction
By the 1920s, Bison bonasus were extinct in the wild.The only remaining European bison were kept in managed enclosures and amounted to just 54 individuals (29 males and 25 females).The current population of the species is derived from 12 founders: 11 individuals of the subspecies Bison bonasus bonasus and the last representative of the subspecies Bison bonasus caucasicus.After successful reintroductions, there are now two genetic lines: Lowland (LB) and Lowland-Caucasian (LC).The Lowland line, also called the Bialowieza line, is derived from seven B. b. bonasus individuals and is a closed line, meaning that only offspring of Lowland European bison may be classified as belonging to it.The Lowland-Caucasian line includes European bison whose pedigree includes the last and only male representative of B. b caucasicus (Pucek, 1991;Olech, 1999) As the result of past bottlenecks, the present population (5,249 specimens registered in 2013 in the European Bison Pedigree Book) shows a high level of inbreeding, and a significant loss of genetic variability (Olech, 2010).The European Bison Pedigree Book was created in 1924 and is published to this day.Pedigree data now provide a basis for carrying out breeding, and make it possible, among other things, to estimate the coefficient of inbreeding and kinship of the animals in the European bison breeding centres.However, in such a specific population, the pedigree, though being extremely valuable, cannot constitute the only source of information concerning the genetic value of the animals.In studies on European bison there is a need to apply methods that ensure that as much information about the genome as possible is obtained in a short time.
Over the years, a considerable number of studies have been conducted on European bison.The analyses included, among others, allozymes (Hartl & Pucek, 1994), blood groups (Sipko et al., 1995), the genes from the group of the MHC (major histocompatibility complex) (Udina & Shaikhaiev, 1998;Łopieńska et al., 2003, 2011) and microsatellites (Gralak et al., 2004;Roth et al., 2006;Nowak & Olech, 2008a).One of the more recent techniques used to estimategenetic variation involves microarrays, an approach used for several years to determine SNP (single nucleotide polymorphism) genotypes in various sites of the genome.This method allows the analysis of hundreds of thousands of markers at the same time, significantly reducing the time required to achieve a huge amount of data (Illumina®).Tokarska et al. (2009) compared effectiveness of 17 microsatellite and 960 SNP markers for paternity and identity analysis in the Lowland line of the European bison.Oleński et al. (2015) used for the first time the BovineHD microarray to find SNP markers associated with posthitis in the same genetic line.The first SNP analysis using the BovineSNP50 microarray which included both genetic lines (five individuals from LB and five individuals from LC) was performed by Kamiński et al. in 2012.The aim of the present study was to identify a panel of SNP markers (among those assayed on the Illumina BovineSNP50 and BovineHD arrays) that could be used to analyse genetic structure, identify individuals and control the origin and relatedness of the European bison, as well as identify alleles specific to the two genetic lines: Lowland (LB) and Lowland-Caucasian (LC).This is the first study using BovineHD microarray to compare genetic structure of two genetic lines of European bison.
The data obtained in this study will further supplement and confirm analysis carried out on the basis of pedigree data in the context of relatedness between particular specimens and herds from captive breeding centres, making it possible, among other things, to estimate inbreeding based on multiple sources of information.Proper management of the breeding program is important for protection of the species against increasing inbreeding and its negative impact.Currently, this program is being conducted in both the European bison breeding centres and in the wild.Most of the animals from captive breeding have a known pedigree, which contributes to the control of their origins and aids population management.However, with such a low genetic variability as occurs in European bison, pedigree information may be insufficient.In addition, in the case of animals from free roaming herds, the information on their relatedness is incomplete or impossible to determine.For this reason, the assignment of SNP markers characteristic for particular genetic lines, as well as the populations within them, is clearly a great advantage, not only in future research but also to enrich and support breeding programs.

The animals
The study used 144 samples of European bison DNA (Bison bonasus), including eight samples that were analysed on two types of microarrays (BovineSNP50 v2 962 SNPs) to control the repeatability of the results and to increase the initial pool of markers.Additional control samples constituted the DNA of domestic cattle Bos taurus.A positive genotyping result was obtained from 129 individuals (57 individuals from LB and 72 from LC).The Lowland line included 32 males and 25 females, while the Lowland-Caucasian line included 36 animals of each sex.The biological material was collected from European bison from Polish and other breeding centres, as well as from free roaming herds.To select the most representative samples of European bison species, we were guided by inter alia, the genetic line, parental lines and the participation of ancestors.In addition, to exclude P-C (parent-child) errors and P-P-C (parent-parent-child) errors, the research included related animals -one full family: mother (sample ID K233), father (sample ID K238) and offspring (sample ID K294), as well as eight pairs of father-offspring and 14 pairs of motheroffspring.These animals were selected on the basis of the pedigree book (see table 1s in supplementary material).

DNA isolation
The test material consisted of whole blood samples and soft tissues collected by the European bison Gene Bank in the Animal Genetics and Breeding Department (Warsaw University of Life Sciences) according to decision (WPN-I.6401.90.2014.EB.1) of Regional Directorate of Environmental Protection in Warsaw.DNA from blood was isolated by the magnetic method using a MagMAX TM Express (Applied Biosystems) and a MagMAX TM Total Nucleic Acid Isolation Kit (Ambion), as well as with use of QIAamp DNA Mini Kit (QIAGEN).DNA from soft tissues was isolated using a QIAamp DNA Mini Kit (QIAGEN).The quality and concentration of the isolates was checked using NanoDrop2000 (Thermo Scientific).The DNA was then normalized to a 50ng/μl concentration.Samples with a low concentration of DNA were subjected to concentration in a Concentrator 5301 (Eppendorf AG).Genotyping BovineSNP50 v2 BeadChip and BovineHD BeadChip microarrays (Illumina, Inc.San Diego, CA) were used for genotyping 54K and 777K SNPs.Analyses were performed according to the Infinium Ultra and Infinium Super protocols (Illumina), and the microarrays were scanned using HiScanSQ (Illumina).The resulting intensity reading was analysed in the GenomeStudio (Illumina) software.Using the BovineSNP50 v2 BeadChip microarray we tested 91 samples.We then genotyped 46 samples using the BovineHD BeadChip microarray, including eight samples that we repeated to verify reproducibility of the results (see table 1s in supplementary material).

Criteria of markers selection
In the selection of markers for further analysis, we took into account: call rate ≥ 90%, only those markers that were genotyped in at least 90% of individuals were included; polymorphic markers whose frequency of minor allele amounted to ≥ 0.01 -adopting such a low MAF value as a criterion ensues from the specifics of the species, whose genetic variation is extremely low; no deviations from HW (Hardy-Weinberg) equilibrium at a significance level of 0.01, no P-C errors or P-P-C errors.Markers meeting the above criteria were individually tested in the GenomeStudio (Illumina) software and re-verified for proper cluster assignment, by the analysis of the GenCall Score value in SNP Graphs (figs.1, 2).GenCall Score is a quality metric that indicates the reliability of each genotype call (GenomeStudio TM , Genotyping Module v1.0 User Guide, Illumina).Automatic verification was insufficient because the measure of reliability was developed for cattle.
We then made a listing and comparison of correctly clustered markers, selected after verification, obtained from both microarrays: BovineSNP50 v2 BeadChip and BovineHD BeadChip.For the analysis of allele frequencies, we used the number of private alleles and PE (probability of exclusion) GenAlEx 6.5 (Peakall & Smouse, 2012).Using the R environment version 2.15.3, we carried out the Fisher exact test and the Z-test on two unrelated proportions for large samples to determine the statistical significance of the differences in allele frequency between the Lowland and Lowland-Caucasian lines.The final choice was made from SNP markers for which allele frequencies were significantly different, according to both tests, between the genetic lines of the European bison.The population structure of all tested European bison was evaluated on SNPs common to both microarrays using Bayesian clustering analysis in the software STRUCTURE 2.3.4 (Pritchard et al., 2000;Falush et al., 2003).Analysis was performed under the Correlated Allele Frequencies Model and Admixture Model with 30,000 burn-in steps and 100,000 Marcov-chain Monte Carlo (MCMC) replicates for K = 1-6.Tests were conducted five times for each value of K. To determine the most likely value of K, we used the ΔK statistic (Evanno et al., 2005) Structure Harvester software (Earl & vonHoldt, 2012).

Results
In rounds of arrays preparation, the analysed Bos taurus samples performed well and showed call rates close to 99%.This assured us that the assay performance was essentially optimal and no flaw in the laboratory procedure would affect the results for bison.
From the 54,609 probes included on the BovineS-NP50 v2 BeadChip, in correctly genotyped individuals there were 51,609 markers with a call rate equal to or greater than 90%, of which 5,997 were polymorphic in European bison (MAF ≥ 0.01).Only 1,421 SNP markers met all the aforementioned criteria.The BovineHD BeadChip has 777,962 types of probes on its surface.A total of 735,667 SNPs showed a call rate equal to or greater than 90%, and 22,122 of these markers fulfilled all the conditions set.
After manual verification of SNP graphs in the GenomeStudio (Illumina) software for all the markers obtained after automatic analysis from both microarrays, we selected 806 SNPs and 15,062 SNPs respectively, of which 505 markers were present on both platforms.For these 505 markers, the genotypes of all eight samples analysed on both microarrays were identical.We found highly significant differences in allele frequency between two European bison lines in the case of 1,904 SNPs from both arrays.Finally 1,536 markers were selected for the design of a microarray specific to bison and further analyses: 47 from BovineSNP50 v2, 1,463 from BovineHD, and 26 common to both microarrays; 1,505 selected markers were distributed on autosomes and 31 SNPs on chromosome X.The number of markers on each chromosome ranged from 8 to 136 (table 1).Assuming a similar distribution of the studied SNPs in Bison bonasus and Bos taurus genomes, based on UMD3.1 cattle genome assembly, we found that the mean genomic distance for the selected SNPs was 1,443 kbp and the median distance was estimated for 211 kbp.We found that the highest median distance between SNPs was for chromosome 18 (1,092 kbp) and the lowest was for chromosome X (31 kbp).
The number of private alleles in the Lowland-Caucasian line was considerably higher than in Lowland line (611 and 26 respectively).Selected 1,536 SNPs were plotted against cattle chromosomes in figure 1s in supplementary material.We calculated the probabilities of exclusion coefficients (PE, both parents known; PE1, only one parent known; PE2, exclude both parents) were calculated for combined loci from each microarray, for pooled genetic lines, and separately.For 47 SNPs from Bovine SNP50, all coefficients obtained for the LB line were significantly lower than in the LC line and in the pooled samples.The statistical difference between the value of this rate for LC and the whole population was found for PE1 only (table 2).In contrast, all analyses of PE for 1,489 markers from Bovine HD gave a result of 1,000.
The minor allele frequency (MAF) was calculated separately for both lines to pre-estimate the genetic variability among the European bison analysed.number of polymorphic loci in both genetic lines and to firmly demonstrate the difference between them, we also included loci polymorphic in one line but monomorphic in the other.In the Lowland line, we found almost 50% of monomorphic SNPs, indicating a high level of inbreeding, which is unavoidable in a closed group.In contrast to the Lowland population, in the Lowland-Caucasian line, more than 80% of the markers had an MAF > 0.2; of these, about 50% were characterized by MAF as greater than 0.3, indicating a far greater variation between individuals in the Lowland-Caucasian line than in the Lowland line.Analysis of the genetic structure of the population carried out in STRUCTURE 2.3.4 on all tested samples also showed clear differences between genetic lines.The highest value DK pointed to the division of the population into two clusters (K = 2), dividing individuals from both lines to clearly separate groups.The results of this analysis are given in figure 4.

Discussion
The BovineSNP50BeadChip, which was designed for domestic cattle, has been successfully used to analyse the genetic structure of several wild species.In the case of goats, two subspecies: the Tatra chamois (Rupicapra rupicapra tatrica) and the Alpine chamois (Rupicapra rupicapra rupicapra), were genotyped using the above mentioned microarray (Demontis et al., 2011).In this study, 505 of among 54,000 markers were found to be polymorphic, although only 151 were correctly clustered after manual verification.Such a low number of correctly clustered markers could indicate low genetic variability, but could also be the result of species differences.In turn, a study by Haynes & Latch (2012) for the deer species Odocoileus hemionus columbianus, O. h.hemionus and O. Virginianus, obtained 21,131 genotyped markers in at least 90% of the animals tested, of which 1,068 were polymorphic.Although Odocoileus spp.are genetically more distant from domestic cattle than bison, use of the same microarray allowed to obtain a relatively high number of polymorphic markers.The evolution of the genus Bison shows that the European bison as a wild species is genetically more similar to the domestic cattle than the American bison.The reason for this is the hybridization of the aurochs, which -like the introgression of yak in Bison bison-influenced the distance of these two subspecies of Bison (Nowak & Olech, 2008b).For comparison, in studies by Tokarska et al. (2009) the 50 Lowland European bison (LB) tested gave a reading of 52,978 SNPs, of which 960 markers were polymorphic.In contrast, Kamiński et al. (2012), despite testing only 10 European bison (five LB and five LC specimens), obtained 1,337 polymorphic SNPs.The number of markers was higher due to inclusion of both genetic lines in the studies and LC is intrinsically more diverse than LB, which is also noticeable in the present study.The participation of ancestors is different in each of the genetic lines, therefore the testing of only one of them is insufficient and cannot be used to estimate the genetic structure of the entire species.Tokarska et al. (2009) presented results of paternity analysis carried out on two marker sets: the most heterozygous SNPs, and a randomly selected set of markers.They concluded that in the case of the first set, 50-60 SNPs would be needed to assign  paternity with 95% confidence and 80-90 loci for the random set.Our study for the probability of exclusion (PE) partly confirms these results.Analyses were carried out for pooled genetic lines (LB + LC), and each line separately.We checked how many loci would be needed for a 99% of confidence.For the PE (both parents known) in the Lowland line -the same genetic line as in Tokarska et al. (2009)-57 markers would be sufficient, but in the case of PE1 (only one parent known) 160 would be necessary.Other results were obtained for the Lowland-Caucasian line; only 27 SNPs in the case of PE, and 53 markers for PE1.For combined lines we estimated that 50 markers would be needed for PE, and 59 SNPs in the case of PE1.In 2015, Oleński et al. used the BovineHD BeadChip for an association study in the Lowland line.Besides reporting SNP markers significantly associated with postitis disease, the authors concluded that information from the subsets of SNPs could be a useful tool for the European bison breeding program, from a conservation and epizootic point of view.
When anticipate that our design of a specific SNP panel for European bison with characteristic markers for particular genetic lines (Lowland and Lowland-Caucasian) and parental lines will provide a key tool for the future analyses of the genetic structure of the species, specimen identification, and control of the origin and relatedness of European bison, both in captive breeding centres and in free roaming populations.Such knowledge is crucial for optimal management of breeding programs for these highly valued animals, and will contribute to their direct protection in the future.

Fig. 1 .
Fig. 1.SNP Graph showing the division of individuals into clusters corresponding to the genotypes at a given locus.The X-axis represents normalized theta (the angle deviation from a pure A signal, where 0 represents a pure A signal and 1 represents a pure B signal), and the Y-axis represents the distance of the point to the origin.Samples are divided according to their genotype.Samples lying within the left region are called AA; samples within the middle region are called AB and samples lying within the right region are called BB.

Fig. 2 .
Fig. 2. SNP Graphs showing an abnormal division into clusters.The X-axis represents normalized theta (the angle deviation from pure A signal, where 0 represents pure A signal and 1 represents pure B signal), and the Y-axis represents the distance of the point to the origin.Samples are coloured according to their genotype.Samples marked in black are classified as 'no calls'.Any ambiguous division into clusters excluded a marker from further analysis.
Figure  3shows the distribution of minor allele frequency within Lowland and Lowland-Caucasian lines.In order to minimize miscalculation arising from the different

Fig. 4 .
Fig. 4. Bayesian clustering analysis performed by STRUCTURE.Each individual is represented by one bar divided into segments, illustrating the proportion of estimated membership in each cluster.The vertical black lines separate a group of European bison from the Lowland line.

Fig. 1s .
Fig. 1s.Localization of SNPs with significant differences in allele frequency between LB and LC populations and SNPs with private alleles in each population.Black line indicates chromosome length: Freq.SNPs differing in frequency between LB and LC population; LB.SNPs with private allele in LB population; LC.SNPs with private allele in LC population; Chr.Chromosome.

Table 1 .
The number of markers per chromosome, genomic distances (according to the UMD3.1 cattle genome build): Chr.Chromosome; N. Number of SNPs per chromosome; MD.Mean distance; SD.Standard deviation.