Optimization of a crossing system using mate selection

A simple model based on one single identified quantitative trait locus (QTL) in a two-way crossing system was used to demonstrate the power of mate selection algorithms as a natural means of opportunistic line development for optimization of crossbreeding programs over multiple generations. Mate selection automatically invokes divergent selection in two parental lines for an over-dominant QTL and increased frequency of the favorable allele toward fixation in the sire-line for a fully-dominant QTL. It was concluded that an optimal strategy of line development could be found by mate selection algorithms for a given set of parameters such as genetic model of QTL, breeding objective and initial frequency of the favorable allele in the base populations, etc. The same framework could be used in other scenarios, such as programs involving crossing to exploit breed effects and heterosis. In contrast to classical index selection, this approach to mate selection can optimize long-term responses.


INTRODUCTION
Crossbreeding is the mating of sires of one breed or breed combination to dams of another breed or breed combination [4]. Crossbreeding is carried out for several reasons. It is used to develop new breeds or types from foundation purebreds and to introgress genes and characteristics from one breed to another [6,25]. It is widely used in commercial animal production as a means of exploiting heterosis [5,21,23]. Crossbreeding is also valuable for the averaging of breed effects, for example when an animal of intermediate body size is better suited to the length of the grazing season or to market demands, or when two traits such as lactation length and yield per day act multiplicatively to give profit, and intermediate values are superior to opposite extremes [13,24].
Optimization of selection within crossbreeding systems has been extensively studied in animal breeding. Wei proposed a selection index that combines information about crossbreds and purebreds (CCPS) to maximize the genetic response from a crossing system [27]. A number of studies have shown that selection response in crossbred performance can be increased by using CCPS [1][2][3]8,26,[28][29][30]. When using CCPS within index selection in the case of non-additive traits, two problems should be noted. Firstly, CCPS selects animals on an individual basis. Non-additive effects are not confined to individual animals but are expressed in the progeny and later descendants of mating pairs. Secondly, an enterprise of animal production will be concerned about benefit not only in the current and next generation but also for a period in the future. CCPS optimizes crossing systems only for one generation ahead, not for multiple generations. CCPS leads to the fixation of favorable alleles (unless there is overdominance), which may cause loss of heterosis effects. Therefore, an approach that can select animals based on optimal mating pairs and optimizes a crossing system over a number of generations needs to be explored to optimally develop lines in a crossing system. Mate selection is a breeding strategy combining selection and mating simultaneously according to a specified objective function. Hayes and Miller found that mate selection improves total progeny performance over index selection when dominant variation is significant [9].
Optimization of a crossbreeding system involves the development of optimal lines and finding optimal mating pairs of individuals. The objectives of this paper were to illustrate the effectiveness of mate selection in the optimization of a two-way crossing system over multiple generations and to demonstrate that mate selection can in fact lead to optimal development of parental lines.

Two-way crossing system
A two-way crossbreeding system with discrete generations was simulated deterministically. A sire line and a dam line were developed either from one or from two different foundation populations and animals from the two lines were mated to form crossbreds. There were three destinations for newborn animals in each line: purebreeding, crossing and culling. Crossing was only between males from the sire-line and females from the dam-line. Some animals in each line were selected as parents of their own purebred lines in different selection strategies used (as defined later). All of remaining females in the dam-line and all of males in the sire-line were available for producing crossbreds. The males and females that were not used for purebreeding or crossing were sold to the market together with all crossbreds produced.
The population size of the sire line depended on the number of sires, the number of dams mated per sire (dps) and the reproductive rate of the dam (RRD). After selecting purebreeding replacements, all males left in the sire line and all females left in the dam line were used for crossing. The number of females provided in the dam line depended on the number of males provided from the sire line and dps. The number of sires and dams needed in the dam line depended on the number of females needed for crossbreeding, dps and RRD. RRD and dps were the same in the sire line, the dam line and the crossing.

Genetic model
A simple genetic model and scenarios were adopted in order to make clear demonstration of some key principles. The trait considered was assumed to be affected by one bi-allelic quantitative trait locus (QTL) and no polygenic effects were taken into account in the model. The QTL had three genotypes: qq, Qq and QQ, with genotypic values g 1 = −a, g 2 = d × a and g 3 = a where a was its additive effect and d its degree of dominance [7]. QTL genotypes were identified without error in all individuals prior to selection.

Benefit evaluation
Benefit from this two-way crossing system was measured as genetic merit of animals sold to the market. Let NAS i j,t be the number of animals sold from genotype i (i = 1 to 3) in cohort j ( j = 1 to 5, denoting males of the sire line, females of the sire line, males of the dam line, females of the dam line and all of crossbred animals, respectively) in generation t. Let G jt ( j = 1 to 5) be the average performance of each cohort in generation t. Therefore, G jt is calculated in equation (1): (1)

Selection strategy
Benefits under two strategies, mate selection and index selection, were compared in this study. Mate selection selects and mates animals according to merit of progeny and index selection selects animals according to their breeding value under random mating. In the current application, mate selection targeted merit not just in one generation, but merit across multiple generations.

Mate selection
A mate selection algorithm was used for finding the set of animals to be selected and mated, which led to maximum benefit [14]. However, for the current application, the algorithm was modified to consider simultaneous mate selection across generations. A breeding period of n generations was considered in a single round of optimization. The objective function for mate selection was cumulative discounted performance of total animals sold over n generations (CDP), which was calculated with equation (2): where dr is the discount rate. Selection and mating were optimized at cohort level rather than at the individual animal level. Selection was applied to animals from the four purebred cohorts (line by sex) simultaneously for purebred replacement and generating crossbreds. For a particular generation t, a vector S was optimized, with s ji representing the numbers of animals selected in genotype i (i = 1 to 3) for the jth cohort ( j = 1 to 6), which was denoted as equation (3): With three genotypes formed by a single locus, there were nine possible mating combinations. For mating combinations in a given generation, a matrix M needed to be found with m i j indicating how many matings were made between males with the ith genotype and females with the jth genotype. Taking mating among parents of the sire line as an example, S 1i was the number of sires selected with the ith genotype and S 2 j was the number of dams selected with the jth genotype where i = 1, 2 or 3 and j = 1, 2 or 3. The sires (S 1i ), the dams (S 2 j ) and possible matings (m i j ) are shown in Table I. Table I. Sires selected (S 1i ) and dams selected (S 2 j ) and possible matings (m i j ) in the mating process.
There were three different mating matrices in the two-way crossing system: matings among parents of the sire line (M 1 ), matings among parents of the dam line (M 2 ) and matings among parents of crossbreds (M 3 ). The three matrices were denoted as matrix M in equation (4): For a given S 1i and dams S 2 j , a matrix M k (k = 1 to 3) is chosen, subject to the following three restrictions: (a) The sum of matings in the nine mating combinations needs to be equal to the total number of dams selected and equal to the product of dps and the total number of sire selected (as in Eq. (5)); (b) For a particular genotype of sire, the total number of matings in the genotype was equal to the product of dps and the number of sires in the genotype.
(c) For a particular genotype of dam, the total number of matings in the genotype was equal to the number of dams in the genotype. For example, j = 1, In each matrix M i (i= 1 to 3), there are one row and one column of elements being dependent on the other rows and columns. Therefore, the number of variables to be optimized in each M i is equal to four elements. The number of elements within one generation to be optimized was 18 (6 × 3) in matrix S and 12 (4 × 3) in matrix M. Figure 1 is an example of mate selection within one generation with a single QTL, where the numbers in bold are to be optimized. The optimal numbers of males and females selected from each where IAF is initial allele frequency and N c is the number of candidates. genotype (selection variables) and the optimal numbers of matings between genotypes (mating variables) in the period of selection considered were determined by using a Differential Evolution (DE) algorithm [20]. The number of loci in the DE chromosome is equal to the total number of variables for selection and mating over t generations (18 × t + 12 × t = 30 × t). Figure 2 gives an example of DE variable representation in the mate selection algorithm. After the DE was run with a population size of 30 for 5000 generations, it was continuously run until the difference between the current best solution and the average of the best solution from the previous 500 generations was 0.01%. Twenty replicates of DE were conducted and the best result was retained.

Index selection
Index selection was conducted for generation n based on breeding values (BV) of the genotype groups, which depended on allele frequency in the prospective mates of the opposite sex. The BV is the sum of the average effects of genes [7], that changed over time as allele frequency changed over time under selection. Suppose BV pb,t and BV cb,t are the BV of a genotype for pure Table II. Breeding values of genotypes of QTL for pure breeding and crossbreeding in the sire line and the dam line at generation t.

Genotype
Sire line Dam line Pure breeding Crossbreeding Pure breeding Crossbreeding breeding at generation t and the BV of a genotype for crossbreeding at generation t, respectively. BV pb,t depends on the allele frequency of candidates from the purebred lines at generation t and BV cb,t depends on the allele frequency of candidates in the line to be crossed to at generation t. The average effects of gene-substitution in the sire line and the dam line are where t is the generation in which selection occurred (1 ≤ t ≤ n). Index selection was used to select the best animals in each cohort for pure breeding and subsequently the next best for crossbreeding. Selected animals were mated randomly.

Simulation parameters
The period of selection conducted was five generations. Reproductive rate of dams (RRD) had a value of 3, 5 or 10 and the number of dams mated to each sire (dps) was 30. The number of sires in the sire line was 10, which determined population sizes of the sire line and the dam line together with dps and RRD. Weight on QTL breeding value for crossbreeding in equation (6) was 0.49, 1.47 and 3.89 for RRD equal to 3, 5 and 10, respectively. The additive effect of QTL (a) was equal to 1.0. The dominance degree of QTL (d) varied 1.0 or 2.0 for full dominance or over-dominance, respectively. Initial allele frequency (IAF) of the favorite allele of the QTL was 0.05 or 0.5. The discounting rate was 10% per generation.

Allele fixation pattern
Changes of allele frequencies of candidates in the sire-line and dam-line over time in mate selection and index selection are shown in Figure 3 for sireand dam-line having equal IAF (where patterns of allele frequencies in the sire-and dam-line for index selection were the same) and in Figure 4 for the two lines having different IAF. For a fully dominant QTL, allele frequency increased with generation. Index selection led to a higher allele frequency fixation rate than mate selection. Allele frequency fixation rate in the sire-line was higher than that in the dam line and allele frequency in the dam line was not fixed (being 0.8-0.9 at generation 4-5) in mate selection. For both index selection and mate selection, the rate of fixation in the sire line was higher (fixed at generation 4) than that in the dam-line, if IAF of the QTL in the sire-line was equal to or higher than that in the dam-line. The rate of fixation in the sire line was still higher than that in the dam-line under mate selection but lower than that in the dam-line under index selection, if IAF in the sire-line was lower than that in the dam-line.
With an over-dominant QTL having equal or similar IAF in the sire-and dam-line the allele frequency pattern in index selection was the same in the two lines. The pattern fluctuated from generation to generation (Fig. 3). The fluctuation can be explained as follows. Breeding value of genotype qq was higher than QQ for an over-dominant QTL when allele frequency was over 0.75. At this value, the breeding value of all genotypes is equal. Index selection would favor QQ animals for allele frequencies below 0.75 and favor Qq and qq animals for allele frequencies over 0.75. This is analogous to rotational crossing and the same pattern was found in a selection optimization study by Kinghorn [10]. With mate selection, allele frequency fixation rate in the sire-line increased until fixation whereas that in the dam-line increased to 0.6 at generation 3 and then dropped to about 0.5. When IAF in the sire-line was lower than that at the dam-line, allele frequency diverged under both index selection and mate selection. A higher allele frequency fixation rate was observed in index selection than in mate selection (Fig. 4).
The optimal allele fixation pattern in mate selection caused a higher allele frequency for complete dominant QTL or brought the allele frequency closer to 0.5 for over-dominant QTL among the culled animals in mate selection compared to index selection (Fig. 5).

Optimal mating pair and genotypic allocation
Mate selection optimized breeding in two respects, because selection as well as mating were optimized with respect to maximizing the objective function. Figure 6 shows an example of the difference in genotypes of animals selected for purebreeding, crossing and culling between mate selection and index selection. For an over-dominant QTL, mate selection used QQ animals in purebreeding of the sire-line and mixed genotypes of animals in purebreeding of the dam-line while it used QQ males and Qq females for crossing. Allocation of genotypes for crossing was not optimized in index selection.
Mating pairs in mate selection were optimized. Table III shows an example of difference in mating pattern between mate selection and index selection for crossing males from the sire-line and females from the dam-line. Mating pairs in mate selection for crossing were Qq × QQ and qq × QQ, which could give the maximum genetic merit of the crossbreds. Mating patterns changed over time in index selection due to change of breeding values of genotypic groups. The objective function in this paper was to maximize profit from all individuals sold, which included the individuals sold in the current generation and all individuals in the previous generation. This requires that the percentage of individuals with a qq genotype is as small as possible and that female individuals used for crossing need to be Qq rather than qq.

Superiority of mate selection over index selection
CDP superiority of mate selection over index selection with a single fullyand over-dominant QTL for varying IAF in the parental lines and varying RRD  Table IV. CDP superiority of mate selection over index selection ranged from 0.7 to 12.8% for a fully-dominant QTL and from 10.6% to 30.7% for an over-dominant QTL. With fully-dominant QTL, the superiority decreased with the increase of RRD. The extent of reduction of the superiority was higher for different IAF in the parental lines than that for equal IAF in the parental lines. The same phenomenon was found for over-dominant QTL. In this situation, allele frequencies in the sire-line and the dam-line diverged in index selection as well as mate selection (Fig. 4). The reason is that most of the animals with high genotypic value were used for purebreeding in index selection whereas mate selection culled more animals with high genotypic value than index selection. When an over-dominant QTL had different IAF in the two parental lines, the superiority of mate selection over index selection increased with the increase of RRD. With higher RRD, fluctuation of allele frequencies in the two parental lines in index selection caused higher reduction of benefit.
When the two parental lines had the same IAF, superiority of mate selection over index selection decreased slightly with the increase of IAF. This means that index selection is more efficient to exploit QTL for higher IAF than lower IAF. When the two parental lines had different IAF, higher superiority  Table III. Example showing mating pairs (%) between sire genotype and dam genotype for crossing in mate selection and index selection with an over-dominant QTL for five generations (IAF = 0.5, RRD = 3).

Generation
Mating was observed with the situation that the sire-line had lower IAF than the damline, especially when RRD = 3. In this case, allele frequency was fixed very quickly in index selection whereas the optimal allele fixation pattern in mate selection was reduced, which led to more culling of animals with high genotypic value from the dam-line than index selection (Fig. 4).

DISCUSSION
The method used in this paper appears to be the first to simultaneously implement mate selection over multiple generations. It achieves this by handling cohorts of individuals rather than individuals per se [18]. Kinghorn and Shepherd carried out individual mate selection with an objective function that targets performance of grandprogeny [16]. Shepherd and Kinghorn carried out individual mate selection to exploit heterosis in both progeny and grandprogeny, using a combination of investment matings and realization matings [12]. The current method gives a basis to optimally exploit both such mating types over many generations, but only at a cohort level. Extension to an individual animal level would likely introduce difficulties with the optimization problem and a very high computational requirement. Further development is required to apply our proposed method to actual selection of individual animals.
The populations simulated in this paper had discrete generations, where animals were selected and used as parents just once. In the practice of animal breeding, populations usually have overlapping generations and parents need to be used for several years or mating cycles. A seed stock pool consists of parents in several age groups. Animals are selected from the offspring each year to replace uncompetitive parents. The mate selection algorithm proposed in this paper can be extended to overlapping generations, but in practice this will be difficult because of the large numbers of cohorts that will prevail under overlapping generations. Perhaps a combination of a cohort approach and an individual animal approach [22] could help to solve this.
Another possible extension to the current method would lead to its application to the complete foundation population without user-nomination of breeding structure and line membership. Optimal cohort mate selection across generations would then define breeding structure. This should result in (equal or) better performance due to optimization of overall breeding structures. This sort of functionality is currently available for single-generation individual mate selection [12,17] but without multi-generation application it has little power to set up optimal line-crossing structures, or, for example, to favorably target interacting multi-QTL genotypic configurations.
This paper demonstrates that mate selection naturally optimizes genotypic selection for an identified QTL in the development of crossing lines for a two-way crossing system. Mate selection exploits dominance effects more effectively than index selection in crossbreeding since it is based on expected genetic value of offspring, and later descendants over a number of generations in the current implementation. The index selection is based on additive genetic value, leading to fixation of the favorable allele and loss of heterozygosity. The effectiveness of mate selection is achieved by making selection and mating decisions simultaneously, which ensures that favorable mating pairs are produced, and by considering multiple generations. If genotypes of QTL are not perfectly known, QTL genotype probabilities can be used in the mate selection algorithm [11,15,19].
Mate selection can be used to exploit across-and within-breed dominance variation, with up to 12.5% of improvements in total progeny performance over truncation selection followed by random matings [9]. Our results show that mate selection led to 0.7-12.8% extra benefit over index selection for a fully dominant QTL with different initial allele frequencies in the sire-line and the dam-line. The amount of extra benefit in mate selection with an over-dominant QTL was up to 30.8%. The extra benefit in mate selection over index selection results from the optimal selection and allocation of animals among purebreeding, crossing and culling destinies and from the optimal mating pairs. It is concluded that mate selection can automatically develop optimal genetic change in lines for crossing. For a fully-dominant QTL, the optimal line development pattern is that allele frequency fixation rate in the sire-line is higher than that in the dam-line because of increased ability to change allele frequency in the sire line, due to higher selection intensities. For overdominant QTL, optimal allele frequency changes in the parental lines show divergence. The optimal line development pattern is affected by a variety of factors, such as breeding objective, size of QTL effect, initial allele frequency, the degree of dominance and the number of QTL identified. The current results demonstrate that mate selection can be used to find the optimal pattern for these different situations.
The objective function used in optimization has an effect on the optimal pattern of line development. The objective function used in this paper was cumulative discounted performance of the animals culled from the two-way crossing system. Cumulative performance combines short-term benefit and long-term benefit with a discount rate giving more emphasis on the short-term benefit. The optimal strategy of line development depended on initial allele frequencies. If we would only be concerned about performance of the crossing system in the long-term, the optimal configuration of crossing lines should not be influenced by the starting situation as long as the optimal state can be reached in the time available. If the optimization is aimed at maximizing benefit from crossbreds only in a crossing system, the optimal line development pattern will change. An investigation, which was made for this purpose in this paper (results not shown), showed that for an over-dominant QTL, when its initial allele frequencies in the parental lines was not high (below 0.5), frequency of the favorable allele in the sire line increased and that in the dam line decreased, because providing QQ males for crossbreeding from the sire line is easier than providing QQ females for crossbreeding from the dam line. In addition, this investigation also showed that maximizing benefit from crossbreds only also increased the superiority of mate selection over index selection.
This paper simulated genotypic selection on identified QTL for the situation where the sire-line and the dam-line had different IAF. It was found that the superiority of mate selection over index selection was high if the sire-line had lower IAF of the identified QTL than the dam-line, especially when reproductive rates of dams were low. Our proposed methods can be especially useful for exploiting non-additive QTL (probably including imprinted QTL) in lamb and beef production where crossbreeding is common and reproductive rates are low.
The genetic model used in this paper only took a single QTL into account, ignoring the polygenic effects. This poorly reflects a real animal population where polygenic effects and one or more QTL effects should be included. The current results show that mate selection was better than index selection because the former fully exploits the non-additive effect from a QTL. If polygenic effects are involved, the extra benefit of mate selection will depend on the degree of non-additivity at the QTL relative to polygenic effects, and the value of mate selection would likely be lower than that shown in this paper. If multiple nonadditive QTL are involved, the total benefit of mate selection will be expected to increase and the magnitude of the increase will depend on their sizes, degrees of dominance and epistasis, and allele frequencies.
Ultimately, with a good handle on the suites of interacting genes, mate selection could help lay down routes towards ideal genotypes. This would be of particular value where strong epistatic interactions exist -a situation that may well be true, but difficult to detect with current levels of information. A realistic model including polygenic effects and multiple QTL effects deserves further investigation.