 Research Article
 Open Access
 Published:
Modelling the shape of the pig scapula
Genetics Selection Evolution volume 52, Article number: 36 (2020)
Abstract
Background
The shape of pig scapula is complex and is important for sow robustness and health. To better understand the relationship between 3D shape of the scapula and functional traits, it is necessary to build a model that explains most of the morphological variation between animals. This requires point correspondence, i.e. a map that explains which points represent the same piece of tissue among individuals. The objective of this study was to further develop an automated computational pipeline for the segmentation of computed tomography (CT) scans to incorporate 3D modelling of the scapula, and to develop a genetic prediction model for 3D morphology.
Results
The surface voxels of the scapula were identified on 2143 CTscanned pigs, and point correspondence was established by predicting the coordinates of 1234 semilandmarks on each animal, using the coherent point drift algorithm. A subsequent principal component analysis showed that the first 10 principal components covered more than 80% of the total variation in 3D shape of the scapula. Using principal component scores as phenotypes in a genetic model, estimates of heritability ranged from 0.4 to 0.8 (with standard errors from 0.07 to 0.08). To validate the entire computational pipeline, a statistical model was trained to predict scapula shape based on marker genotype data. The mean prediction reliability averaged over the whole scapula was equal to 0.18 (standard deviation = 0.05) with a higher reliability in convex than in concave regions.
Conclusions
Estimates of heritability of the principal components were high and indicated that the computational pipeline that processes CT data to principal component phenotypes was associated with little error. Furthermore, we showed that it is possible to predict the 3D shape of scapula based on marker genotype data. Taken together, these results show that the proposed computational pipeline closes the gap between a point cloud representing the shape of an animal and its underlying genetic components.
Background
The need for highthroughput phenotyping in animal breeding has accelerated the development of automated segmentation of computed tomography (CT) images [1,2,3] and the generation of new precision phenotypes based on the morphology of animals [4]. Detailed 3D models would be useful for the development of new indicators for economically important traits that are not directly measurable on selection candidates, such as longevity and robustness. Including such relevant indicator phenotypes in multitrait genomic evaluations has the potential to increase the accuracy of estimated breeding values for traits under selection [4,5,6].
In human medicine, tremendous progress has already been achieved to create accurate descriptions of the shape of organs, bones, and surfaces through statistical shape modelling (SSM) [7]. More recently, the use of 3D cameras for web communication and entertainment, has accelerated the development of SSM of the face in humans [8]. In humans, SSM of the face has also been used in genetic studies that aimed at building methods for the identification of people from biological material [9, 10]. In livestock science, there is also a lot of new research that focuses on 3D imaging since it provides automatic and objective measures on live animals for breeding applications [11]. Much emphasis has been placed on replacement of phenotypes that, traditionally, have involved a large amount of manual work, for example weight [12,13,14], body condition score, and locomotion [15, 16]. Automatic highthroughput phenotyping from imaging technologies provides precise measures on already developed phenotypes, and simultaneously provides information about an individual from the 3D surface data. As such, 3D imaging has the potential to provide objective information about an individual’s condition and health. This could be used to gain a better understanding of the mechanisms that underlie animal health and robustness [4] and to develop relevant phenotypes for animal breeding [17].
The images recorded by, e.g., CT or 3D cameras are point clouds of surfaces and tissues without any annotation. To create relevant scalar phenotypes, which are the input data of the traditional mixed model equations used for animal breeding [18], these images must be annotated. The objects of interest may be rotated in relation to each other and are represented by different numbers of pixels. Hence, there is a need to create point correspondence [19], i.e. a map that explains which points represent the same piece of tissue among individuals.
In this paper, we report the construction of a computational pipeline for generating surface meshes with point correspondence based on automatic segmented CT images. Furthermore, we perform a principal component analysis (PCA) of the 3D shape and present a physical interpretation of the first principal components (PC). Finally, we build a statistical genomic prediction model for the 3D scapula surface and present the prediction accuracy of this model through a crossvalidation study.
Because of its complex shape and its importance for sow robustness and health, we used the pig scapula as a model organ [4], but our computational pipeline can, in theory, be applied to any 3D object that represents the surface of an organ, bone, or body.
Methods
Animals
The animals in this study were purebred Landrace pigs that originated from the Norsvin breeding nucleus. All the phenotypes used were part of routine records in the breeding programme and conducted in accordance to the laws and legislations for raising pigs in Norway. In total, 2143 boars were CTscanned at the boar testing station as part of the Topigs Norsvin breeding program [20]. The animals were born and raised to 25–30 kg in different nucleus herds located in Norway before being sent to the boar test station. The boar test included feed and weight records, scores for conformation traits, and CT scanning at the end of the test (at 120 kg) [21].
Genotypes
Single nucleotide polymorphism (SNP) genotyping was performed at CIGENE (University of Life Sciences, Ås, Norway) or at GeneSeek (Lincoln, NE, USA), using either the Illumina GeneSeek custom 80 K SNP chip (Lincoln, NE, USA), the Illumina Porcine SNP60 Beadchip (Illumina Inc., San Diego, CA, USA), an Illumina GeneSeek custom 50 K SNP chip (Lincoln, NE, USA), or the Illumina Porcine SNP9 Beadchip (Illumina Inc., San Diego, CA, USA). Genome positions of the SNPs were based on the Sscrofa10.2 assembly of the reference genome [22]. The genotypes were filtered [23] and imputed using the AlphaImpute software [24, 25]. After imputation, genotypes on 34,726 SNPs were available for 2088 animals.
Ethics approval
All animals were cared for according to the laws and regulations for keeping pigs in Norway (Regulation for the keeping of pigs in Norway 20030218175, 2003; Animal welfare Act 2009061997, 2009). In Norway, animal breeding is controlled by “Mattilsynet” (Norwegian Food Safety Authority), who has officially approved Norsvin for maintaining herdbooks through article 1(d) of Directive 88/661/EEC (approval date 27.07.1994). The official herd ID for the boar test station is 0403050826. Data recording and sample collection were conducted strictly in line with the laws given by Norwegian animal research authorities on the protection of animals (“Lov om dyrevelferd”). The data were obtained as part of routine data recording in commercial breeding programs. Samples collected for DNA extraction were used only for the routine diagnostic purpose of the breeding program.
Creation of phenotypes from CTimages
The full skeleton of live pigs was identified by applying a threshold value of 200 Hounsfield units to the CT scans [26]. Thereafter, the left scapula was identified and extracted [2], before modelling the 3D morphology of the scapula in Python, which involved the steps described below.
Creation of the scapula atlas
For the sake of simplicity, the surface of the left scapula was modelled for a random pig with a scapula that looked normal by visual inspection of the CT image:

(a)
The scapula was centred, rotated, and scaled to its second invariant (on scapula CT images, the width is more precisely measured than the length [4]).

(b)
The surface voxels were extracted. As an alternative to the marching cubes algorithm [27], which also has the potential to create surface meshes from e.g. 3D camera images, a standard 3D Delaunay triangularization [28] was performed. This gives a convex volumetric mesh representation of the scapula. To also represent the concave regions, tetrahedrons, which had one or more edge longer than a specified threshold (5 mm), were removed. This is similar to the concept of alpha shapes and allows the object to have a partly concave surface. Inside the triangularization, all faces belong to two adjacent tetrahedrons, while surface faces belong to only one tetrahedron. To go from a volumetric mesh to a surface mesh representation of the scapula, faces that belonged to two tetrahedrons were removed, and only the nodes that belonged to the remaining (surface) faces were extracted and defined the surface of the object. The output from this step was a dense surface mesh (Fig. 1) that consisted of 7472 surface nodes, interpolated by 14,946 faces.

(c)
To reduce the computational load and to avoid problems with multiple point correspondence, when fitting semi landmarks (see (b) in “Point correspondence” below), a sparse subset of surface nodes was selected. This was done by going through all the nodes in the dense surface mesh and calculating the local volume of the submesh (i.e. the mesh consisting of the convex hull of each node and the other nodes that shared faces with this node). For submeshes with volumes smaller than a certain threshold, the centre node was removed and a new local mesh was built. After removing nodes that were associated with the smallest local volumes, the remaining nodes constituted the scapula atlas. By doing this, the number of surface points was reduced from 7472 to 1234 semilandmarks (red dots in Fig. 1).
Point correspondence

(a)
Each scapula was centred, rotated, and scaled in the same manner as was done for the atlas and a dense surface mesh was built for the individual, as in (b) in “Creating the scapula atlas” above, where the number of nodes varied between animals.

(b)
To achieve the same number and same meaning of all points or phenotypes for all individuals, we used the coherent point drift (CPD) algorithm [29], which is implemented in Python [30]. This algorithm and improvements of this method have been widely used to achieve point correspondence for a variety of applications [31]. The CPDalgorithm finds the transformation that best aligns two pointclouds, X and Y, using Gaussian mixture models (GMM) through expectation–maximization (EM) optimization. The Ypoints are considered as GMM centroids, whereas the X’s are data points to be fitted. At the optimum, the two pointclouds are aligned and the output is the probability correspondence matrix \({\mathbf{P}}\), which contains the probability that any point from Y corresponds to any of the points in X. In our case, we used the sparse atlas mesh as X [(c) in “Creating the scapula atlas”], and the dense surface mesh for each individual as Y [(a) in “Point correspondence”]. At the optimum, we used the most probable point correspondence for all points in X to find the corresponding point in Y. Hence, the raw phenotypes were a subset (1234 points) of the dense surface mesh Y. Within the CPD algorithm, we tested three types of deformation, i.e. rigid, affine, and nonrigid [29], on a subset of 115 individuals, using default parameters. To compare the effect of different deformation modes, we compared how well all points in the dense surface point cloud for each individual were represented by the aligned subset. Based on the results from this initial test, we proceeded on the entire dataset that consisted of 2143 animals, using the affine deformation option.

(c)
To better align the fitted point clouds of 1234 x, y, and zcoordinates for each animal, the centring, scaling, and rotating procedure from (a) in “Creating the scapula atlas” was repeated. For breeding purposes, size of the scalpula is an interesting phenotype, and therefore, data was rescaled to its original size and measured in mm in an Eulerian coordinate space.

(d)
Earlier attempts to achieve point correspondence for scapula, resulted in problems with point matching in regions where surfaces on different sides of the objects are close to each other. For example, at the scapula spine, the bone is quite thin and aligned points for some individuals were fitted to the wrong side of the bone. To test whether the surface points were taken from the correct side of the spine, we measured the difference in yvalues between two node points on each side of the spine, with similar x and z coordinate values (red points in Fig. 2a). In addition, we made similar measurements on some of the thinnest parts of the scapula. On each side of the spine, we selected two pairs of points with similar x and y values and measured the distance in the z direction (green and blue points in Fig. 2a).
Generation of scalar phenotypes

(a)
To remove outliers, we summed the absolute deviations from the population mean for all 3702 phenotypes (1234 x, y, and z coordinates). Animals with a summed absolute deviation that was more than 1.5 interquartile, ranges below the first quartile or above the third quartile were identified as outliers and removed from the dataset. After this step, 2088 animals remained in the dataset.

(b)
Because the number of raw phenotypes was very large and the covariance between the variables was large, PCA [32] was used to capture the main variation between individuals. A set of raw phenotypes for 200 animals were masked before running the PCA, since these animals constituted the validation set. In the PCA transformation, the M (1888) × N (3702) matrix \({\mathbf{X}}\) was transformed into a set of linearly uncorrelated variables, so the phenotype vector for each animal, \(m\), could be expressed as:
$${\mathbf{X}}_{m} = \bar{X} + \mathop \sum \limits_{n = 1}^{N} \varvec{w}_{n} s_{nm} ,$$(1)where \(\bar{X}\) is the population mean of \({\mathbf{X}}\), \(\varvec{w}\) are the principal component loadings or coefficients, and s are the principal component scores. The first few principal component loadings or coefficients (\(\varvec{w}\)) are the new uncorrelated variables that explained the main variation in the data and were used as the final aggregated phenotypes, while the principal component score \(s_{nm}\) was used as the phenotypic value of individual \(m\) for the new aggregated trait \(n\).
Creating a statistical model
Next, we estimated the genetic parameters for the PCA scores using best linear unbiased prediction (BLUP) [33] multitrait animal models with the DMU software [34], similar to the approach used for CT traits in previous work [4]. This analysis was conducted using a pedigree that contained all animals with CT data and five generation of ancestors, for 7315 animals. To limit the genetic analyses to a feasible number of traits, we included only the first 10 principal components. The PCA score phenotypes \(s_{nm}\) were modelled in a multtrait genetic analysis as:
with herd year (\({\text{HY}}\)) of the boar’s birth, birth month (\({\text{BM}}\)), and parity number of the dam (\({\text{PN}}\)) as fixed effects, the boar’s phenotype for live weight at scanning date (\({\text{LW}}\)) as a fixed covariate, and the additive genetic effect of the animal (\({\text{a}}_{mn}\)) and the residual (\({\text{e}}_{ijklmn}\)) as random effects. The numbers of levels for \({\text{HY}}\), \({\text{BM}}\), and \({\text{PN}}\) were 148, 12, and 4, respectively, while the covariate \({\text{LW }}\) had a mean of 122.8 kg and a standard deviation of 5.2 kg.
The genetic and phenotypic variancecovariances matrices \({\mathbf{G}}\) and \({\mathbf{P}}\) were further examined to identify genetically uncorrelated linear combinations of the 10 PC traits by calculating the eigenvectors \({\mathbf{v}}\) and eigenvalues \(\lambda\) of the \({\mathbf{GP}}^{  1}\) matrix [35]. The eigenvector associated with the largest eigenvalue shows the direction of the shape space that is associated with the highest heritability estimate, whereas the eigenvector associated with the smallest eigenvalue shows the direction captured by the 10 principal components for which variation is the least associated with the underlying genetics.
Crossvalidation using a genomic prediction model
The SNP genotypes of 2088 animals (1888 in training and 200 in validation) were used to compute genomic relationships [36] with the program Gmatrix [37] as:
where \({\mathbf{Z}}\) is a matrix of standardised SNPs. Genotypes, with elements \(Z_{ij} = I_{ij}  2p_{j }\), where \(I_{ij}\) is the number of the first allele that animal \(i\) carries for SNP \(j\), with an allele frequency of \(p_{j }\). The genomic relationship matrix was fitted to the training data to predict genomic breeding values [38] using the statistical model described above (Eq. 2).
The phenotypes of the validation animals were masked, and the breeding values for all 2088 animals were estimated. Then, the genomic estimated breeding values (GEBV), \({\text{a}}_{mn}\) for each of the validation animals were added to the relevant solutions of the fixed effects in the model (\({\text{HY}}_{in} + {\text{BM}}_{jn} + {\text{PN}}_{kn} + {{\beta }} \times {\text{LW}}_{ln}\)) to constitute predicted PCA scores, \(\hat{s}_{nm}\). Since the data were extracted from a commercial breeding programme, the experimental setup was somewhat unbalanced, which could have led to unreliable estimates of fixed effects. To cope with this, we removed animals from the validation set for which the number of observations for any of their estimates of fixed effects was less than 4. By multiplying predicted PCA scores, \(\hat{s}_{nm}\), by the corresponding PCAloadings, we predicted the 3702 raw phenotypes, using the 10 first PC, as:
Goodnessoffit of the shape was quantified by the mean Euclidean distance \(d\left( {\varvec{x}_{m} ,\hat{\varvec{x}}_{m} } \right)\) between corresponding genomic predicted points, \(\hat{\varvec{x}}_{m} ,\) and true CTdata, \(\varvec{x}_{m} ,\) for each of the validation animals. To investigate predictive performance over the surface [10], we also measured the spatial prediction reliability, \(R^{2}\), using the following formula:
where the numerator is the squared Euclidean distance between true and predicted coordinates, averaged over the \(M_{v}\) validation animals, and the denominator is the mean squared Euclidean distance between observed and average coordinates values.
Visualization of results
For 3D visualization of the PCA components, we used the Mayavi [39] package. First, isonormals that pointed outwards were calculated for all faces in the mean shape mesh, and the average of isonormals of adjacent faces was used to calculate the isonormal of the nodes. The PCA loadings for each point explain how much of the variability of the raw data is explained by each of the principal components. The loadings were organized into a vector field for the 1234 3D data points that represented the scapula. Taking the dot product between this vector field and the isonormals provided a scalar field that represented how variation in the current component affects morphological changes, perpendicular to the scapula surface. This allowed for the visualization of both inward and outward deformations in heatmaps.
Results
Selection of the best deformation type
To compare how the three types of deformation, rigid, affine, and nonrigid, worked within the coherent point drift (CPD) algorithm for the type of data analyzed here, an initial test was done on a subset of 115 animals. For this subset, the left scapula surface data consisted of between 7472 and 11,228 data points per animal. After deforming the atlas onto each individual’s data, a subset of 1234 node points was extracted. To consider how well these 1234 node points represented the surface points, we calculated the Euclidean distance between all surface points and the closest node points. The median and maximum of these distances were calculated for each individual and then averaged over all individuals for each deformation type. For affine deformations, the median and maximum distances were 2.39 and 9.13 mm. For nonrigid deformations, the corresponding numbers were 2.46 and 10.29 mm, while for rigid formations, the median and maximum distances were 2.43 and 9.60 mm. Based on these statistics, the affine deformation was considered to best represent the scapula surface and was then used for the entire dataset.
Point correspondence
The coherent point drift algorithm was run successfully on the data from all 2143 animals, of which 55 were identified as outliers and removed from further analysis. The pairwise distance between pairs of node points (indicated by red, blue and green dots in Fig. 2a) on opposite sides of the scapula was measured for all animals and is shown as an empirical cumulative distribution function (ECDF) plot in Fig. 2b.
Figure 2 shows that the width at the narrowing on the scapula spine had a smooth uniform distribution with an average of 8 mm and a standard deviation of 1 mm, which indicates that surface points were fitted to the correct side for all animals. The distance for the two pointpairs on the shoulder blade had a bimodal distribution, indicating that, for about 15% and 25% of animals, surface modelling was not perfect for the supraspinous fossa (shown in green) and the infraspinous fossa (shown in blue), respectively.
Principal components
Of the 2143 animals, 2088 passed outlier removal. Of these, data on 200 animals were masked before the PCA for validation purposes. The first three PC explained 29.1, 16.5, and 11.1% of the total variation, respectively, which means that more than 50% of the observed variation was captured by the first three PC, while more than 80% of the variation was captured by the first 10 PC (Table 1). After the first few components, the proportion of explained variance per PC shrank, and with 50 PC, 85.7% of the total variance was captured (Fig. 3a).
Comparing the data from the first 10 PC with the input of PCA showed that using 10 PC (Fig. 3b) resulted in an average error of 1.4 mm for node positions. The effects of the first four PC are illustrated in Fig. 4. The first PC was associated with mass transfer between infraspinous and supraspinous fossa (see Fig. 1) and the curvature of the scapular spine in the y–z plane. The second PC was mainly dominated by size of the scapula. This can also be observed in the regression coefficients (\(\beta\) from Eq. 2) in Table 1, where the regression coefficient for live weight was much larger for this component than for the others. The third PC was mainly associated with shape of the spina scapula and also affected curvature of the entire bone in the x–z plane. The fourth PC mainly scaled the length to width and the length to thickness ratios.
Genetic parameters
To have a manageable number of traits in the genetic analysis, we focused on the first 10 PC and calculated genetic parameters using the DMU software [34]. Estimates of heritabilities of the PC were high (0.40–0.79) (Table 1), while estimates of the genetic correlation between PC ranged from − 0.2 to 0.25, with a standard error of ~ 0.1. Summing the product of the explained variance and heritability estimate for the 10 PC in Table 1 provides a rough estimate of the heritability of the observed scapula shape, and was equal to 0.47. However, this is a lower estimate, since variation in the remaining PC could also have a genetic component.
Each of the eigenvectors of the \({\mathbf{GP}}^{  1}\) matrix were associated mainly with one PC (Table 2). Ranking the eigenvectors by the size of their associated eigenvalue, the first eigenvector was dominated by PC8, which was the PC with the highest heritability (Table 1), while the subsequent eigenvectors were mainly associated with PC with gradually decreasing heritabilities. The main exceptions from this pattern were the third and fourth largest eigenvalues which are both, to some extent, associated with both PC3 and PC9.
Validation
Of the 200 validation animals, 15 were removed because reliable estimates of fixed effects were not available. For the remaining 185 animals, estimates of fixed effects were added to the GEBV and then multiplied with the corresponding PC loadings (according to Eq. 4). Then, the predicted shape, \(\hat{X}\) was compared with the true shape, \(X\), based on the average Euclidean norm of all 1234 corresponding 3Dpoints (see Fig. 5). Adding predictions of more principal components reduced the mean error (across animals) from about 2.97 mm (when population average scapula was used as prediction) to 2.64 mm when 10 PCpredictions were used. For comparison, omitting the GEBV (in Eq. 2) and using only the fixed effect estimates (indicated by blue dotted line in Fig. 5) reduced the mean error from 2.97 to 2.94 mm, only.
The spatial prediction reliability, \(R^{2}\), was calculated for each of the surface node points. The mean reliability for the full prediction model that included both GEBV and estimates of fixed effects was 0.18, with a standard deviation of 0.05 (Fig. 6b). By mapping reliabilities to the scapula, we see that the prediction reliability was slightly higher in convex than in concave regions. Specifically, some reliabilities were lower on the outside of the shoulder blade, close to the spine, both on the infraspinous and supraspinous fossa (see Fig. 6b). In comparison, using estimates of fixed effects only for shape prediction (Fig. 6a) resulted in spatial prediction reliabilities close to zero over the whole geometry.
Discussion
In this paper, we present a computational pipeline for shape modelling of 3D surfaces of organs and a predictive model for 3D shape based on genomic data. In general, estimates of heritability (Table 1) were high (mean 0.62) compared to those of most traits related to skeletal features reported for humans [40,41,42], mice [43, 44], and livestock [45,46,47]. However, it should be mentioned that most of the literature on livestock focuses on bone size or on phenotypes related to bone length (e.g. height or stature), and not on bone shape, so the quantity of relevant data for comparison is limited. The high heritabilities of overall measures in the current study show that the shape of the scapula is highly heritable and that the computational pipeline that processes raw CT images to principal components is associated with limited errors.
In spite of the relatively small number of individuals used for training (1888), we have shown that genomic predictions of the 3D shape of the scapula was achievable. Adding additional principal components to predictions improved the average fit, resulting in an average error or predictions across points and animals of 2.64 mm based on the first 10 PC, which represents the sum of errors from genomic prediction and from simplification of the data when using 10 instead of 3702 variables. In addition, there are also some inherent imprecision in the CT data (voxel size of about 1 mm^{3}), which limits the achievable predictive precision. Genomic predictions for shape had a mean reliability of 0.18, while predictions based on estimates of fixed effects only resulted in a prediction reliability close to zero. The reasons why estimates of genetics resulted in a better predictor of shape than estimates of environmental effects are: (1) the high heritability of the traits, and (2) the fact that the individuals were raised in very similar conditions and were scanned at a similar age/weight. For example, if the variation in body weight of scanned animals had been large, the predictive ability based on only estimates of fixed effects would probably have been higher.
The average spatial prediction reliability obtained here was of a similar order of magnitude as reported for facial shape predictions in humans [10]. In our data, reliability was quite spatially uniform, while for predictions of facial shape [10], reliability was relatively heterogeneous. In our data, the main area with lower spatial prediction reliabilities was on the outside of the shoulder blade, close to the spine, both on the infraspinous and supraspinous fossa side (blue regions in Fig. 6b). This is an area that has a slightly concave curvature in the y–z plane. In addition, the bone is quite thin in this region and the lowered prediction reliability might be because the CPD algorithm [29] led to conflicting correspondences between the subscapularis fossa’s surface and the infraspinatus fossa’s surface in some instances (see blue and green points in and corresponding curves in Fig. 2). This problem was also reported in an analysis of human scapula data [48], and in that study conflicting correspondences were handled first by dividing the surface into zones and then looking for correspondences between points belonging to the same zone, rather than fitting the whole point cloud in one step.
Previously, statistical shape models of scapula have been reported for humans [48,49,50], primates [51], and Felidae [52], but these were based on a much smaller number of individuals (between 15 and 57) than that used in our study. In addition, the scapula analysed here were probably more uniform in both size and shape than in these previous studies because the individuals were more homogeneous in terms of age and genetics and were raised under uniform environmental conditions. In some of these previous studies, the first principal component explained a large amount of the total variance of the data, i.e. 60% [50], 72% [49], and 99% [52], and represented mainly the sizecomponent in the data. Even after removing the sizecomponent from the total variation, fewer principal components were needed in these studies to explain a certain ratio of the total variance than the number required in our study.
Examination of genetically orthogonal linear combinations of the PC traits through eigen decomposition of the \({\mathbf{GP}}^{  1}\) matrix, showed almost a onetoone relationship between PC traits and eigenvectors, which was as expected, since estimates of genetic correlations between the PC traits were relatively low (between − 0.2 and 0.25 with a standard error of ~ 0.1). The main exceptions to this onetoone relationship were the third and fourth largest eigenvalues, which were both associated with the third and ninth principal component, which both had a heritability estimate equal to 0.68.
Implications of the chosen methods
Our aim was to build a computational pipeline that could connect 3D data of an individual to its underlying genetics. Hence, the results rely on a variety of methods and assumptions, and some justification and implications of the chosen methods are presented here.
Creating the scapula atlas
In the current study, for simplicity, a random individual was selected to create the atlas for point correspondence. Other alternatives, such as using an average atlas or multiatlas, have been shown to give a more accurate atlas [53, 54] and might have improved the point correspondence, especially in the concave regions and where the surfaces on different sides of the objects are close to each other.
Surface extraction method
In the current study, we used Delaunay triangularization and removed all tetrahedrons with any edge longer than a specified threshold. The rationale for this was to establish a more general method than, e.g., the marching cubes algorithm [27], which can also extract surfaces from less structured point clouds, such as those coming from 3D cameras. The level of the threshold chosen could, however, affect results. A high threshold could reduce the information about the concave areas, while a low threshold could result in a discontinuous surface. By using a threshold of 5 mm for a point cloud with a regular grid with a resolution equal to 0.94 × 0.94 × 1.25 mm, some smoothing occured in the most concave regions, as for example where the tendons attach to the scapular spine.
Point registration method
The CPDalgorithm was used to find corresponding points across individuals. Initial tests were conducted with default input parameters and the affine deformation was found to perform best. Optimization of parameters, which could potentially improve the achievements of the method, was not performed. In addition, as described in Methods, we used the most probable point correspondence (from matrix \({\mathbf{P}}\)) for all points in X to find the corresponding point in Y. An alternative that could have improved the method is to use \({\mathbf{P}}\) to calculate weighted coordinate points from Y. A weakness of the original CPDalgorithm is that only the Euclidean distance is considered as a measure of similarity and not the neighbourhood structure of points [31], as exemplified by locating corresponding points on the wrong side of the object (Fig. 2). Much effort has been put into improving the CPD algorithm, such that it includes more information about the local structure between neighbouring points, see e.g. [31, 55]. Testing these developments as well as other relevant point registration methods [56, 57] is beyond the scope of this paper, but will be followed up in future research.
Scaling method
In the current study, the size of the object was embedded in the definition of the phenotypes. Before doing the PCA, the surface point cloud was rescaled to its original size, measured in mm. However, the PCphenotypes were corrected for the live weight of the animal at scanning to account for different sizes of the animals and in order to keep the phenotypes as independent as possible from animal management decisions (e.g. day of scanning). An alternative for handling the effect of size would be to consider size separately from the PCA [58] and treat it as a separate scalar phenotype in the statistical model for estimating genetic parameters. Although this approach would have changed the principal components, the main conclusions would probably remain much the same, because the size component in the current dataset was relatively limited.
Training population and crossvalidation
In the current study, a set of 200 animals was masked from the dataset before calculation of the principal components, estimation of genetic parameters, and genomic prediction. This was done because our aim was to demonstrate the concept of predicting the shape of an organ, rather than to obtain realistic prediction accuracies for animal breeding, which requires relevant relatedness between training and validation populations [59]. An alternative that would have increased the precision of the estimate of prediction reliability is to use Kfold crossvalidation [60]. However, this would have changed the definition of the phenotypes and the estimates of genetic parameters between validation sets, which would have made the results more difficult to interpret.
Further use of 3D animal models
Previously [4], we focused on the shape of the scapula spine and on some overall characteristics such as length, width, and thickness of the scapula. As hypothesized, we found that the shape of the scapula was genetically correlated with the severity of shoulder lesions and with body condition score of sows at weaning (of the litter). In the work reported here, we extended the shape modelling to capture the entire 3D geometry of the scapula, which allowed us to use all morphological variation from the CTscans. This is important to better understand the relationships between morphology, health, and genetics. Furthermore, for animal breeding purposes, detailed 3D models could be used to develop relevant indicators for traits that are not possible to measure on selection candidates, such as longevity and robustness [61,62,63]. By including relevant precision phenotypes in multivariate genomic predictions, the accuracy of estimated breeding values for economically important traits will increase [5]. As such, the development of a statistical shape model for breeding can be important for animal welfare and efficient food production. In addition, the large number of CT scanned individuals in modern breeding programs, constitutes a valuable data source for verification of novel segmentation and detection algorithms. Furthermore, breeding animals are not covered by the General Data Protection Regulations, and high throughput genotyping and phenotyping from pig breeding could be used to improve reference datasets for genetic research across species, and to improve and automate diagnostic tools.
Conclusions
In this work, we constructed a statistical shape model of the pig scapula, in which the first 10 principal components covered 80% of the total shape variance observed in a large population of CTscanned pigs. In general, estimates of heritabilities of the first principal components were high and showed that the computational pipeline that processes raw CT data to principal component phenotypes was associated with little error. Furthermore, we showed that it is possible to predict the 3D shape of the scapula based on genomic data. Some limitations in the quality of the point correspondence were observed in concave areas where the surfaces of opposite sides of the scapula were close to each other. In those areas, the coherent point drift algorithm sometimes led to conflicting correspondences between the different sides of the bone, and the reliability of genomic predictions was lower than for the rest of the scapula. The statistical shape model can be used to develop precise indicator traits for phenotypes that are not possible to measure on selection candidates, such as longevity and robustness. As such, the development of detailed 3D animal models can be an important tool for improving animal welfare and for efficient food production.
Availability of data and materials
The data that support the findings of this study are available from Norsvin SA, but restrictions apply to the availability of these data, which were used under license for the current study, and thus are not publicly available. However, data are available from the authors upon reasonable request and with permission of Norsvin SA.
References
 1.
Gangsei LE, Kongsro J, Olstad K, Grindflek E, Sæbø S. Building an in vivo anatomical atlas to close the phenomic gap in animal breeding. Comput Electron Agric. 2016;127:739–43.
 2.
Gangsei LE, Kongsro J. Automatic segmentation of Computed Tomography (CT) images of domestic pig skeleton using a 3D expansion of Dijkstra’s algorithm. Comput Electron Agric. 2016;121:191–4.
 3.
Kvam J, Gangsei LE, Kongsro J, Schistad Solberg AH. The use of deep learning to automate the segmentation of the skeleton from CT volumes of pigs. Transl Anim Sci. 2018;2:324–35.
 4.
Nordbø Ø, Gangsei LE, Aasmundstad T, Grindflek E, Kongsro J. The genetic correlation between scapula shape and shoulder lesions in sows. J Anim Sci. 2018;96:1237–45.
 5.
McParland S, Lewis E, Kennedy E, Moore SG, McCarthy B, O’Donovan M, et al. Midinfrared spectrometry of milk as a predictor of energy intake and efficiency in lactating dairy cows. J Dairy Sci. 2014;97:5863–71.
 6.
Miglior F, Fleming A, Malchiodi F, Brito LF, Martin P, Baes CF. A 100year review: identification and genetic selection of economically important traits in dairy cattle. J Dairy Sci. 2017;100:10251–71.
 7.
Heimann T, Meinzer HP. Statistical shape models for 3D medical image segmentation: a review. Med Image Anal. 2009;13:543–63.
 8.
Zollhöfer M, Thies J, Garrido P, Bradley D, Beeler T, Pérez P, et al. State of the art on monocular 3D face reconstruction, tracking, and applications. Comput Graph Forum. 2018;37:523–50.
 9.
Claes P, Liberton DK, Daniels K, Rosana KM, Quillen EE, Pearson LN, et al. Modeling 3D facial shape from DNA. PLoS Genet. 2014;10:e1004224.
 10.
Lippert C, Sabatini R, Maher MC, Kang EY, Lee S, Arikan O, et al. Identification of individuals by trait prediction using wholegenome sequencing data. Proc Natl Acad Sci USA. 2017;114:10166–71.
 11.
VázquezArellano M, Griepentrog HW, Reiser D, Paraforos DS. 3D Imaging systems for agricultural applications—a review. Sensors (Basel). 2016;16:618.
 12.
Fernandes AFA, Dórea JRR, Fitzgerald R, Herring W, Rosa GJM. A novel automated system to acquire biometric and morphological measurements and predict body weight of pigs via 3D computer vision. J Anim Sci. 2019;97:496–508.
 13.
Kongsro J. Estimation of pig weight using a Microsoft Kinect prototype imaging system. Comput Electron Agric. 2014;109:32–5.
 14.
Song X, Bokkers EAM, van der Tol PPJ, Groot Koerkamp PWG, van Mourik S. Automated body weight prediction of dairy cows using 3dimensional vision. J Dairy Sci. 2018;101:4448–59.
 15.
Hansen MF, Smith ML, Smith LN, Abdul Jabbar K, Forbes D. Automated monitoring of dairy cow body condition, mobility and weight using a single 3D video capture device. Comput Ind. 2018;98:14–22.
 16.
Stavrakakis S, Li W, Guy JH, Morgan G, Ushaw G, Johnson GR, et al. Validity of the microsoft kinect sensor for assessment of normal walking patterns in pigs. Comput Electron Agric. 2015;117:1–7.
 17.
Aasmundstad T, GjerlaugEnger E, Grindflek E, Vangen O. Genetic trends of conformation traits and genetic correlations to osteochondrosis in boars. Animal. 2014;8:1045–52.
 18.
Lynch M, Walsh B. Genetics and analysis of quantitative traits. Sunderland: Sinauer Asscoiates Inc.; 1998. p. 980.
 19.
Stanković K, Booth BG, Danckaers F, Burg F, Vermaelen P, Duerinck S, et al. Threedimensional quantitative analysis of healthy foot shape: a proof of concept study. J Foot Ankle Res. 2018;11:8.
 20.
Aasmundstad T, Kongsro J, Wetten M, Dolvik NI, Vangen O. Osteochondrosis in pigs diagnosed with computed tomography: heritabilities and genetic correlations to weight gain in specific age intervals. Animal. 2013;7:1576–82.
 21.
GjerlaugEnger E, Kongsro J, Odegård J, Aass L, Vangen O. Genetic parameters between slaughter pig efficiency and growth rate of different body tissues estimated by computed tomography in live boars of Landrace and Duroc. Animal. 2012;6:9–18.
 22.
Groenen MAM, Archibald AL, Uenishi H, Tuggle CK, Takeuchi Y, Rothschild MF, et al. Analyses of pig genomes provide insight into porcine demography and evolution. Nature. 2012;491:393–8.
 23.
Iversen MW, Nordbø Ø, GjerlaugEnger E, Grindflek E, Lopes MS, Meuwissen T. Effects of heterozygosity on performance of purebred and crossbred pigs. Genet Sel Evol. 2019;51:8.
 24.
Antolín R, Nettelblad C, Gorjanc G, Money D, Hickey JM. A hybrid method for the imputation of genomic data in livestock populations. Genet Sel Evol. 2017;49:30.
 25.
Hickey JM, Kinghorn BP, Tier B, van der Werf JH, Cleveland MA. A phasing and imputation method for pedigreed populations that results in a singlestage genomic evaluation. Genet Sel Evol. 2012;44:9.
 26.
Kongsro J, Gangsei LE, KarlssonDrangsholt TM, Grindflek E. Genetic parameters of in vivo primal cuts and body composition (PigAtlas) in pigs measured by computed tomography (CT)1. Transl Anim Sci. 2017;1:599–606.
 27.
Lorensen WE, Cline HE. Marching cubes: a high resolution 3D surface construction algorithm. ACM SIGGRAPH Comput Graph. 1987;21:163–9.
 28.
Jones E, Oliphant T, Peterson P. SciPy: Open source scientific tools for Python. 2001. http://www.scipy.org/.
 29.
Myronenko A, Song X. Pointset registration: coherent point drift. IEEE Trans Pattern Anal Mach Intell. 2010;32:2262–75.
 30.
Khallaghi S. PyCPD: Tutorial on the coherent point drift algorithm. 2017. http://siavashk.github.io/2017/05/14/coherentpointdrift/.
 31.
Peng L, Li G, Xiao M, Xie L. Robust CPD Algorithm for nonrigid point set tegistration based on structure information. PLoS One. 2016;11:e0148483.
 32.
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikitlearn: machine learning in python. J Mach Learn Res. 2011;12:2825–30.
 33.
Henderson CR. Best linear unbiased estimation and prediction under a selection model. Biometrics. 1975;31:423–47.
 34.
Madsen P, Jensen J. A user’s guide to DMU. A package for analysing multivariate mixed models. Version 6, release 5.2. 2013. http://dmu.agrsci.dk/DMU/Doc/Current/dmuv6_guide.5.2.pdf.
 35.
Klingenberg CP, Leamy LJ. Quantitative genetics of geometric shape in the mouse mandible. Evolution. 2001;55:2342–52.
 36.
VanRaden PM. Efficient methods to compute genomic predictions. J Dairy Sci. 2008;91:4414–23.
 37.
Su G, Madsen P. User’s Guide for Gmatrix Ver2. A program for computing Genomic relationship matrix; 2014. http://dmu.agrsci.dk/Gmatrix/Doc/Current/.
 38.
Meuwissen THE, Hayes BJ, Goddard ME. Prediction of total genetic value using genomewide dense marker maps. Genetics. 2001;157:1819–29.
 39.
Ramachandran P, Varoquaux G. Mayavi: 3D Visualization of scientific data. Comput Sci Eng. 2011;13:40–51.
 40.
Havill LM, Mahaney MC, Binkley T, Specker BL. Effects of genes, sex, age, and activity on BMC, bone size, and areal and volumetric BMD. J Bone Miner Res. 2007;22:737–46.
 41.
Jian WX, Long JR, Deng HW. High heritability of bone size at the hip and spine in Chinese. J Hum Genet. 2004;49:87–91.
 42.
Tsagkrasoulis D, Hysi P, Spector T, Montana G. Heritability maps of human face morphology through largescale automated threedimensional phenotyping. Sci Rep. 2017;7:45885.
 43.
Navarro N, Maga AM. Does 3D phenotyping yield substantial insights in the genetics of the mouse mandible shape? G3(Bethesda). 2016;6:1153–63.
 44.
Percival CJ, Liberton DK, de Villena F, Spritz R, Marcucio R, Hallgrímsson B. Genetics of murine craniofacial morphology: diallel analysis of the eight founders of the collaborative cross. J Anat. 2016;228:96–112.
 45.
Hintz RL, Hintz HF, Van Vleck L. Estimation of heritabilities for weight, height and front cannon bone circumference of thoroughbreds. J Anim Sci. 1978;47:1243–5.
 46.
Utrera AR, Van Vleck LD. Heritability estimates for carcass traits of cattle: a review. Genet Mol Res. 2004;3:380–94.
 47.
Vargas CA, Elzo MA, Chase CC, Chenoweth PJ, Olson TA. Estimation of genetic parameters for scrotal circumference, age at puberty in heifers, and hip height in Brahman cattle. J Anim Sci. 1998;76:2536–41.
 48.
Mayya M, Poltaretskyi S, Hamitouche C, Chaoui J. Scapula statistical shape model construction based on watershed segmentation and elastic registration. In Proceedings of the 2013 IEEE 10th International Symposium on Biomedical Imaging: 7–11 April 2013; San Francisco. 2013. p. 101–4.
 49.
Casier SJ, Van den Broecke R, Van Houcke J, Audenaert E, De Wilde LF, Van Tongel A. Morphologic variations of the scapula in 3dimensions: a statistical shape model approach. J Shoulder Elbow Surg. 2018;27:2224–31.
 50.
Mutsvangwa T, Burdin V, Schwartz C, Roux C. An atomated statistical shape model developmental pipeline: application to the human scapula and humerus. IEEE Trans Biomed Eng. 2015;62:1098–107.
 51.
Yang YM, Rueckert D, Bull AMJ. Predicting the shapes of bones at a joint: application to the shoulder. Comput Methods Biomech Biomed Engin. 2008;11:19–30.
 52.
Zhang KY, WiktorowiczConroy A, Hutchinson JR, Doube M, Klosowski M, Shefelbine SJ, et al. 3D Morphometric and posture sudy of felid scapulae using statistical shape modelling. PLoS One. 2012;7:e34619.
 53.
Rohlfing T, Brandt R, Menzel R, Maurer CR. Evaluation of atlas selection strategies for atlasbased image segmentation with application to confocal microscopy images of bee brains. NeuroImage. 2004;21:1428–42.
 54.
Aljabar P, Heckemann RA, Hammers A, Hajnal JV, Rueckert D. Multiatlas based segmentation of brain images: atlas selection and its effect on accuracy. NeuroImage. 2009;46:726–38.
 55.
Ma J, Zhao J, Yuille AL. Nonrigid point set registration by preserving global and local structures. IEEE Trans Image Proces. 2016;25:53–64.
 56.
Pomerleau F, Colas F, Siegwart R. A review of point cloud registration algorithms for mobile robotics. Found Trends Robotics. 2015;4:1–104.
 57.
Zhu H, Guo B, Zou K, Li Y, Yuen KV, Mihaylova L, et al. A review of point set registration: from pairwise registration to groupwise registration. Sensors. 2019;19:1191.
 58.
Klingenberg CP. Evolution and development of shape: integrating quantitative approaches. Nat Rev Genet. 2010;11:623–35.
 59.
Daetwyler HD, Calus MPL, PongWong R, Campos G, Hickey JM. Genomic prediction in animals and plants: simulation of data, validation, reporting, and benchmarking. Genetics. 2013;193:347–65.
 60.
Borra S, DiCiaccio A. Measuring the prediction error. A comparison of crossvalidation, bootstrap and covariance penalty methods. Comput Stat Data Anal. 2010;54:2976–89.
 61.
Buttchereit N, Stamer E, Junge W, Thaller G. Genetic parameters for energy balance, fat/protein ratio, body condition score and disease traits in German Holstein cows. J Anim Breed Genet. 2012;129:280–8.
 62.
König S, May K. Invited review: phenotyping strategies and quantitativegenetic background of resistance, tolerance and resilience associated traits in dairy cattle. Animal. 2019;13:897–908.
 63.
Aasmundstad T, Olsen D, Sehested E, Vangen O. The genetic relationships between conformation assessment of gilts and sow production and longevity. Livest Sci. 2014;167:33–40.
Acknowledgements
The author thanks R. Sagevik and J. Kongsro for kindly providing segmented CT data, H. Hamland for providing genotype data and M. Knudsen for helping with illustrations. The author also wants to thank E Grindflek, LE Gangsei, AH Gaustad, T.Ådnøy, THE Meuwissen and AB Gjuvsland for valuable discussions as well as MW Iversen for language editing.
Funding
This study was supported by the Research Council of Norway under the BIONÆR program, project number 282252.
Author information
Affiliations
Contributions
ØN designed the study, performed the data analysis, wrote the paper and approved the final manuscript. The author read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
All animals were cared for according to the laws and regulations for keeping pigs in Norway (Regulation for the keeping of pigs in Norway 20030218175, 2003; Animal welfare Act 2009061997, 2009). In Norway, animal breeding is controlled by “Mattilsynet” (Norwegian Food Safety Authority), who has officially approved Norsvin for maintaining herdbooks through article 1(d) of Directive 88/661/EEC (approval date 27.07.1994). The official herd ID for our testing station is 0403050826. Data recording and sample collection were conducted strictly in line with the laws given by Norwegian animal research authorities on the protection of animals (“Lov om dyrevelferd”). The data was obtained as part of routine data recording in commercial breeding programs. Samples collected for DNA extraction were used only for the routine diagnostic purpose of the breeding program.
Consent for publication
Not applicable.
Competing interests
The author declares that he has no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Nordbø, Ø. Modelling the shape of the pig scapula. Genet Sel Evol 52, 36 (2020). https://doi.org/10.1186/s12711020005555
Received:
Accepted:
Published: