When the frequency of alleles for a gene is constant over time that gene is said to be genetic?

Introduction

In his letter to the Editor of Science Hardy [1908] showed that under specific conditions the simple two allele system [A, a] has the property that the allele frequencies [p, q] determine the genotype frequencies [AA, aa, Aa] obeying proportions given by the simple well-known relation p2 + q2 + 2pq = 1. Independently, the German physician Weinberg [1908] arrived to similar results. In the 1930s the synthetic theory proposed that under the Hardy–Weinberg [HW] conditions genetic systems attain an “equilibrium” state characterized by the genotype frequency proportions obtained directly from the allele frequencies as stated in the HW relation above. This definition of equilibrium in genetic systems became part of the well-known HW Principle [Hartl and Clarke, 2007] which formally states that: “If a genetic population is such that [1] organisms are diploid, [2] reproduction is sexual, [3] generations do not overlap, [4] mating is random, [5] the size of the population is significantly large, [6] allele frequencies are equal in the sexes, and [7] there is no migration, mutation, or selection, then the genotype frequencies in the population are given by weighted products of the allele frequencies. In the case of the one locus-two allele system the allele frequencies [[A, a] = [p, q]] give directly the genotype frequencies [[AA, aa, Aa] = [p2, q2, 2pq]].” Under the above conditions it is easy to demonstrate the following corollary: “For a population satisfying the HW conditions the allele frequencies are constant in time.” The notion of HW equilibrium adopted in the synthetic theory derives from the above corollary.

Attempting to establish a conceptual “bridge” with the definition of mechanical equilibrium in Newtonian mechanics, the synthetic theory adopts the idea that the existence of a time invariant observable leads to the hypothesis that the genetic population would be in equilibrium with no net “external forces” acting on the system [Stephens, 2004; Hartl and Clarke, 2007]. From this simple reasoning it became broadly accepted that these external forces should be represented by selection and therefore the theory derives the well-known definition of evolution based on the variation of the allele frequencies. This vision of biological evolution is strongly supported by the well-defined concept of mechanical equilibrium of [classical] physical systems: if the sum of all external forces is null, the physical system is said to be isolated and its [macroscopic] mechanical state [well characterized by appropriate mechanical variables of its constituents] is unaltered. In this situation the system is said to be in mechanical equilibrium. This notion is a result of the combined use of the well-known first and second Newton’s principles.

In the case of genetic systems, assuming that selection “forces” are not acting on the population, the system would be free of external forces and would be in a “genetic state” for which the allele frequencies are constant in time. The idea of genetic equilibrium state related to zero “external forces” would follow immediately. Although this theoretical construct may be appealing it is important to note that it is no more than an analogy with serious difficulties to be formally established as it is done in classical mechanics: in classical mechanics the notion of a force acting on a system [as a result of fundamental physical interaction] has the formal status of a [operational] definition. In fact, from the formal point of view, the first two Newtonian Principles are definitions, and therefore cannot be proven: the first principle is equivalent to the definition of [inertial] mass and the second can be seen as a prescription to calculate the force acting on the system. The notion of mechanical equilibrium emerges from the first two principles in the following way: according to the second principle [the sum of all forces acting on the mechanical system is defined as F = m·a, where the force F is understood to be the cause of the transformation of the state of movement, necessarily external to the system and the acceleration is identified with the observable system’s response] if the resultant of all forces is null then the system should behave in time according to the first principle, namely driven by its own inertia. In this case we say the system is in a special mechanical state called equilibrium, which is a very appealing notion since according to the same first principle the system should then remain in this state [eternally] unless an external action takes the system out of its state of mechanical equilibrium. Therefore, the idea of mechanical equilibrium is related to the impossibility to change the mechanical state of the system by means of the system itself. In other words, the state of a mechanical system can only be altered by means of the interaction of the system of interest with another one. The interaction with another system is described by the third principle; therefore, the three principles contemplate both the definition and identification of equilibrium state [combined use of the first two principles] and how the interaction with an external agent take the system out of an equilibrium state [mediated by the third principle]. As definitions of inertia and force, the first two principles are validated through results from specific applications. Only the third principle, the Action–Reaction Principle, has a more fundamental role since it is related to the [universal] law of conservation of momentum and intrinsic symmetries of the physical system. The formal results of classical mechanics cannot be derived solely by the first two principles. The third law is in the very foundation of the theory since it addresses the physical phenomenon itself [the interaction of two physical systems obeys basic principles].

In genetic systems viewed in analogy with classical mechanical systems, natural selection appears as the definition of a cause external to the system and capable of changing the system’s state [assumed to be defined by allelic frequencies], such that if natural selection is absent the system should evolve according to its own “inertia.” Therefore, if the analogy is fully considered, natural selection should be placed as a formal definition to be validated through concrete applications. In this line of thought, the biological law [or natural principle] analogous to the third law of classical mechanics is still to be found.

The above arguments should be sufficient to understand that the notion of genetic equilibrium in the framework of the synthetic theory is [at most] related to the concept of statistical equilibrium. Its relevance as a natural principle should be supported by experiments. In this respect it is interesting to note that in his original study, Hardy [1908] carefully chose the word “stability” and not equilibrium to describe the statistical invariance of allele frequencies. This is more than a semantics difference because a system can be stable without being [necessarily] at equilibrium, as for example is the case of metastable states in non-equilibrium thermodynamics and therefore stability and equilibrium do not necessarily refer to the same physical properties of dynamic systems [Kivelson and Reiss, 1999].

As it appears in textbooks on population genetics the canonical χ2 statistical test is currently used to compare observed and estimated proportions of alleles [Hartl and Clarke, 2007]. The basic idea is that if the measured number of genotypes in the population is statistically close enough [χ2 < 3.8414… and correspondingly p < 0.05] to the theoretically expected [given by the HW proportions] then the population is said to be in HW equilibrium; in other words the χ2-test is used as an indicator of deviation from randomness. Nevertheless, as stated in the HW theorem the set of properties the population has to satisfy constitute necessary but not sufficient conditions for the invariance of the allele frequencies. Therefore, in rigorous terms by merely satisfying the statistical condition χ2 < 3.8414… one cannot guarantee random mating or any of the other conditions [or premises] of the HW principle. To prove this statement it suffices to find a counterexample, namely: in a genetic system for which at least one of the conditions of the HW theorem is violated it is possible to satisfy the condition χ2 < 3.8414… Analytical counterexamples and a possible generalization of the HW principle have already been studied by Li [1988] and Stark [1976, 1980, 2005, 2006a,b] considering the case of infinite populations under non-random mating.

The χ2-test is used in experimental trials with genetic systems as a tool with the specific goal to identify if the genetic system is in a state of equilibrium characterized [and defined] by the conditions stated in the HW principle. This strategy, normally adopted in many studies [Salanti et al., 2005; Rodriguez et al., 2009], would be logically correct if the HW theorem would state necessary and sufficient conditions. The conclusion that the genetic system is in a state of [HW] equilibrium because the measured χ2 < 3.8414… [or any other value] lacks logical foundation. In fact, its usefulness and limitations for the study of genetic systems are currently under investigation [Rohlfs and Weir, 2008; Engels, 2009].

To exploit more deeply the possibilities of the system through counterexamples and other statistical properties relevant for the basic concept of genetic equilibrium, we performed numerical experiments with the simplest genetic system of two alleles and three genotypes [AA, Aa, aa] for very large but finite populations. The numerical simulations show that the stable ensemble distribution of χ2 leads to inconclusiveness about random/non-random mating for populations of any finite size. The constancy of the allele frequencies can only be observed for rigorously infinite populations under strict HW conditions. As a consequence we present strong arguments and evidences supporting the conclusion that for any finite genetic population under the HW conditions the time evolution of the allele frequencies is dynamically neutral with a corresponding equilibrium state attainable only in the infinite time horizon, and therefore unattainable and not well characterized by isolated observations. On the other hand in the case of non-random mating the allele frequencies obeys a stable dynamics with a well characterized stable equilibrium state. As a result we propose that the problem of characterizing equilibrium states in genetic systems should be addressed in dynamical terms where the use of statistical quantities like χ2 would acquire a secondary [auxiliary and not conclusive] role. Therefore, in order to address the question of equilibrium states in genetic systems we should move toward a different perspective by looking the genetic population as a dynamic system whose main signatures are coded in the time series of useful observables. To make our point more clear we analyze the time evolution of the two alleles – one locus system through numerical simulations.

Methods

Numerical experiments were performed for the genetic system defined by a population of N individuals composed by the three subpopulations with NAA, NAa, Naa individuals such that N = NAA + NAa + Naa. At each time step Nc reproducing couples [2Nc = ε N surviving/reproducing individuals composing the effective population, 0 < ε < 1] were chosen from the population according to a prescribed probability distribution [PAA, PAa, Paa] that assigns to an individual a stationary probability to be chosen among the individuals of the same genotype. Each individual is chosen according to the probability distribution [Pij] and two subsequently chosen individuals form one reproducing couple. This process continues till the total number of chosen individuals reaches the value 2Nc composing the reproductive population [or effective population]. Reproduction gives birth to a new generation composed by mNc individuals. To avoid the case of geometric explosion of the population size [m > 2] the model fixes an upper bound for the population size Nmax: if mNc > Nmax then the exceeding individuals [[mNc − Nmax] individuals] are randomly chosen and eliminated from the population. Each generation step is completed when the couples reproduce giving birth to the next generation of Nmax individuals. The generation step is considered as the discrete time unit of the model [see Figure 1].

Figure 1. Diagram of the numerical experiment that generates the time evolution of the genetic population. This figure illustrates one reproductive cycle. From the n-th generation the algorithm randomly selects Nc couples [2Nc reproducing individuals] corresponding to a fraction ε of the total population N. Couples are selected according to specific rules [random or not] and reproduction takes place with a mean number m of descendants per couple. When the population grows at geometric rate [m > 2] a prescribed number Nmax of individuals composes the surviving population defined as the [n + 1]-th generation.

The case of random mating corresponds to equiprobable individuals; the probabilities Pij to choose any individual are approximately given by the instantaneous frequencies Pij ≈ fij = Nij/N for each of the genotypes; in the case N → ∞ and Nij → ∞ the fractions fij converge to the a priori probabilities Pij. For N → ∞ under the remaining HW conditions the frequencies fij are time invariant and relate to the allele frequencies through HW proportions. For each chosen couple the model specified the number of descendants that could be deterministic and equal for every couple or could be probabilistically chosen from the values 1, 2, 3, or 4. In this case the population had a mean and stable reproductive capacity. Clearly, if the mean reproductive capacity was m = 2 then the size of the population fluctuated around a constant mean value; if m > 2 the population size grows geometrically – here [after reproduction] the fixed number Nmax of individuals are randomly chosen as viable ones [composing the next generation] in order to avoid the exponential explosion of the population size; if m < 2 the population trivially goes to extinction in finite time.

The numerical simulation starts by prescribing an initial population with N individuals with the genotype profile [NAA, NAa, Naa] at [t = 0]; the input parameters are the maximum population size Nmax, a set of four stationary probabilities assigned to the four possible values of the reproductive capacity of each couple [determining the mean reproductive capacity m], a probability matrix used to fix mating chances among the individuals [random mating being a special case], the fraction ε of surviving/reproducing individuals and the total number of generation steps [the simulation duration]. It is important to note that during the process of mating the condition of random mating is strongly dependent on the number of [still] uncoupled individuals because when two individuals are chosen to form a couple they are removed from the population such that the next couple is chosen among a smaller number of individuals. Clearly this may introduce a sampling effect over the mating process. Therefore, in order to minimize these effects and guarantee homogeneous mating rules for all individuals, a fraction of the [already large] population has to be disregarded. For instance, random mating is assured as much as one imposes large values of the reproductive capacity together with large values of death rate. The parameter ε specifies the fraction of the populations that survive and reproduce; clearly this parameter has to be carefully chosen in combination with the parameter m in order to avoid extinction of the population during the numerical simulation.

As outputs we measured the genotype frequencies, allele frequencies, and the value of χ2 over the total population at each time step. The computational platform allows accessing the time series of those quantities or to fix one time step tobs and observe the same quantities for a statistical ensemble of populations that evolve from t = 0 with the same initial conditions. In that way it is possible to study the statistical properties of the system over the system’s history or over the statistical [abstract] ensemble of populations for one fixed generation at time tobs. To favor clarity and avoid unnecessary complications in order to achieve our main objective to discuss the notion of equilibrium in genetic systems, we focus our attention on the effects of finite size of populations and random/non-random mating conditions.

The source program was written in C++ on a 64-bits Linux platform. The executable file is available upon request.

Results

To study the fluctuations of χ2 we avoid the problem of sampling by calculating the distribution of possible values of χ2 over a statistical ensemble ZL[N] made up of L copies of identical populations of size N. For each population of the ensemble we evolve the system till a fixed time point where the instantaneous value of χ2 is measured. As a result we obtain the ZL[N] – ensemble distribution of possible values of χ2 for populations of size N, at fixed time. As an example we present the results of numerical experiments for N = 3.0 × 106 and L = 105. Initially all populations of the ensemble are identical with the same genotype distribution Naa = NAA = NAa = 10 × 106. At each time step the number of reproducing individuals is fixed at Nrep = 1.8 × 106 in such a way that imbalance of the mating probabilities [due to sampling over the finite population] is minimized at each time step. Geometric reproduction rate is imposed to guarantee the same value of Nrep at each time step. The measurement of χ2 is made for each population of the ensemble of 105 populations at generation 30. Here we present the results of two typical simulations. Figure 2 shows the ensemble distributions of χ2 for populations under random mating condition and one example of non-random mating condition where the mating probabilities are fixed as P[AA] = 0.23, P[aa] = 0.23, P[Aa] = 0.54 in order to choose reproducing couples [these values are chosen just for the sake of presentation; indistinguishable distributions can be also obtained for different values of the mating probabilities]. As it can be clearly observed the two distributions are almost indistinguishable. As we consider finite populations the distribution of χ2 over the ensemble is not the [canonical] one

[where G[m] is the gamma function and k is the number of degrees of freedom; for the system of two alleles – one locus k = 1] applicable for the case of infinite populations. In fact, if the population is finite then the largest possible value of the variable χ2 is the population size N and therefore the appropriate distribution function χN2 -distribution should be null for χ2 > N. As a consequence for finite populations the critical value χc2≈3.8414 corresponds to a p-value p > 0.05.

Figure 2. Distributions P[χ2] over the statistical ensemble of 105 populations composed by 3.0 × 106 individuals each. The values of χ2 are measured at fixed generation [simulation time unit] 30. The case of random mating [full dots] and the case of non-random mating [empty triangles] are well fitted by exponential distributions for χ2 > 0.4 with a small deviation from the exponential function for small χ2 values. Both distributions are very stable with respect to the size of the statistical ensemble. The distributions are used to evaluate the probability of false negative [for the case of random mating] and the probability of false positive [for the case of non-random mating].

In the numerical experiment related to Figure 2 both distributions have a well fitted exponential tail for χ2 > 0.4. With the help of the ensemble distribution it is possible to estimate the probability of false negatives in the case of random mating as P[χ2 > 3.8414] ≈ 0.146 if we accept the critical value 3.8414… We note that this probability is slowly varying in the range N = 104–106 even if the statistical ensemble is sufficiently large. Moreover, it is very difficult to obtain the exact distribution for finite populations due to slow convergence with respect to the ensemble size. It is clear that the convergence to the χ2-distribution for infinite populations [P[χ2 > 3.8414] ≈ 0.05] is faster if we consider random samples over large populations. Nevertheless, the main conclusions presented here are unaltered by considering the χ2-distributions over the total population or over samples. As a result even if the population is known to be under the HW conditions there is always an irreducible probability of at least 0.05 [for the ideal case of inferences with respect to infinite populations] that the χ2 estimator leads to false negatives. More dramatic results are obtained for the case of non-random mating. The ensemble distribution P[χ2] is very similar to the case of random mating with a pronounced exponential tail and the probability of false positive can be estimated as P[χ2 < 3.8414] ≈ 0.848 [or P[χ2 < 3.8414] ≈ 0.95 in the case of the distribution over random samples]. Therefore if the population is subject to an external bias [not detectable by other means] resulting in non-random mating there is a significant probability to get measurements leading to false positives.

As a conclusion, the χ2 criterion can only be considered as a poor estimation of the conditions [HW or not] under which the genetic system is submitted. In other words, from the simple observation that χ2

Chủ Đề