Using Hardy-Weinberg Equilibrium

Let’s start with a problem. Your task is to try to prevent the spread of the genetic disorder cystic fibrosis. How do you go about the process of prevention?

Let’s begin with what you know. For starters, this disease is controlled by one gene with two alleles. It is an autosomal recessive disorder, meaning that a person has to inherit two broken copies of the gene. The working allele, the one that doesn’t give a person cystic fibrosis, we will call A. The non-working allele, we will call a.

In the case of this cystic fibrosis gene, a person can have three genotypes, or combinations of alleles: AA, Aa, or aa. It is easy to tell who has the aa genotype as these people are the ones with the symptoms of the disease. What is more difficult is discerning the difference between the AA and Aa genotypes.

Back to the problem at hand. What do we need to know to stop the spread of this disease?

This is a recessive condition we are talking about. One of the alleles masks the other. Fortunately for the carriers, the good allele masks the bad allele, but herein lies our problem. How do we identify the carriers? Genetic disorders are heritable conditions, meaning they are passed on from parents to offspring. If two people with the AA genotype get together and make babies, their children will have the AA genotype. This is because each parent donates one of their alleles. This allele then takes its place in the child’s genome next to the allele donated by the other parent. We have now produced the child’s genome. In this scenario, both parents only had the A allele to give away, so the A allele is all the child can get.

It gets more complicated with carriers of genetic diseases. Let’s say we have a parent with the AA genotype and a parent with the Aa genotype. The resulting child will be healthy (read: no disease), but there is a chance the child will be a carrier. There is also a chance that this particular a allele will not be passed on. So, parent 1 will give either one of their A alleles to the child, but parent 2, the carrier, will give either an A or the a allele. There is a 50% chance that the child will then receive the Aa genotype. This has everything to do with whether the carrier parent passes on either A or a.

The real problem here is when two carriers decide to have a baby. Parent 1 (a carrier) will pass on either the A or the a allele, and Parent 2 (also a carrier) will do the same thing. These are two healthy people, most definitely not thinking of cystic fibrosis while starting their family. But there is a bit of danger here. Carrier parent 1 (Aa genotype) will give the child either allele A or a. Carrier parent 2 (Aa genotype) will do the same thing. With two carriers, we have the greatest range of possibilities on this spectrum. Their child will have one of three outcomes: AA, Aa, or aa.

The disease rears its head when two carriers procreate. If we want to control the spread of cystic fibrosis, we will want to know how common carriers are.

This is a difficult problem. Carriers of autosomal recessive conditions have no idea that they are carriers. They look and feel just like people who are completely free of the disease. This is our core problem. We cannot simply ask for a show of hands and find out that way how many people are in danger of passing on the disease.

We need to devise a way to estimate the number of carriers in a population if we have any hope of preventing the disease.

Enter Godfrey Hardy and Wilhelm Weinberg, A British mathematician and German physician, respectively. They devised a way to solve this problem:


Above is the equation for the Hardy-Weinberg Equilibrium. This is our tool for finding out how many carriers our in our population. Let me take a minute to explain what’s going on here. P and q here represent our alleles. We can say that p represents A in the equation and q represents a. Why don’t we just call p and q A and a then? Well, its because they represent something slightly different than simply the alleles themselves.

Instead, we have to think about the genes in terms of the Gene pool of the population. Let’s imagine something that may be slightly horrific. For every person in our population, we break them open and take their genes. If the person has the AA genotype, then we take their two A alleles and we throw them into the pool. If someone has the aa genotype, they contribute to a alleles to the pool. The carriers will contribute one of each.

Now, we have begun to think about genes the way population geneticists tend to think about them. We are not concerned with the genotypes of individual members of our population, but instead with the total of alleles in the gene pool. The gene pool reflects the total number of A alleles in the population plus the number of a alleles in the population.

Time for an oversimplified example for the sake of illustration. Pretend we are studying cystic fibrosis within a population of people living on a small, isolated island. 1000 people are living on this island. Each person has two alleles for this cystic fibrosis gene. This means that the gene pool for this population contains 2000 alleles. If 1500 of these alleles are As and 500 of them are as, then 75% of the alleles in our gene pool are A and the remaining 25% are a. Given that there are only 2 alleles, if we know how many copies of one allele are in our gene pool, we know that everything else has to be the other. If we express these percentages as proportions of our whole, then we have our values of p and q.

So, p=.75 and q=.25.

Great! We can now plug this into our equation.



So what does this mean?

Remember, the equation is 1=p^2+2pq+q^2, so let’s define these terms. If p represents the proportion of A alleles in our gene pool, then p^2 represents the proportion of people in our population with the AA genotype. Then q^2 represents the proportion of people with the aa genotype, and 2pq represents the Aa genotype. Fantastic. If we know what is in the gene pool, we can make predictions about the population. In this scenario, on our island, roughly 56 percent of people will have nothing to do with the disease – they neither exhibit it nor carry it. About 6% will have cystic fibrosis, and about 38% are the carriers. We can learn something we did not previously know here. We now have an idea of how prevalent the disease may be in the future by knowing how many people may be able to pass it on. This is key in trying to treat the population for this genetic disorder.

However, you may have spotted a slight problem with this. If finding out how many carriers are in our population is our goal, then there must be easier ways of discerning this. Knowing exactly how many alleles of each type are in the gene pool presents a difficult problem. To do this, we would have to genetically screen our entire population to get an accurate snapshot of the gene pool. By then we would have the answer to our question, but rendering the Hardy-Weinberg equation pointless.

I hear you shouting at your screen in frustration at me. Before you close your browser window, let me explain that you never have that information to begin with. I know this. That example was a demonstration of the underlying mathematics at play here.

In reality, as a doctor on an island of 1000 people, you have to work with the information available to you. So what information do we have? For starters, we know how many people have cystic fibrosis. Our unknowns are related to our inability to discern carriers from non-carriers.

This is an autosomal recessive condition, so short of testing everyone’s genes (not practical here) we can still solve this problem.

The people with cystic fibrosis are homozygous recessive. We know this to be a fact, so we know the genotype of all the people with the condition. Let’s go back to the equation. Remember how q^2 represents the proportion of homozygous individuals for the q allele? We can take the proportion of homozygotes here and work backward. We know that 6.25% of the population is afflicted, so we know q^2=.0625. Take the square root of this, and we have our q value. Subtract q from 1, and we have p. From there, we can fallow through with the rest as we did above. Now we have a prediction for the total number of carriers in our population.

%d bloggers like this: