An Overview of Genome-Wide Association Studies

Genome-wide association studies (GWAS) are observational tests that look at the entire genome in an attempt to find associations (connections) between specific areas on DNA (loci) and certain traits, such as common, chronic diseases. These associations have the potential to impact people in a number of ways.

By identifying genetic risk factors for a disease, the knowledge might lead to early detection or even prevention measures. GWAS may also improve treatment, allowing researchers to design treatments based on the specific underlying biology of a condition (precision medicine) rather than treating with the one-size-fits-all approach common to many of these conditions.

A scientist with a pipette working with DNA samples

 Andrew Brookes / Getty Images

How GWAS Can Change Our Understanding of Genetic Disease

At the current time, much of our genetic understanding of disease relates to uncommon conditions associated with single specific gene mutations, such as cystic fibrosis.

The potential impact of GWAS is significant, as these studies may reveal previously unknown variations in a number of genes in the genome at large that are associated with a wide range of common, complex chronic conditions.

A quick example of this is that GWAS have already been used to identify three genes that account for 74% of the attributable risk for age-related macular degeneration, a condition that had not been previously considered a genetic disease.

Genome-Wide Association Studies (GWAS) Overview

Before going into the details of genome-wide association studies (GWAS), it's helpful to define these studies from a big-picture standpoint.

GWAS may be defined as tests that may ultimately identify the (often several) genes responsible for a number of common, chronic medical conditions that had previously been thought of as related to the environment or lifestyle factors alone. With genes that raise the risk of a condition, doctors could thus screen those people at risk (or offer prevention strategies) while protecting people not at risk from the inevitable side effects and false positives associated with screening.

Learning about genetic associations with common diseases can also help researchers uncover the underlying biology. For most diseases, treatments are aimed primarily at treating symptoms and in a one-size-fits-all manner. By understanding the biology, treatments can be designed that get to the root of the problem, and in a personalized way.

History of Genetics and Disease

Genome-wide association studies were first performed in 2002, with the completion of the human genome project in 2003 making these studies fully possible. Prior to GWAS, an understanding of the genetic basis of disease was limited primarily to "single gene" conditions that had very significant effects (such as cystic fibrosis or Huntington's disease) and large genetic changes (such as the presence of an extra chromosome 21 with Down syndrome). Finding the specific genes that might be associated with a disease was a great challenge, as only specific genes were usually examined.

Unlike "single gene" conditions, it's likely that there are many genes from many different regions associated with most complex chronic diseases.

Single Nucleotide Polymorphisms (SNPs) and Genetic Variation

Genome-wide association studies look for specific loci (single nucleotide polymorphisms) in the entire genome that may be associated with a trait (such as a disease). Roughly 99% plus of the human genome is identical among all humans. The other portion, less than 1% of the human genome, contains variations between different people that may occur anywhere in the genome, throughout our DNA.

Single nucleotide polymorphisms (SNPs) are only one type of genetic variation found in the genome but are the most common.

Genome-wide association studies look for these specific loci or SNPs (pronounced "snips) to see if some are more common in people with a particular disease.

SNPs are an area of DNA that varies in a single nucleotide or base pair. Nucleotides are the bases that make up the building blocks or "letters" of the genetic code.

There are only four bases, A (adenine), C (cytosine), G (guanine), and T (thymine). Despite being an "alphabet" of only four letters, the variations created by different bases are almost limitless and account for the differences in traits between different people.

How Many SNPs Exist in the Human Genome?

There are roughly 300 billion nucleotides in the human genome, of which roughly one in 1,000 is a SNP. Each individual's genome contains between four million and five million SNPs.

Minor and Major SNPs

SNPs are classified as major or minor depending on the frequency of a SNP in a particular population. For example, if 80% of people had an A (adenine) in one position and 20% had a T (thymine), the SNP with an A would be considered a major or common SNP, and the SNP with a T, a minor SNP.

When SNPs occur within a gene, these regions are referred to as alleles, with most having two possible variations. The term "minor allele frequency" simply refers to the frequency of the less common allele, or a minor SNP.

Some rare diseases are characterized by a single, rare SNP; Huntington's disease, for example. With most common, complex diseases such as type II diabetes or heart disease, there may instead be many, relatively common SNPs.

Locations of SNPs

SNPs are found in different functional regions of the genome, and this region, in turn, plays a role in the effect they may have. SNPs may lie in:

  • The coding sequence of a gene
  • A non-coding region
  • Between genes (intergenic)

When a SNP is found with the coding sequence of a gene, it may have an effect on the protein coded for by that gene, changing its structure so that it has a deleterious effect, a beneficial effect, or no effect at all.

Each segment of three nucleotides (three SNPs) codes for one amino acid. There is redundancy in the genetic code, however, so that even if one nucleotide changes it may not result in a different amino acid being placed in a protein.

A change in an amino acid may change the structure and function of a protein or not, and if so, may result in different degrees of dysfunction of that protein. (Each combination of three bases determines which of 21 possible amino acids will be inserted in a particular region in a protein.)

SNPs that fall in a non-coding region or between genes may still have an effect on the biological function where they may play a regulatory role in the expression of nearby genes (they may affect functions such as transcription factor binding, etc.).

Types of SNPs in Coding Regions

Within the coding region of a gene, there are different types of SNPs as well.

  • Synonymous: A synonymous SNP will not change the amino acid.
  • Nonsynonymous: With nonsynonymous SNPs, there will be a change in the amino acid, but these can be of two different types.

Types of nonsynonymous SNPs include:

  • Missense mutations: These types of mutations result in a protein that does not function properly or does not function at all.
  • Nonsense mutations: These mutations result in a premature stop codon that results in a shortening of the protein.

SNPs vs. Mutations

The terms mutation and SNP (variation) are sometimes used interchangeably, although the term mutation is more often used to describe rare genetic variants; SNP is usually used to describe common genetic variations.

Germ Cell vs. Somatic Mutations

With the recent addition of targeted therapies for cancer (drugs that target specific genetic changes or mutations in cancer cells that drive the growth of a tumor), discussing gene mutations can be very confusing. The types of mutations found in cancer cells are most often somatic or acquired mutations.

Somatic or acquired mutations occur in the process of a cell becoming a cancer cell and are present only in the cells in which they originate (for example, cancerous lung cells). Since they are acquired after birth, they are not inherited or passed down from one generation to another.

When these acquired changes or mutations involve the change in a single base, they are usually referred to as a single nucleotide alteration instead of a SNP.

Germ cell or hereditary mutations, in contrast, are mutations or other genetic changes in DNA that are present from birth (conception) and can be inherited.

With GWAS, the focus is on genetic variations that are inherited, and therefore germ cell mutations that may be found.

How SNPs May Affect Biology

Many SNPs have little impact directly on biology but can serve as very useful markers to find the region of the genome that does. While SNPs may occur within a gene, they are more commonly found in non-coding regions.

When certain SNPs are found to be associated with a trait on genome-wide association studies, researchers then use further tests to examine the area of DNA near the SNP. In doing so, they may then identify a gene or genes that are associated with a trait.

An association alone does not prove that a SNP (or a particular gene near a SNP) causes a trait; further evaluation is needed. Scientists may look at the protein that is generated by the gene to assess it's function (or dysfunction). In doing so, it is sometimes possible to figure out the underlying biology that leads to that disease.

Genotype and Phenotype

When talking about SNPs and traits, it's helpful to define two more terms. Science has known for a very long time that genetic variations are related to phenotypes.

  • Genotypes refer to genetic variations, such as variations in SNPs.
  • Phenotypes refer to traits (for example, eye color or hair color) but may also include diseases, behavioral characteristics, and much more.

In an analogy, with GWAS researchers might look for SNPs (genetic variations) that are associated with a predisposition to be a blond or a brunette. As with findings in a genome-wide association study, the association (correlation) between genotype (SNPs in this case) and a trait (for example, hair color) does not necessarily mean that the genetic findings are the cause of the trait.

SNPs and Human Disease

It's important to note that with common diseases, a specific SNP is not usually the cause of a disease alone, but rather there is usually a combination of several SNPs (or at least the nearby gene) that may contribute to a disease to different degrees (severity) and in different ways.

In addition, variations in SNPs are usually combined with other genetic factors and environmental/lifestyle risk factors. Some SNPs may be associated with more than one disease, as well.

Not all SNPs are "bad" and some SNPs (as has been found with inflammatory bowel disease) may reduce the risk of a disease rather than increase risk. Findings such as this may lead researchers to find better treatments for disease, by learning about the protein coded for by the gene and trying to mimic the actions with a medication.

How They Are Done: Methods and Results

Genome-wide association studies may have different designs depending on the question to be answered. When looking at common medical conditions (such as Type 2 diabetes), researchers gather one group of people with the disease and another group that does not have the disease (the phenotype). GWAS are then done to see if there are any associations between genotype (in the form of SNPs) and the phenotype (the disease).


The first step in performing these studies is to obtain samples of DNA from the participants. This can be done through a blood sample or a cheek swab. The sample is purified to isolate the DNA from cells and other components in the blood. The isolated DNA is then placed on a chip that can be scanned in an automated machine.

Scanning and Statistical Analysis of Variations

The entire genome of the DNA samples is then scanned to look for genetic variations (SNPs) that are associated with a disease or other trait, or if specific SNPs (variations) are seen more in the disease group. If variations are found, statistical analysis is then done to estimate whether the variations between the two groups are statistically significant.

In other words, the results are analyzed to determine the probability that the disease or trait is indeed related to the genetic variation. These results are then displayed in a Manhattan plot.

Further Analysis and Follow-Up Confirmation

When evaluating findings, researchers use databases of genotype and phenotype (GWAS catalog) to compare known reference sequences with those that are found. The International HapMap Project (2005) provided the groundwork that, along with the completion of the Human Genome Project, has made these studies possible.

If variations are detected, they are said to be associated with a disease but not necessarily the cause of a disease, and further tests are conducted to look more closely at the area of the genome in the region where the SNPs were found.

This often involves sequencing a specific region (looking at the sequence of base pairs in DNA), the particular area, or whole exon sequencing.

Comparison to Other Genetic Tests

Most rare genetic diseases are caused by a gene mutation, but there are a number of different variations (mutations) in the same gene that may occur.

For example, a few thousand variations within the BRCA gene fall under the term BRCA mutation. Linkage analysis can be used to look for these variations. It is not, however, very helpful when looking at common, complex diseases.


As with most medical tests, there are limitations to genome-wide association studies. Some of these include:

  • Genetic limitations: Not all disease risk (genetic or environmental) is caused by common variants. For example, some conditions are caused by very rare variants, and others are caused by larger changes in the genome.
  • False negatives: GWAS may not detect all variants that are involved in a particular medical condition, and therefore give less complete information regarding any associations.
  • False positives: Certainly, associations may be detected between loci and disease that are due to chance rather than a connection between the two. One of the bigger concerns for some people is that an association found by GWAS may not have any true relevance to disease.
  • Errors: There is always a potential for error in genome-wide association studies, with multiple places where this could occur beginning with poor sampling, to errors in isolating DNA and applying it to a chip, to machine errors that could occur with automation. Once the data is available, errors in interpretation could occur as well. Careful quality control at each step of the process is a must.

These studies are also affected by sample size, with a smaller sample size being less likely to provide significant information.

Potential Impact and Clinical Applications

Genome-wide association studies have the potential to impact disease in many ways, from determining risk, to prevention, to designing personalized treatments, and much more. Perhaps the greatest potential of these studies, however, is their role in helping scientists figure out the underlying biology of common, complex medical conditions.

At the current time, many if not most of the treatments we have for disease are designed to help with the symptoms of the disease.

Genome-wide association studies (along with followup studies such as analysis of rare variants and whole-genome sequencing) allow researchers to study the biological mechanisms that cause these diseases in the first place, setting the stage for the development of treatments that address the cause rather than simply treat the symptoms.

Such treatments are in theory more likely to be effective while causing fewer side effects.

Susceptibility and Thus Early Detection of Disease

At the current time, many of the tests used to screen for medical conditions are based on the average risk of individuals. With some conditions, it's not cost-effective and could actually cause more harm than good to screen everyone.

By learning whether a person is more or less susceptible to a condition, screening could be tailored to that individual person, whether screening may be recommended more often, at an earlier age, with a different test, or perhaps not need to be screened at all.

Susceptibility to Risk Factors

Not all people are equally affected by toxins in the environment. For example, it's thought that women may be more susceptible to carcinogens in tobacco. Determining a person's susceptibility to exposures could not only help scientists look at prevention mechanisms, but may guide the public in other ways.

A possible example is that of coffee. Many studies have been done looking at coffee and the risk of various cancers and other diseases, with conflicting results. It could be that the answer depends on the particular person, and that drinking coffee may have positive effects for one person and be harmful to others due to variations in their genome.


The field of pharmacogenomics is already using findings to help predict an individual's response to a particular medication. Variations in a person's genetic makeup can affect how effective a drug will be, how it is metabolized in the body, and what side effects may occur. Testing can now help some people predict which antidepressants may be more effective.

Coumadin (warfarin) is a blood thinner that can be challenging to dose appropriately. If the dose is too low, it can be ineffective in preventing blood clots, potentially leading to pulmonary emboli, heart attacks, or ischemic strokes. On the other side of the spectrum, when the dose is too high (too much blood thinner) the result can be equally catastrophic, with people bleeding, for example, into their brain (hemorrhagic stroke).

Researchers were able to use GWAS to demonstrate variations in several genes that have a very significant influence on Coumadin dosing. This finding led to the development of genetic tests that can be used in the clinic to assist doctors in prescribing the proper dose of the drug.

Diagnosis and Treatment of Viral Diseases

Some people are more susceptible to certain viral infections than others, and it's known that people respond differently to treatments. The combination of GWAS and next-generation sequencing may help bring answers to both of those issues.

For example, genetic variation may increase susceptibility to HPV infection and cervical cancer. Knowing who is more susceptible could aid doctors in recommending both prevention and screening. Another example in which GWAS could be very helpful is in hepatitis C treatment, as people may respond very differently to treatments currently available.

Estimating Prognosis

Even with treatment, some people who appear to have a very similar diagnosis may have very different outcomes from a disease. GWAS may help identify who will respond well and who will not. Someone with a poor prognosis may need to be treated more aggressively, whereas a person with a very good prognosis may need less treatment; knowing this ahead of time might spare that person side effects.

Examples of GWAS Successes in Medicine

As of 2018, over 10,000 loci for common diseases (or other traits) had been identified, and that number continues to increase rapidly. There are several examples of how these studies may change the face of medicine.

Some of these discoveries are already changing our understanding of common diseases.

Macular Degeneration

One of the first eye-opening findings of genome-wide association studies was with regard to age-related macular degeneration, the leading cause of blindness in the United States. Prior to GWAS, macular degeneration was considered largely an environmental/lifestyle disease with little genetic basis.

GWAS determined that three genes account for 74% of the attributable risk for the disease. Not only was this surprising in a condition that had not previously been thought of as a genetic disease, but these studies helped demonstrate the biological basis for the disease by looking at a variation in the gene for complement protein H. This gene codes for a protein that regulates inflammation.

Knowing this, scientists can hopefully design treatments that are aimed at the cause rather than symptoms.

Inflammatory Bowel Disease

GWAS have identified a large number of loci associated with the development of inflammatory bowel diseases (ulcerative colitis and Crohn's disease), but also found a mutation that appears to protect against the development of ulcerative colitis. By studying the protein made by this gene, scientists can hopefully design a medication that could likewise control or prevent the disease.

Many Other Medical Conditions

There are many more common medical conditions in which GWAS has made important findings. Just a few of these include:

A Word From Verywell

Genome-wide association studies have already improved our understanding of many common diseases. Following the clues in these studies that point to the underlying biological mechanisms of disease has the potential to transform not only treatment but possibly prevention of these conditions in the future.

Was this page helpful?
Article Sources
Verywell Health uses only high-quality sources, including peer-reviewed studies, to support the facts within our articles. Read our editorial process to learn more about how we fact-check and keep our content accurate, reliable, and trustworthy.