BOSTON UNIVERSITY SCHOOL OF MEDICINEdepartment of genetics & genomicsGMED
Associations from the FHS Offspring Cohort 100K Scan
 About the Project   Phenotype Browser   Genome Browser   Tools  

Analysis of the FHS 100K Genome Scan Data

Our analysis of the Framingham Heart Study (FHS) Offspring Cohort leverages the study's family-based design. Associations were found using two analytical strategies, PBAT and LME/FBAT. These analysis methods are described below.

PBAT

PBAT employs a screening step in which only those SNPs with the greatest power, based on estimated offspring genotypes and measured offspring phenotypes, are tested for transmission to offspring. The method is described in Van Steen et al., 2005 and was utilized to study obesity in Herbert et al., 2006. PBAT is especially well suited to family-based genome-wide association studies because the screening step helps to control for the large number of SNPs tested for association in these types of studies.

PBAT was developed by Drs. Nan Laird and Christoph Lange of Harvard School of Public Health.

PBAT Analysis Strategy

PBAT performs genomic screening and testing using a single data set, and can incorporate correlated, longitudinal data. The analysis is divided into two statistically independent parts: a screening step, and a test step. In the screening step, the expected offspring genotypes, computed from the parental genotypes assuming Mendelian transmission, are used to select those SNPs most likely to affect the offspring phenotype and to select the most powerful genetic model for detecting an association. Missing parental genotypes can be inferred using sufficient statistics.

As a result of using only parental genotypes to predict offspring phenotypes, we have shown that the screening step is statistically independent of the subsequent test for association, which uses measured offspring genotypes. The test step assumes alleles are transmitted from parent to child stochastically and tests whether measured offspring genotypes are distributed independently of offspring phenotypes. The method has been through extensive simulation studies [Van Steen et al., 2005, Lange et al., 2003a, Lange et al., 2003b]. A proof of the statistical independence of the screening and test steps is presented in Herbert et al., 2006 in supplementary online methods. The subset of SNPs selected in the screening step is formally tested for association using FBAT [Hovarth et al., 2001, Lange et al., 2002] in the test step. FBAT correlates offspring genotypes with offspring phenotypes using parental information to control for population admixture.

The screening step estimates the power of FBAT to detect an association in the test step and makes use of all available parental information. FBAT [Laird et al., 2000] is a generalization of the transmission disequilibrium test [Spielman et al., 2003] and assesses whether the over/under-transmission of an allele is correlated with offspring phenotypes. Since the allele that is transmitted from the parent to the offspring is selected stochastically, the test step is statistically independent of the screening step that conditions only on the parental genotypes. Thus, while estimates of genetic effect size in the screening step can be biased by population stratification, the power estimates from the screening step do not bias the significance level of any subsequently computed FBAT-statistic in the test step.

Following the guidelines we developed in Van Steen et al., 2005, the 10 SNPs with the highest power estimates are tested for association using the offspring genotypes in the FBAT test. All other SNPs are discarded from the analysis. So instead of performing ~100,000 tests for association with a trait using offspring genotypes (as would otherwise be required in this study), we perform only 10, alleviating an otherwise severe multiple comparison problem [Van Steen et al., 2005]. Since the power estimates from the screening step do not bias the significance level of any subsequently computed FBAT-statistic in the test step [Van Steen et al., 2005, Lange et al., 2003a, Lange et al., 2003b] the FBAT-results for the selected 10 SNPs have to be adjusted for only 10 comparisons. A SNP that reaches significance after this adjustment is also significant at a genome-wide level [Van Steen et al., 2005]. While choosing to evaluate the 10 SNPs by power from the screening step seems somewhat arbitrary, it has proven to be a useful threshold in both real and simulated analyses. The analysis strategy just described is implemented in the software package PBAT [Lange et al., 2004].

 

LME/FBAT

A second, more liberal, approach for finding associations utilizes a linear mixed effects regression model in the screening step followed by a transmission test in the second step implemented in FBAT. In this case genotype data from both parents and offspring is used in the screening step, resulting in the two steps not being statistically independent. The goal is to provide a more robust estimate of the population effect of a given genotype by using all available information. In contrast, PBAT does not utilize measured offspring genotypes.

Inclusion of a second step that is transmission based, FBAT, ensures that positive associations do not result from admixture. This analytic strategy is likely to result in fewer false negatives than PBAT but more false positives. Associations obtained by LME/FBAT are therefore not considered to reach genome-wide significance and merely represent associations that may be worthy of replication in other populations.

LME/FBAT was developed by Dr. Alan Herbert of Boston University.

Screening. Associations were determined using a linear mixed-effects model with sex and age category as fixed effects. Since there were repeated measures on individuals and on families, we used participant ID and nuclear family ID as grouping variables in these models.

Testing using FBAT. The analysis was performed using FBAT as implemented in the program PBAT. The median phenotypic values were adjusted for sex and median age for the 6 exams. Two genetic models were examined, a recessive model (which for a biallelic SNP is equivalent to a dominant model for the other allele) and an additive model. Only analyses with more than 20 informative families were considered because asymptotic p-values for small sample size are unreliable under the assumption that the test statistic has a chi-square distribution.

Estimates of the False-Positive Error Rate for PBAT and LME/FBAT

False-positive error rates were estimated by simulation. Simulations were performed by permutation of participant identifiers while maintaining the same phenotypic data and family structure. Data was analyzed using a linear-mixed effects model after reordering data from each exam by age-category. This approach enabled use of multiple measurements for each individual. The median measurement for each individual was analyzed using the family-based association test FBAT. Results are compared to those obtained with PBAT. No false positives were found using the Van Steen criterion applied to the top 20 and top 100 by power.

Expected false-positive rate for PBAT & LME/FBAT per 100,000 SNPs analyzed

  95% LL mean 95% UL
LME/FBAT 3.00 5.00 8.00
PBAT 0.32 0.33 0.34

LL: Lower limit. UL: Upper limit.

The false-positive rate for LME/FBAT is higher than with PBAT but is quite manageable. The increased rate of false-positive errors represents a reasonable tradeoff as LME/FBAT results in a 15-fold increase in the number of associations detected as compared to PBAT. These data are shown in the next section.

Summary of Associations

The following table shows the number of associations found using PBAT and LME/FBAT as of September 1st, 2006.

Associations identified by PBAT & LME/FBAT

Phenotype PBAT LME/FBAT Both
BMI 5 47 2
DBP 0 24 0
LDL 0 18 0
SBP 1 23 0
Triglycerides 1 15 1
VLDL 0 32 0
Cholesterol 0 36 0
Glucose 4 22 2
HDL 5 27 4
TOTAL 16 244 9

Several SNPs have been identified as being associated with multiple phenotypes. These are listed in the following table. In some cases the association with multiple phenotypes reflects the high degree of correlation between the phenotypes.

SNPs found to be associated with multiple phenotypes

dbSNP ID Phenotype 1 Phenotype 2 Cytoband Physical Location Exam 3 Exam 6
rs10489535 cholesterol TG 1p36.31 6278091 0.34 0.28
rs815160 TG HDL 1q31.1 186988747 -0.45 -0.41
rs1171381 TG HDL 1q31.1 187037999 -0.45 -0.41
rs2827196 DBP SBP 21q21.1 22287002 0.77 0.58
rs1550922 glucose BMI 3p14.2 59634698 0.29 0.28
rs10510468 VLDL HDL 3p24.3 16913627 -0.43 -
rs2286983 cholesterol BMI 3q26.31 173775186 0.19 0.01
rs39317 cholesterol LDL 7q31.2 116560255 0.89 0.88
rs752658 cholesterol LDL 10q26.2 130245784 0.89 0.88
rs287474 cholesterol LDL 13q21.33 68173961 0.89 0.88
rs287354 cholesterol LDL 13q21.33 68137953 0.89 0.88
rs966376 cholesterol HDL 18q12.1 27086836 -0.01 0.14

Correlations between phenotypes for Exams 3 and 6 are shown in the two right most columns. Individuals on lipid lowering or hypoglycemic medications were excluded from these analyses.

Other Analysis Methods

To facilitate the application of other data analysis methods to these data, all data has been returned to the NHLBI and is available to all qualified investigators without restriction. More information about requesting genetic and phenotypic data from the FHS study can be found at http://www.nhlbi.nih.gov/about/framingham/policies/index.htm

References

Herbert, A., N.P. Gerry, M.B. McQueen, I.M. Heid, A. Pfeufer, T. Illig, H.E. Wichmann, T. Meitinger, D. Hunter, F.B. Hu, G. Colditz, A. Hinney, J. Hebebrand, K. Koberwitz, X. Zhu, R. Cooper, K. Ardlie, H. Lyon, J.N. Hirschhorn, N.M. Laird, M.E. Lenburg, C. Lange, and M.F. Christman. A common genetic variant is associated with adult and childhood obesity. Science. 2006. 312(5771): p. 279-83.

Van Steen K, McQueen MB, Herbert A, Raby B, Lyon H, Demeo DL, Murphy A, Su J, Datta S, Rosenow C, et al: Genomic screening and replication using the same data set in family-based association testing. Nat Genet 2005, 37:683-691.

Lange C, DeMeo D, Silverman EK, Weiss ST, Laird NM: PBAT: tools for family-based association studies. Am J Hum Genet 2004, 74:367-369.

Lange C, Lyon H, DeMeo D, Raby B, Silverman EK, Weiss ST: A new powerful non-parametric two-stage approach for testing multiple phenotypes in family-based association studies. Hum Hered 2003, 56:10-17.

Lange C, DeMeo D, Silverman EK, Weiss ST, Laird NM: Using the noninformative families in family-based association tests: a powerful new testing strategy. Am J Hum Genet 2003, 73:801-811.

Lange C, DeMeo DL, Laird NM: Power and design considerations for a general class of family-based association tests: quantitative traits. Am J Hum Genet 2002, 71:1330-1341.

Horvath S, Xu X, Laird NM: The family based association test method: strategies for studying general genotype--phenotype associations. Eur J Hum Genet 2001, 9:301-306.

Laird, N.M., Horvath, S., Xu, X. (2000) Genet Epidemiol 19 Suppl 1, S36.

Spielman, R. S., McGinnis, R. E., and Ewens, W. J. (1993). Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am J Hum Genet 52, 506-516.