1. What is the outcome of each of the 3 CVD risk calculator in the Qatar Biobank sample
The outcome of risk calculator was categorise as per AHA 2013 guideline to the following categories high risk >20% , Moderate Risk 10% to 20%, low risk < 10%
Below table illustrates the results from the pooled cohort equation risk calculator.
Pooled cohort equation Risk prevalence
Frequency Percent Valid Percent Cumulative Percent
Valid High risk 16 2.3 2.3 2.3
Low risk 627 90.7 90.7 93.1
Moderate risk 48 6.9 6.9 100.0
Total 691 100.0 100.0
Table below represent the results after combining the moderate and high risk together at which it was carried out due to the population sample mainly shifted toward younger participants (Mean age = 38)
Pooled cohort equation Risk prevalence combining high and moderate
Frequency Percent
Valid High & Moderate risk 64 9.3
Low risk 627 90.7
Total 691 100.0
The outcome of Framingham Lipid equation Risk is shown in the bellow table
Framingham Lipid equation Risk prevalence
Frequency Percent Valid Percent Cumulative Percent
Valid High risk 54 7.8 7.8 7.8
Low risk 554 80.2 80.2 88.0
Moderate risk 83 12.0 12.0 100.0
Total 691 100.0 100.0
Framingham Lipid equation Risk prevalence combining high and moderate
Frequency Percent
Valid High & Moderate risk 137 19.8
Low risk 554 80.2
risk
Total 691 100.0
The outcome of Framingham Lipid equation Risk is shown in the bellow table
Framingham BMI equation Risk prevalence
Frequency Percent Valid Percent Cumulative Percent
Valid High risk 69 10.0 10.0 10.0
Low risk 523 75.7 75.7 85.7
Moderate risk 99 14.3 14.3 100.0
Total 691 100.0 100.0
Framingham BMI equation Risk prevalence combining high and moderate
Frequency Percent
Valid High & Moderate risk 168 24.3
Low risk 523 75.7
Total 691 100.0
Q2 what is the prevalence of CVD Risk in the Qatari population within specified subgroups?
The following result shows the prevalence of each calculator according to the age group and gender
Age group * pooled cohort equation Risk prevalence (Crosstabulation)
Gender: Total
Risk Categories Total
High risk Low risk Moderate risk
agegroup2 >60 Count 10 6 15 31
% within age group 32.3% 19.4% 48.4% 100.0%
% within Risk Categories 62.5% 1.0% 31.3% 4.5%
% of Total 1.4% 0.9% 2.2% 4.5%
20-30 Count 0 231 0 231
% within age group 0.0% 100.0% 0.0% 100.0%
% within Risk Categories 0.0% 36.8% 0.0% 33.4%
% of Total 0.0% 33.4% 0.0% 33.4%
31-40 Count 0 168 2 170
% within age group 0.0% 98.8% 1.2% 100.0%
% within Risk Categories 0.0% 26.8% 4.2% 24.6%
% of Total 0.0% 24.3% 0.3% 24.6%
41-50 Count 1 140 8 149
% within age group 0.7% 94.0% 5.4% 100.0%
% within Risk Categories 6.3% 22.3% 16.7% 21.6%
% of Total 0.1% 20.3% 1.2% 21.6%
51-60 Count 5 82 23 110
% within age group 4.5% 74.5% 20.9% 100.0%
% within Risk Categories 31.3% 13.1% 47.9% 15.9%
% of Total 0.7% 11.9% 3.3% 15.9%
Total Count 16 627 48 691
% within age group 2.3% 90.7% 6.9% 100.0%
% within Risk Categories 100.0% 100.0% 100.0% 100.0%
% of Total 2.3% 90.7% 6.9% 100.0%
Pooled cohort equation Risk prevalence * Gender Crosstabulation
Gender Total
f m
Risk Categories High risk Count 0 16 16
% of Total 0.0% 2.3% 2.3%
Low risk Count 291 336 627
% of Total 42.1% 48.6% 90.7%
Moderate risk Count 4 44 48
% of Total 0.6% 6.4% 6.9%
Total Count 295 396 691
% of Total 42.7% 57.3% 100.0%
Age group * Framingham Lipid equation Risk prevalence (Crosstabulation)
Risk categories Total
High risk Low risk Moderate risk
agegroup2 >60 Count 19 3 9 31
% within age group 61.3% 9.7% 29.0% 100.0%
% within Risk categories 35.2% 0.5% 10.8% 4.5%
% of Total 2.7% 0.4% 1.3% 4.5%
20-30 Count 0 231 0 231
% within age group 0.0% 100.0% 0.0% 100.0%
% within Risk categories 0.0% 41.7% 0.0% 33.4%
% of Total 0.0% 33.4% 0.0% 33.4%
31-40 Count 1 164 5 170
% within age group 0.6% 96.5% 2.9% 100.0%
% within Risk categories 1.9% 29.6% 6.0% 24.6%
% of Total 0.1% 23.7% 0.7% 24.6%
41-50 Count 8 112 29 149
% within age group 5.4% 75.2% 19.5% 100.0%
% within Risk categories 14.8% 20.2% 34.9% 21.6%
% of Total 1.2% 16.2% 4.2% 21.6%
51-60 Count 26 44 40 110
% within age group 23.6% 40.0% 36.4% 100.0%
% within Risk categories 48.1% 7.9% 48.2% 15.9%
% of Total 3.8% 6.4% 5.8% 15.9%
Total Count 54 554 83 691
% within age group 7.8% 80.2% 12.0% 100.0%
% within Risk categories 100.0% 100.0% 100.0% 100.0%
% of Total 7.8% 80.2% 12.0% 100.0%
Framingham Lipid equation Risk prevalence * Gender Crosstabulation
Gender Total
f m
LipidRiskcategories High risk Count 4 50 54
% of Total 0.6% 7.2% 7.8%
Low risk Count 284 270 554
% of Total 41.1% 39.1% 80.2%
Moderate risk Count 7 76 83
% of Total 1.0% 11.0% 12.0%
Total Count 295 396 691
% of Total 42.7% 57.3% 100.0%
Age group * Framingham BMI equation Risk prevalence (Crosstabulation)
Gender: Total
Risk categories Total
High risk Low risk Moderate risk
agegroup2 >60 Count 26 0 5 31
% within age group 83.9% 0.0% 16.1% 100.0%
% within Risk categories 37.7% 0.0% 5.1% 4.5%
% of Total 3.8% 0.0% 0.7% 4.5%
20-30 Count 0 231 0 231
% within age group 0.0% 100.0% 0.0% 100.0%
% within Risk categories 0.0% 44.2% 0.0% 33.4%
% of Total 0.0% 33.4% 0.0% 33.4%
31-40 Count 0 165 5 170
% within age group 0.0% 97.1% 2.9% 100.0%
% within Risk categories 0.0% 31.5% 5.1% 24.6%
% of Total 0.0% 23.9% 0.7% 24.6%
41-50 Count 10 96 43 149
% within age group 6.7% 64.4% 28.9% 100.0%
% within Risk categories 14.5% 18.4% 43.4% 21.6%
% of Total 1.4% 13.9% 6.2% 21.6%
51-60 Count 33 31 46 110
% within age group 30.0% 28.2% 41.8% 100.0%
% within Risk categories 47.8% 5.9% 46.5% 15.9%
% of Total 4.8% 4.5% 6.7% 15.9%
Total Count 69 523 99 691
% within age group 10.0% 75.7% 14.3% 100.0%
% within Risk categories 100.0% 100.0% 100.0% 100.0%
% of Total 10.0% 75.7% 14.3% 100.0%
Framingham BMI equation Risk prevalence * Gender Crosstabulation
Gender Total
f m
Risk categories High risk Count 6 63 69
% of Total 0.9% 9.1% 10.0%
Low risk Count 268 255 523
% of Total 38.8% 36.9% 75.7%
Moderate risk Count 21 78 99
% of Total 3.0% 11.3% 14.3%
Total Count 295 396 691
% of Total 42.7% 57.3% 100.0%
Q3 what is the level of agreement between the three calculators?
Agreement between the three risk calculators was assessed in two ways: Intra-Class Correlations (ICC), and Cohens Kappa Index.
Intra-Class Correlations (ICC)
ICC were measured on the (a) raw calculated raw risk scores (Table 1) as well as on (b) the risk classes (1=Low, 2=Medium, and 3=High) resulting from each calculator (Table 2).
Table 1: Intra-class Correlation Coefficient
Intra-class Correlationb 95% Confidence Interval F Test with True Value 0
Lower Bound Upper Bound Value df1 df2 Sig
Single Measures .869a .853 .884 20.950 690 1380 .000
Average Measures .952c .946 .958 20.950 690 1380 .000
Two-way mixed effects model where people effects are random and measures effects are fixed.
a. The estimator is the same, whether the interaction effect is present or not.
b. Type C intraclass correlation coefficients using a consistency definition-the between-measure variance is excluded from the denominator variance.
c. This estimate is computed assuming the interaction effect is absent, because it is not estimable otherwise.
Table 2: Intra-class Correlation Coefficient
Intra-class Correlationb 95% Confidence Interval F Test with True Value 0
Lower Bound Upper Bound Value df1 df2 Sig
Single Measures .744a .715 .770 9.705 690 1380 .000
Average Measures .897c .883 .910 9.705 690 1380 .000
Two-way mixed effects model where people effects are random and measures effects are fixed.
a. The estimator is the same, whether the interaction effect is present or not.
b. Type C intraclass correlation coefficients using a consistency definition-the between-measure variance is excluded from the denominator variance.
c. This estimate is computed assuming the interaction effect is absent, because it is not estimable otherwise.
In both Table 1 and Table 2, ICC (two-way mixed, consistency, single and average-measures (McGraw & Wong, 1996) indicate overall consistency between the three CVD risk calculators in their ratings for CVS risk across individuals in the study sample. The resulting ICC was in the excellent range, 0.744 <= ICC <= 0.952 (Cicchetti, 1994: Cicchetti DV. Guidelines, criteria, and rules of thumb for evaluating normed and standardized
assessment instruments in psychology. Psychological Assessment. 1994; 6(4):284290.), indicating that risk calculators had a high degree of agreement and suggesting that CVD risk was rated similarly across CVD risk calculators. The high ICC suggests that a minimal amount of measurement error was introduced by the risk calculators, and therefore statistical power for subsequent analyses is not substantially reduced. CVD risk ratings were therefore deemed to be suitable for use in the hypothesis tests of the present study (Hallgren, 2012: Kevin A. Hallgren; Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial, Tutor Quant Methods Psychol. 2012; 8(1): 2334).
Cohens Kappa Index
An agreement analysis was performed to assess the degree that the CVD risk calculators consistently assigned categorical risk classes ratings to individuals in the study sample. Kappa was computed for each CVD risk calculators pair (Tables 3, 4, and 5) then averaged to provide a single index of agreement (Light, 1971: Light RJ. Measures of response agreement for qualitative data: Some generalizations and alternatives; Psychological Bulletin. 1971; 76(5):365377.). The value of the Kappa index for the individual pairs of risk calculator indicate that there is a substantial agreement between FRAMINGHAM LIPID RISK CATEGORIES and Framingham BMI Risk categories (? = 0.7) (Landis & Koch, 1977: Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977; 33(1):159174. [PubMed: 843571])), a moderate agreement between FRAMINGHAM LIPID RISK CATEGORIES and POOLED COHORT EQUATION RISK (? = 0.4) and a fair agreement between F2RS Framingham BMI Risk calculator and POOLED COHORT EQUATION RISK calculator(? = 0.3). The overall average kappa indicated moderate agreement, ? = 0.46, and is adequately close, but not identical to, the agreement level using ICC.
Table 3: Framingham lipid risk categories * Framingham BMI risk categories Crosstabulation
Count
F2RSClass Total
High Low Medium
F1RSClass High 48 2 4 54
Low 0 514 40 554
Medium 21 7 55 83
Total 69 523 99 691
Table 4: Symmetric Measures
Value Asymp. Std. Errora Approx. Tb Approx. Sig.
Measure of Agreement Kappa .709 .029 24.368 .000
N of Valid Cases 691
a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis.
Table 5: Framingham lipid risk categories * pooled cohort equation risk categories Crosstabulation
Count
POOLED COHORT EQUATION RISKClass Total
High Low Medium
FRAMINGHAM LIPID RISK CATEGORIESClass High 16 1 37 54
Low 0 554 0 554
Medium 0 72 11 83
Total 16 627 48 691
Table 6: Symmetric Measures
Value Asymp. Std. Errora Approx. Tb Approx. Sig.
Measure of Agreement Kappa .393 .036 14.173 .000
N of Valid Cases 691
a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis.
Table 7: Framingham BMI risk categories * pooled cohort equation risk categories
Count
POOLED COHORT EQUATION RISKClass Total
High Low Medium
FRAMINGHAM BMI EQUATION RISKClass High 16 15 38 69
Low 0 520 3 523
Medium 0 92 7 99
Total 16 627 48 691
Table 8: Symmetric Measures
Value Asymp. Std. Errora Approx. Tb Approx. Sig.
Measure of Agreement Kappa .288 .032 11.088 .000
N of Valid Cases 691
a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis.
Predictive Modeling
We analyzed and validated the predictive power of each risk calculator using two predictive modeling techniques: binary logistic regression and a decision tree (using CHAID algorithm). The logistic regression is a parametric technique that is widely used by bio-medical researchers, whereas the decision tree is a machine learning technique that is non-parametric and that is most known for its easy interpretation and understandability.
Framingham lipid risk calculator Logistic Regression
Table 1: Omnibus Tests of Model Coefficients
Chi-square df Sig.
Step 1 Step 502.601 6 .000
Block 502.601 6 .000
Model 502.601 6 .000
Table 1 indicates that the logistic regression model is significant (p < 0.001) for FRAMINGHAM LIPID RISK CATEGORIES at the 0.05 significance level, and explains a high proportion of variability in the FRAMINGHAM LIPID RISK CATEGORIES variability as shown by the pseudo-r-square values in Table 2. The model goodness of fit is confirmed by the Hosmer-Lemeshow goodness-of-fit test (Table 3).
Tabl3 2: Model Summary
Step -2 Log likelihood Cox & Snell R Square Nagelkerke R Square
1 185.615a .517 .820
a. Estimation terminated at iteration number 9 because parameter estimates changed by less than .001.
Table 3: Hosmer and Lemeshow Test
Step Chi-square df Sig.
1 4.788 8 .780
Table 4 indicates a high accuracy of CVD classification using FRAMINGHAM LIPID RISK CATEGORIES.
Table 4: Classification Tablea
Observed Predicted
F1RiskClass01 Percentage Correct
0 1
Step 1 F1RiskClass01 0 532 22 96.0
1 24 113 82.5
Overall Percentage 93.3
a. The cut value is .500
Table 5 shows that all variables are significant for CVD risk classification as well as the coefficients that relate the independent variables to the risk score calculated by Framingham 1 CVD risk calculator.
Table 5: Variables in the Equation
B S.E. Wald df Sig. Exp(B)
Step 1a Gender01 3.613 .612 34.808 1 .000 37.062
Age .290 .036 66.440 1 .000 1.336
SBP .063 .016 14.969 1 .000 1.066
TRTBP 2.339 .540 18.782 1 .000 10.366
TCL .020 .005 13.780 1 .000 1.020
HDL -.118 .022 27.760 1 .000 .889
Constant -24.483 3.206 58.331 1 .000 .000
a. Variable(s) entered on step 1: Gender01, Age, SBP, TRTBP, TCL, HDL.
Figure 1 and Table 6 indicate a high level of robustness of FRAMINGHAM LIPID RISK CATEGORIES CVD classification as indicated by the high AUC of 0.982 (CI: 0.974, 0.990) at the 0.05 significance level.
Area Under the Curve
Test Result Variable(s): Predicted probability
Area Std. Errora Asymptotic Sig.b Asymptotic 95% Confidence Interval
Lower Bound Upper Bound
.982 .004 .000 .974 .990
a. Under the nonparametric assumption
b. Null hypothesis: true area = 0.5
FRAMINGHAM BMI EQUATION RISK Logistic Regression
Omnibus Tests of Model Coefficients
Chi-square df Sig.
Step 1 Step 576.818 5 .000
Block 576.818 5 .000
Model 576.818 5 .000
Model Summary
Step -2 Log likelihood Cox & Snell R Square Nagelkerke R Square
1 189.718a .566 .845
a. Estimation terminated at iteration number 9 because parameter estimates changed by less than .001.
Hosmer and Lemeshow Test
Step Chi-square df Sig.
1 1.874 8 .985
Classification Tablea
Observed Predicted
F2RiskClass01 Percentage Correct
0 1
Step 1 F2RiskClass01 0 502 21 96.0
1 26 142 84.5
Overall Percentage 93.2
a. The cut value is .500
Variables in the Equation
B S.E. Wald df Sig. Exp(B)
Step 1a Gender01 4.192 .595 49.595 1 .000 66.150
Age .333 .038 76.410 1 .000 1.395
SBP .084 .017 23.909 1 .000 1.087
TRTBP 1.742 .592 8.652 1 .003 5.710
BMI .218 .043 25.433 1 .000 1.243
Constant -36.120 4.187 74.430 1 .000 .000
a. Variable(s) entered on step 1: Gender01, Age, SBP, TRTBP, BMI.
Area Under the Curve
Test Result Variable(s): Predicted probability
Area Std. Errora Asymptotic Sig.b Asymptotic 95% Confidence Interval
Lower Bound Upper Bound
.984 .004 .000 .977 .991
a. Under the nonparametric assumption
b. Null hypothesis: true area = 0.5
POOLED COHORT EQUATION RISK Logistic Regression
Omnibus Tests of Model Coefficients
Chi-square df Sig.
Step 1 Step 296.505 6 .000
Block 296.505 6 .000
Model 296.505 6 .000
Model Summary
Step -2 Log likelihood Cox & Snell R Square Nagelkerke R Square
1 129.920a .349 .758
a. Estimation terminated at iteration number 9 because parameter estimates changed by less than .001.
Hosmer and Lemeshow Test
Step Chi-square df Sig.
1 4.094 8 .849
Classification Tablea
Observed Predicted
PCERiskClass01 Percentage Correct
0 1
Step 1 PCERiskClass01 0 616 11 98.2
1 17 47 73.4
Overall Percentage 95.9
a. The cut value is .500
Variables in the Equation
B S.E. Wald df Sig. Exp(B)
Step 1a Gender01 2.572 .771 11.124 1 .001 13.098
Age .295 .044 45.464 1 .000 1.343
TCL .026 .006 20.643 1 .000 1.026
HDL -.115 .029 16.058 1 .000 .891
SBP .056 .018 10.129 1 .001 1.058
TRTBP 2.002 .568 12.409 1 .000 7.407
Constant -27.031 3.738 52.299 1 .000 .000
a. Variable(s) entered on step 1: Gender01, Age, TCL, HDL, SBP, TRTBP.
Area Under the Curve
Test Result Variable(s): Predicted probability
Area Std. Errora Asymptotic Sig.b Asymptotic 95% Confidence Interval
Lower Bound Upper Bound
.982 .005 .000 .971 .992
a. Under the nonparametric assumption
b. Null hypothesis: true area = 0.5
The prevalence of all highlighted categories were lower than those of literature; this is due to the population samples age frequency shifting toward youth. All the variables are age dependent except the Body Mass Index (BMI) which showed high prevalence of (33.7%) obesity and (38.5%) overweight. The main calculator affected by the BMI results is the Framingham BMI calculator. Thus, according to Cohens Kappa Index the Framingham BMI Calculator fairly correlate with the other Calculators studied as the BMI being one of the factors affecting the calculators outcome while purely age is the affecting factor on the other ones. This was concluded based on a decision tree analysis which illustrated that the main significant variable for the three calculators used is age. Given the sample cuts at which most of the participants are at young ages, the results of moderate and high cardiovascular diseases were merged. As the merging decision was based on the fact that samples with moderate cardiovascular risks at young ages they will develop high risks with age. The merged data showed closer agreement with the cardiovascular disease prevalence reported by other literature.
The future prospect of this study is to recalibrate the calculator to best fit the Qatari population. This can be done by a five years, and ten years follow-up to record their CVD outcome. Thus the calculator can be used in the clinical practice.