Exploratory Analysis: Predicting Heart Disease Using Clinical Variables

Data Set: Predicting Heart Disease Using Clinical Variables

Background: I chose this data set in order to analyze different clinical variables and how they pertain to heart disease. This data contains categories such as age, sex, cholesterol levels, blood pressure, and the classification of the individual’s chest pain.

Goal: I chose this data set in order to analyze different factors that pertain to heart disease and to identify significant relationships between variables and the presence of heart disease.

The exploratory questions that were formulated are:

  • Is there a significant correlation between age and heart disease?
  • Is there a significant relationship between blood pressure and cholesterol levels?
  • Is there a correlation between sex and heart disease?

One interesting part of this data set is that sex is denoted as ‘0’ and ‘1’, but never clarifies the sex assigned to each number. This led to the formulation of the next exploratory question:

  • Which sex do 0 and 1 represent, given that heart disease usually occurs sooner in men?

Methods: Using Spyder software, the exploratory questions were answered by utilizing various plots and statistical tests.

Results:

The scatterplot shows a fairly strong positive relationship between blood pressure and cholesterol.
The box plot shows that, in general, cholesterol levels tend to be higher in those with heart disease.
The faceted scatterplots show the correlation between cholesterol levels and blood pressure and are grouped in accordance with the individual’s severity of chest pain.
The box plots display a significant difference in the range and median of ages with and without heart disease. This indicates a significant relationship between age and the presence of heart disease.
The box plot displays the age all individuals with heart disease, grouped by sex. Given that, in general, heart disease occurs sooner in men, it is reasonable to assume that ‘0’ represents women and ‘1’ represents men.
The cross-tabulation between heart disease and sex shows a higher level of heart disease among Sex ‘1’ which is assumed to be men.

Observations and Analysis:

A T-test was performed in order to solidify the conclusion that there is a significant relationship between age and the presence of heart disease. This test resulted in a p value of 0.026. Since this value is less than 0.05, the null hypothesis can be rejected and we can assume that there is a significant relationship between the two variables.

A Pearson’s Correlation Coefficient Test was performed in order to solidify the conclusion that there is a significant relationship between cholesterol and blood pressure. This test resulted in a p value of 0.00435. Since this value is less than 0.05, the null hypothesis can be rejected and we can assume that there is a significant relationship between those two variables.

Future Directions: I would be interested to find data that includes genetic disposition and research how that is correlated with the presence of heart disease.


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

css.php