Satisfaction Guarantee

First time here?

usewelcome15 to get 15% off

Determine if the residuals fit a normal distribution using a residual histogram, a boxplot and a Q-Q plot.

Document Format: You must submit your project report in a single file through Canvas. The
acceptable formats are Microsoft Word (*.xlsx) or PDF (*.pdf) no exceptions. The submission
page is in the Semester Project module. Projects submitted in multiple parts, in a format other than
Word or PDF, or via email/hardcopy will be rejected.
Style Requirements: The Semester Project module contains a sample project that would
receive a 100% grade. Your report should be formatted similarly.
The first page of your report must be a title page containing your name, the course and
section number, the title “Semester Project,” and the submission date.
Use a font suitable for an official business document. Any standard typeface is
acceptable as long as it is readable and presents a professional appearance (Calibri and
Times New Roman are good examples, but not the only possibilities). The size should
be no smaller than 12 points, and the color should be black.
Do not include any borders, decorative images/illustrations, or watermarking.
Embed all graphics directly into your project file. I will not accept separate files
containing graphics.
Data Set: All students will use the same data set: Spring 2020 Semester Project Raw Data. The
data set is located in the StatCrunch MTH 245 Homework Group. The data come from a
Stellenbosch University (South Africa) master’s thesis that studied blood chemistry in Type 2
diabetic patients. The variables of interest are random blood glucose (RBG) (measured in
millimoles per deciliter) and glycosylated hemoglobin (HBA1C) (measured as a percentage of
total red blood cells).
Technology Requirements: Except where required to build graphs or charts, all numerical
calculations must be performed using StatCrunch. Do not use a graphing calculator, Excel,
standard normal tables, or any other method for your numerical calculations.
Graphics Requirements: All graphics must be constructed using StatCrunch, Excel, or other
computer-based graphics program. Hand-drawn plots, cell phone pictures of graphics, etc., are
not acceptable. All graphics must include an informative title and (except for boxplots) correct
labels for both axes. Orient all boxplots horizontally.
Rounding Rules: In Section 1 histograms, round all upper and lower class bounds to three
decimal places. In the remaining sections, round all calculated sample statistics to four decimal
places and all p-values to three decimal places. Add trailing zeroes to any rounded value as
needed.
Required Content: Organize your report in five separate sections using the following numbers
and titles. The required elements for each section are as follows:
Section 1 Visual Data Assessment. Create a histogram for each variable of interest RBG and
HBA1C. For RBG, use a “Start at:” value of 0.000 and a “Width:” value of 5.000; for HBA1C,
use a “Start at:” value of 2.000 and a “Width:” value of 2.000. It is not necessary to display
frequency counts above the bars. For each histogram, include a paragraph that answers
each of the following questions:
a. Is the histogram symmetric, left-skewed, or right-skewed?
b. How many peaks does the histogram have, and in which class(es) are they located
(must include the correct lower and upper bounds for each class listed)?
c. Does the histogram have any gaps between classes? If so, where are they?
Section 2 Descriptive Statistics.
a. For each variable, find the mean, range, variance, standard deviation, and five-number summary. Display these numbers in a format that is easy to understand.
b. Construct a regular boxplot for each variable. For each boxplot, include a brief
statement containing an assessment of whether the data appear to be symmetric,
left-skewed, or right-skewed.
c. For each variable, construct a modified boxplot and use it to identify potential
outliers. If any exist, list them by value; if none exist, say so.
Section 3 Confidence Intervals. Construct a 95% confidence interval for the mean of each
variable (two intervals total). Use the algebraic format for each interval ( < < ). State the distribution you used for each interval ( or normal). Section 4 Hypothesis Test. Using the p-value method, conduct a formal hypothesis test of the claim that the mean RBG of Type 2 diabetics is 13.5 mmol/dl or higher. Use = 0.01. Include the following in your written summary of the results: a. Your null and alternate hypotheses in the proper format using standard notation. b. The type of distribution you used ( or normal). c. The p-value and its logical relationship to ( or >).
d. Your decision regarding the null hypothesis: reject or fail to reject.
e. A statement interpreting your decision: reject/fail to reject (or support/fail to
support) the original claim that the mean RBG of Type 2 diabetics is 13.5 mmol/dl or
higher.
Note: Section 4 only applies to RBG. There is no hypothesis test related to HBA1C.
Section 5 Correlation/Regression Analysis.
a. Construct a linear regression model with RBC as the predictor and HBA1C as the
response. State the equation incorrect algebraic format as shown in the course notes.
b. Create a scatter plot of the data with a plot of the least-squares line included.
(StatCrunch should generate this when you calculated the model in 5a.) The plot
must include an informative title and correct labels for both axes.
c. Use the coefficient of determination to identify the percentage of the variation in
HBA1C explained by the variation in RBC.
d. Identify the following points (list them as ordered pairs in the form (RBC, HBA1C)).
If none exist, say so.
1) Outliers (all points with studentized residuals greater than 3.000 or less
than 3.000).
2) High-leverage points (all points with leverage greater than 0.0171).
e. Using Cook’s Distance, examine the outliers and high-leverage points you found in
5d (if any) to determine if any are likely to be influential (Cook’s D > 1.000). If none
of the points appear to be influential, say so.
f. Conduct a formal hypothesis test at = 0.05 to determine if there is sufficient
evidence of a correlation between RBC and HBA1C. Include the following:
1) The p-value and its logical relationship to ( or >).
2) Your decision regarding the null hypothesis: reject or fail to reject.
3) A statement regarding the sufficiency of the evidence for a linear relationship
between RBC and HBA1C.
g. State whether the equation in 5a satisfies the following LINE criteria (assume the
residuals are independent):
Linear Relationship: Based on the model’s visual fit to the data, determine if a
linear model is appropriate.
Normally-Distributed Residuals: Determine if the residuals fit a normal distribution
using a residual histogram, a boxplot and a Q-Q plot.
Equal Variances of the Residuals: Assess the residuals for constant variance using a
plot of the residuals versus RBC.
h. Use the results from 5g and 5h to determine if the model you built in 5a provides
valid estimates of HBA1C as a function of RBC. Justify your decision.
i. Provide a valid point estimate of the mean HBA1C value when RBC = 20.000. Use
the regression model you constructed in 5a or calculate the estimate using the
HBA1C column by itself, whichever is appropriate.
j. Provide a valid 95% confidence interval estimate of the mean HBA1C value when
RBC = 20.000. Use the regression model you constructed in 5a or calculate the
estimate using HBA1C by itself, whichever is appropriate.
k. If you use the regression model from 5a to calculate the estimate in 5i, calculate a
95% prediction interval estimate of , a new observation of HBA1C for a child aged 13. If the model in 5a is invalid, include a statement that a prediction interval
estimate is not applicable.