Statistics for Nursing
HTHSCI 2S03 Tutorial Assignment #1
1. The food frequency questionnaire (FFQ) is a tool often used in epidemiology studies to assess food consumption. A person is asked to write down the number of servings per day typically eaten in the past year of over 100 individual food items. A food-consumption table is then used to compute nutrient intakes for key nutrients (e.g., protein, fat, calories) based on aggregating response for individual foods. The FFQ is inexpensive to administer but is considered less accurate than the diet record (DR) – the ‘gold standard’ for diet epidemiological studies. For the DR, the participant writes down the amount of each specific food eaten over the past week in a food diary and a nutritionist uses a special computer program to compute nutrient intakes from the food diaries.
To validate the FFQ, 173 nurses participating in the Nurses’ Health Study completed 4 weeks of diet recording about equally spaced over a 12-month period and an FFQ at the end of diet recording.
The excel data file called VALID.xlsx contains the data for this study and is in the Data Sets in the Contents of A2L. It contains the following measures: saturated fat (grams) from the DR (sfat_dr) and FFQ (sfat_ffq), total fat (grams) from the DR (tfat_dr) and FFQ (tfat_ffq), alcohol consumption from (ounces) the DR (alco_DR) and FFQ (alco_ffq) and total calories from the DR (cal_dr) and FFQ (cal_ffq).
Use this data set to answer the questions below.
a) Create an appropriate graphical display to show the distribution of saturated fat data (sfat_ffq) for the FFQ.
b) Create an appropriate graphical display to relate alcohol consumption measured using the dietary record (alco_dr) with alcohol consumption measured using the FFQ (alco_ffq).
c) Based on your assessment of the graphical display created for question b, do you think that the FFQ is a reasonably accurate approximation of the DR for alcohol consumption? Why or why not?
2. Using the VALID.xlsx excel data set described in question 1, answer the following questions relating to the total fat consumption (tfat_dr, tfat_ffq).
a) There are 173 values in the data set for tfat_dr. You would like to summarize these data more succinctly and decide to use a frequency table to do this. Create a frequency table that consists of between five and ten intervals to summarize the tfat_dr data. Show both the frequency and relative frequency for the intervals you have chosen.
b) Create an appropriate graphical display to relate total fat measured using the dietary record (tfat_dr) with total fat measured using the FFQ (tfat_ffq). Does it look like the FFQ is a reasonable approximation of the DR for total fat?
3. A study was conducted to examine the association between lead exposure and developmental features in children. The excel file containing the data on 99 children is called LEAD_MAXFWT.xlsx and is in the Data Sets folder under Contents in A2L. Children were divided into two lead exposure groups: group 1 (GROUP=1, n=64) were children who lived less than 25 km away from a lead smelter, and group 2 (GROUP=2, n=35) were children who lived more than 25 km away from a lead smelter.
One important outcome measure that was captured was Weschler IQ score (IQF).
Use these data to answer the questions below.
a) Create one box and whisker plot comparing the distribution of the IQF scores for the two groups of children. Does there appear to be a significant difference between IQ scores for the two groups?
b) As discussed in Session 1, an outlier can be considered any observation that exceeds the value of Q3 + 1.5xIQR and Q1 – 1.5xIQR. Using this definition of an outlier, how many outliers exist in the IQR data (ignore groups)? Show your calculations.
4. Determine the data types or measurement scales for the following:
a) Determine whether the following variables are discrete or continuous:
i. Length of stay for hospitalizations (in days):
ii. Any emergency department visit in last 6 months (Yes/No):
iii. HbA1c level (in %):
iv. Height (in cm):
v. Concentration of mercury in the blood (ug/L):
vi. Pulse rate (beats per minute):
vii. Temperature (degrees Celsius):
viii. Pain scale (0-9):
ix. Grade level (A+, A, A-,…C-):
x. Diabetes ‘cut-off’ (non-diabetes-fasting glucose < 110 mg/dL, diabetic – fasting glucose ≥ 110 mg/dL):
xi. Number of missing teeth:
xii. Arm length (cm):
b) Determine whether the following variables are nominal, ordinal, interval or ratio:
i. Ethnicity (e.g., German, Italian, Iranian):
ii. Temperature (degrees Kelvin):
iii. Ph level (degree of acidity in a solution):
iv. Satisfaction rating (strongly like, like,…strongly dislike):
v. Income (in dollars):
vi. Cause of death (injury, chronic illness, infection etc.):
vii. BMI category (underweight, normal, overweight, obese):
viii. Fasting glucose level (mg/dL):
ix. Socio-economic status (low, middle, upper):
x. Blood group (e.g., A, B, AB):
xi. Disease symptoms (e.g., fever, pain, incontinence):
xii. Disease severity (mild, moderate, severe):
References:
The questions in this tutorial assignment were adapted from practice questions and companion data sets from Chapter 2 of the textbook by Rosner, B. (2011). Fundamentals of Biostatistics, 7th Edition. Brooks/Cole Cengage Learning, Boston, MA.