Evaluate and briefly discuss the three descriptive designs

Evaluate and briefly discuss the three descriptive designs

Evaluate and briefly discuss the three descriptive designs presented in Chapter 3 (case studies, archival research, and observational research). Be sure to discuss the strengths and weaknesses associated with each approach. Which of these three approaches do you think is the best descriptive design and why? Finally, recommend research questions or topics for which each of these designs would be effective in investigating.

Jan Halaska/Ticket/Photolibrary

chapter 3

Descriptive Designs— Observing Behavior

Chapter Contents

• Qualitative Methods • Case Studies • Archival Research • Observational Research • Describing Your Data

CO_

CO_

new66480_03_c03_p089-132.indd 89 10/31/11 9:39 AM

CHAPTER 3Introduction

In the fall of 2009, Phoebe Prince and her family relocated from Ireland to South Hadley, Massachusetts. Phoebe was immediately singled out by bullies at her new high school and subjected to physical threats, insults about her Irish heritage, and harassing posts on her Facebook page. This relentless bullying continued until January of 2010, ending only because Phoebe elected to take her own life in order to escape her tormentors (UPI, 2011). Tragic stories like this one are all too common, and it should come as no surprise that the Centers for Disease Control have identified bullying as a serious problem facing our nation’s children and adolescents (CDC, 2002).

Scientific research on bullying began in Norway in the late 1970s in response to a wave of teen sui- cides. Work begun by psychologist Dan Olweus— and since continued by many others—has docu- mented both the frequency and the consequences of bullying in the school system. Thus, we know that approximately one third of children are vic- tims of bullying at some point during develop- ment, with between 5% and 10% bullied on a regu- lar basis (Griffin & Gross, 2004; Nansel et al., 2001). Victimization by bullies has been linked with a wide range of emotional and behavioral problems, including depression, anxiety, self-reported health problems, and an increased risk of both violent behavior and suicide (for a detailed review, see

Griffin & Gross, 2004). Recent research even suggests that bullying during adolescence may have a lasting impact on the body’s physiological stress response (Hamilton et al., 2008).

But most of this research has a common limitation: It has studied the phenomenon of bul- lying using self-report survey measures. That is, researchers typically ask students and teachers to describe the extent of bullying in the schools, and/or have students fill out a collection of survey measures, describing in their own words both bullying experiences and psychological functioning. These studies are conducted rigorously, and the mea- sures they use certainly meet the criteria of reliability and validity that we discussed in Chapter 2. However, as Wendy Craig, Professor of Psychology at Queen’s University, and Debra Pepler, a Distinguished Professor at York University, suggested in a 1997 article, this questionnaire approach is unable to capture the full context of bullying behaviors. And, as we have already discussed, self-report measures are fully dependent on people’s ability to answer honestly and accurately. In order to address this limitation, Craig and Pepler (1997) decided to observe bullying behaviors as they occurred naturally on the playground. Among other things, the researchers found that acts of bullying occurred approximately every 7 minutes, lasted only about 38 seconds, and tended to occur within 120 feet of the school building. They also found that peers intervened to try to stop the bullying more than twice as often as adults did (11% vs. 4%, respectively). These findings add significantly to scientific understanding of when and how bullying occurs. And for our purposes, the most notable thing about them is that none of the findings could have been documented without directly observing and recording bullying behaviors on the playground. By using this technique, the researchers were able to gain a more thorough understanding of the phenomenon of bullying and thus able to provide real-world advice to teachers and parents.

William Gottlieb/Corbis

Most studies on bullying rely on self-report surveys.

TX_

TX

new66480_03_c03_p089-132.indd 90 10/31/11 9:39 AM

CHAPTER 3Section 3.1 Qualitative Methods

One recurring theme in this book is that it is absolutely critical to pick the right research design to address your hypothesis. Over the next three chapters, we will be discussing three specific categories of research designs, proceeding in order of increasing control over elements of the design. This chapter focuses on descriptive research designs, in which the primary goal is to describe attitudes and behavior. We will begin by contrasting qualita- tive and quantitative approaches to description. We will then discuss three examples of descriptive designs—case studies, archival research, and observational research—cover- ing the basic concept and the pros and cons of each. Finally, this chapter concludes with a discussion of guidelines for presenting descriptive data in graphical, numerical, and narrative form.

3.1  Qualitative Methods

We learned in Chapter 1 that researchers generally take one of two broad approaches to answering their research questions. Quantitative research is a systematic and empirical approach that attempts to generalize results to other contexts, whereas qualitative research is a more descriptive approach that attempts to gain a deep understanding of particular cases and contexts. Before we discuss specific examples of descriptive designs, it is important to understand that these can represent either quantitative or qualitative perspectives. In this section, we examine the qualitative approach in more detail.

In Chapter 1 we used the analogy of studying traffic patterns to contrast qualitative and quantitative methods—a qualitative researcher would likely study a single busy intersec- tion in detail. This illustrates a key point about this approach: Qualitative researchers are focused on interpreting and making sense out of what they observe rather than try- ing to simplify and quantify these observations. In general, qualitative research involves a collection of interviews and observations made in a natural setting. Regardless of the overall approach (qualitative or quantitative), collecting data in the real world results in less control and structure than does collecting data in a laboratory setting. But whereas

Increasing Control . . .Increasing Control . . .

• Case Study • Archival Research • Observation

Descriptive Methods

• Survey Research

Predictive Methods

• Quasi-experiments • “True” Experiments

Experimental Methods

Figure 3.1: Descriptive Designs on the Continuum of Control

new66480_03_c03_p089-132.indd 91 10/31/11 9:39 AM

CHAPTER 3Section 3.2 Case Studies

quantitative researchers might view reduced control as a threat to reliability and validity, qualitative researchers view it as a strength of the study. By conducting observations in a natural setting, it is possible to capture people’s natural and unfiltered responses.

As an example, consider two studies on the ways people respond to traumatic events. In a 1993 paper, psychologists James Pennebaker and Kent Harber took a quantitative approach to examining the community-wide impact of the 1989 Loma Prieta earthquake (near San Francisco). These researchers conducted phone surveys of 789 area residents, asking people to indicate, using a 10-point scale, how often they “thought about” and “talked about” the earthquake over the 3-month period after its occurrence. In analyz- ing these data, Pennebaker and Harber discovered that people tend to stop talking about traumatic events about 2 weeks after they occurred but keep thinking about the event for approximately 4 more weeks. That is, the event is still on people’s minds, but they decide to stop discussing it with other people. In a follow-up study using the 1991 Gulf War, these researchers found that this conflict leads to an increased risk of illness (Pennebaker & Harber, 1991). Thus, the goal of the study was to gather data in a controlled manner and test a set of hypotheses about community responses to trauma.

Contrast this approach with the more qualitative one taken by the developmental psy- chologist Paul Miller and colleagues (in press), who used a qualitative approach to study- ing the ways that parents model coping behavior for their children. These researchers conducted semistructured interviews of 24 parents whose families had been evacuated following the 2007 wildfires in San Diego County and an additional 32 parents whose families had been evacuated following a 2008 series of deadly tornadoes in Tennessee. Due to a lack of prior research on how parents teach their children to cope with trauma, Miller and colleagues approached their interviews with the goal of “documenting and describing” (p. 8) these processes. That is, rather than attempt to impose structure and test a strict hypothesis, the researchers focused on learning from these interviews and letting the interviewees’ perspectives drive the acquisition of knowledge.

In the following three sections, we examine three specific examples of descriptive designs— case studies, archival research, and observational research. Because each of these methods has the goal of describing attitudes, feelings, and behaviors, each one can be used from either a quantitative or a qualitative perspective. In other words, qualitative and quantita- tive researchers use many of the same general methods but do so with different goals. To illustrate this flexibility, we will end each section with a paragraph that contrasts qualita- tive and quantitative uses of the particular method.

3.2  Case Studies

At the 1996 meeting of the American Psychological Association, James Penne-baker—chair of the Psychology department at The University of Texas—deliv-ered an invited address, describing his research on the benefits of therapeutic writing. Rather than follow the expected route of showing graphs and statistical tests to support his arguments, Pennebaker told a story. In the mid-1980s, when Pennebaker’s lab was starting to study the effects of structured writing on physical and psychological

new66480_03_c03_p089-132.indd 92 10/31/11 9:39 AM

CHAPTER 3Section 3.2 Case Studies

health, one study participant was an American soldier who had served in the Vietnam War. Like many others, this soldier had had difficulty adjusting to what had happened during the war and consequent trouble reintegrating into “normal” life. In Pennebaker’s study, he was asked to simply spend 15 minutes per day, over the course of a week, writ- ing about a traumatic experience—in this case, his tour of duty in Vietnam. At the end of this week, as you might expect, this veteran felt awful; these were unpleasant memories that he had not relived in over a decade. But over the next few weeks, amazing things started to happen: He slept better; he made fewer visits to his doctor; he even reconnected with his wife after a long separation!

Pennebaker’s presentation was a case study, which provides a detailed, in-depth analysis of one person over a period of time. Although this case study was collected as part of a larger quantitative experiment, case studies are usually conducted in a therapeutic setting and involve a series of interviews. An interviewer will typically study the subject in detail, recording everything from direct quotes and observations to his or her own interpretations. We encountered this tech- nique briefly in Chapter 2, in discussing Oliver Sacks’s case studies of individuals learning to live with neurological impairments.

Pros and Cons of Case Studies

Case studies in psychology are a form of qualitative research and represent the lowest point on our continuum of control. Because they involve one person at a time, without a control group, case studies are often unsystematic. That is, the participants are chosen because they tell a compelling story or because they represent an unusual set of circumstances, rather than being selected randomly. Studying these individuals allows for a great deal of explora- tion, which can often inspire future research. However, it is nearly impossible to generalize from one case study to the larger population. In addition, because the case study includes both direct observation and the researcher’s interpretation, there is a risk that a researcher’s biases might influence the interpretations. For example, Pennebaker’s investment in dem- onstrating that writing has health benefits could have led to more positive interpretations of the Vietnam vet’s outcomes. However, in this particular case study, the benefits of writing mirror those seen in hundreds of controlled experimental studies that involved thousands of people, so we can feel confident in the conclusions from the single case.

Case studies have two distinct advantages over other forms of research. First is the simple fact that anecdotes are persuasive. Despite Pennebaker’s nontraditional approach to a scientific talk, the audience came away utterly convinced of the benefits of therapeutic writing. And despite the fact that Oliver Sacks studies one neurological patient at a time, the stories in his books shed very convincing light on the ability of humans to adapt to their circumstances. Second, case studies provide a useful way to study rare populations and individuals with rare conditions. For example, from a scientific point of view, the

Hemera/Thinkstock

Case studies involve a researcher conducting a series of interivews and usually take place in a therapeutic setting.

new66480_03_c03_p089-132.indd 93 10/31/11 9:39 AM

CHAPTER 3Section 3.2 Case Studies

ideal might be to gather a random sample of individuals living with severe memory impairment due to alcohol abuse and conduct some sort of controlled study in a labora- tory environment. This approach could allow us to make causal statements about the results, as we will discuss in Chapter 5. But from a practical point of view, this study would be nearly impossible to conduct, making case studies such as Sacks’s interviews with William Thompson the best strategy for understanding this condition in depth.

Examples of Case Studies

Throughout the history of psychology, case studies have been used to address a num- ber of important questions and to provide a starting point for controlled quantitative studies. For example, in developing his theories of cognitive development, the Swiss psychologist Jean Piaget studied the way that his own children developed and changed their thinking styles. Piaget proposed that children would progress through a series of four stages in the way that they approached the world—sensorimotor, preoperational, concrete operational, and formal operational—with each stage involving more sophis- ticated cognitive skills than the previous stage. By observing his own children, Piaget noticed preliminary support for this theory and later was able to conduct more con- trolled research with larger populations.

Perhaps one of the most famous case studies in psychology is the story of Phineas Gage, a 19th- century railroad worker who suffered severe brain damage. In September of 1848, Gage was working with a team to blast large sections of rock to make way for new rail lines. After a large hole was drilled into a section of rock, Gage’s job was to pack the hole with gunpowder, sand, and a fuse and then tamp it down with a long cylindri- cal iron rod (known as a “tamping rod”). On this particular occasion, it seems Gage forgot to pack in the sand. So, when the iron rod struck gunpow- der, the powder exploded, sending the 3-foot long iron rod through his face, behind his left eye, and

out the top of his head. Against all odds, Gage survived this incident with relatively few physical side effects. However, everyone around him noticed that his personality had changed—Gage became more impulsive, violent, and argumentative. Gage’s physician, John Harlow, reported the details of this case in an 1868 article. The following passage is a great example of the rich detail that is often characteristic of case studies:

He is fitful, irreverent, indulging at times in the grossest profanity (which was not previously his custom), manifesting but little deference for his fellows, impatient of restraint or advice when it conflicts with his desires. A child in his intellectual capacity and manifestations, he has the animal passions of a strong man. Previous to his injury, although untrained in the schools, he pos- sessed a well-balanced mind, and was looked upon by those who knew him as a shrewd, smart businessman, very energetic and persistent in executing all his

Everett Collection

Various views show an iron rod embedded in Phineas Gage’s (1823-1860) skull.

new66480_03_c03_p089-132.indd 94 10/31/11 9:39 AM

CHAPTER 3Section 3.2 Case Studies

plans of operation. In this regard his mind was radically changed, so decidedly that his friends and acquaintances said he was “no longer Gage.” (Harlow, 1868, pp. 339–342)

Gage’s transformation ultimately inspired a large body of work in psychology and neu- roscience that attempts to understand the connections between brain areas and person- ality. The area of his brain destroyed by the tamping rod is known as the frontal lobe, now understood to play a critical role in impulse control, planning, and other high-level thought processes. Gage’s story is a perfect illustration of the pros and cons of case stud- ies: On the one hand, it is difficult to determine exactly how much the brain injury affected his behavior because he is only one person. On the other hand, Gage’s tragedy inspired researchers to think about the connections among mind, brain, and personality. As a result, we now have a vast—and still growing—understanding of the brain. This illustrates a key point about case studies: Although individual cases provide limited knowledge about people in general, these cases often lead researchers to conduct additional work that does lead to generalizable knowledge.

Qualitative versus Quantitative Approaches

Case studies tend to be qualitative more often than not: The goal of this method is to study a particular case in depth, as a way to learn more about a rare phenomenon. In both Pennebaker’s study of the Vietnam veteran and Harlow’s study of Phineas Gage, the researcher approached the interview process as a way to gather information and learn from the bottom up about the interviewee’s experience. However, it is certainly possible for a case study to represent quantitative research. This is often the case when research- ers conduct a series of case studies, learning from the first one of the first few and then developing hypotheses to test on future cases. For example, a researcher could use the case of Phineas Gage as a starting point for hypotheses about frontal lobe injury, perhaps predicting that other cases would show similar changes in personality. Another way in which case studies can add a quantitative element is for researchers to conduct analyses within a single subject. For example, a researcher could study a patient with brain dam- age for several years following an injury, tracking the association between deterioration of brain regions with changes in personality and emotional responses. At the end of the day, though, these examples would still suffer from the primary downside of case studies: Because they study a single individual, it is difficult to generalize findings.

new66480_03_c03_p089-132.indd 95 10/31/11 9:39 AM

CHAPTER 3Section 3.2 Case Studies

Research: Thinking Critically By the Peninsula College of Medicine and Dentistry

Attending frequently with medically unexplained symptoms is distressing for both patient and doc- tor and effective treatment or management options are limited: one in five patients has symptoms that remain unexplained by conventional medicine. Studies have shown that the cost to the NHS [National Health Service] of managing the treatment of a patient with medically unexplained symp- toms can be twice that of a patient with a diagnosis.

A research team from the Institute of Health Services Research, Peninsula Medical School, University of Exeter, has carried out a randomised control trial and a linked interview study regarding 80 such patients from GP [General Practitioner] practices across London, to investigate their experiences of having five-element acupuncture added to their usual care. This is the first trial of traditional acu- puncture for people with unexplained symptoms.

The results of the research are published in the British Journal of General Practice. They reveal that acupuncture had a significant and sustained benefit for these patients and consequently acupunc- ture could be safely added to the therapies used by practitioners when treating frequently attending patients with medically unexplained symptoms.

The patient group was made up of 80 adults, 80% female with an average age of 50 years and from a variety of ethnic backgrounds who had consulted their GP at least eight times in the past year. Nearly 60% reported musculoskeletal health problems, of which almost two thirds had been present for a year.

In the 3 months before taking part in the study, the 80 patients had accounted for the following NHS experiences: 21 patient in-days; 106 outpatient clinic visits; 52 hospital clinic visits (for treatments such as physiotherapy, chiropody, and counselling); 44 hospital visits for investigations (including 10 magnetic resonance imaging—MRI—scans); and 75 visits to non–NHS practitioners such as opti- cians, dentists, and complementary therapists.

The patients were randomly divided into an acupuncture group and a control group. Eight acupunc- turists administered individual five-element acupuncture to the acupuncture group immediately, up to 12 sessions over 26 weeks. The same numbers of treatments were made available to the control group after 26 weeks.

At 26 weeks the patients were asked to complete a number of questionnaires including the individu- alised health status questionnaire “Measure Yourself Medical Outcome Profile.”

The acupuncture group registered a significantly improved overall score when compared with the control group. They also recorded improved well-being but did not show any change in GP and other clinical visits and the number of medications they were taking. Between 26 and 52 weeks the acu- puncture group maintained their improvement and the control group, now receiving their acupunc- ture treatments, showed a ‘catch up’ improvement.

The associated qualitative study, which focused on the patients’ experiences, supported the quanti- tative work.

This element identified that the participating patients had a variety of long-standing symptoms and disability including chronic pain, fatigue, and emotional problems which affected their ability to work, socialize, and carry out everyday tasks. A lack of a convincing diagnosis to explain their symp- toms led to frustration, worry, and low mood.

Participating patients reported that their acupuncture consultations became increasingly valuable. They appreciated the amount of time they had with each acupuncturist and the interactive and holis- tic nature of the sessions—there was a sense that the practitioners were listening to their concerns and, via therapy, doing something positive about them. (continued)

new66480_03_c03_p089-132.indd 96 10/31/11 9:39 AM

CHAPTER 3Section 3.3 Archival Research

3.3  Archival Research

Moving slightly further along the continuum of control, we come to archival research, which involves drawing conclusions by analyzing existing sources of data, including both public and private records. Sociologist David Phillips (1997) hypothesized that media coverage of suicides would lead to “copycat” suicides. He tested this hypothesis by gathering archival data from two sources: front-page newspaper articles devoted to high-profile suicides and the number of fatalities in the 11-day period following coverage of the suicide. By examining these patterns of data, Phillips found support for his hypothesis. Specifically, fatalities appeared to peak 3 days after coverage of a suicide, and increased publicity was associated with a greater peak in fatalities.

As a result, many patients were encouraged to take an active role in their treatment, resulting in cog- nitive and behavioural lifestyle changes, such as a new self-awareness about what caused stress in their lives, and a subsequent ability to deal with stress more effectively; and taking their own initia- tives based on advice from the acupuncturists about diet, exercise, relaxation, and social activities.

Comments from participating patients included: “the energy is the main thing I have noticed. You know, yeah, it’s marvellous! Where I was going out and cutting my grass, now I’m going out and cut- ting my neighbour’s after because he’s elderly”; “I had to reduce my medication. That’s the big help actually, because medication was giving me more trouble . . . side effects”; and “It kind of boosts you, somehow or another.”

Dr. Charlotte Paterson, who managed the randomised control trial and the longitudinal study of patients’ experiences, commented: “Our research indicates that the addition of up to 12 five-ele- ment acupuncture consultations to the usual care experienced by the patients in the trial was fea- sible and acceptable and resulted in improved overall well-being that was sustained for up to a year.

This is the first trial to investigate the effectiveness of acupuncture treatment to those with unex- plained symptoms, and the next development will be to carry out a cost-effectiveness study with a longer follow-up period. While further studies are required, this particular study suggests that GPs may recommend a series of five-element acupuncture consultations to patients with unexplained symptoms as a safe and potentially effective intervention.

She added: “Such intervention could not only result in potential resource savings for the NHS, but would also improve the quality of life for a group of patients for whom traditional biomedicine has little in the way of effective diagnosis and treatment.”

Think about it:

1. In this study, researchers interviewed acupuncture patients using open-ended questions and recorded their verbal responses, which is a common qualitative research technique. What advantages does this approach have over administering a quantitative questionnaire with multiple-choice items?

2. What are some advantages of adding a qualitative element to a controlled medical trial like this?

3. What would be some disadvantages of relying exclusively on this approach?

Research: Thinking Critically (continued)

new66480_03_c03_p089-132.indd 97 10/31/11 9:39 AM

CHAPTER 3Section 3.3 Archival Research

Pros and Cons of Archival Research

It is difficult to imagine a better way to test Phillips’s hypothesis about copycat suicides. You could never randomly assign people to learn about suicides and then wait to see whether they killed themselves. Nor could you interview people right before they commit suicide to determine whether they were inspired by media coverage. Archival research provides a test of the hypothesis by examining existing data and, thereby, avoids most of the ethical and practical problems of other research designs. Related to this point, archival research also neatly sidesteps issues of participant reactivity, or the tendency of peo- ple to behave differently when they are aware of being observed. Any time you conduct research in a laboratory, participants are aware that they are in a research study and may not behave in a completely natural manner. In contrast, archival data involves making use of records of people’s natural behaviors. The subjects of Phillips’s study of copycat suicides were individuals who decided to kill themselves, who had no awareness that they would be part of a research study.

Archival research is also an excellent strategy for examining trends and changes over time. For example, much of the evidence for global warming comes from observing upward trends in recorded temperatures around the globe. To gather this evi- dence, researchers dig into existing archives of weather patterns and conduct statistical tests on the changes over time. Psychologists and other social scientists also make use of this approach to examine population-level changes in every- thing from suicide rates to voting patterns over time. These comparisons can sometimes involve a blend of archival and current data. For example, a great deal of social psychology research has been dedicated to understanding people’s stereotypes about other groups. In a classic series of studies known as the “Princeton Trilogy,” researchers doc- umented the stereotypes held by Princeton stu- dents over a 25-year period (1933 to 1969). Social psychologist Stephanie Madon and her colleagues (2001) collected a new round of data but also con- ducted a new analysis of this archival data. These new analyses suggested that, over time, people have become more willing to use stereo- types about other groups, even as stereotypes themselves have become less negative.

One final advantage of archival research is that once you manage to gain access to the relevant archives, it requires relatively few resources. The typical laboratory experiment involves one participant at a time, sometimes requiring the dedicated attention of more than one research assistant over a period of an hour or more. But once you have assem- bled your data from the archives, it is a relatively simple matter to conduct statistical analyses. In a 2001 article, the psychologists Shannon Stirman and James Pennebaker

Joe Giron/Corbis

Copycat suicides often peak 3 days after media coverage of a high profile suicide, such as when Nirvana’s Kurt Cobain killed himself in 1994.

new66480_03_c03_p089-132.indd 98 10/31/11 9:39 AM

CHAPTER 3Section 3.3 Archival Research

used a text analysis computer program to compare the language of poets who commit- ted suicide (e.g., Sylvia Plath) with the language of similar poets who had not commit- ted suicide (e.g. Denise Levertov). In total, these researchers examined 300 poems from 20 poets, half of whom had committed suicide. Consistent with Durkheim’s theory of suicide as a form of “social disengagement,” Stirman and Pennebaker (2001) found that suicidal poets used more self-references and fewer references to other people in their poems. But here’s the impressive part: Once they had assembled their archive of poems, it took only seconds for their computer program to analyze the language and generate a statistical profile of each poet.

Overall, however, archival research is still relatively low on our continuum of control. As a researcher, you have to accept the archival data in whatever form they exist, with no control over the way they were collected. For instance, in Stephanie Madon’s (2001) re- analysis of the “Princeton Trilogy” data, she had to trust that the original researchers had collected the data in a reasonable and unbiased way. In addition, because archival data often represent natural behavior, it can be difficult to categorize and organize responses in a meaningful and quantitative way. The upshot is that archival research often requires some creativity on the researcher’s part—such as analyzing poetry using a text analysis program. In many cases, as we discuss next, the process of analyzing archives involves developing a coding strategy for extracting the most relevant information.

Content Analysis—Analyzing Archives In most of our examples so far, the data come in a straightforward, ready-to-analyze form. That is, it is relatively simple to count the number of suicides, track the average tempera- ture, or compare responses to questionnaires about stereotyping over time. In other cases, the data can come in a sloppy, disorganized mass of information. What do you do if you want to analyze literature, media images, or changes in race relations on television? These types of data can yield incredibly useful information, provided you can develop a strategy for extracting it.

Mark Frank and Tom Gilovich—both psychologists at Cornell University—were inter- ested in whether cultural associations with the color black would have an effect on behav- ior. In virtually all cultures, black is associated with evil—the bad guys wear black hats; we have a “black day” when things turn sour; and we are excluded from social groups by being blacklisted or blackballed. Frank and Gilovich (1988) wondered whether “a cue as subtle as the color of a person’s clothing” (p. 74) would influence aggressive behavior. To test this hypothesis, they examined aggressive behaviors in professional football and hockey games, comparing teams whose uniforms were black to teams who wore other colors. Imagine for a moment that this was your research study. Professional sporting events contain a wealth of behaviors and events. How would you extract information on the relationship between uniform color and aggressive behavior?

Frank and Gilovich (1988) solved this problem by examining public records of penalty yards (football) and penalty minutes (hockey) because these represent instances of pun- ishment for excessively aggressive behavior, as recognized by the referees. And, in both sports the size of the penalty increases according to the degree of aggression. These pen- alty records were obtained from the central offices of both leagues, covering the period from 1970 to 1986. Consistent with their hypothesis, teams with black uniforms were

new66480_03_c03_p089-132.indd 99 10/31/11 9:39 AM

CHAPTER 3Section 3.3 Archival Research

“uncommonly aggressive” (p. 76). Most strikingly, two NHL hockey teams changed their uniforms to black during the period under study and showed a marked increase in pen- alty minutes with the new uniforms!

But even this analysis is relatively straightforward in that it involved data that were already in quantitative form (penalty yards and minutes). In many cases, the starting point is a messy collection of human behavior. In a pair of journal articles, psychologist Russell Weigel and colleagues (1980; 1995) examined the portrayal of race relations on prime-time television. In order to do this, they had to make several critical decisions about what to analyze and how to quantify it. The process of systematically extracting and analyzing the contents of a collection of information is known as content analysis. In essence, content analysis involves developing a plan to code and record specific behaviors and events in a consistent way. We can break this down into a three-step process:

Step 1—Identify Relevant Archives Before we develop our coding scheme, we have to start by finding the most appropriate source of data. Sometimes the choice is fairly obvious: If you want to compare temperature trends, the most relevant archives will be weather records. If you want to track changes in stereotyping over time, the most relevant archive is questionnaire data assessing people’s attitudes. In other cases, this decision involves careful consideration of both your research question and practical concerns. Frank and Gilovich decided to study penalties in pro- fessional sports because these data were both readily available (from the central league offices) and highly relevant to their hypothesis about aggression and uniform color.

Because these penalty records were publicly available, the researchers were able to access them easily. But if your research question involved sen- sitive or personal information—such as hospital records or personal correspondence—you would need to obtain permission from a responsible party. Let’s say you wanted to analyze the love letters written by soldiers serving overseas and then try to predict relationship stability. Because these letters would be personal, perhaps rather intimate, you would need permission from each person involved before proceeding with the study. Or, say you wanted to analyze the correla- tion between the length of a person’s hospital stay and the number of visitors he or she receives. This would most likely require permission from both hospital administrators, doctors, and the patients themselves. However you manage to obtain access to private records, it is absolutely essen- tial to protect the privacy and anonymity of the people involved. This would mean, for example, using pseudonyms and/or removing names and other identifiers from published excerpts of per- sonal letters.

iStockphoto/Thinkstock

A personal letter is an example of a data source that a researcher would need to obtain permission to use.

new66480_03_c03_p089-132.indd 100 10/31/11 9:39 AM

CHAPTER 3Section 3.3 Archival Research

Step 2—Sample from the Archives In Weigel’s research on race relations, the most obvious choice of archives was to take snippets of both television programming and commercials. But this decision was only the first step of the process. Should they examine every second of every program ever aired on television? Naturally not; instead, their approach was to take a smaller sample of television programming. We will discuss sampling in more detail in Chapter 4, but the basic process involves taking a smaller, representative collection of the broader popula- tion in order to conserve resources. Weigel and colleagues (1980) decided to sample one week’s worth of prime-time programming from 1978, assembling videotapes of every- thing broadcast by the three major networks at the time (CBS, NBC, and ABC). They nar- rowed their sample by eliminating news, sports, and documentary programming because their hypotheses were centered on portrayals of fictional characters of different races.

Step 3—Code and Analyze the Archives The third and most involved step is to develop a system for coding and analyzing the archival data. Even a sample of one week’s worth of prime-time programming contains a near-infinite amount of information! In the race-relations studies, Weigel et al. elected to code four key variables: (1) the total human appearance time, or time during which people were on-screen; (2) the black appearance time, in which black characters appeared on-screen; (3) the cross-racial appearance time, in which characters of two races were on- screen at the same time; and (4) the cross-racial interaction time, in which cross-racial characters interacted. In the original (1980) paper, these authors reported that black char- acters were shown only 9% of the time, and cross-racial interactions only 2% of the time. Fortunately, by the time of their 1995 follow-up study, the rate of black appearances had doubled, and the rate of cross-racial interactions had more than tripled. However, there was depressingly little change in some of the qualitative dimensions that they measured, including the degree of emotional connection between characters of different races.

This study also highlights the variety of options for coding complex behaviors. The four key ratings of “appearance time” consist of simply recording the amount of time that each person or group is on-screen. In addition, the researchers assessed several abstract quali- ties of interaction using judges’ ratings. The degree of emotional connection, for instance, was measured by having judges rate the “extent to which cross-racial interactions were characterized by conditions promoting mutual respect and understanding” (Weigel et al., 1980, p. 888). As you’ll remember from Chapter 2, any time you use judges’ ratings, it is important to collect ratings from more than one rater and to make sure they agree in their assessments.

Your goal as a researcher is to find a systematic way to record the variables most relevant to your hypothesis. As with any research design, the key is to start with clear operational definitions that capture the variables of interest. This involves both deciding the most appropriate variables and the best way to measure these variables. For example, if you analyze written communication, you might decide to compare words, sentences, charac- ters, or themes across the sample. A study of newspaper coverage might code the amount of space or number of stories dedicated to a topic. Or a study of television news might code the amount of airtime given to different positions. The best strategy in each case will be the one that best represents the variables of interest.

new66480_03_c03_p089-132.indd 101 10/31/11 9:39 AM

CHAPTER 3Section 3.4 Observational Research

Qualitative versus Quantitative Approaches

Archival research can represent either qualitative or quantitative research, depending on the researcher’s approach to the archives. Most of our examples in this section repre- sent the quantitative approach: Frank and Gilovich (1988) counted penalties to test their hypothesis about aggression; and Stirman and Pennebaker (2001) counted words to test their hypothesis about suicide. But the race-relations work by Weigel and colleagues (1980; 1995) represents a nice mix of qualitative and quantitative research. In their initial 1980 study, the primary goal was to document the portrayal of race relations on prime-time television (i.e., qualitative). But in the 1995 follow-up study, the primary goal was to deter- mine whether these portrayals had changed over a 15-year period. That is, they tested the hypothesis that race relations were portrayed in a more positive light (i.e., quantitative). Another way in which archival research can be qualitative is to study open-ended nar- ratives, without attempting to impose structure upon them. This approach is commonly used to study free-flowing text, such as personal correspondence or letters to the edi- tor in a newspaper. A researcher approaching these from a qualitative perspective would attempt to learn from these narratives, without attempting to impose structure via the use of content analyses.

3.4  Observational Research

Moving further along the continuum of control, we come to the descriptive design with the greatest amount of researcher control. Observational research involves studies that directly observe behavior and record these observations in an objec- tive and systematic way. In your previous psychology courses, you may have encountered the concept of attachment theory, which argues that an infant’s bond with his or her primary caregiver has implications for later social and emotional development. Mary Ainsworth, a Canadian developmental psychologist, and John Bowlby, a British psychologist and psy- chiatrist, articulated this theory in the early 1960s, arguing that children can form either “secure” or a variety of “insecure” attachments with their care- givers (Ainsworth & Bell, 1970; Bowlby, 1963).

In order to assess these classifications, Ainsworth and Bell (1970) developed an observational tech- nique called the “strange situation.” Mothers would arrive at their laboratory with their chil- dren for a series of structured interactions, includ- ing having the mother play with the infant, leave him alone with a stranger, and then return to the room after a brief absence. The researchers were most interested in coding the ways in which the infant responded to the various episodes (8, in total). One group of infants, for example, showed curiosity when the mother left but then returned to playing with their toys, trusting that she would return. Another group showed immediate

Doctor Stock/Science Faction/Corbis

Observational research can be used to measure an infant’s attachment to a caregiver.

new66480_03_c03_p089-132.indd 102 10/31/11 9:39 AM

CHAPTER 3Section 3.4 Observational Research

distress when the mother left and clung to her nervously upon her return. Based on these and other behavioral observations, Ainsworth and colleagues classified these groups of infants as “secure” and “insecurely” attached to their mothers, respectively.

Pros and Cons of Observational Research

Observational designs are well suited to a wide range of research questions, provided the questions can be addressed through directly observable behaviors and events; that is, you can observe parent–child interactions, or nonverbal cues to emotion, or even crowd behavior. However, if you are interested in studying thought processes—such as how mothers interpret their interactions—then observation will not suffice. This harkens back

Research: Making an Impact Harry Harlow

In the 1950s, U.S. psychologist Harry Harlow conducted a landmark series of studies with rhesus monkeys on the mother–infant bond. While his research would be considered unethical by contem- porary standards, the results of his work revealed the importance of affection, attachment, and love on healthy childhood development.

Prior to Harlow’s findings, it was believed that infants attached to their mothers as a part of a drive to fulfill exclusively biological needs, in this case obtaining food and water and to avoid pain (Her- man, 2007; van der Horst, & van der Veer, 2008). In an effort to clarify the reasons that infants so clearly need maternal care, Harlow removed rhesus monkeys from their natural mothers several hours after birth, giving the young monkeys a choice between two surrogate “mothers.” Both moth- ers were made of wire, but one was bare and one was covered in terry cloth. Although the wire mother provided food via an attached bottle, the monkeys preferred the softer, terry-cloth mother, even though the latter provided no food (Harlow & Zimmerman, 1958; Herman, 2007).

Further research with the terry-cloth mothers contributed to the understanding of healthy attach- ment and childhood development (van der Horst & van der Veer, 2008). When the young monkeys were given the option to explore a room with their terry-cloth mothers and had the cloth mothers in the room with them, they used the mothers as a safe base. Similarly, when exposed to novel stimuli such as a loud noise, the monkeys would seek comfort from the cloth-covered surrogate (Harlow & Zimmerman, 1958). However, when the monkeys were left in the room without their cloth mothers, they reacted poorly—freezing up, crouching, crying, and screaming.

A control group of monkeys who were never exposed to either their real mothers or one of the sur- rogates revealed stunted forms of attachment and affection. They were left incapable of forming lasting emotional attachments with other monkeys (Herman, 2007). Based on this research, Harlow discovered the importance of proper emotional attachment, stressing the importance of physical and emotional bonding between infants and mothers (Harlow & Zimmerman, 1958; Herman, 2007).

Harlow’s influential research led to improved understanding of maternal bonding and child develop- ment (Herman, 2007). His research paved the way for improvements in infant and child care and in helping children cope with separation from their mothers (Bretherton, 1992; Du Plessis, 2009). In addition, Harlow’s work contributed to the improved treatment of children in orphanages, hospitals, day care centers, and schools (Herman, 2007; van der Horst & van der Veer, 2008).

new66480_03_c03_p089-132.indd 103 10/31/11 9:39 AM

CHAPTER 3Section 3.4 Observational Research

to our discussion of behavioral measures in Chapter 2: In exchange for giving up access to internal processes, you gain access to unfiltered behavioral responses.

To capture these unfiltered behaviors, it is vital for the researcher to be as unobtrusive as possible. As we have already discussed, people have a tendency to change their behavior when they are being observed. In the bullying study by Craig and Pepler (1997) discussed at the beginning of this chapter, the researchers used video cameras to record children’s behavior unobtrusively; otherwise, the occurrence of bullying might have been artifi- cially low. If you conduct an observational study in a laboratory setting, there is no way to hide the fact that people are being observed, but the use of one-way mirrors and video recordings can help people to become comfortable with the setting (versus having an experimenter staring at them across the table). If you conduct an observational study out in the real world, there are even more possibilities for blending into the background, including using observers who are literally hidden. For example, let’s say you hypoth- esize that people are more likely to pick up garbage when the weather is nicer. Rather than station an observer with a clipboard by the trash can, you could place someone out of sight standing behind a tree or perhaps sitting on a park bench pretending to read a magazine. In both cases, people would be less conscious of being observed and therefore more likely to behave naturally.

One extremely clever strategy for blending in comes from a study by the social psychol- ogist Muzafer Sherif, involving observations of cooperative and competitive behaviors among boys at a summer camp (1954). You can imagine that it was particularly impor- tant to make observations in this context without the boys realizing they were part of a research study. Sherrif took on the role of camp janitor, allowing him to be a presence in nearly all of the camp activities. The boys never paid enough attention to the “janitor” to realize his omnipresence—or his discrete note taking. The brilliance of this idea is that it takes advantage of the fact that people tend to blend into the background once we become used to their presence.

Types of Observational Research

There are several variations on observational research, according to the amount of control that a researcher has over the data collection process.

Structured observation involves creating a standard situation in a controlled setting and then observing participants’ responses to a predetermined set of events. The “strange situ- ation” studies of attachment (discussed above) are a good example of structured obser- vation—mothers and infants are subjected to a series of eight structured episodes, and researchers systematically observe and record the infants’ reactions. Even though these types of studies are conducted in a laboratory, they differ from experimental studies in an important way: Rather than systematically manipulate a variable to make comparisons, researchers present the same set of conditions to all participants.

Another example of structured observation comes from the research of John Gottman, a psychologist at the University of Washington. For nearly three decades, Gottman and his colleagues have conducted research on the interaction styles of married couples. Couples who take part in this research are invited for a 3-hour session in a laboratory that closely resembles a living room. Gottman’s goal is to make couples feel reasonably comfortable

new66480_03_c03_p089-132.indd 104 10/31/11 9:39 AM

CHAPTER 3Section 3.4 Observational Research

and natural in the setting, in order to get them talking as they might do at home. After allowing them to settle in, Gottman adds the structured ele- ment by asking the couple to discuss an “ongoing issue or problem” in their marriage. The research- ers then sit back to watch the sparks fly, record- ing everything from verbal and nonverbal com- munication to measures of heart rate and blood pressure. Gottman has observed and tracked so many couples over the decades that he is able to predict, with remarkable accuracy, which couples will divorce in the 18 months following the lab visit (Gottman & Levenson, 1992).

Naturalistic observation involves observing and systematically recording behavior out in the real

world. This can be done in two broad ways—with or without intervention on the part of the researcher. Naturalistic studies that involve researcher intervention consist of manipu- lating some aspect of the environment and then observing responses. For example, you might leave a shopping cart just a few feet away from the cart return area and measure whether people move the cart. (Given the number of carts that are abandoned just inches away from their proper destination, someone must be doing this research all the time. . . ). In another example you may remember from Chapter 1 (in our discussion of ethical dilem- mas), Harari et al. (1995) used this approach to study whether people would help in emer- gency situations. In brief, these researchers staged what appeared to be an attempted rape in a public park and then observed whether groups or individual males were more likely to rush to the victim’s aid.

The ABC network has developed a hit reality show that illustrates this type of research. The show “What Would You Do?” sets up provocative settings in public and videotapes people’s reactions; full episodes are available online at http://abcnews.go.com/WhatWouldYouDo/. If you were an unwitting participant in one of these episodes, you might see a customer stealing tips from a restaurant table or a son berating his father for being gay or a man proposing to his girlfriend who minutes earlier had been kissing another man at the bar. Of course, these observation “studies” are more interested in shock value than data collec- tion (or IRB approval; see Chapter 1), but the overall approach can be a useful strategy to assess people’s reactions to various situations. In fact, some of the scenarios on the show are based on classic studies in social psychology, such as the well-documented phenom- enon that people are reluctant to take responsibility for helping in emergencies.

Alternatively, naturalistic studies can involve simply recording ongoing behavior without any attempt by the researchers to intervene or influence the situation. In these cases, the goal is to observe and record behavior in a completely natural setting. For example, you might station yourself at a liquor store and observe the numbers of men and women who buy beer versus wine. Or, you might observe the numbers of people who give money to the Salvation Army bell ringers during the holiday season. You can use this approach to make comparisons of different conditions, provided the differences occur naturally. That is, you could observe whether people donate more money to the Salvation Army on sunny or snowy days or compare donation rates when the bell ringers are different genders or

Jose Luis Pelaez, Inc./Corbis

Structured observation is useful in predicting which marriages will end in divorce.

new66480_03_c03_p089-132.indd 105 10/31/11 9:39 AMhttp://abcnews.go.com/WhatWouldYouDo/

CHAPTER 3Section 3.4 Observational Research

races. Do people give more money when the bell ringer is an attractive female? Or do they give more to someone who looks more needy? These are all research questions that could be addressed using a well-designed naturalistic observation study.

Participant observation involves having the researcher(s) conduct observations while engaging in the same activities as the participants. The goal is to interact with these par- ticipants in order to gain better access and insight into their behaviors. In one famous example, the psychologist David Rosenhan (1973) was interested in the experience of peo- ple hospitalized for mental illness. To study these experiences, he had eight perfectly sane people gain admission to different mental hospitals. These fake patients were instructed to give accurate life histories to a doctor except for lying about one diagnostic symptom; they all supposedly heard voices occasionally, a symptom of schizophrenia.

Once admitted, these “patients” behaved in a normal and cooperative manner, with instruc- tions to convince hospital staff that they were healthy enough to be released. In the meantime, they observed life in the hospital and took notes on their experiences—a behavior that many doc- tors interpreted as “paranoid note taking.” The main finding of this study was that hospital staff tended to see all patient behaviors through the lens of their initial diagnoses. Despite immedi- ately acting “normally,” these fake patients were hospitalized an average of 19 days (with a range from 7 to 52!) before being released. And all but one was given a diagnosis of “schizophrenia in remission” upon release. The other striking find- ing was that treatment was generally deperson- alized, with staff spending little time with indi- vidual patients.

In another great example of participant obser- vation, Festinger, Riecken, and Schachter (1956) decided to join a doomsday cult to test their new theory of cognitive dissonance. Briefly, this the- ory argues that people are motivated to maintain

a sense of consistency among their various thoughts and behaviors. So, for example, if you find yourself smoking a cigarette despite being aware of the health risks, you might rationalize your smoking by convincing yourself that lung cancer risk is really just genetic. In this case, Festinger and colleagues stumbled upon the case of a woman named Mrs. Keach, who was predicting the end of the world, via alien invasion, at 11 p.m. on a specific date 6 months in the future. What would happen, they wondered, when this prophecy failed to come true?

To answer this question, the researchers pretended to be new converts and joined the cult, living among the members and observing them as they made their preparations for doomsday. Sure enough, the day came, and 11 p.m. came and went without the world

iStockphoto/Thinkstock

Psychologists David Rosenhan’s study of staff and patients in a mental hospital found that patients tended to be treated based on their diagnosis, not on their actual behavior.

new66480_03_c03_p089-132.indd 106 10/31/11 9:39 AM

CHAPTER 3Section 3.4 Observational Research

ending. Mrs. Keach first declared that she had forgotten to account for the time zone dif- ference, but as sunrise started to approach the group members became restless. Finally, after a short absence to communicate with the aliens, Mrs. Keach returned with some good news: The aliens were so impressed with the devotion of the group that they decided to postpone their invasion! The group members rejoiced, rallying around this brilliant piece of rationalizing, and quickly began a new campaign to recruit new members.

As you can see from these examples, participant observation can provide access to amaz- ing and one-of-a-kind data, including insights into group members’ thoughts and feel- ings. This also provides access to groups that might be reluctant to allow outside observ- ers. However, this approach has two clear disadvantages over other types of observation. The first problem is ethical; data is collected from individuals who do not have the oppor- tunity to give informed consent. Indeed, the whole point of the technique is to observe people without their knowledge. In order for an IRB to approve this kind of study, there has to be an extremely compelling reason to ignore informed consent, as well as extremely rigorous measures to protect identities. The second problem is methodological; there is ample opportunity for the objectivity of observations to be compromised by the close contact between researcher and participant. Because the researcher is a part of the group, he or she can change the dynamics in subtle ways, possibly leading the group to confirm his or her hypothesis. In addition, the group can shape the researcher’s interpretations in subtle ways, leading him or her to miss important details.

Steps in Observational Research

One of the major strengths of observational research is that it has a high degree of ecologi- cal validity; that is, the research can be conducted in situations that closely resemble the real world. Think of our examples so far—married couples observed in a living room- like laboratory; doomsday cults observed from within; bullying behaviors on the school playground. In every case, people’s behaviors are observed in the natural environment or something very close to it. But this ecological validity comes at a price; the real world is a jumble of information, some relevant, some not so much. The challenge for the researcher, then, is to decide on a system for sorting out the signal from the noise that provides the best test of her hypothesis. In this section, we discuss a three-step process for conduct- ing observational research. The key thing you should note right away is that most of this process involves making decisions ahead of time so that the process of data collection is smooth, simple, and systematic.

Step 1—Develop a Hypothesis For research to be systematic, it is important to impose structure by having a clear research question and hypothesis. We have covered hypotheses in detail in other chapters, but the main points bear repeating: Your hypothesis must be testable and falsifiable, meaning that it must be framed in such a way that it can be addressed through empirical data and might be disconfirmed by these data. In our example involving Salvation Army donations, we predicted that people might donate more money to an attractive bell ringer. This could easily be tested empirically and could just as easily be disconfirmed by the right set of data—say, if attractive bell ringers brought in the fewest donations.

new66480_03_c03_p089-132.indd 107 10/31/11 9:39 AM

CHAPTER 3Section 3.4 Observational Research

This particular example also highlights an additional important feature of observational hypotheses; namely, they have to be observable. Because observational studies are based on observations of behaviors, our hypotheses have to be centered on behavioral measures. That is, we can safely make predictions about the amount of money people will donate because this can be directly observed. But we are unable to make predictions in this con- text about the reasons for donations. There would be no way to observe, say, that people donate more to attractive bell ringers because they were trying to impress them. In sum, one limitation of observing behavior in the real world is that we are unable to delve into the cognitive and motivational reasons behind the behaviors.

Step 2—Decide What and How to Sample Once you have developed a hypothesis that is testable, falsifiable, and observable, the next step is to decide what kind of information to gather from the environment to test this hypothesis. The simple fact is that the world is too complex to sample everything. Imagine that you wanted to observe the dinner rush at a restaurant. There is a nearly infinite list of possibilities to observe: What time does the restaurant get crowded? How many times do people send their food back to the kitchen? What are the most popular dishes? How often do people get in arguments with the wait staff? To simplify the process of observing behavior, you will need to take samples, or small snippets of the environment that are relevant to your hypothesis. That is, rather than observing “dinner at the restaurant,” the goal is to narrow your focus to something like “the number of people waiting in line for a table at 6 p.m. versus 9 p.m.”

The choice of what and how to sample will ultimately depend on the best fit for your hypothesis. In the context of observational research, there are three strategies for sampling behaviors and events. The first strategy, time sampling, involves comparing behaviors during different time intervals. For example, to test the hypothesis that football teams make more mistakes when they start to get tired, you could count the number of penalties in the first 5 and the last 5 minutes of the game. This data would allow us to compare mis- takes at one time interval with mistakes at another time interval. In the case of Festinger’s study of a doomsday cult, time sampling was used to compare how the group members behaved before and after their prophecy failed to come true.

The second strategy, individual sampling, involves collecting data by observing one person at a time in order to test hypotheses about indi- vidual behaviors. Many of the examples we have already discussed involve individual sampling: Ainsworth and colleagues tested their hypotheses about attachment behaviors by observing indi- vidual infants, while Gottman tests his hypoth- eses about romantic relationships by observing one married couple at a time. These types of data allow us to examine behavior at the individual level and test hypotheses about the kinds of things people do—from the way they argue with their spouses to whether they wear team colors to a football game.

Steve Mason/Photodisc/Thinkstock

The dinner scene at a busy restaurant offers a wide variety of behaviors to sample.

new66480_03_c03_p089-132.indd 108 10/31/11 9:39 AM

CHAPTER 3Section 3.4 Observational Research

The third strategy, event sampling, involves observing and recording behaviors that occur throughout an event. For example, you could track the number of fights that break out during an event such as a football game or the number of times people leave the restaurant without paying the check. This strategy allows for testing hypotheses about the types of behaviors that occur in a particular environment or setting. For example, you might compare the num- ber of fights that break out in a professional football versus a professional hockey game. Or, the next time you host a party, you could count the number of wine bottles versus beer bottles that end up in your recycling bin. The distinguishing feature of this strategy is that you focus on occurrence of behaviors more than on the individuals performing these behaviors.

Step 3—Record and Code Behavior Now that you have formulated a hypothesis and decided on the best sampling strategy, there is one final and critical step before you begin data collection. Namely, you have to develop good operational definitions of your variables by translating the underlying concepts into measurable variables. Gottman’s research turns the concept of marital inter- actions into a range of measurable variables like the number of dismissive comments and passive-aggressive sighing—all things that can be observed and counted objectively. Rosenhan’s study involving fake schizophrenic patients turned the concept of how staff treat patients into measureable variables such as the amount of time staff members spent with each patient—again, something very straightforward to observe.

It is vital to decide up front what kinds and categories of behavior you will be observing and recording. In the last section, we narrowed down our observation of dinner at the restaurant to the number of people in line at 6 p.m. versus the number of people in line at 9 p.m. But how can we be sure we get an accurate count? What if two people are waiting by the door while the other two members of the group are sitting at the bar? Are those at the bar waiting for a table or simply having drinks? One possibility might be to count the number of individuals who walk through the door in different time periods, although our count could be inflated by those who give up on waiting or who only enter to ask for directions to another place.

In short, observing behavior in the real world can be messy. The best way to deal with this mess is to develop a clear and consistent categorization scheme, and stick with it. That is, in testing your hypothesis about the most crowded time at the restaurant, you would choose one method of counting people and use it for the duration of the study. In part, this choice is a judgment call, but your judgment should be informed by three criteria. First, you should consider practical issues, such as whether your categories can be directly observed. You can observe the number of people who leave the restaurant, but you can- not observe whether they got impatient. Second, you should consider theoretical issues, such as how well your categories represent the underlying theory. Why did you decide to study the most crowded time at the restaurant? Perhaps this particular restaurant is in a new, up-and-coming neighborhood and you expect the restaurant to get crowded over the course of the evening. It would also lead you to include people sitting both at tables and at the bar—because this crowd may come to the restaurant with the sole intention of stay- ing at the bar. Finally, you should consider previous research in choosing your categories. Have other researchers studied dining patterns in restaurants? What kinds of behaviors did they observe? If these categories make sense for your project, you should feel free to re-use them—no need to reinvent the wheel!

new66480_03_c03_p089-132.indd 109 10/31/11 9:39 AM

CHAPTER 3Section 3.4 Observational Research

Last, but not least, you should take a step back and evaluate both the validity and the reli- ability of your coding system. (See Chapter 2 for a review of these terms.) Validity in this case means making sure the categories we observe do a good job of capturing the under- lying variables in our hypothesis (i.e., construct validity; see Chapter 2). For example, in Gottman’s studies of marital interactions, some of the most important variables are the emotions expressed by both partners. One way to observe emotions would be to count the number of times a person smiles. However, we would have to think carefully about the validity of this measure because smiling could indicate either genuine happiness or condescension. As a general rule, the better our operational definitions, the more valid our measures will be (Chapter 2).

Reliability in the context of observation means making sure our data are collected in a consistent way. If your research involves more than one observer using the same system, their data should look roughly the same (i.e., interrater reliability). This is accomplished in part by making the task simple and straightforward—for example, you can have trained assistants use a checklist to record behaviors rather than depend on open-ended notes. The other key to improving reliability is through careful training of the observers, giving them detailed instructions and ample opportunities to practice the rating system.

Observation Examples

To give you a sense of how all of this comes together, let’s walk through a pair of exam- ples, from research question to data collection.

Example 1—Theater Restroom Usage First, imagine, for the sake of this example, that you are interested in whether people are more likely to use the restroom before or after watching a movie. This research question could provide valuable information for theater owners in planning employee schedules (i.e., when are bathrooms most likely to need cleaning). Thus, by studying patterns of human behavior, we could gain valuable applied knowledge.

The first step is to develop a specific, testable, and observable hypothesis. In this case, we might predict that people are more likely to use the restroom after the movie, as a result of consuming those 64-ounce sodas during the movie. And, just for fun, let’s also compare the restroom usage of men and women. Perhaps men are more likely to wait until after the movie, whereas women are as likely to go before as after? This pattern of data might look something like the percentages in Table 3.1. That is, men make 80% of their restroom visits after the movie and 20% before the movie, while women make about 50% of their restroom visits at each time.

Table 3.1: Hypothesized Data from Observation Exercise

Gender Men Women

Before movie 20% 50%

After movie 80% 50%

Total 100% 100%

new66480_03_c03_p089-132.indd 110 10/31/11 9:39 AM

CHAPTER 3Section 3.4 Observational Research

The next step is to decide on the best sampling strategy to test this hypothesis. Of the three sampling strategies we discussed—individual, event, and time—which one seems most relevant here? The best option would probably be time sampling because our hypothesis involves comparing the number of restroom visitors in two time periods (before versus after the movie). So, in this case, we would need to define a time interval for collecting data. One option would be to limit our observations to the 10 minutes before the previews begin and the 10 minutes after the credits end. The potential problem here, of course, is that some people might use either the previews or the end credits as a chance to use the restroom. Another complication arises in trying to determine which movie people are watching; in a giant multiplex theater, movies start just as others are finishing. One possible solution, then, would be to narrow our sample to movie theaters that show only one movie at a time and to define the sampling times based on the actual movie start and end times.

Once we decide on a sampling strategy, the next step is decide on the types of behaviors we want to record. This particular hypothesis poses a challenge because it deals with a rather private behavior. In order to faithfully record people “using the restroom,” we would need to station researchers in both men’s and women’s restrooms to verify that people actually, well, “use” the restroom while they are in there. However, this strategy comes with the potential downside that your presence (standing in the corner of the rest- room) will affect people’s behavior. Another, less intrusive option would be to stand out- side the restroom and simply count “the number of people who enter.” The downside here, of course, is that we don’t technically know why people are going into the restroom. But sometimes research involves making these sorts of compromises—in this case, we chose to sacrifice a bit of precision in favor of a less intrusive measurement.

So, in sum, we started with the hypothesis that men are more likely to use the restroom after a movie, while women use the restroom equally before and after. We then decided that the best sampling strategy would be to identify a movie theater showing only one movie and to sample from the 10-minute periods before and after the actual movie’s run- ning time. Finally, we decided that the best strategy for recording behavior would be to station observers outside the restrooms and count the number of people who enter. Now, let’s say we conduct these observations every evening for one week and collect the data in Table 3.2.

Table 3.2: Findings from Observation Exercise

Gender Men Women

Before movie 75 (25%) 300 (60%)

After movie 225 (75%) 200 (40%)

Total 300 (100%) 500 (100%)

You can see that more women (N = 500) than men (N = 300) attended the movie theater during our week of sampling. But the real test of our hypothesis comes from examining the percentages within gender groups. That is, of the 300 men who went into the restroom, what percentage of them did so before the movie and what percentage of them did so after the movie? In this dataset, women used the restroom with relatively equal frequency before (60%) and after (40%) the movie. Men, in contrast, were three times as likely to

new66480_03_c03_p089-132.indd 111 10/31/11 9:39 AM

CHAPTER 3Section 3.4 Observational Research

use the restroom after (75%) than before (25%) the movie. In other words, our hypothesis appears to be confirmed by examining these percentages.

Example 2—Cell Phone Usage While Driving Imagine for this example that you are interested in patterns of cell phone usage among drivers. Several recent studies have reported that drivers using cell phones are as impaired as drunk drivers, making this an important public safety issue. Thus, if we could under- stand the contexts in which people are most likely to use cell phones, this would provide valuable information for developing guidelines for safe and legal use of these devices. So, in this study, we might count the number of drivers using cell phones in two settings: in rush-hour traffic and moving on the freeway.

The first step is to develop a specific, testable, and observable hypothesis. In this case, we might predict that people are more likely to use cell phones when they are bored in the car. So, we hypothesize that we will see more drivers using cell phones while stuck in rush- hour traffic than while moving on the freeway.

The next step is to decide on the best sampling strategy to test this hypothesis. Of the three sampling strategies we discussed—individual, event, and time—which one seems most relevant here? The best option would probably be individual sampling because we are interested in the cell phone usage of individual drivers. That is, for each individual car we see during the observation period, we want to know whether the driver is using a cell phone. One strategy for collecting these observations would be to station observers along a fast-moving stretch of freeway, as well as along a stretch of road that is clogged during rush hour. These observers would keep a record of each passing car, noting whether the driver was on the phone.

Once we decide on a sampling strategy, our next step is to decide on the types of behav- iors we want to record. One challenge in this study is in deciding how broadly to define the category of cell phone usage. Would we include both talking and text messaging? Given our interest in distraction and public safety, we probably would want to include text messaging. Several states have recently banned text messaging while driving, in response to tragic accidents. Because we will be observing moving vehicles, the most reli- able approach might be to simply note whether each driver had a cell phone in his or her hand. As with our restroom study, we are sacrificing a little bit of precision (i.e., we don’t know what the cell phone is being used for) to capture behaviors that are easier to record.

So, in sum, we started with the hypothesis that drivers would be more likely to use cell phones when stuck in traffic. We then decided that the best sampling strategy would be to station observers along two stretches of road, and they should note whether drivers were using cell phones. Finally, we decided that the best compromise for observing cell phone usage would be to note whether each driver was holding a cell phone. Now, let’s say we conduct these observations over a 24-hour period and collect the data shown in Table 3.3.

new66480_03_c03_p089-132.indd 112 10/31/11 9:39 AM

CHAPTER 3Section 3.4 Observational Research

Table 3.3: Findings from Observation Exercise #2

Rush Hour Moving

Cell Phone 30 (30%) 200 (67%)

No Cell Phone 70 (70%) 100 (33%)

Total 100 300

You can see that more cars passed by during the non-rush-hour stretch (N = 300) than during the rush-hour stretch (N = 200). But the real test of our hypothesis comes from examining the percentages within each stretch. That is, of the 100 people observed during rush hour and the 300 observed not during rush hour, what percentage were using cell phones? In this data set, 30% of those in rush hour were using cell phones, compared with 67% of those not during rush hour using cell phones. In other words, our hypothesis was not confirmed by the data. Drivers in rush hour were less than half as likely to be using cell phones. The next step in our research program would be to speculate on the reasons why the data contradicted our hypothesis.

Qualitative versus Quantitative Approaches

The general method of observation lends itself equally well to qualitative and quantita- tive approaches, although some types of observation fit one approach better than the other. For example, structured observation tends to be focused on hypothesis testing and quantification of responses. In Mary Ainsworth’s “strange situation” research (described above), the primary goal was to expose children to a predetermined script of events and to test hypotheses about how children with secure and insecure attachments would respond to these events. In contrast, naturalistic observation—and, to a greater extent, participant observation—tends to be focused on learning from events as they occur natu- rally. In Leon Festinger’s “doomsday cult” study, the researchers joined the group in order to observe the ways members reacted when their prophecy failed to come true.

new66480_03_c03_p089-132.indd 113 10/31/11 9:39 AM

CHAPTER 3Section 3.4 Observational Research

Research: Thinking Critically The Irritable Heart

By K. Kris Hirst

Using open source data from a federal project digitizing medical records of veterans of the American Civil War (1860–1865) called the Early Indicators of Later Work Levels, Disease, and Death Project, researchers have identified an increased risk of post-war illness among Civil War veterans, including cardiac, gastrointestinal, and mental diseases throughout their lives. In a project partly funded by the National Institutes of Aging, military service files from a total of 15,027 servicemen from 303 compa- nies of the Union Army stored at the United States National Archives were matched to pension files and surgeon’s reports of multiple health examinations. A total of 43% of the men had mental health problems throughout their lives, some of which are today recognized as related to post-traumatic stress disorder (PTSD). Most particularly affected were men who enlisted at ages under 17. Roxane Cohen Silver and colleagues at the University of California, Irvine, published their results in the Feb- ruary 2006 issue of Archives of General Psychiatry.

Studies of PTSD to date have connected war experiences to the recurrence of mental health prob- lems and physical health problems such as cardiovascular disease and hypertension and gastroin- testinal disorders. These studies have not had access to long-term health impacts, since they have been focused on veterans of recent conflicts. Researchers studying the impact of modern conflict participation report that the factors increasing risk of later health issues include age at enlistment, intimate exposure to violence, prisoner of war status and having been wounded.

The Trauma of the American Civil War

The Civil War was a particularly traumatic conflict for American soldiers. Army soldiers commonly enlisted at quite young ages; between 15% and 20% of the Union army soldiers enlisted between ages of 9 and 17. Each of the Union companies was made up of 100 men assembled from regional neighborhoods, and thus often included family members and friends. Large company losses—75% of companies in this sample lost between 5% and 30% of their personnel—nearly always meant the loss of family or friends. The men readily identified with the enemy, who in some cases represented family members or acquaintances. Finally, close-quarter conflict, including hand-to-hand combat without trenches or other barriers, was a common field tactic during the Civil War.

To quantify trauma experienced by Civil War soldiers, researchers used a variable derived from per- centage of company lost to represent relative exposure to trauma. Researchers found that in military companies with a larger percentage of soldiers killed, the veterans were 51% more likely to have cardiac, gastrointestinal, and nervous disease.

The Youngest Soldiers Were Hardest Hit

The study found that the youngest soldiers (ages 9 to 17 years at enlistment) were 93% more likely than the oldest (ages 31 and older) to experience both mental and physical disease. The younger soldiers were also more likely to show signs of cardiovascular disease alone and in conjunction with gastrointestinal conditions, and they were more likely to die early. Former POWs had an increased risk of combined mental and physical problems as well as early death.

One problem the researchers grappled with was comparing diseases as they were recorded during the latter half of the 19th century to today’s recognized diseases. Post-traumatic stress syndrome was not recognized by doctors—although they did recognize that veterans exhibited an extreme level of “nervous disease” that they labeled “irritable heart” syndrome. (continued)

new66480_03_c03_p089-132.indd 114 10/31/11 9:39 AM

CHAPTER 3Section 3.5 Describing Your Data

3.5  Describing Your Data

Before we move on from descriptive research designs, this last section covers the process of presenting descriptive data in both graphical and numeric form. No mat-ter how you present your data, a good description is one that is accurate, concise, and easy to understand. In other words, you have to represent the data accurately and in the most efficient way possible, so that your audience can understand it. Another, more eloquent way to think of these principles is to take the advice of Edward Tufte, a statisti- cian and expert in the display of visual information. Tufte suggests that when people view your visual displays, they should spend time on “content-reasoning” rather than “design-decoding” (Tufte, 2001). The sole purpose of designing visual presentations is to communicate your information. So, the audience should spend time thinking about what you have to say, not trying to puzzle through the display itself. In the following sections, we cover guidelines for accomplishing this goal in both numeric and visual form.

Table 3.4 presents hypothetical data from a sample of 20 participants. In this example, we have asked people to report their gender and ethnicity, as well as answer questions about their overall life satisfaction and daily stress. Each row in this table represents one participant in the study, and each column represents one of the variables for which data were collected. In the following sections, we will explore different options for summarizing these sample data, first in numeric form and then using a series of graphs. Our focus in this chapter is on

Children and Adolescents in Combat

Harvard psychologist Roger Pitman, writing in an editorial in the publication , writes that the impact on younger soldiers should be of immediate concern, since “their immature nervous systems and diminished capacity to regulate emotion give even greater reason to shudder at the thought of children and adolescents serving in combat.” Although disease identification is not one-to-one, said senior researcher Roxane Cohen Silver, “I’ve been studying how people cope with traumatic life experiences of all kinds for 20 years and these findings are quite consistent with an increasing body of literature on the physical and mental health consequences of traumatic experiences.”

Boston University psychologist Terence M. Keane, Director of the National Center for PTSD, com- mented that this “remarkably creative study is timely and extremely valuable to our understanding of the long-term effects of combat experiences.” Joseph Boscarino, Senior Investigator at Geisinger Health System, added “There are a few detractors that say that PTSD does not exist or has been exaggerated. Studies such as these are making it difficult to ignore the long-term effects of war- related psychological trauma.”

Think about it

1. What hypotheses are the researchers testing in this study?

2. How did the researchers quantify trauma experienced by Civil War soldiers? Do you think this is a valid way to operationalize trauma? Explain why or why not.

3. Would this research be best described as case studies, archival research, or natural observa- tion? Are there elements of more than one type? Explain.

Critical Thinking Questions (continued)

new66480_03_c03_p089-132.indd 115 10/31/11 9:39 AM

CHAPTER 3Section 3.5 Describing Your Data

ways to describe the sample characteristics. In later chapters, we will return to these prin- ciples in discussing graphs that display the relationship between two or more variables.

Table 3.4: Raw Data from a Sample of Twenty Individuals

Subject ID Gender Ethnicity Life Satisfaction Daily stress

1 Male White 40 10

2 Male White 47 9

3 Female Asian 29 8

4 Male White 32 9

5 Female Hispanic 25 3

6 Female Hispanic 35 3

7 Female White 28 8

8 Male Hispanic 40 9

9 Male Asian 37 10

10 Female African American 30 10

11 Male White 43 8

12 Male Asian 40 4

13 Male White 48 7

14 Female African American 30 4

15 Female White 37 7

16 Male Hispanic 40 1

17 Female White 36 1

18 Male African American 45 8

19 Female White 42 8

20 Female African American 38 7

Numeric Descriptions

Frequency Tables Often, a good first step in approaching your data set is to get a sense of the frequencies for your demographic variables—gender and ethnicity in this example. The frequency tables shown in Table 3.5 are designed for presenting the number and percentage of the sample that falls into each of a set of categories. As you can see in this pair of tables, our sample consisted of an equal number of men and women (i.e., 50% for each gender). The majority of our participants were white (45%), with the remainder divided almost equally between African American (20%), Asian (15%), and Hispanic (20%) ethnicities.

new66480_03_c03_p089-132.indd 116 10/31/11 9:39 AM

CHAPTER 3Section 3.5 Describing Your Data

Table 3.5: Frequency Table Summarizing Ethnicity and Sex Distribution

Gender Frequency Percentage Valid percentage Cumulative percentage

Female 10 50.0 50.0 50.0

Male 10 50.0 50.0 100.0

Total 20 100.0 100.0

Ethnicity Frequency Percentage Valid percentage Cumulative percentage

African American

4 20.0 20.0 20.0

Asian 3 15.0 15.0 35.0

Hispanic 4 20.0 20.0 55.0

White 9 45.0 45.0 100.0

Total 20 100.0 100.0

We can gain a lot of information from numerical summaries of data. In fact, numeric descriptors form the starting point for doing inferential statistics and testing our hypoth- eses. We will cover these statistics in later chapters, but for now it is important to under- stand that two numeric descriptors can provide a wealth of information about our data set: measures of central tendency and measures of dispersion.

Measures of Central Tendency The first number we need to describe our data is a measure of central tendency, which represents the most typical case in our data set. There are three indices for representing central tendency:

The mean is the mathematical average of our data set, calculated using the following formula:

M 5 SX N

The capital letter M is used to indicate the mean; the X refers to individual scores, and the capital letter N refers to the total number of data points in the sample. Finally, the Greek letter sigma, or S, is a common symbol used to indicate the sum of a set of values.

So, in calculating the mean, we add up all the scores in our data set (SX), and then divide this total by the number of scores in the data set (N). Because we are adding and dividing our scores, the mean can only be calculated using interval or ratio data (see Chapter 2 for a review of the four scales of measurement). In our sample data set, we could calculate the mean for both life satisfaction and daily stress. To calculate the mean value for life satisfac- tion scores, we would first add the 20 individual scores (i.e., 40 1 47 1 29 1 32 1 . . . 1 38), and then divide this total by the number of people in the sample (i.e., 20).

new66480_03_c03_p089-132.indd 117 10/31/11 9:39 AM

CHAPTER 3Section 3.5 Describing Your Data

M 5 SX N

5 742 20

5 37.1

In other words, the mean, or most typical satisfaction rating in this sample is 37.1.

The median is another measure of central tendency, representing the number in the mid- dle of our dataset, with 50% of scores both above and below it. The location of the median is calculated by placing the list of values in ascending numeric order, then using the fol- lowing formula: Mdn 5 (N 1 1)/2. For example, if you have 9 scores, the median will be the fifth one: Mdn 5 (N 1 1)/2 5 (9 1 1)/2 5 10/2 5 5. If you have an even number of scores, say, 8, the median will fall between two scores: Mdn 5 (8 1 1)/2 5 9/2 5 4.5, or the average of the fourth and fifth one. This measure of central tendency can be used for ordinal, interval, or ratio data because it does not require mathematical manipulation to obtain. So, in our sample data set, we could calculate the median for either life satisfaction or daily stress scores. To find the median score for life satisfaction, we would sort the data in order of increasing satisfaction scores (which has already been done in this case). Next, we find the position of the median using the formula Mdn 5 (N 1 1)/2. Because we have an N of 20 scores:

Mdn 5 1N 1 1 2

2 5

120 1 1 2 2

5 21 2

5 37.1

In other words, the median will be the average of the 10th and 11th scores. The 10th partici- pant scored a 37, and the 11th participant scored a 38, for a median of 37.5. The median is another way to represent the most typical score on life satisfaction, so it is no accident that it is so similar to the mean (i.e., 37.1).

The final measure of central tendency, the mode, represents the most frequent score in our data set, obtained either by visual inspection of the values or by consulting a frequency table like in the one in Table 3.5 (discussed below). Because the mode represents a simple frequency count, it can be used with any of the four scales of measurement. In addition, it is the only measure of central tendency that is valid for use with nominal data, since the numbers assigned to these data are arbitrary.

So, in our sample data, we could calculate the mode for any of the variables in the table. To find the mode of life satisfaction scores, we would simply scan the table for the most common score, which turns out to be 40. Thus, we have one more way to represent the most typical score on life satisfaction. Note that the mode is slightly higher than our mean (37.1) or our median (37.5). We will return to this issue shortly and discuss the process of choosing the most representative measure. Since we’ve been ignoring the nominal vari- ables so far, let’s also find the mode for ethnicity. This is accomplished by tallying up the number of people in each category—or, better yet, by letting a computer program do the tallying for you. As we saw earlier, the majority of our participants were white (45%), with the remainder divided almost equally among African American (20%), Asian (15%), and Hispanic (20%) ethnicities. So, the modal, or most typical value of ethnicity, in this sample was white.

One important take-home point is that your scale of measurement largely dictates the choice between measures of central tendency—nominal scales can only use the mode, and

new66480_03_c03_p089-132.indd 118 10/31/11 9:39 AM

CHAPTER 3Section 3.5 Describing Your Data

the mean can only be used for interval or ratio scales. The other piece of the puzzle is to consider which measure best represents the data. Remember that the central tendency is a way to represent the “typical” case with a single number, so the goal is to settle on the most representative number. This process is illustrated by the examples in Table 3.6.

Table 3.6: Comparing the Mean, Median, and Mode

Data Mean Median Mode Analysis

1,2,3,4,5, 11,11

5.29 4 11 • Both the mean and the median seem to represent the data fairly well.

• The mean is a slightly better choice because it hints at the higher scores.

• The mode is not representative—two people seem to have higher scores than everyone else.

1,1,1,5,10, 10,100

18.29 5 1 • The mean is inflated by the atypical score of 100 and therefore does not represent the data accurately.

• The mode is also not representative because it ignores the higher values.

• In this case, the median is the most representative value to describe this dataset.

Let’s look at one more example, using the “Daily Stress” variable from our sample data in Table 3.4. The Daily Stress values of our 20 participants were as follows: 1, 1, 3, 3, 4, 4, 7, 7, 7, 8, 8, 8, 8, 8, 9, 9, 9, 10, 10, and 10.

• To calculate the mean of these values, we add up all of the values and divide by our sample size of 20:

M 5 SX N

5 134 20

5 6.70

• To calculate the median of these values, we use the formula Mdn 5 (N 1 1)/2 to find the middle score: Mdn = (N 1 1)/2 5 (21)/2 5 10.5. This tells us that our median is the average of our 10th and 11th scores, or 8.

• To obtain the mode of these values, we can inspect the data and determine that 8 is the most common number because it occurs five times.

In analyzing these three measures of central tendency, we see that they all appear to rep- resent the data accurately. The mean is a slightly better choice than the other two because it represents the lower values as well as the higher ones.

Measures of Dispersion The second measure used to describe our data set is a measure of dispersion, or the spread of scores around the central tendency. Measures of dispersion tell us just how typical the typical score is. If the dispersion is low, then scores are clustered tightly around the cen- tral tendency; if dispersion is higher, then the scores stretch out farther from the central

new66480_03_c03_p089-132.indd 119 10/31/11 9:39 AM

CHAPTER 3Section 3.5 Describing Your Data

tendency. Figure 3.2 presents a conceptual illustration of dispersion. The graph on the left has a low amount of dispersion because the scores (i.e., the blue curve) cluster tightly around the average value (i.e., the red dotted line). The graph on the right shows a high amount of dispersion because the scores (blue curve) spread out widely from the average value (red dotted line).

One of the most straightforward measures of dispersion is the range, which is the dif- ference between the highest and lowest scores. In the case of our Daily Stress data, the range would be found by simply subtracting the lowest value (1) from the highest value (10) to get a range of 9. The range is useful for getting a general idea of the spread of scores, although it does not tell us much about how tightly these scores cluster around the mean.

The most common measures of dispersion are the variance and standard deviation, both of which represent the average difference between the mean and each individual score. The variance (abbreviated S2) is calculated by subtracting each score from the mean to get a deviation score, squaring and summing these individual deviation scores, and then dividing by the sample size. The more scores are spread out around the mean, the higher the sum of our deviation scores will be, and therefore the higher our variance will be. The deviation scores are squared because otherwise their sum would always equal zero; that is, S(X 2 M) 5 0. Finally, the standard deviation, abbreviated SD, is calculated by taking the square root of our variance. This four-step process is illustrated in Table 3.7, using a hypothetical data set of 10 participants.

Once you know the central tendency and the dispersion of your variables, you have a good sense of what the sample looks like. These numbers are also a valuable piece for calculating the inferential statistics that we ultimately use to test our hypotheses.

Low Amount of Dispersion Around the Mean (red dotted line)

High Amount of Dispersion Around the Mean (red dotted line)

Figure 3.2: Two Distributions with a Low vs. High Amount of Dispersion

new66480_03_c03_p089-132.indd 120 10/31/11 9:39 AM

CHAPTER 3Section 3.5 Describing Your Data

Table 3.7: Steps to Calculate the Variance and Standard Deviation

Values 1.  Subtract values  from mean.

2.  Square & sum  deviation scores.

3.  Calculate variance.

1 (1 2 5.4) 5 24.4 24.42 5 19.36

S2 5 S 1X 2 X 2

N 5

82.4 10

5 8.24

2 (2 2 5.4) 5 23.4 23.42 5 11.56

2 (2 2 5.4) 5 23.4 23.42 5 11.56

4 (4 2 5.4) 5 21.4 21.42 5 1.96

5 (5 2 5.4) 5 20.4 20.42 5 0.16

7 (7 2 5.4) 5 1.6 1.62 5 2.56 4. Calculate standard deviation.

7 (7 2 5.4) 5 1.6 1.62 5 2.56

s 5 !s2 5 !8.24 5 2.87 8 (8 2 5.4) 5 2.6 2.62 5 6.76

9 (9 2 5.4) 5 3.6 3.62 5 12.96

9 (9 2 5.4) 5 3.6 3.62 5 12.96

mean = 5.40 S 5 0.00 S 5 82.40

Standard Scores So far, we have been discussing ways to describe one particular sample in numeric terms. But what do we do when we want to compare results from different samples or from stud- ies using different scales? Let’s say you want to compare the anxiety levels of two people; unfortunately, in this example, the people were measured using different anxiety scales:

Joe scored 25 on the ABC Anxiety Scale, which has a mean of 15 and a standard deviation of 2.

Deb scored 40 on the XYZ Anxiety Scale, which has a mean of 30 and a standard deviation of 10.

At first glance, Deb’s anxiety score appears higher, but note that the scales have different properties: The ABC scale has an average score of 15, while the XYZ scale has an average score of 30. The dispersion of these scales is also different; scores on the ABC scale cluster more tightly around the mean (i.e., SD = 2 compared to SD = 10).

The solution for comparing these scores is to convert both of them to standard scores (or z scores), which represent the distance of each score from the sample mean, expressed in standard deviation units. The formula for a z score is:

Z 5 x 2 M

SD

new66480_03_c03_p089-132.indd 121 10/31/11 9:39 AM

CHAPTER 3Section 3.5 Describing Your Data

This formula subtracts the individual score from the mean and then divides this difference by the standard deviation of the sample. In order to compare Joe’s score with Deb’s score, we simply plug in the appropriate numbers, using the mean and standard deviation from the scale that each one completed. This lets us put scores from very different distributions on the same scale. So, in this case:

Joe: Z 5 x 2 M

SD 5

25 2 15 2

5 10 2

5 5

Deb: Z 5 x 2 M

SD 5

40 2 30 10

5 10 10

5 1

The resulting scores represent each person’s score in standard deviation terms: Joe is 5 standard deviations above the mean of the ABC scale, while Deb is only 1 standard devia- tion above the mean of the XYZ scale. Or, in plain English, Joe is actually considerably more anxious than Deb.

In order to understand just how anxious Joe is, it is helpful to know a bit about why this technique works. If you have taken a statistics class, you will have encountered the con- cept of the normal distribution (or “bell curve”), a symmetric distribution with an equal number of scores on either side of the mean, as illustrated in Figure 3.3.

It turns out that lots of variables in the social and behavioral sciences fit this normal dis- tribution, provided the sample sizes are large enough. The useful thing about a normal distribution is that it has a consistent set of properties, such as having the same value for mean, median, and mode. In addition, if the distribution is normal, each standard devia- tion cuts off a known percentage of the curve, as illustrated in Figure 3.3. That is, 68% of scores will fall within 6 1 standard deviation of the mean; 95% of scores will fall within 6 two standard deviations; and 99.7% of scores will fall within 6 three standard deviations.

Low –3SD –2SD –1SD +1SD +2SD +3SDMean

Score

68%

95%

99.7%

High

Fr eq

u en

cy

Figure 3.3: Standard Deviations and the Normal Distribution

new66480_03_c03_p089-132.indd 122 10/31/11 9:39 AM

CHAPTER 3Section 3.5 Describing Your Data

These percentages allow us to understand our individual data points in even more use- ful ways, because we can easily move back and forth between z scores, percentages, and standard deviations. Take our example of Joe and Deb’s anxiety scores: Deb has a z score of 1, which means her anxiety is 1 standard deviation above the mean. And, as we can see by consulting the normal distribution, her anxiety level is higher than 84% of the popula- tion. Poor Joe has a z score of 5, which means his anxiety is 5 standard deviations above the mean. This also means that his anxiety is higher than 99.999% of the population. (See http://www.measuringusability.com/pcalcz.php for a handy online calculator that con- verts between z scores and percentages).

This relationship between z scores and percentiles is also commonly used in discussions of intelligence test scores. Tests that purport to measure IQ are converted to a scale that has a mean of 100 and a standard deviation of 15. Because IQ is normally distributed, we are able to move easily back and forth between z scores and percentages. For example, someone who has an IQ test score of 130 falls 2 standard deviations above the mean and falls in the upper 2.5% of the population. A person with an IQ test score of 70 is 2 standard deviations below the mean and thus falls in the bottom 2.5% of the population.

Ultimately, the use of standard scores allows us to take data that have been collected on different scales—perhaps in different laboratories and different countries—and place them on the same metric for comparison. As we have discussed in several contexts, science is all about the accumulation of knowledge one study at a time. The best support for an idea comes when it is supported by data from different researchers, using different measures to capture the same concept. The ability to convert these different measures back to the same metric is an invaluable tool for researchers who want to compare research results.

Visual Descriptions

Displaying Frequencies Displaying your data in visual form is often one of the most effective ways to communi- cate your findings—hence the cliché, a picture is worth a thousand words. But what sort of visual should you use? Our choice of graphs is guided by two criteria: the scale of mea- surement and the best fit for the results.

One common type of graph is the bar graph, which also summarizes the frequency of data by category. Figure 3.4a presents a bar graph, showing our four categories of ethnicity along the horizontal axis and the number of people falling into each category indicated by the height of the bars. So, for example, this sample contains 9 white participants and 4 Hispanic participants. You’ll notice that these bar graphs contain exactly the same infor- mation as the frequency table in Table 3.5. When reporting your results in a paper, you would, of course, only use one of these methods; more often than not, graphical displays are the most effective way to communicate information.

Figure 3.4b shows another variation on the bar graph, the clustered bar graph, which sum- marizes the frequency by two categories at one time. In this case, our bar graph displays information about both gender and ethnicity. As in the previous graph, our categories of ethnicity are displayed along the horizontal axis. But this time, we have divided the total number of each ethnicity by the gender of respondents—indicated using different colored

new66480_03_c03_p089-132.indd 123 10/31/11 9:39 AMhttp://www.measuringusability.com/pcalcz.php

CHAPTER 3Section 3.5 Describing Your Data

bars. For example, you can see that our 9 white participants are divided into 5 males and 4 females. Similarly, our 4 African American participants are divided into 1 male and 3 females.

The important rule to keep in mind with bar graphs is that they are used for qualitative, or nominal, categories—that is, those that do not have a numerical value. We could just as easily have listed white participants second, third, or fourth along the axis because ethnic- ity is measured on a nominal scale.

When we want to present quantitative data—that is, those values measured on an ordi- nal, interval, or ratio scale—we use a different kind of graph called a histogram. As seen

Fr eq

u en

cy

10

White Asian Hispanic African American

Ethnicity

9

8

7

6

5

4

3

1

2

0

Fr eq

u en

cy

6

White Asian Hispanic African American

Gender

Male

Female

5

4

3

2

1

0

Figure 3.4: Bar Graph Displaying (a) Frequency by Ethnicity and (b)  Clustered Bar Graph Displaying Frequency by Ethnicity and Gender

a)

b)

new66480_03_c03_p089-132.indd 124 10/31/11 9:39 AM

CHAPTER 3Section 3.5 Describing Your Data

in Figure 3.5a, histograms are drawn with the bars touching one another to indicate that the categories are quantitative and on a continuous scale. In this figure, we have broken down the “life-satisfaction” values into three categories (less than 30, 31–40, and 41–50) and displayed the frequencies for each category in numerical order. For example, you can see that 6 people had life satisfaction scores falling between 31 and 40.

Finally, all of our bar graphs and histograms so far have displayed data that have been split into categories. But, as seen in Figure 3.5b, histograms can also present data on a con- tinuous scale. Figure 3.5b also has an additional new feature—a curved line overlaid on the graph. This curve is a representation of a normal distribution, and allows us to gauge visually how close our sample data are to being normally distributed.

Fr eq

u en

cy

10

Less than 31 31–40 41–50

Life Satisfaction

9

8

7

6

5

4

3

1

2

0

Figure 3.5: Histograms Showing (a) Frequencies by Life Satisfaction  (Quantitative) Categories and (b) Life Satisfaction Scores on a Continuous Scale

Fr eq

u en

cy

4

3

2

1

0 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Life Satisfaction

b)

a)

new66480_03_c03_p089-132.indd 125 10/31/11 9:39 AM

CHAPTER 3Section 3.5 Describing Your Data

Displaying Central Tendency Another common use of graphs is to display numeric descriptors in an easy-to-understand visual format. That is, we can apply the same principles for displaying information about our sample frequencies to displaying the typical scores in the sample. If we refer back to our sample data in Table 3.4, we have information about ethnicity and gender but also about reports of daily stress and life satisfaction. Thus, a natural question to ask is whether there are gender or ethnic differences in these two variables. Figure 3.6 displays a clustered bar graph, displaying the mean level of life satisfaction in each group of participants. One thing that jumps out is that males appear to report more life satisfaction than females, as seen by the fact that the red bars are always higher than the gold bars. We can also see some variation in satisfaction levels by ethnicity: African-American males (45) seem to report slightly more satisfaction than white males (42).

These particular data are fictional, of course; but even if our graph were displaying real data, we would want to be cautious in our interpretations. One reason for caution is that this represents a descriptive study. We might be able to state which demographic groups report more life satisfaction, but we would be unable to determine the reasons for the dif- ference. Another, more important, reason for caution is visual presentations can be mis- leading, and we would need to conduct statistical analyses to discover the real patterns of differences.

The best way to appreciate this latter point is to see what happens when we tweak the graph a little bit. Our original graph in Figure 3.6 is a fair representation of the data: The scale starts at zero, and the y-axis on the left side increases by reasonable intervals. But, if we were trying to win an argument about gender differences in happiness, we could always alter the scale, as shown in Figure 3.7. These bars represent the same set of means, but we have compacted the y-axis to show only a small part of the range of the scale. That is, rather than ranging from 0 to 50, this misleading graph ranges from 28 to 45, in

50

African–American Asian Hispanic White

45

40

35

30

25

20

15

5

10

0

Male

Female

Figure 3.6: Clustered Bar Graph Displaying Life Satisfaction Scores by  Gender and Ethnicity

new66480_03_c03_p089-132.indd 126 10/31/11 9:39 AM

CHAPTER 3Summary

increments of 1. To the uncritical eye, this appears to show an enormous gender difference in life satisfaction; to the trained eye, this shows an obvious attempt to make the findings seem more interesting. Anytime you encounter a bar graph that is used to support a par- ticular argument, always pay close attention to the scale of the results: Does it represent the actual range of the data, or is it compacted to exaggerate the difference? Likewise, anytime you create a graph to display results, it is your responsibility as a researcher to ensure that the graph is an accurate representation of the data.

Summary

In this chapter, we have focused on descriptive designs, the first of three specific research designs we will be covering. As the name implies, the primary goal of descrip-tive designs is to describe attitudes and behavior, without any pretense of making causal claims. One common feature of all descriptive designs is that they are able to assess behaviors in their natural environment, or at least in something very close to it. We cov- ered three types of descriptive research in this chapter: case studies, archival research, and observational research. Because each of these methods has the goal of describing atti- tudes, feelings, and behaviors, each one can be used from either a quantitative or a quali- tative perspective.

In a case study, the researcher studies one person in great detail over a period of time. This approach is often used to study special populations and to gather detailed information about rare phenomena. On the one hand, case studies represent the lowest point on our continuum of control due to the lack of a comparison group and to the difficulty of gen- eralizing from a single case. On the other hand, case studies are a valuable tool for begin- ning to study a phenomenon in depth. We discussed the example of Phineas Gage, who suffered severe brain damage and showed drastic changes in his personality and cogni- tive skills. Although it is difficult to generalize from the specifics of Gage’s experience, this

45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28

African–American Asian Hispanic White

Male

Female

Figure 3.7: Clustered Bar Graph Altered to Exaggerate the Differences

new66480_03_c03_p089-132.indd 127 10/31/11 9:39 AM

CHAPTER 3Summary

case has helped to inspire more than a century’s worth of research into the connections among mind, brain, and behavior.

Archival research involves drawing new conclusions by analyzing existing sources of data. This approach is often used to track changes over time or to study things that would be impossible to measure in a laboratory setting. For example, we discussed Phillips’s study of copycat suicides, which he conducted by matching newspaper coverage of sui- cides to subsequent spikes in fatality rates. There would be no practical or ethical way to study these connections other than examining the patterns as they occurred naturally. Archival studies are still relatively low on our continuum of control, primarily because the researcher does not have much control over how the data were collected. In many cases, analyzing archives involves a process known as content analysis, or developing a cod- ing strategy to extract relevant information from a broader collection of content. Content analysis involves a three-step process: identifying the most relevant archives, sampling from these archives, and finally coding and recording behaviors. For example, Weigel and colleagues studied race relations on television by sampling a week’s worth of prime-time programming and recording the screen time dedicated to portraying different races.

Observational research involves directly observing behavior and recording observations in a systematic way. This approach is well suited to a wide variety of research questions, provided your variables can be directly observed. That is, you can observe what people do but not why they do it. In exchange for giving up access to internal processes, you gain access to unfiltered behavioral responses—especially when you find ways to observe people unobtrusively. We discussed three main types of observational research. Struc- tured observation involves creating a standardized situation, often in a laboratory setting, and tracking people’s responses. Naturalistic observation involves observing behavior as it occurs naturally, often in its natural context. Participant observation involves having the researcher take part in the same activities as the participants in order to gain greater insight into their private behaviors. All three of these variations go through a similar three-step process as archival research: choose a hypothesis; choose a sampling strategy; and then code and record behaviors.

Finally, this chapter covered principles for describing data in both visual and numeric form. To move toward conducting statistical analyses, it is also useful to summarize your data in numeric form. We discussed two categories of numeric summaries, central ten- dency and dispersion. Measures of central tendency (i.e., mean, median, and mode) pro- vide information about the “typical” score in your data set, while measures of dispersion (i.e., range, variance, and standard deviation) provide information about the distribution of scores around the central tendency—that is, they tell us how typical our typical score is. Finally, we covered the process of translating scores into standard scores (aka, z scores), which express individual scores in terms of standard deviations. This technique is useful for comparing results from different studies and different measures. We also discussed guidelines for visual presentation. If you remember one thing from this section, remember that the sole purpose of visual information is to communicate your findings to an audi- ence. Thus, your descriptions should always be accurate, concise, and easy to understand. The most common visual displays for summarizing data are bar graphs (for nominal data) and histograms (for quantitative data). Regardless of the visual display you choose, it should represent your data accurately; it is especially important to make sure that the y-axis is an accurate representation of the range of your data.

new66480_03_c03_p089-132.indd 128 10/31/11 9:39 AM

CHAPTER 3Summary

Key Terms

archival research a descriptive design that involves drawing conclusions by analyz- ing existing sources of data, including both public and private records

bar graph a visual display that summa- rizes the frequency of our data by cat- egory; used to display nominal data

case study a descriptive design that pro- vides a detailed, in-depth analysis of one person over a period of time

central tendency a numeric descriptor that represents the most typical case in a data set

clustered bar graph a visual display that summarizes the frequency by two catego- ries at one time; used to display nominal data

content analysis the process of systemati- cally extracting and analyzing the contents of a collection of information

deviation score the difference between an individual score and the sample mean, obtained by subtracting each score from the mean

dispersion a numeric descriptor that represents the spread of scores around the central tendency

ecological validity the extent to which the research setting resembles conditions in the real world

event sampling in observational research, a technique that involves observing and recording behaviors that occur throughout an entire event

frequency tables summary tables that present the number and percentage of the sample that fall into each of a set of categories

histogram a variation of a bar graph used to display ordinal, interval, or ratio data; histograms are drawn with the bars touch- ing one another to indicate that the catego- ries are quantitative

individual sampling in observational research, a technique that involves collect- ing data by observing one person at a time in order to test hypotheses about individ- ual behaviors

mean a measure of central tendency that represents the mathematical average of a data set; calculated by adding all the scores together and then dividing by the number of scores

median a measure of central tendency that represents the number in the middle of a data set, with 50% of scores both above and below it

mode a measure of central tendency that represents the most frequent score in a data set, obtained either by visual inspection of the values or by consulting a frequency table

naturalistic observation a type of obser- vational study that involves observing and systematically recording behavior in the real world; can be done with or without intervention by the researcher

normal distribution (or “bell curve”) a symmetric distribution with an equal number of scores on either side of the mean; has the same value for mean, median, and mode

observational research a descriptive design that involves directly observing behavior and recording these observations in an objective and systematic way

new66480_03_c03_p089-132.indd 129 10/31/11 9:39 AM

CHAPTER 3Summary

participant observation a type of obser- vational study that involves having the researcher(s) conduct observations while engaging in the same activities as the participants; the goal is to interact with participants to gain access and insight into their behaviors

participant reactivity the tendency of people to behave differently when they are aware of being observed

range a measure of dispersion that repre- sents the difference between the highest and lowest scores

samples small snippets of the environ- ment that are relevant to the hypothesis

standard deviation another measure of dispersion that represents the average difference between the mean and each individual score; calculated by taking the square root of the variance

standard scores (or z scores) scores that represent the distance of each score from the sample mean, expressed in standard deviation units; calculated by subtracting a score from the mean, then dividing by the standard deviation

structured observation a type of observa- tional study that involves creating a stan- dard situation in a controlled setting and then observing participants’ responses

time sampling in observational research, a technique that involves comparing behav- iors during different time intervals

variance a measure of dispersion that rep- resents the average difference between the mean and each individual score; calculated by subtracting each score from the mean to get a deviation score, squaring and sum- ming these individual deviation scores, and dividing by the sample size

Apply Your Knowledge

1. Compare and contrast the sets of the following terms. Your answers should dem- onstrate that you understand each term.

a. individual sampling vs. event sampling b. participant observation vs. naturalistic observation c. mean vs. median vs. mode d. variance vs. standard deviation e. bar graph vs. histogram

2. Place each of the three research methods we have discussed in this chapter (listed below) on the continuum of control.

,——————————————————————————————————. LOW HIGH

archival research case study naturalistic observation

new66480_03_c03_p089-132.indd 130 10/31/11 9:39 AM

CHAPTER 3Summary

3. For each of the following research methods, list 1 advantage and 1 disadvantage associated with this particular method.

a. archival research advantage: disadvantage: b. case studies advantage: disadvantage: c. observation studies advantage: disadvantage:

4. For each of the following data sets, compute the mean, median, mode, and standard deviation. Once you have all three measures of central tendency, decide which one is the best representation of the data.

a. 2, 2, 4, 5 b. 10, 13, 15, 100

5. Mike scores an 80 on a math test that has a mean of 100 and a standard deviation of 20. Convert Mike’s test score into a z score.

6. For each of the following relationships, state the best way to present it graphi- cally (bar graph, clustered bar graph, or histogram).

a. average income by years of school completed (ratio scale) b. average income based on category of school completed (high school, some

college, college degree, master’s degree, and doctoral degree) c. average income based on gender and category of school completed

7. For each of the following questions, state how you would test them using an observational design.

a. Are people who own red cars more likely to drive like maniacs? (1). What would your hypothesis be? (2). Where would you get your sample, and how (i.e., which type)? (3). what categories of behavior would you record? How would you define

them? b. Are men more likely than women to “lose control” at a party? (1). What would your hypothesis be? (2). Where would you get your sample, and how (i.e., which type)? (3). What categories of behavior would you record? How would you define

them? c. How many fights break out in an average NHL (hockey) game? (1). What would your hypothesis be? (2). Where would you get your sample, and how (i.e., which type)? (3). What categories of behavior would you record? How would you define

them?

new66480_03_c03_p089-132.indd 131 10/31/11 9:39 AM

CHAPTER 3Summary

Critical Thinking Questions

1. Explain the tradeoffs involved in taking a qualitative versus a quantitative approach to your research question. What are the pros and cons of each one?

2. What are the advantages and disadvantages of conducting participant observation?

new66480_03_c03_p089-132.indd 132 10/31/11 9:39 AM

Leave a Comment

Your email address will not be published. Required fields are marked *