Unpacking a Research Study
Summary: Assessing Research ValidityUnpacking a Research Synthesis (or Literature Review) References and Resources
When researchers discuss whether findings and conclusions from research can be trusted, they are referring to validity
. Researchers have proposed different frameworks for examining validity and have different terms to describe different types of validity. The terms, however, are not as important as understanding what makes research conclusions valid and knowing what questions to ask about the research.
As Shadish, Cook and Campbell (2002) point out, validity means the approximate truth of an inference or conclusion. Thus, in the Primer, the term “research validity” means the validity of the researcher’s conclusions.
Unpacking a Research Study
Judging the validity of a research study requires some detective work. When a crime is committed, the prosecuting attorney makes arguments to support the conclusion that a person is guilty. The defense attorney presents arguments to support the conclusion that the person is not guilty. Each attorney dissects and analyzes the criminal case.
Policymakers and educators who are judging research and evaluation studies need to be like prosecuting attorneys. They need to take apart and analyze studies for possible errors – the “crimes” against research validity. The researchers are like the defense attorneys. They need to provide evidence they did not commit research crimes.
Unpacking a research study involves asking four questions:
- What is the research question?
- Does the research design match the research question?
- How was the study conducted?
- Are there rival explanations for the results?
Although education research studies and evaluation studies have different goals, procedures, and reporting formats, their conclusions should be assessed using the same criteria for validity.
STEP 1: WHAT IS THE RESEARCH QUESTION?
In the introduction to most research reports, the purpose of the study is presented in research question or in a research hypothesis. Sometimes the questions are not explicit. Regardless of how a question is phrased, it is important to determine whether the research question is descriptive or causal. For the research to be valid, it must be designed to answer the type of question asked.
Descriptive Research asks these types of questions:
- Whatis happening?
- Howis something happening?
- Why is something happening?
The following examples illustrate how descriptive research questions might be stated in a report. Note that research questions are sometimes contained in the form of a statement:
- We hypothesized that teacher professional development has a positive association with student achievement.
- We were interested in what types of teacher professional development occur in high-performing schools.
- Do high-performing schools provide teachers with more professional development than low-performing schools?
- How do high-performing schools design professional development?
Causal Research (or Experimental Research) asks this type of question:
- Does something cause an effect?
The following examples illustrate how causal research questions might be stated in a report. Note that in many reports the word “cause” is not explicit. If the statement or question implies, however, that an effect (e.g., higher student achievement) will result from something that is varied (e.g., the effect of more versus less teacher professional development), then the research question is a causal question. Also note, again, that questions are sometimes given in the form of statements:
- We hypothesized that increasing the amount of professional development teachers received would increase student achievement.
- We were interested in whether teacher professional development in language arts increases student achievement more than teacher professional development in general teaching strategies.
- Does providing teachers with professional development in teaching reading cause their students to have higher achievement in reading?
As the two sets of examples of causal and descriptive research questions show, sometimes questions in descriptive research appear to seek a causal connection. Descriptive research, however, lacks the random assignment and manipulation of a treatment present in experimental research. In the absence of these two elements, the most that descriptive research can uncover is the correlationor association of factors; it cannot reveal an actual causal relationship.
Correlation only indicates that two or more factors occur in association with one another; it does not indicate whether one factor causes another. For example, the correlation of poverty with low student achievement does not mean that poverty causes low achievement. There are other factors possibly associated with poverty that might be causing low achievement such as the lack of a consistent caregiver.
STEP 2: DOES THE RESEARCH DESIGN MATCH THE RESEARCH QUESTION?
After determining the type of research question that the study addresses, the next step is to examine the research design. For research to be valid, the research design must match the research question. Descriptive research questions require descriptive research designs. Causal research questions require experimental research designs.
For more information about research design, see Appendix A, A RESEARCH TYPOLOGY.
STEP 3: HOW WAS THE STUDY CONDUCTED?
Step 3 concerns the research method, which refers to how the study was conducted and how the research design was implemented. A research report should provide enough details about the method so the study can be repeated. Without these details, it is difficult and sometimes impossible to judge the validity of the research.
Four key components of the research method influence research validity:
- Participants– Who were the participants in the study? How were they selected?
The research report should describe the number of participants in the study, as well as their characteristics. This includes not only the characteristics of persons, but also those of entities such as schools and districts. In addition, the report should describe how the study’s participants were chosen and how participants were assigned (if they were) to the different comparison groups in the study.
For a more thorough discussion on this component, see PARTICIPANT CONSIDERATIONS.
- Treatment – How is the treatmentdefined and described in the study? How was it implemented?
Most education research studies concern a particular education treatment or intervention, for example, a reading program, a type of teacher preparation or a mathematics curriculum. A good researcher will define the treatment carefully and implement it consistently.
A more thorough discussion can be read about TREATMENT CONSIDERATIONS.
- Data Collection– What data were collected, and how were they collected?
Most education research studies attempt to connect a treatment to a result. This result is called the dependent variable and refers to what is being measured in a research study. Datamake up the body of information produced by these measures. Student achievement and teacher classroom practices are examples of dependent variables in education research.
Data-collection procedures refer to how and when the data were collected. The procedures used to collect data can influence research validity.
The most commonly used data-collection instrumentsin education research are the following:
It is critical that data-collection instruments have both validity and reliability.
For a more thorough discussion, see DATA-COLLECTION CONSIDERATIONS.
- Data Analysis– How were the data analyzed?
When determining whether or not a particular study did a good job of analyzing the data it produced, it is important to distinguish between quantitative data and qualitative data(see also Creswell, 2002).
Researchers analyze quantitative data through statistics. The computation of inferential statisticsis the primary basis for research conclusions about a treatment effect.
In qualitative research, the data consist of narrative descriptions and observations. Although statistics are not used, qualitative data analyses need to be systematic to support valid research conclusions. In most qualitative research studies, large amounts of descriptive information are organized into categories and themes through codingin order facilitate interpretations of the findings.
A more thorough discussion can be read about DATA-ANALYSIS CONSIDERATIONS.
For a deeper understanding of key statistical concepts, see the UNDERSTANDING STATISTICS TUTORIAL.
STEP 4: ARE THERE RIVAL EXPLANATIONS FOR THE RESULTS?
At the end of a research report, the researcher presents conclusions based on the results that were obtained through the study. To judge whether a conclusion can be trusted, always ask this question: Could there be an explanation for the results other than the conclusion reached? Researchers refer to these rival explanations as threats to validitybecause they threaten the validity of the research conclusion (Shadish et al., 2002). It is the job of the researcher to rule out rival explanations by demonstrating they do not apply to the study.
It is especially important to identify or rule out rival explanations when the researcher concludes that a treatment (e.g., an education program or intervention) has an effect – in other words, that something works. Research studies that examine the effects of a treatment usually collect quantitative data and employ a treatment group and a control group.
Several factors can account for rival explanations in quantitative studies of the effectiveness of an intervention:
- Selection bias concerns how the study participants are assigned to comparison groups in a study. Random assignment is the best way to ensure student and teacher characteristics that might influence outcomes do not systematically favor the treatment or the control group. Random assignment of students and teachers sometimes is not feasible, however. To rule out a rival explanation due to selection bias, the researcher should describe the characteristics of both groups of teachers and their students (i.e., in the control group and the treatment group) and either show how the comparison groups are similar or conduct data analyses that statistically controlfor individual student characteristics (e.g., socioeconomic status) and teacher characteristics (e.g., teaching experience).
- Sample attrition (also called mortality) can be a rival explanation. If more participants (e.g., teachers and/or students) leave the treatment group than the control group (or vice versa), the results could be due to differences in characteristics between the groups at the posttest that did not exist at the pretest. To rule out a rival explanation based on sample attrition, the researcher should document who left the study and why. Sample attrition is a particular concern in longitudinal researchstudies where the same participants are studied over a long time span. The participants who remain in the study could have different characteristics than those who left.
- Treatment diffusionor spillover, another rival explanation, can occur when participants who are in different comparison groups operate in the same environment such as teachers in the same school. Teachers in the control group might overhear treatment teachers discussing the intervention, or control teachers might gain access to materials being used for the intervention. The researcher should ask participants in each group about their interactions and document their responses.
- History effectscan be a problem in research studies that occur over a long span time, such as a year or more. For example, there might be a change in school leadership. To rule out rival explanations based on history effects, the researcher needs to monitor all possible occurrences and demonstrate they do not influence the results of the treatment and control groups differently.
- Practice effects refer to a rival explanation that results from repeated measuresof the same individuals. In any research study where participants are tested or measured more than once, there is a possibility that the participants’ responses on the second and subsequent tests are affected by practice on the pretest. Practice effects are less likely to occur when there are longer time spans between the pretest and posttest. The researcher should determine whether participants practiced for the posttest and especially whether practice occurred more in the treatment group compared to the control group. (Test practice by students for state assessments has become commonplace.)
- Regression toward the meanis a rival explanation that can occur when participants have extremely low or extremely high scores on a pretest. Extreme scores tend to move toward the average or mean score when a test is repeated. This means that extreme scorers will score less extremely on posttests, even without a treatment. To rule out this rival explanation, the researcher should demonstrate, for example, that the students of the treatment and control teachers do not differ in the proportion of extreme scorers.
Several EXAMPLESillustrate the issue of rival explanation in studies of intervention effectiveness.
In qualitative research, it is also important to rule out rival explanations for the results. This occurs through procedures such as:
- Checking back with study participants to confirm that the researcher’s interpretation of their responses, in an interview, for example, is correct.
- The use of multiple sources of data. When data from several different sources, such as documents, interviews and observations, converge on the same conclusions, there can be greater confidence in the validity of these conclusions than if only one data source informs conclusions.
- A search for disconfirming evidence in which the researcher examines all the data for any evidence that might indicate the conclusions are wrong.
- Generation of specific rival explanations for the conclusions and a demonstration of how they do not apply based on the data and the methods used.
Summary: Assessing Research Validity
The following table lists the components of a research study and summarizes the important questions to ask regarding each component in order to be able to assess the study’s validity. For a more detailed set of considerations that can actually be employed as a guide in the assessment of a specific research study, see the ANALYZING RESEARCH FLOWCHART.
|Questions to ask about research validity
|Research Question and Design
What was the basis for selecting the participants?
How were the participants assigned to groups?
Do participant selection and assignment follow the research design?
Are the results influenced by
of participants and contexts?
Was the treatment implemented as planned?
What is the operational definition of the
Are the data-collection instruments valid and reliable?
Was there training for data collectors?
Were nonsignificant results (i.e., p > .05) discussed as if they were
in the scores influence the results?
What procedures were used to
Did any of the following occur that might have affected the results and were not ruled out?
Conclusions about score gains from a treatment without a
Conclusions about score gains from a treatment without a
Bias in assigning participants to different comparison groups?
Loss of participants from the study sample?
Spillover of the treatment into the control or comparison group?
Influences from an event that occurred between a pretest and
Effects from participant practice on the measuring instrument?
Extreme scores that could become less extreme on the posttest regardless of treatment?
Unpacking a Research Synthesis (or Literature Review)
A research synthesis reviews and integrates the findings from prior empirical research studies. The purpose of a research synthesis (or literature review) is to generate conclusions about a particular topic based on the body of prior research related to the topic.
Unpacking a research synthesis involves asking four questions:
- What is the research question?
- How comprehensive and systematic was the search for past research literature?
- What were the criteria for including and excluding research studies?
- How were the results of past research studies analyzed and summarized?
- What is the validity of the conclusions?
Step 1: What is the research question?
In a research synthesis, the researcher poses a question that the synthesis will address. For example: “What is the influence of tutoring on student achievement in reading?”
Operational definitionsof the terms in the research question influence the scope of the prior research that will be examined. For example, tutoring could be defined as one-on-one instruction of a student by an adult, a peer or both. Students could be elementary, secondary or both. Finally, student achievement could be defined as test scores, grades or both. Broader definitions are likely to provide more information related to the research question than are narrower definitions. As a result, conclusions will be more trustworthy with broader definitions. For example, tutoring might have different influences on elementary students compared to secondary students. Failure to include studies on both types of students might lead to erroneous conclusions about the overall effect of tutoring on student achievement.
Step 2: Was there a comprehensive and systematic search for past research?
The methods used to search for past research studies are critical to a research synthesis. A comprehensive literature search requires an examination of all potential sources of research literature on a topic. A systematic literature search requires the consistent use of terms in searching for research studies in databases such as ERIC. For example, searching for both “tutoring” and “peer-tutoring” in one database and searching only for “tutoring” in another database would not be a systematic literature search and would overlook potentially informative studies.
Step 3: What were the criteria for including and excluding research studies?
Most reviewers employ criteria for selecting studies for the synthesis from among the studies produced by the literature search. These criteria and the rationale for their use need to be clearly specified. One common reason to include or exclude studies is their relevance to the research question. For example, for a research question that concerns student achievement in reading, studies that measure only mathematics achievement would be excluded. Another reason to exclude a study is the type of method used to conduct the study. Depending on the research question, some methods would not provide trustworthy results for inclusion. For example, a reviewer might decide to include studies on the effectiveness of tutoring only if they used a comparison group of students who did not receive tutoring. Another criterion concerns whether studies have been published in journals or books. Although published studies are more likely to have undergone peer review, journals tend not to publish studies that report negative or no effects of an intervention. Consequently, a reviewer who examines only published studies risks making erroneous conclusions about intervention effectiveness.
Inclusion and exclusion criteria should be established prior to the literature search and should be applied consistently to all the studies that the search produces. Otherwise, there could be reviewer bias in selecting studies that have particular results. In addition, the reviewer should describe the number and characteristics of excluded studies.
Step 4: How were the results of past research studies analyzed and summarized?
There are different methods for conducting research syntheses.
Narrative reviewis a qualitative method that involves summarizing the results of studies through narrative description. Sometimes narrative reviews report the number of positive and negative findings among the studies.
Meta-analysis is a quantitative method that involves summarizing the results based on their means and standard deviations. The result of a meta-analysis is an effect size, which indicates the overall impact of the intervention being studied.
Meta-analyses use standardized procedures, and syntheses results can be replicated. Narrative reviews are less systematic than meta-analyses, and they depend more on reviewer judgment, which makes the syntheses results difficult to replicate. Meta-analyses, however, tend to combine studies together into categories (e.g., all tutoring studies of elementary students) so that differences in study details (e.g., the nature of the tutoring) are obscured. Additionally, meta-analysis is useful only with quantitative researchstudies.
5. Do the conclusions have validity?
The validity of conclusions from a research synthesis depends on:
- A comprehensive and systematic literature search
- The consistent application of inclusion criteria that are backed by a rationale for their use
- A method of data analysis that is systematic and appropriate to the research question and the type of studies being synthesized
- Reviewer interpretation of the results
The interpretation of synthesis results depends on reviewer judgment. Reviewers should judge results based on the synthesis method and the nature of the studies reviewed. The conclusions should reflect any limitations to the synthesis. For example, the conclusions of a synthesis that examines only published qualitative studies of an intervention can be made only in reference to that body of studies. A synthesis of other types of studies might reach different conclusions. Similarly, reviewers should consider the research quality of the studies in the synthesis when drawing conclusions. If the individual research studies in the synthesis are not valid, then a conclusion based on a synthesis of these studies is unlikely to be valid.
References and Resources
Cooper, H. (1998). Synthesizing research: A guide for literature reviews(3rd ed.). Thousand Oaks, CA: Sage Publications.
Creswell, J.W. (2002). Research design: Qualitative, quantitative and mixed method approaches.Thousand Oaks, CA: Sage Publications.
Shadish, W.R., Cook, T.D., and Campbell, D.T. (2002). Experimental and quasi-experimental designs for causal inference.Boston: Houghton Mifflin.
Shanahan, T. (2000). "Research synthesis: Making sense of the accumulation of knowledge in reading." In M.L. Kamil, P.B. Mosenthal, P.D. Pearson, and R. Barr (Eds.), Handbook of reading research, volume III (pp. 209-226). Mahwah, NJ: Lawrence Erlbaum and Associates.