This web page is intended to provide an introduction to the process of educational research, for people whose
expertise is in scientific research and teaching. The examples given are specific to my own interest in promoting
ocean literacy through courses in introductory oceanography, but the principles apply to any educational research
in the geosciences.
Even if you think you're just giving your students another test like any other class test, if it's for the sake of research, you need permission from your university's human subjects office. In this case, permission is easier to attain than forgiveness. NCSU's human subjects research web page is here .
It's also important not to freak out your students by giving a test on the first day of class. In my case, I explained the sort of research I was doing, wrote "Don't Panic" in large comforting letters on the board, and promised that they would not be penalized for incorrect answers. I did make it worth a few percent of their grade to take the pre-class and post-class surveys, but also offered alternative assignments in case anyone really didn't want to do the survey. No one took me up on the alternatives, and my students put good effort into completing the surveys. I also got students' names on the surveys, for the sake of comparing grades with literacy scores - don't show the original surveys with names on to anyone.
It might be tempting to define ones objectives in an introductory oceanography course in terms of a standard textbook. Most introductory texts are structured similarly, with 15-20 chapters covering the various topics of oceanography in a consistent order. Some professors march through the text, presenting one chapter per week, and might thus define their teaching objectives in terms of topics covered. This is a teaching objective, not a learning objective. Another tempting mistake is to announce that students should understand a certain list of topics, thereby treating a teaching objective as a learning objective. Understanding is not a measurable quantity. Student behaviors, such as correctly answering a certain question or designing a bridge, are measurable.
Some Oceanographic Questions Related to Bloom's Taxonomy
This heirarchical approach to understanding presents an important problem for assessment. Multiple-choice questions are easily scored and provide quantitative data, but it is very difficult to write a multiple-choice question that addresses higher order skills. How, for example, would one write a multiple-choice question that tests a student's ability to design a bridge? True assesments of these higher order skills require more open ended assessment tools, that tend to be perceived as less qualitative. This debate is described in more detail below, under the discussion of test validity and Julie Libarkin's work on the Geoscience Concept Inventory.
Finally, Dr Felder offers this very important tip: "Consider taking a gradual approach: formulating good objectives for a course may take some time, and there is no need to write them all in a single course offering." This advice also applies to developing educational surveys; sometimes you just have to start by asking open ended questions and finding out what the students are thinking.
Numerous studies have been conducted to assess public awareness of and concern for environmental issues, of which relatively few have been specific to the ocean (The Ocean Project, Belden et al 1999c). In the mid 1990's Americans generally rated ocean health as poor and weakening, but did not perceive the oceans to be in immediate danger (Belden 1999a). Survey respondents blamed humans in general, and supported regulation in the abstract, but felt that individuals had no significant impact (Belden 1999c). Respondants placed great personal importance on the oceans, but considered air and water pollution to be more significant environmental issues and crime and education more significant still (Belden 1999c). Ocean Project survey also included five questions about the science of the ocean, of which the average American could answer only two correctly.
A decade later, people had an increased sense of urgency about ocean issues, with 80% believing that man made stresses are endangering coastal regions and oceans (AAAS 2004). About 30% of survey respondents felt that their personal actions had a lot of influence on the health of oceans and coastal regions, and slightly more that half were willing to take certain actions, such as eating less of some kinds of fish (AAAS 2004).
I used some of AAAS' questions in a post-class survey.
Of 110 students who answered the question, all but five felt that the ocean was in trouble.
One annotated his survey with the concise comment "no, duh!". The inability of this
question to distinguish between students renders it unreliable (as discussed below), so my survey does not
currently include a question about the overall health of the ocean.
Confidence surveys can be used both to determine whether students believe they understand science well enough to make a difference in the world, and more importantly, to help determine whether the students' confidence is actually merited. There is often a significant disconnect between peoplesí "common sense" conceptual models of physical science and scientific fact. In one famous video, Harvard graduates confidently and incorrectly announced that the seasons were due to the distance between the earth and sun (Shapiro et al 1988). Similar open-ended questions were used by De Laughter et al (1998), to assess the preconceptions of students entering introductory earth science courses. Again, student's preconceptions were held quite confidently, in spite of logical contradictions (Halloun and Hestenes 1985a,b, De Laughter et al 1998).
The confidence with which students hold their general (un)scientific preconceptions contrasts with their initial lack of confidence on course-specific material. In surveys conducted at the end of a course, Nufer (2003) found a connection between students' confidence that they could answer course-specific questions and their actual class grades. Nufer did not specify the nature of the questions used to determine those grades, but De Laughter has noted that the lower the level of understanding required, the more confident students may be.
A more recent survey of public ocean literacy revealed that about half of Americans consider themselves "somewhat informed" about coastal and policy issues, about 15% consider themselves "informed" or "very well informed" and about a third consider themselves "not informed" (Steel et al, 2005). The level of confidence seems to be unmerited; in a five-item quiz about ocean policy the average score was less than 50%, consistent with Belden et al, (1999). Steel et al did not report the inter-correlations between confidence and quiz scores, or between the scores on different quiz items, so the reliability of this survey is unknown.
I am currently testing the hypothesis that student confidence is correlated with their ability to complete a high-order task related to ocean stewardship. One learning objective for this semester is that students should write a letter to their congress-person about an issue affecting the ocean. (They will not be required to mail the letters, nor will their political positions affect their grades. The object is to lay out the issues coherently, demonstrating a clear understanding of the relevant science.) In a pre-test and future post-test, students are asked to rate their level of agreement with the following statements:
Student agreement with the statements is measured on a 5-point Likert scale ( * ) and pre- and post-test scores will be correlated with grades on the letter-writing assignment. I hope to see an increase student agreement with the above statements, and positive correlation between confidence and actual ability.
Determining the validity of confidence surveys is critical, because the higher the level of understanding on Bloom's taxonomy,
the harder the question is to grade. For the sake of large scale educational surveys, the questions must be easy to grade.
Therefore, we need to find simple proxies for the big questions.
Nature of Science:
Student success in science has been found to predictable from students attitudes toward science (e.g. Adams et al, 2006).
For example, students who take responsibility for their own learning tend to do better, as do students who percieve a given science as being
related to the real world. The Colorado Learning Attitudes about Science Survey, CLASS includes
42 questions about student attitudes toward learning physics. Of those, I chose nine where the word "physics" could plausibly be replaced
by "oceanography". My honors students this semester have taken this attitude survey
as a pre-test, but I have yet to use it for a larger sample.
The Views of the Nature Of Science Survey VNOS is well-validated and commonly used. As the questions are qualitative and difficult to score, I am using only one question from this survey: "What is science?".
The Views about Science Survey VASS (look near the bottom of that page) has similar questions to CLASS, but gives two answers for each question and a sliding scale to choose between them. The authors of CLASS made a big deal of the fact that their questions were categorized empirically, based on something like a cluster analysis of student responses. VASS may use pre-determined categories; I'm not sure.
The open format questions are modeled on a published Earth Science Literacy Test that is used to measure students' preconceptions (DeLaughter et al, 1998). This qualitative data collection is a vital step toward the development of valid and reliable quantitative survey instruments, such as the Geoscience Concept Inventory GCI (Libarkin & Kurdziel, 2002ab, Libarkin & Anderson, 2005).
My survey instruments have undergone several stages of validation and revision. All versions of the survey, as well as several related surveys, are available through my Ocean Literacy Webpage.
Beichner (1994) illustrates the concepts of validity and reliability using the image of a target, reproduced here for a seminar I gave at UNC. A reliable survey instrument measures student knowledge or attitudes in a consistent and reproducible manner, suggested by the tight clustering of arrow holes in the target on the left side of the figure. Validity is a measure of accuracy, suggested by the cloud of arrows surrounding the center of the target on the right side of the figure. The validity of a survey instrument depends on how the target is defined; clear learning objectives are critical.
There are a wide variety of approaches to reliability and validity, and different researchers list anywhere from three to
five different types of validity (*) .
There is even some overlap between descriptions of validity and reliability. The most coherent reference I have found
on the www is a a superb tutorial
posted by Colorado State University What follows is my interpretation of that tutorial.
|Types of Reliability|
|Equivalency Reliability||Are scores on different questions inter-correlated?|
|Stability Reliability||Does the same instrument, applied at a later time, give the same result?|
|Internal Consistency||Do different questions that measure the same thing give the same result?|
|Interrater Reliability||Do two independent raters assign the same score to the same answer?|
Types of Validity
| How does the survey look? Reasonable? Well Designed? |
Q: how does it look to my mom?
A: she likes it, and she's an educational researcher
| Do my objectives represent the topic I wish to study? |
Q: am I asking the right questions?
A: work with experts
| Does the theoretical concept match the measuring device? |
Q: am I asking them the right way?
A: work with students
Criterion Related Validity
| Does my instrument give the same result as an existing valid instrument? |
Q: do my results correlate with class grades?
A: yup, and with Ernie's grades, too
Validity is also divided into two larger categories, internal and external. Quoting from the glossary provided with CSU's tutorial .
"Internal Validity (1) The rigor with which the study was conducted (e.g., the study's design, the care taken to conduct measurements, and decisions concerning what was and wasn't measured) and (2) the extent to which the designers of a study have taken into account alternative explanations for any causal relationships they explore (Huitt, 1998). In studies that do not explore causal relationships, only the first of these definitions should be considered when assessing internal validity."
"External Validity The extent to which the results of a study are generalizable or transferable. "
Currently, the most reliable and thoroughly validated survey used in the earth sciences is the Geosciences Concept Inventory GCI, developed by Julie Libarkin and colleagues. Dr Libarkin is a geophysicist by training, but has been conducting educational research for the better part of a decade (developing the GCI took seven years and the cooperation of dozens of geoscience faculty around the country). She has written several articles and a regular column for the Journal of Geoscience Education to help other geoscientists get into educational research. She advocates the approach I have followed, using qualitative data to establish the context for the study, analyzing the qualitative data using quantitative methods, and finally developing new surveys that produce quantitative data (Libarkin & Kurziel, 2002ab). This approach is vital for Construct Validation.
The open ended questions in my preliminary survey provide qualitative data in the form of written answers. Quoting Libarkin & Kurdziel, 2002b, "Three types of analysis are common in qualitative research: thematic content analysis, where themes are extracted from the text, indexing, where specific words are viewed in context, and quantitative descriptive analysis, or word counting." I have used thematic content analysis extensively in my preliminary studies of students attitudes and interests (Cudaback, 2006). For example, in response to a pre-class survey question about human impacts, 88% of students mentioned pollution, but only 8% mentioned coastal development, which is a very hot issue in North Carolina. We discussed development in mid-semester and pollution at the end of the semester. The pollution study was fresh in students' minds when they completed the post-class survey, and 89% mentioned pollution again. The number of students mentioning development increased to about 14%, suggesting that our discussion had some lasting impact. This small result demonstrates one use of thematic content analysis.
My first qualitative paper was published Oct 3, 2006, in EOS, Transactions of the American Geophysical Union. This study used thematic content analysis to extract quantitative data about students' background and interests from essay format questions. Data are currently being analyzed for a quantitative measure of how well the students understand the ocean. Watch for updates to this section.