Cynthia Cudaback, PhD
Ocean Consulting

Home Science Education Policy

Conducting Educational Research

Purpose of this Web Page

This web page is intended to provide an introduction to the process of educational research, for people whose expertise is in scientific research and teaching. The examples given are specific to my own interest in promoting ocean literacy through courses in introductory oceanography, but the principles apply to any educational research in the geosciences.

Step 1: Getting Permission

Even if you think you're just giving your students another test like any other class test, if it's for the sake of research, you need permission from your university's human subjects office. In this case, permission is easier to attain than forgiveness. NCSU's human subjects research web page is here .

It's also important not to freak out your students by giving a test on the first day of class. In my case, I explained the sort of research I was doing, wrote "Don't Panic" in large comforting letters on the board, and promised that they would not be penalized for incorrect answers. I did make it worth a few percent of their grade to take the pre-class and post-class surveys, but also offered alternative assignments in case anyone really didn't want to do the survey. No one took me up on the alternatives, and my students put good effort into completing the surveys. I also got students' names on the surveys, for the sake of comparing grades with literacy scores - don't show the original surveys with names on to anyone.

Step 2: Defining Learning Objectives

How not to define objectives

It might be tempting to define ones objectives in an introductory oceanography course in terms of a standard textbook. Most introductory texts are structured similarly, with 15-20 chapters covering the various topics of oceanography in a consistent order. Some professors march through the text, presenting one chapter per week, and might thus define their teaching objectives in terms of topics covered. This is a teaching objective, not a learning objective. Another tempting mistake is to announce that students should understand a certain list of topics, thereby treating a teaching objective as a learning objective. Understanding is not a measurable quantity. Student behaviors, such as correctly answering a certain question or designing a bridge, are measurable.

Cognitive Objectives & Bloom's Taxonomy

In 1956, Benjamin Bloom found that students were generally tested on their recall of specific information, not on higher order skills such as synthesizing and applying information. He designed the following taxonomy to help classify different sorts of cognitive processes. For each level, certain verbs like "list" and "explain" can be used to define objectives. My primary resource is Richard Felder , who is conducting some very interesting research here at NSCU. Other useful web pages are maintained by the University of Victoria and and online psychology course. In the most recent version of the taxonomy (*) the highest order skill, "Creation" replaces "Synthesis", which was a lower-order skill than "Evaluation".
  1. Knowing (repeating from memory): list, identify, summarize, label, define
  2. Comprehending (demonstrating understanding of terms and concepts): explain, describe, interpret, select
  3. Applying (applying learned information to solve a new problem): apply, calculate, demonstrate, illustrate
  4. Analyzing (breaking things down into their elements, formulating explanations of observed phenomena): derive, explain, classify, test
  5. Evaluating (choosing among alternatives and justifying the choice): determine, optimize, select, justify, evaluate
  6. Creating (creating something, combining elements in novel ways): formulate, design, create, propose

Some Oceanographic Questions Related to Bloom's Taxonomy

This heirarchical approach to understanding presents an important problem for assessment. Multiple-choice questions are easily scored and provide quantitative data, but it is very difficult to write a multiple-choice question that addresses higher order skills. How, for example, would one write a multiple-choice question that tests a student's ability to design a bridge? True assesments of these higher order skills require more open ended assessment tools, that tend to be perceived as less qualitative. This debate is described in more detail below, under the discussion of test validity and Julie Libarkin's work on the Geoscience Concept Inventory.

Finally, Dr Felder offers this very important tip: "Consider taking a gradual approach: formulating good objectives for a course may take some time, and there is no need to write them all in a single course offering." This advice also applies to developing educational surveys; sometimes you just have to start by asking open ended questions and finding out what the students are thinking.

Objectives Related to Attitudes

Not all educational objectives will necessarily be related to students understanding of particular scientific concepts. Ewell (1987) listed several types of student outcomes: content knowledge acquisition, skills development, changes in attitudes and long-term behavioral outcomes. In the particular case of ocean literacy, I wish to change students attitudes toward science and the ocean, and inspire behaviors that protect the ocean. Lacking the resources to conduct long-term studies, I focused on three categories of student attitudes.
  1. Stewardship: Are the students concerned about the well-being of the ocean, and are they willing to act on that concern?
  2. Confidence: Do the students feel empowered to help protect the ocean, and do they feel they have the knowledge they need?
  3. Nature of Science: Do the students understand the nature of science and its application to stewardship?

Numerous studies have been conducted to assess public awareness of and concern for environmental issues, of which relatively few have been specific to the ocean (The Ocean Project, Belden et al 1999c). In the mid 1990's Americans generally rated ocean health as poor and weakening, but did not perceive the oceans to be in immediate danger (Belden 1999a). Survey respondents blamed humans in general, and supported regulation in the abstract, but felt that individuals had no significant impact (Belden 1999c). Respondants placed great personal importance on the oceans, but considered air and water pollution to be more significant environmental issues and crime and education more significant still (Belden 1999c). Ocean Project survey also included five questions about the science of the ocean, of which the average American could answer only two correctly.

A decade later, people had an increased sense of urgency about ocean issues, with 80% believing that man made stresses are endangering coastal regions and oceans (AAAS 2004). About 30% of survey respondents felt that their personal actions had a lot of influence on the health of oceans and coastal regions, and slightly more that half were willing to take certain actions, such as eating less of some kinds of fish (AAAS 2004).

I used some of AAAS' questions in a post-class survey. Of 110 students who answered the question, all but five felt that the ocean was in trouble. One annotated his survey with the concise comment "no, duh!". The inability of this question to distinguish between students renders it unreliable (as discussed below), so my survey does not currently include a question about the overall health of the ocean.


Confidence surveys can be used both to determine whether students believe they understand science well enough to make a difference in the world, and more importantly, to help determine whether the students' confidence is actually merited. There is often a significant disconnect between peoplesí "common sense" conceptual models of physical science and scientific fact. In one famous video, Harvard graduates confidently and incorrectly announced that the seasons were due to the distance between the earth and sun (Shapiro et al 1988). Similar open-ended questions were used by De Laughter et al (1998), to assess the preconceptions of students entering introductory earth science courses. Again, student's preconceptions were held quite confidently, in spite of logical contradictions (Halloun and Hestenes 1985a,b, De Laughter et al 1998).

The confidence with which students hold their general (un)scientific preconceptions contrasts with their initial lack of confidence on course-specific material. In surveys conducted at the end of a course, Nufer (2003) found a connection between students' confidence that they could answer course-specific questions and their actual class grades. Nufer did not specify the nature of the questions used to determine those grades, but De Laughter has noted that the lower the level of understanding required, the more confident students may be.

A more recent survey of public ocean literacy revealed that about half of Americans consider themselves "somewhat informed" about coastal and policy issues, about 15% consider themselves "informed" or "very well informed" and about a third consider themselves "not informed" (Steel et al, 2005). The level of confidence seems to be unmerited; in a five-item quiz about ocean policy the average score was less than 50%, consistent with Belden et al, (1999). Steel et al did not report the inter-correlations between confidence and quiz scores, or between the scores on different quiz items, so the reliability of this survey is unknown.

I am currently testing the hypothesis that student confidence is correlated with their ability to complete a high-order task related to ocean stewardship. One learning objective for this semester is that students should write a letter to their congress-person about an issue affecting the ocean. (They will not be required to mail the letters, nor will their political positions affect their grades. The object is to lay out the issues coherently, demonstrating a clear understanding of the relevant science.) In a pre-test and future post-test, students are asked to rate their level of agreement with the following statements:

Student agreement with the statements is measured on a 5-point Likert scale ( * ) and pre- and post-test scores will be correlated with grades on the letter-writing assignment. I hope to see an increase student agreement with the above statements, and positive correlation between confidence and actual ability.

Determining the validity of confidence surveys is critical, because the higher the level of understanding on Bloom's taxonomy, the harder the question is to grade. For the sake of large scale educational surveys, the questions must be easy to grade. Therefore, we need to find simple proxies for the big questions.

Nature of Science:

Student success in science has been found to predictable from students attitudes toward science (e.g. Adams et al, 2006). For example, students who take responsibility for their own learning tend to do better, as do students who percieve a given science as being related to the real world. The Colorado Learning Attitudes about Science Survey, CLASS includes 42 questions about student attitudes toward learning physics. Of those, I chose nine where the word "physics" could plausibly be replaced by "oceanography". My honors students this semester have taken this attitude survey as a pre-test, but I have yet to use it for a larger sample.

The Views of the Nature Of Science Survey VNOS is well-validated and commonly used. As the questions are qualitative and difficult to score, I am using only one question from this survey: "What is science?".

The Views about Science Survey VASS (look near the bottom of that page) has similar questions to CLASS, but gives two answers for each question and a sliding scale to choose between them. The authors of CLASS made a big deal of the fact that their questions were categorized empirically, based on something like a cluster analysis of student responses. VASS may use pre-determined categories; I'm not sure.

Step 3: Preliminary Survey Instruments

If, like me, you are conducting scientific research and teaching, your initial forays into educational research may be tentative. In my case, I teach honors introductory oceanography to 12-20 students in fall, and a 100-125 student lecture section of introductory oceanography in spring. When I learned in October of 2005 that educational research was considered a viable option in my department, I had to get some sort of survey together for January of 2006. I wrote a few questions related to the essential principles of Ocean Literacy, and asked my students and colleagues to comment on them (face validation). My preliminary survey consisted of a dozen open-format questions, designed to allow students to express their (mis)understandings freely.

The open format questions are modeled on a published Earth Science Literacy Test that is used to measure students' preconceptions (DeLaughter et al, 1998). This qualitative data collection is a vital step toward the development of valid and reliable quantitative survey instruments, such as the Geoscience Concept Inventory GCI (Libarkin & Kurdziel, 2002ab, Libarkin & Anderson, 2005).

My survey instruments have undergone several stages of validation and revision. All versions of the survey, as well as several related surveys, are available through my Ocean Literacy Webpage.

Step 4: Validity and Reliability


Beichner (1994) illustrates the concepts of validity and reliability using the image of a target, reproduced here for a seminar I gave at UNC. A reliable survey instrument measures student knowledge or attitudes in a consistent and reproducible manner, suggested by the tight clustering of arrow holes in the target on the left side of the figure. Validity is a measure of accuracy, suggested by the cloud of arrows surrounding the center of the target on the right side of the figure. The validity of a survey instrument depends on how the target is defined; clear learning objectives are critical.

There are a wide variety of approaches to reliability and validity, and different researchers list anywhere from three to five different types of validity (*) . There is even some overlap between descriptions of validity and reliability. The most coherent reference I have found on the www is a a superb tutorial posted by Colorado State University What follows is my interpretation of that tutorial.

Types of Reliability
Equivalency Reliability Are scores on different questions inter-correlated?
Stability Reliability Does the same instrument, applied at a later time, give the same result?
Internal Consistency Do different questions that measure the same thing give the same result?
Interrater Reliability Do two independent raters assign the same score to the same answer?

Types of Validity

Face Validity

How does the survey look? Reasonable? Well Designed?

Q: how does it look to my mom?
A: she likes it, and she's an educational researcher

Content Validity

Do my objectives represent the topic I wish to study?

Q: am I asking the right questions?
A: work with experts

Construct Validity

Does the theoretical concept match the measuring device?

Q: am I asking them the right way?
A: work with students

Criterion Related Validity

Does my instrument give the same result as an existing valid instrument?

Q: do my results correlate with class grades?
A: yup, and with Ernie's grades, too

Validity is also divided into two larger categories, internal and external. Quoting from the glossary provided with CSU's tutorial .

"Internal Validity (1) The rigor with which the study was conducted (e.g., the study's design, the care taken to conduct measurements, and decisions concerning what was and wasn't measured) and (2) the extent to which the designers of a study have taken into account alternative explanations for any causal relationships they explore (Huitt, 1998). In studies that do not explore causal relationships, only the first of these definitions should be considered when assessing internal validity."

"External Validity The extent to which the results of a study are generalizable or transferable. "

Step 5: Combining Qualitative and Quantitative Data

Currently, the most reliable and thoroughly validated survey used in the earth sciences is the Geosciences Concept Inventory GCI, developed by Julie Libarkin and colleagues. Dr Libarkin is a geophysicist by training, but has been conducting educational research for the better part of a decade (developing the GCI took seven years and the cooperation of dozens of geoscience faculty around the country). She has written several articles and a regular column for the Journal of Geoscience Education to help other geoscientists get into educational research. She advocates the approach I have followed, using qualitative data to establish the context for the study, analyzing the qualitative data using quantitative methods, and finally developing new surveys that produce quantitative data (Libarkin & Kurziel, 2002ab). This approach is vital for Construct Validation.

The open ended questions in my preliminary survey provide qualitative data in the form of written answers. Quoting Libarkin & Kurdziel, 2002b, "Three types of analysis are common in qualitative research: thematic content analysis, where themes are extracted from the text, indexing, where specific words are viewed in context, and quantitative descriptive analysis, or word counting." I have used thematic content analysis extensively in my preliminary studies of students attitudes and interests (Cudaback, 2006). For example, in response to a pre-class survey question about human impacts, 88% of students mentioned pollution, but only 8% mentioned coastal development, which is a very hot issue in North Carolina. We discussed development in mid-semester and pollution at the end of the semester. The pollution study was fresh in students' minds when they completed the post-class survey, and 89% mentioned pollution again. The number of students mentioning development increased to about 14%, suggesting that our discussion had some lasting impact. This small result demonstrates one use of thematic content analysis.

Putting it all together: Ocean Literacy Surveys

Step 7: Scoring and Analyzing Survey Responses

My first qualitative paper was published Oct 3, 2006, in EOS, Transactions of the American Geophysical Union. This study used thematic content analysis to extract quantitative data about students' background and interests from essay format questions. Data are currently being analyzed for a quantitative measure of how well the students understand the ocean. Watch for updates to this section.

Step 8: Publishing Results

Journals for Publishing Research about Undergraduate Geoscience Education