Introduction. In 1995-96 a General Education Science Assessment Working Group (Rudy Koch, Biology; George Huppert, Geography/Earth Science; T.A.K. Pillai, Physics; Bill Cerbin, Assessment Coordinator) assessed learning outcomes in Science: Understanding the Natural World." This report summarizes the assessment procedures and results.
The science learning outcomes in general education. The five learning outcomes for the science category are based upon the stated objectives of the general education science courses. As a result of coursework in general education science courses, students should be able to:
The science test. The test consisted of three (3) essay questions to assess outcomes b-e. We decided not to assess learning outcome a" (demonstrate basic knowledge in at least one scientific discipline: biology, chemistry, physics, astronomy, or earth science) since to do so would entail developing a separate content test for each disciplinary area. The three questions are:
1. Design an experiment to examine the relationship between the number of hours of sunlight exposure per day and growth rate in spider plants. Be as specific as you can about your plans for the experiment. (outcomes b and c)
2. In Newsweek (November 6, 1995) an article entitled Sex, Lies and Garlic," described the alleged health benefits of a variety of herbs and dietary supplements. The following passage is about garlic.
" . . . garlic has been touted as an antidote to just about everything bad--hemorrhoids, heart disease, tuberculosis, even vampires--and some claims are well supported. At least 28 studies have found garlic effective for reducing cholesterol; in one German experiment, subjects taking an 800-mg tablet daily saw their blood levels fall an average of 12 percent over four months. Other studies suggest that garlic can help lower blood pressure, prevent blood clots, combat infections and ward off some malignancies. In a 1989 study, National Cancer Institute researchers found that people in China who consumed at least 50 pounds of garlic, onions, shallots and chives each year suffered 40 percent less stomach cancer than those who ate less."What additional information would you like to have before you accept the claim that garlic is a potent substance that can improve your health? Explain why the information is important for deciding whether garlic improves health. (outcomes b, c, and e)
3. Throughout history human beings have invented all manner of devices, machines, processes and procedures that affect the way people live and the overall quality of life. For example, when the automobile was first invented it was heralded as a solution to the pollution problems of the day--dusty streets filled with animal waste and filth. Obviously, the widespread use of automobiles has led to significant environmental and social changes--good as well as bad.
Contemporary society is experiencing the widespread use of another technological invention--the computer. Identify and explain three ways that computers have contributed to significant environmental and social changes. Please be specific and cite examples to support your answer. (outcome d)
Assessment design and administration. We tested two groups of students; a Science Group consisting of students who already completed at least one general education science course, and a No Science Group consisting of freshmen who had no college level science courses.
The Science Group consisted of 99 students from four sections of Math 205. We administered the test to 114 students in these sections at the end of spring semester, 1996. Of these, 15 had no science courses at UWL and were omitted from the analysis leaving 99 students--58 females, 39 males (2 did not indicate gender). Thirty-eight students had one UWL general education science course, 16 had two UWL science courses and 45 had at least one UWL general education science course and at least one college science course at another institution. The mean age of the Science Group was 20.5 years, and 70% of the students were 20 years or younger. There were 42 freshmen, 33 sophomores, 13 juniors, and 8 seniors (3 did not indicate class). The distribution of GPA's was <2.5 (21), 2.5-3.0 (36), 3.01-3.5 (26),>3.5 (10), and 6 did not indicate GPA. There were 31 academic majors represented in the group and 14 students were undeclared majors.
The No Science Group consisted of 100 freshmen in Biology 101. We administered the test to 250 students in two sections of Biology 101 in the second week of fall semester 1996. From this pool we selected a stratified random sample of 100 students who had no previous college science courses. There we 60 females and 40 males, and the mean age was 18.
Results. The Experiment Design" question was developed by faculty at the State University of New York at Fredonia. We used their evaluation rubric which identifies five dimensions related to the design of the experiment:
Students could receive 0, 1, or 2 points for each dimension; the difference between a score of 1 and 2 points for each dimension is the degree of description and explanation. For example, students earned one point if they indicated awareness that one needs to systematically vary the sunlight exposure and two points if they described appropriate exposure intervals (e.g., five groups of plants--one group exposed for 2 hours, one group for 4 hours, and so on).
A maximum score of 10 points indicates the student: 1) explained how to systematically vary exposure to sunlight, 2) explained how to control other relevant variables such as temperature, moisture, nutrients, etc., 3) specified an adequate sample size, 4) indicated ways to measure plant growth, and 5) specified an appropriate duration for the experiment. There is no cutoff score that distinguishes between adequate and inadequate answers. Instead, answers differ in terms of their overall quality. For example, a score of 5 indicates that the student may have been on the right track, but omitted information or gave only general information for one or more dimensions.
The mean total scores for the question were 2.99 (SD=2.28) and 4.07 (SD=2.32) for the No Science and Science Groups, respectively. A t-test indicates the difference between the means is significant (p<.001). The Science Group performed better than freshmen who have had no college science courses. The difference between the groups is significant statistically but small--about one point between the groups. Table 1 reports the percentage of students at each score for the five dimensions.
| Category | Group | 0 points | 1 point | 2 points |
|---|---|---|---|---|
| Systematic variation of sunlight exposure | No science Science | 19.00 4.65 | 42.00 60.46 | 39.00 34.88 |
| Control of variables | No science Science | 60.00 41.86 | 20.00 17.44 | 20.00 40.70 |
| Sample size | No science Science | 73.00 56.98 | 16.00 33.72 | 11.00 9.30 |
| Measurement | No science Science | 82.00 61.63 | 14.00 27.91 | 4.00 10.47 |
| Duration of experiment | No science Science | 67.00 37.21 | 5.00 10.47 | 28.00 52.33 |
More than half the Science Group students received at least one point for variation of exposure, control of variables, and duration of experiment--indicating awareness that these dimensions are important to the design of the experiment. About 95% of them indicated that it is necessary to vary plants' exposure to sunlight--although only 35% described how to do this adequately. About 58% indicated that it is necessary to control other variables that could affect plant growth, but only 41% cited more than two relevant variables. Sixty-three percent indicated an appropriate duration for the experiment. The pattern of scores for the No Science group were similar but tended to be lower that the Science Group (e.g., 40% of the No Science Group at least recognized the control of variables in contrast to 58% of the Science Group).
Students in both groups had difficulty with sample size and measurement of plant growth. Note that 57% of the Science Group and 73% of the No Science Group received a score of 0 for the sample size dimension; many students said the experiment could be done with only one plant in each sunlight exposure group (e.g., expose one plant to four hours of sun, one plant to six hours of sun, etc.). This is a particularly surprising result for the Science Group since the students were enrolled in Math 205: Statistics, a course that deals with statistical tests. It is likely that the examples of experiments students had encountered in statistics always involved comparisons among groups with sample sizes greater than one!
In general it appears that most Science Group students had partial understanding of what constitutes a scientific experiment; they at least recognized that systematic control of variables is an essential part of scientific experimentation. But their experimental designs were flawed in fundamental ways. The No Science Group performed at a somewhat lower level and many students omitted critical information.
In Question 2 students analyzed a brief passage from Newsweek which makes a number of claims about the health benefits of eating garlic. Students were asked to indicate what additional information they would need before accepting the claim that garlic is beneficial and also to explain why the information is important to their decision. Table 2 reports the distribution of answers. The categories are:
| Response Category | % Age of Students (No science) | % Age of Students (Science) |
|---|---|---|
| Sample and Research Method | 37.00 | 44.44 |
| OtherVariables | 30.00 | 44.44 |
| CounterEvidence | 36.00 | 24.24 |
| Researchercredibility | 5.00 | 16.16 |
| Statisticalsignificance | 0 | 2.02 |
| GenericRequest | 10.00 | 4.04 |
| NoResponse | 0 | 3.03 |
| AmbiguousResponse | 18.00 | 3.03 |
| AcceptsClaims | 0 | 2.02 |
Forty-four percent of the Science Group and 37% of the No Science Group said it was important to know what other variables could account for the results (e.g., perhaps people who eat a lot of garlic also have a low fat diet that may reduce cancer risk). Another common answer was that the findings may not apply to Americans because the studies were done in other cultures. And, 24% of the Science Group and 36% of the No Science Group wanted to know if garlic has negative effects on health.
Students also had to explain why the additional information was important for their decision. We sorted responses into four categories:
Table 3 reports the types of explanations. Although the large majority of students specified at least one relevant piece of additional information important for judging the credibility of the claims about garlic, only 47% (Science Group) and 36% (No Science Group) gave a reasonable explanation about the significance of the information. About 25% in both groups gave an inadequate explanation and 25% (Science Group) and 38% (No Science Group) gave no explanation.
| Type of Explanation | Percentage of Students (Noscience) | Percentage of Students (Science) |
|---|---|---|
| Reasonable | 36.00 | 47.25 |
| Inadequate | 26.00 | 25.27 |
| Personal | 0 | 2.20 |
| NoExplanation | 38.00 | 25.27 |
The Computer Question was intended to measure students' understanding of how science and technology influence environmental and social change. Table 4 reports the distribution of scores for the two groups.
| 0 Examples | 1 Example | 2 Examples | 3 Examples | |
|---|---|---|---|---|
| % of students (No science) | 3 | 19 | 42 | 36 |
| % of students (Science) | 1 | 16 | 33 | 50 |
Fifty percent of the Science Group and 36% of the No Science Group gave three examples of how computers have influenced social changes. Students produced a range of ideas and Table 5 contains the percentages of students who gave a response in each of the 15 categories. The four most common ideas were that computers:
| Type of Change | % age of Students (No science) | % age of Students (Science) |
|---|---|---|
| Communication | 45.00 | 49.49 |
| Work | 28.00 | 37.37 |
| Access to information | 16.00 | 33.33 |
| Energy use/conservation | 22.00 | 27.27 |
| Management of information | 7.00 | 18.18 |
| Research tool | 10.00 | 14.14 |
| Interpersonal interaction | 7.00 | 11.11 |
| Educational tool | 7.00 | 10.10 |
| Personal convenience | 17.00 | 14.14 |
| Social or physical problems | 17.00 | 6.06 |
| Recreation/entertainment | 0 | 7.07 |
| New basic skill | 7.00 | 6.06 |
| Loss of privacy | 0 | 2.02 |
| Sedentary life | 7.00 | 2.02 |
| Have/have nots | 0 | 1.01 |
We did not evaluate the quality of the responses since they tended to be brief with very little detail. For instance, many students stated that computers make communication easier, but did not explain this in a larger context of social change (e.g., how this might change the nature and quality of social interaction). Similarly, students tended to say that computers increased access to information without describing potential social consequences.
Composite scores. We devised a composite score for each student based upon the quality of responses to each of the three questions. We used three categories--weak, marginal and adequate performance, and assigned numerical values of 0, 1, and 2 respectively. The composite score is the sum of the three individual scores. Table 6 reports the characteristics of each performance level.
| Question | Weak (0 points) | Marginal (1 point) | Adequate (2 points) |
|---|---|---|---|
| Experiment Design | score on item 0-4 | score on item 5-6 | score on item 7-10 |
| Analysis of science information (garlic question) | Did not explain why information was important. | Gave one additional source of information and a reasonable explanation of its importance. | Gave 2 additional sources of information and a reasonable explanation of its importance |
| Social effects of science and technology (computer question) | Cited one example of how computers influenced social change. | Cited two examples | Cited three examples |
The composite score is a general indicator of how well students performed on the science test. As Table 7 illustrates, the majority of students scored between 2-3 which indicates marginal overall performance.
| Composite score | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|---|
| % Age of students (No science) | 9.00 | 29.00 | 32.00 | 24.00 | 3.00 | 3.00 | 0 |
| % Age of students (Science) | 4.04 | 18.18 | 29.29 | 28.28 | 13.13 | 7.07 | 0 |
General Discussion, Conclusions and Summary. In the final section of the report we address three questions:
What do these results tell us about UWL students' knowledge and skills with respect to the four science learning outcomes?
The Experiment Design question, since it requires students to design an actual experiment, is a rigorous measure of students' understanding of and ability to use basic science methods to investigate physical phenomena (outcomes b and c). The results indicate that most Science Group students understand a fundamental principle of experimentation--that it involves systematic analysis of the relationship among variables. Most Science Group students knew that one must systematically vary the exposure of plants to light, and also that it is essential to control other variables affecting plant growth such as nutrients, water, and temperature. Fewer No Science Group students recognized the need to control variables. The weakest areas for both groups were in how to measure growth and how to establish adequate sample sizes of plants.
One conclusion is that in general education science courses students learn that experimentation entails careful, systematic control of variables. The overall quality of the answers suggests that while students may be aware of the critical dimensions of an experiment they are less well able to map those into an actual situation to investigate physical phenomena. Many students either omitted critical dimensions of the experiment or were vague about how they figured into the situation. These data indicate that students have trouble using the basic elements of experimentation to investigate phenomena.
The Garlic Question required students to analyze scientific information from the popular media; a measure of outcomes b and c. Most students identified relevant additional information needed to evaluate the claims in the article. These tended to focus on whether the research included a representative sample of subjects and whether there were other variables that accounted for the lower risk of disease. Another popular response was that it would be important to know if increasing garlic intake had negative health effects.
Students tended to identify one additional type of information rather than several possible factors. And, fewer than half adequately explained why it was important to use that information in evaluating the claim. We might describe students as make-sense epistemologists," referring to their tendency to answer open-ended questions by identifying one piece of satisfactory information--something that makes sense--rather than explaining alternative possibilities (Perkins, 1986).
The Computer Question was intended to measure students' understanding of how science and technology affect social life (outcome d). A majority of students explained at least one significant way in which computers have affected society, focusing mainly on how computers: 1) make communication easier", 2) increase access to information and 3) affect the nature of work and job market. However, the answers were relatively brief and do not reveal much depth of thought on the question. For example, many students said that computers make it easier to communicate but did not explain how that might influence social interaction. This may illustrate a major shortcoming in students' knowledge and skills in extrapolating from concrete examples to general social phenomena. For instance, a significant concern among scholars and social critics is how access to computers may deepen the differences between social classes. Those who can afford technology and access to information will be at a great advantage over those who do not have access. This kind of thinking was almost completely absent in students' responses. Their answers tended toward superficial examples such as the way that computers make certain work easier or quicker.
Student performance is highly variable across the science outcomes. Typically a student who wrote a strong answer to one question wrote a weak answer on another. In fact 81% of the Science Group students and 94% of the No Science Group had at least on weak answer on the test. These results suggest significant gaps in students' overall performance with respect to the science outcomes.
Based upon these results, how can we improve students' performance with respect to the four science learning outcomes?
How can we improve the process of assessing student performance in science?
There are a number of limitations of the science test and the testing procedure: