General Education Science Assessment Report
1995-96

A Report to the General Education Committee
October 21, 1996

Introduction. In 1995-96 a General Education Science Assessment Working Group (Rudy Koch, Biology; George Huppert, Geography/Earth Science; T.A.K. Pillai, Physics; Bill Cerbin, Assessment Coordinator) assessed learning outcomes in Science: Understanding the Natural World." This report summarizes the assessment procedures and results.

The science learning outcomes in general education. The five learning outcomes for the science category are based upon the stated objectives of the general education science courses. As a result of coursework in general education science courses, students should be able to:

  1. Demonstrate basic knowledge in at least one scientific discipline: biology, chemistry, physics, astronomy, or earth science.
  2. Show an understanding of the basic methods and thought processes (such as observation, experimentation, data presentation, and inferential reasoning) used in the development of the concepts, theories and principles of science.
  3. Use scientific methods to investigate both scientific questions and social/personal problems.
  4. Show an understanding of how science and technology have contributed to environmental and social change and their consequences.
  5. Make rational decisions about the use of technology in our society based on scientific knowledge and reason.

The science test. The test consisted of three (3) essay questions to assess outcomes b-e. We decided not to assess learning outcome a" (demonstrate basic knowledge in at least one scientific discipline: biology, chemistry, physics, astronomy, or earth science) since to do so would entail developing a separate content test for each disciplinary area. The three questions are:

1. Design an experiment to examine the relationship between the number of hours of sunlight exposure per day and growth rate in spider plants. Be as specific as you can about your plans for the experiment. (outcomes b and c)

2. In Newsweek (November 6, 1995) an article entitled Sex, Lies and Garlic," described the alleged health benefits of a variety of herbs and dietary supplements. The following passage is about garlic.

" . . . garlic has been touted as an antidote to just about everything bad--hemorrhoids, heart disease, tuberculosis, even vampires--and some claims are well supported. At least 28 studies have found garlic effective for reducing cholesterol; in one German experiment, subjects taking an 800-mg tablet daily saw their blood levels fall an average of 12 percent over four months. Other studies suggest that garlic can help lower blood pressure, prevent blood clots, combat infections and ward off some malignancies. In a 1989 study, National Cancer Institute researchers found that people in China who consumed at least 50 pounds of garlic, onions, shallots and chives each year suffered 40 percent less stomach cancer than those who ate less."
What additional information would you like to have before you accept the claim that garlic is a potent substance that can improve your health? Explain why the information is important for deciding whether garlic improves health. (outcomes b, c, and e)

3. Throughout history human beings have invented all manner of devices, machines, processes and procedures that affect the way people live and the overall quality of life. For example, when the automobile was first invented it was heralded as a solution to the pollution problems of the day--dusty streets filled with animal waste and filth. Obviously, the widespread use of automobiles has led to significant environmental and social changes--good as well as bad.

Contemporary society is experiencing the widespread use of another technological invention--the computer. Identify and explain three ways that computers have contributed to significant environmental and social changes. Please be specific and cite examples to support your answer. (outcome d)

Assessment design and administration. We tested two groups of students; a Science Group consisting of students who already completed at least one general education science course, and a No Science Group consisting of freshmen who had no college level science courses.

The Science Group consisted of 99 students from four sections of Math 205. We administered the test to 114 students in these sections at the end of spring semester, 1996. Of these, 15 had no science courses at UWL and were omitted from the analysis leaving 99 students--58 females, 39 males (2 did not indicate gender). Thirty-eight students had one UWL general education science course, 16 had two UWL science courses and 45 had at least one UWL general education science course and at least one college science course at another institution. The mean age of the Science Group was 20.5 years, and 70% of the students were 20 years or younger. There were 42 freshmen, 33 sophomores, 13 juniors, and 8 seniors (3 did not indicate class). The distribution of GPA's was <2.5 (21), 2.5-3.0 (36), 3.01-3.5 (26),>3.5 (10), and 6 did not indicate GPA. There were 31 academic majors represented in the group and 14 students were undeclared majors.

The No Science Group consisted of 100 freshmen in Biology 101. We administered the test to 250 students in two sections of Biology 101 in the second week of fall semester 1996. From this pool we selected a stratified random sample of 100 students who had no previous college science courses. There we 60 females and 40 males, and the mean age was 18.

Results. The Experiment Design" question was developed by faculty at the State University of New York at Fredonia. We used their evaluation rubric which identifies five dimensions related to the design of the experiment:

  1. Systematic variation of exposure to sunlight
  2. Control of other variables that could affect plant growth
  3. Sample size
  4. Measurement of plant growth
  5. Duration of the experiment

Students could receive 0, 1, or 2 points for each dimension; the difference between a score of 1 and 2 points for each dimension is the degree of description and explanation. For example, students earned one point if they indicated awareness that one needs to systematically vary the sunlight exposure and two points if they described appropriate exposure intervals (e.g., five groups of plants--one group exposed for 2 hours, one group for 4 hours, and so on).

A maximum score of 10 points indicates the student: 1) explained how to systematically vary exposure to sunlight, 2) explained how to control other relevant variables such as temperature, moisture, nutrients, etc., 3) specified an adequate sample size, 4) indicated ways to measure plant growth, and 5) specified an appropriate duration for the experiment. There is no cutoff score that distinguishes between adequate and inadequate answers. Instead, answers differ in terms of their overall quality. For example, a score of 5 indicates that the student may have been on the right track, but omitted information or gave only general information for one or more dimensions.

The mean total scores for the question were 2.99 (SD=2.28) and 4.07 (SD=2.32) for the No Science and Science Groups, respectively. A t-test indicates the difference between the means is significant (p<.001). The Science Group performed better than freshmen who have had no college science courses. The difference between the groups is significant statistically but small--about one point between the groups. Table 1 reports the percentage of students at each score for the five dimensions.

Table 1: Percentage of students at each score.
CategoryGroup0 points1 point2 points
Systematic variation of sunlight exposureNo science
Science
19.00
4.65
42.00
60.46
39.00
34.88
Control of variablesNo science
Science
60.00
41.86
20.00
17.44
20.00
40.70
Sample sizeNo science
Science
73.00
56.98
16.00
33.72
11.00
9.30
MeasurementNo science
Science
82.00
61.63
14.00
27.91
4.00
10.47
Duration of experimentNo science
Science
67.00
37.21
5.00
10.47
28.00
52.33

More than half the Science Group students received at least one point for variation of exposure, control of variables, and duration of experiment--indicating awareness that these dimensions are important to the design of the experiment. About 95% of them indicated that it is necessary to vary plants' exposure to sunlight--although only 35% described how to do this adequately. About 58% indicated that it is necessary to control other variables that could affect plant growth, but only 41% cited more than two relevant variables. Sixty-three percent indicated an appropriate duration for the experiment. The pattern of scores for the No Science group were similar but tended to be lower that the Science Group (e.g., 40% of the No Science Group at least recognized the control of variables in contrast to 58% of the Science Group).

Students in both groups had difficulty with sample size and measurement of plant growth. Note that 57% of the Science Group and 73% of the No Science Group received a score of 0 for the sample size dimension; many students said the experiment could be done with only one plant in each sunlight exposure group (e.g., expose one plant to four hours of sun, one plant to six hours of sun, etc.). This is a particularly surprising result for the Science Group since the students were enrolled in Math 205: Statistics, a course that deals with statistical tests. It is likely that the examples of experiments students had encountered in statistics always involved comparisons among groups with sample sizes greater than one!

In general it appears that most Science Group students had partial understanding of what constitutes a scientific experiment; they at least recognized that systematic control of variables is an essential part of scientific experimentation. But their experimental designs were flawed in fundamental ways. The No Science Group performed at a somewhat lower level and many students omitted critical information.

In Question 2 students analyzed a brief passage from Newsweek which makes a number of claims about the health benefits of eating garlic. Students were asked to indicate what additional information they would need before accepting the claim that garlic is beneficial and also to explain why the information is important to their decision. Table 2 reports the distribution of answers. The categories are:

  1. Sample and Research method. Information about the research subjects and the methods used in the studies.
  2. Other variables. Information about other variables that could have contributed to the research findings.
  3. Counter evidence. Information about the negative effects of garlic.
  4. Researcher bias and credibility. Information about the researcher's credibility (e.g., Are they independent scientists or do they work for garlic growers).
  5. Statistical significance. Information about the statistical significance of the studies.
Table 2: Types of Information
Response Category% Age of Students
(No science)
% Age of Students
(Science)
Sample and Research Method37.0044.44
OtherVariables30.0044.44
CounterEvidence36.0024.24
Researchercredibility5.0016.16
Statisticalsignificance02.02
GenericRequest10.004.04
NoResponse03.03
AmbiguousResponse18.003.03
AcceptsClaims02.02

Forty-four percent of the Science Group and 37% of the No Science Group said it was important to know what other variables could account for the results (e.g., perhaps people who eat a lot of garlic also have a low fat diet that may reduce cancer risk). Another common answer was that the findings may not apply to Americans because the studies were done in other cultures. And, 24% of the Science Group and 36% of the No Science Group wanted to know if garlic has negative effects on health.

Students also had to explain why the additional information was important for their decision. We sorted responses into four categories:

  1. Reasonable. The explanation gave a clear reason why the information was needed.
  2. Inadequate. The explanation was vague, ambiguous or incorrect.
  3. Personal. The explanation was based on a unique personal situation rather than on general scientific principles.
  4. No explanation. The student gave no explanation.

Table 3 reports the types of explanations. Although the large majority of students specified at least one relevant piece of additional information important for judging the credibility of the claims about garlic, only 47% (Science Group) and 36% (No Science Group) gave a reasonable explanation about the significance of the information. About 25% in both groups gave an inadequate explanation and 25% (Science Group) and 38% (No Science Group) gave no explanation.

Table 3: Types of Explanations Given by Students.
Type of ExplanationPercentage of Students
(Noscience)
Percentage of Students
(Science)
Reasonable36.0047.25
Inadequate26.0025.27
Personal02.20
NoExplanation38.0025.27

The Computer Question was intended to measure students' understanding of how science and technology influence environmental and social change. Table 4 reports the distribution of scores for the two groups.

Table 4: Number of examples students gave about computers
0 Examples1 Example2 Examples3 Examples
% of students (No science)3194236
% of students (Science)1163350

Fifty percent of the Science Group and 36% of the No Science Group gave three examples of how computers have influenced social changes. Students produced a range of ideas and Table 5 contains the percentages of students who gave a response in each of the 15 categories. The four most common ideas were that computers:

  1. make communication easier".
  2. affect the nature of work by either creating more computer-related jobs or by eliminating jobs that can be automated.
  3. increase access to information.
  4. affect the use or conservation of energy and resources (e.g., reduce the use of paper, increase the use of paper and electricity).

Table 5: Types of Social and Environmental Changes
Type of Change% age of Students
(No science)
% age of Students
(Science)
Communication45.0049.49
Work28.0037.37
Access to information16.0033.33
Energy use/conservation22.0027.27
Management of information7.0018.18
Research tool10.0014.14
Interpersonal interaction7.0011.11
Educational tool7.0010.10
Personal convenience17.0014.14
Social or physical problems17.006.06
Recreation/entertainment07.07
New basic skill7.006.06
Loss of privacy02.02
Sedentary life7.002.02
Have/have nots01.01

We did not evaluate the quality of the responses since they tended to be brief with very little detail. For instance, many students stated that computers make communication easier, but did not explain this in a larger context of social change (e.g., how this might change the nature and quality of social interaction). Similarly, students tended to say that computers increased access to information without describing potential social consequences.

Composite scores. We devised a composite score for each student based upon the quality of responses to each of the three questions. We used three categories--weak, marginal and adequate performance, and assigned numerical values of 0, 1, and 2 respectively. The composite score is the sum of the three individual scores. Table 6 reports the characteristics of each performance level.

Table 6: Performance levels for composite scores
QuestionWeak (0 points)Marginal (1 point)Adequate (2 points)
Experiment Designscore on item 0-4score on item 5-6score on item 7-10
Analysis of science information (garlic question)Did not explain why information was important.Gave one additional source of information and a reasonable explanation of its importance.Gave 2 additional sources of information and a reasonable explanation of its importance
Social effects of science and technology (computer question)Cited one example of how computers influenced social change.Cited two examplesCited three examples

The composite score is a general indicator of how well students performed on the science test. As Table 7 illustrates, the majority of students scored between 2-3 which indicates marginal overall performance.

Table 7: Distribution of Composite Scores
Composite score0123456
% Age of students (No science)9.0029.0032.0024.003.003.000
% Age of students (Science)4.0418.1829.2928.2813.137.070

General Discussion, Conclusions and Summary. In the final section of the report we address three questions:

  1. What do these results tell us about UWL students' knowledge and skills with respect to the four science learning outcomes?
  2. Based upon these results, how can we improve students' performance with respect to the four science learning outcomes?
  3. How can we improve the process of assessing student performance in science?

What do these results tell us about UWL students' knowledge and skills with respect to the four science learning outcomes?

The Experiment Design question, since it requires students to design an actual experiment, is a rigorous measure of students' understanding of and ability to use basic science methods to investigate physical phenomena (outcomes b and c). The results indicate that most Science Group students understand a fundamental principle of experimentation--that it involves systematic analysis of the relationship among variables. Most Science Group students knew that one must systematically vary the exposure of plants to light, and also that it is essential to control other variables affecting plant growth such as nutrients, water, and temperature. Fewer No Science Group students recognized the need to control variables. The weakest areas for both groups were in how to measure growth and how to establish adequate sample sizes of plants.

One conclusion is that in general education science courses students learn that experimentation entails careful, systematic control of variables. The overall quality of the answers suggests that while students may be aware of the critical dimensions of an experiment they are less well able to map those into an actual situation to investigate physical phenomena. Many students either omitted critical dimensions of the experiment or were vague about how they figured into the situation. These data indicate that students have trouble using the basic elements of experimentation to investigate phenomena.

The Garlic Question required students to analyze scientific information from the popular media; a measure of outcomes b and c. Most students identified relevant additional information needed to evaluate the claims in the article. These tended to focus on whether the research included a representative sample of subjects and whether there were other variables that accounted for the lower risk of disease. Another popular response was that it would be important to know if increasing garlic intake had negative health effects.

Students tended to identify one additional type of information rather than several possible factors. And, fewer than half adequately explained why it was important to use that information in evaluating the claim. We might describe students as make-sense epistemologists," referring to their tendency to answer open-ended questions by identifying one piece of satisfactory information--something that makes sense--rather than explaining alternative possibilities (Perkins, 1986).

The Computer Question was intended to measure students' understanding of how science and technology affect social life (outcome d). A majority of students explained at least one significant way in which computers have affected society, focusing mainly on how computers: 1) make communication easier", 2) increase access to information and 3) affect the nature of work and job market. However, the answers were relatively brief and do not reveal much depth of thought on the question. For example, many students said that computers make it easier to communicate but did not explain how that might influence social interaction. This may illustrate a major shortcoming in students' knowledge and skills in extrapolating from concrete examples to general social phenomena. For instance, a significant concern among scholars and social critics is how access to computers may deepen the differences between social classes. Those who can afford technology and access to information will be at a great advantage over those who do not have access. This kind of thinking was almost completely absent in students' responses. Their answers tended toward superficial examples such as the way that computers make certain work easier or quicker.

Student performance is highly variable across the science outcomes. Typically a student who wrote a strong answer to one question wrote a weak answer on another. In fact 81% of the Science Group students and 94% of the No Science Group had at least on weak answer on the test. These results suggest significant gaps in students' overall performance with respect to the science outcomes.

Based upon these results, how can we improve students' performance with respect to the four science learning outcomes?

  1. Get the results into the hands of instructors. It is essential to involve instructors from the science disciplines in discussions about the results. Unless science instructors examine the implications of these results for their classes, little can be done to improve student performance.
  2. Performance standards. These results raise the issue of performance standards. What level of student performance is acceptable? Should these results be used to benchmark student performance and establish goals for improvement?
  3. Learning outcomes. Faculty should examine whether the stated science learning outcomes match the content and instructional emphases in the general education courses. Faculty may want to consider whether these four outcomes are reasonable and realistic outcomes for the program.

How can we improve the process of assessing student performance in science?

There are a number of limitations of the science test and the testing procedure:

  1. The test may underestimate student competence simply because there are no consequences for performance. Students took 15-25 minutes to complete the test, hardly the kind of time that permits careful, reflective thinking. This is an inherent problem in using volunteers" for assessment.
  2. It was difficult to assemble a representative sample of students. Obtaining an adequate pool of students was difficult, and made it difficult to create a representative sample.
  3. Good assessment gives students the opportunity and incentive to demonstrate their best thinking. The present method does not do this effectively, and this is an important consideration for future assessment. Whether assessment takes place in science courses or as part of a separate assessment requirement, there should be times, places and incentives for engaging in good thinking.
  4. There is no feedback to students. The best assessment practices build in meaningful feedback to students. Assessment should be a form of learning and not just a way to gauge student performance. Unfortunately, the current procedure is an add-on" experience and not an educationally meaningful experience for students.
  5. The means by which students demonstrate their learning should involve engaging, authentic" measures of the science learning outcomes. The science test is a school task rather than an activity that entails forms of thinking that people use in real-life situations. Assessment should be linked to the tasks, contexts, and 'feel' of real-world challenges" (Wiggins, 1993). When science outcomes are assessed again in three years, faculty should consider using more authentic measures that tap into the best forms of thinking we want our students to develop.