Gathering & Evaluating Evidence

Evaluating the Quality of Understanding

Understanding is not easily quantified. What would it mean, for example, to say that a student understands 70% of the novel Moby Dick? How do we look at two different performances of understanding and make credible judgments about the differences between them? How do you systematize the evaluation regardless of whether the performance is a written paper, a class discussion, or a group presentation?

You need a scorecard—a rubric. Rubrics are scoring guidelines for evaluating performance. A rubric indicates the criteria used to distinguish among qualitatively different levels of performance. Essentially, a rubric for understanding is a template or model that defines important qualities of understanding and further distinguishes different levels of understanding.

One of the best developed rubrics for evaluating understanding is the work of Grant Wiggins and Jay Mc Tighe in their book, Understanding By Design . They identify six ways in which students can demonstrate their understanding—by explaining, interpreting, applying, developing a perspective, empathizing, and through self-knowledge. Their framework distinguishes between qualitatively different “levels” of understanding for each of these facets. See the Understanding By Design rubric.

Creating Rubrics to Evaluate Understanding

The Understanding by Design model is an excellent starting point for evaluating understanding. The model has predetermined categories for different levels of understanding and you may be able to modify it to suit your specific needs and subject matter.

Some rubrics may be specific and analytical, while others can involve more holistic judgments. Here are two examples from my own work.

Thinking with vs. Thinking about subject matter . An important feature of understanding is the ability to use newly learned disciplinary concepts to explain other concepts, solve problems, formulate a perspective and so forth. In this way, understanding is the ability to “think with” new knowledge. In contrast, “thinking about” the subject matter involves describing and telling what one knows.

I use this distinction to make holistic judgments about students' understanding based on the extent to which they:

  • integrate course concepts vs. mention or describe course concepts
  • use course concepts to develop ideas vs. rely on personal opinion or intuitive beliefs
  • use course concepts analyze and explain vs. use course concepts descriptively (not explain).

Types of explanations . A more specific rubric is one I have used to judge the quality of students' explanations. In this case, the rubric zeroes in on whether the explanation uses course concepts to make a causal connection between ideas. The rubric distinguishes between causal explanations that make specific connections among ideas vs. generic explanations that hint at the causal relationship but are non-specific vs. non-explanations in which the student just describes course concepts without making a causal connection. For examples click here.

Developing your own rubric . A different approach is to start by “looking at the data.” Put the evidence of student understanding in front of you and start asking questions about how students' responses they are similar to and different from one another. For example, suppose your study investigates how students understand a particularly difficult concept in your discipline. You have devised a task in which students respond to several questions at the beginning and end of class. Develop a rubric to evaluate the end-of-class responses.

1. Identify examples of good and poor understanding on the assignment. Pick a few of the best and worst pieces of work. Make a list of the characteristics shared by the good pieces and a list of characteristics shared by the poor pieces. The goal is to develop criteria that distinguish good and poor understanding, the dimensions that define quality of understanding.

2. Define gradations of quality. Define the characteristics of the best understanding. Define the characteristics of the poorest understanding. These provide anchor points. Fill in the middle levels of quality. You can make as many gradations or levels as you want. The trick is to identify clearly what distinguishes the “Best” understanding from the “Next Best” and then what that is next best and so on.

3. Practice using the rubric and revise as needed. You want to make sure you can distinguish levels of understanding. The only way to do this is to practice using the rubric to make sure it includes all the relevant dimensions of understanding, and to make sure you can use it reliably. You can tell it works well if you can read a set of student work and say with confidence that all the work in the top category really belongs there—that all the pieces have the same level of quality on the critical dimensions. If not, something is wrong. What if you apply the rubric and then discover that work in the same category actually looks different on the critical dimensions? You may not be clear yourself about what the actual dimensions are (e.g., you could be using some criteria that you have not yet made explicit) or you may be applying the criteria inconsistently (e.g., you may need to define the criteria more carefully so that you can use them without thinking twice about what they mean).

Making the evaluation method credible

Is your evaluation method credible? Two key technical properties of evaluation instruments are validity and reliability. If you are not familiar with these concepts, it is worth learning about them (See references in the appendix). Most instructors are not equipped to do extensive validity and reliability studies. But instructors can develop rubrics and evaluation procedures that have construct validity and inter-rater reliability.

Construct validity refers to the extent to which your method measures the underlying theoretical construct it purports to measure. For example, if you define understanding as “the ability to apply knowledge to solve new problems,” then that's what your method measures and not some other abilities such as reading comprehension or you can improve construct validity by developing and refining a sound model of understanding as it applies to the students and subject matter in your class.

Inter-rater reliability refers to the extent to which evaluators agree on the ratings or evaluation of students. A measure of learning is reliable if two evaluators independently arrive at the same scores for a group of students. For example, suppose you develop a rubric for evaluating student understanding on a written assignment. The rubric is reliable to the extent that different instructors who evaluate the students' work arrive at the same scores for students (i.e., a student who gets a “low score” by one evaluator also gets a low score by the other). You can improve inter-rater agreement by developing and refining the scoring guidelines. In the event that you have multiple instructors to evaluate student performance, it is advisable to hold a norming session to discuss the criteria and how to apply them. You can also check inter-rater reliability by having instructors independently evaluate student performance and then determine the degree of consistency between them.

If you develop instruments and procedures that are valid and reliable, you will be better able to support your claims that you have measured the qualities of learning your study in intended to measure.

Additional Resources about Evaluating Understanding

The first four references are excellent starting points for theory, research and practice related to teaching for and learning with understanding.

Bransford, John D., Brown, Ann L., & Cocking, Rodney R. Editors 1999 . How people learn: Brain, mind, experience and schooling . Washington, DC: National Academy Press.

This book-length report summarizes important developments in the science of learning. Accessible to a non-specialist audience, the book examines such topics as differences between novices and experts, conditions that improve students' ability to apply knowledge to new circumstances and problems, the design of learning environments, teacher learning, and effective teaching in history, mathematics, and science. This volume provides teachers with a thorough grounding in contemporary theory and research, and highlights important implications for teaching. The entire book is online at

Stone Wiske, Martha. Editor 1998. Teaching for understanding: Linking research with practice . San Francisco: Jossey-Bass Publishers.

This book is the product of a six-year collaborative research project by school teachers and researchers at the Harvard Graduate School of Education. Although it focuses on pre-collegiate teaching, it is applicable to university-level teaching as well. According to the TfU model, there are four fundamental elements in teaching for understanding—generative topics that afford possibilities for deep understanding in a subject, goals that explicitly state what students are expected to understand, performances of understanding through which students develop and demonstrate understanding, and ongoing assessment. The book provides interesting examples of these elements from actual classrooms and examples of student performance. This volume should be valuable for any instructor who views better student understanding as a primary goal of the scholarship of teaching.

Wiggins, Grant 1998 . Educative assessment: Designing assessments to inform and improve student performance . San Francisco: Jossey-Bass Publishers.

This book, a precursor to Understanding by design by the same author, challenges common assessment practices and offers a comprehensive approach to the design and practice of assessment intended to improve student performance. The book examines authentic assessment, the nature of feedback, how to use assessment to promote understanding, how to assess understanding, how to design assessments and create assessment systems. It is itself an important contribution to the scholarship of teaching that provides fundamental grounding is how and why to evaluate student learning and performance. I have used the book extensively to develop a more consistent assessment philosophy and also as a handbook to guide in the design of assessment materials.

Wiggins, Grant and McTighe, Jay. 1998. Understanding by design. Alexandria, Virginia: Association for Supervision and Curriculum Development.

This book proposes that understanding is revealed to the extent that one can explain, interpret, apply, empathize, and have perspective and self-knowledge. The authors describe a process by which teachers can design experiences and materials to be consistent with these facets of understanding. A key component of the process is a way to assess understanding. Toward this end, they offer a rubric that defines different “levels” of understanding and suggest ways to evaluate different facets of understanding. This is a valuable book for those who want to translate abstract notions of understanding into concrete, observable aspects of student performance. See the Understanding by Design website

Additional References about Assessment and Evaluation

Angelo, T.A. & Cross, K. P., (1993). Classroom assessment techniques: A handbook for college teachers (second edition) . San Francisco: Jossey-Bass Publishers.

If there is a “classic” about classroom assessment, this is it. The book is an excellent resource which has a large number of readymade CAT's (Classroom Assessment Techniques).

The following are good resources related to evaluation and assessment

They are written for a wide faculty audience and are not overly technical. They do not focus on classroom level assessment but many of the principles apply to classroom inquiry.

Light, R. J., Singer, J. D., & Willett, J. B., (1990). By Design: Planning Research on Higher Education . Cambridge, MA: Harvard University Press.

This is a very accessible book about doing educational research. It is not about classroom inquiry but does have useful information about validity and reliability.

Erwin, T. D., (1991). Assessing student learning and development: A guide to the principles, goals and

methods of determining college outcomes . San Francisco: Jossey-Bass Publishers.

Palomba, C. A. & Banta, T. W. (1999). Assessment essentials: Planning, implementing and improving assessment in higher education. San Francisco: Jossey-Bass Publishers.

Middle States Commission on Higher Education, (2002). Student learning assessment: options and resources. Philadelphia: Middle States Commission on Higher Education. Available online at


© 2004 Bill Cerbin and Bryan Kopp, All Rights Reserved.

Classroom Inquiry Cycle Online Tutorial Home