Faculty at UW-L, and many other institutions, frequently debate the use of student evaluations of instruction.  On one side, some argue that student evaluations are more related to factors such as class size or subject and not a valid measure of effective teaching.  On another side, some argue that student evaluations are the best measure of teaching effectiveness.  Research (see Seldin, 2006) would suggest that the actual validity of student evaluations lies somewhere in the middle. 


Student evaluations of instruction generally are used for two purposes:  1) to improve instruction, and 2) for personnel decisions.  UW-L is no exception.  Departments consider Student Evaluation of Instruction (SEI) scores when making retention, tenure, and merit decisions, and the Joint Promotion Committee reviews SEI scores for promotion decisions.  The magnitude of these decisions necessitates responsible use of SEI scores.  In August, 2005, UW-L hosted a workshop, “Evaluating Teaching Performance: New Lessons Learned,” by Peter Seldin.  Seldin is a behavioral scientist, educator, author, and specialist in the evaluation and development of faculty and administrative performance. The workshop focused on student ratings of instruction and the teaching portfolio.  Seldin reported that in recent years, many universities (e.g. The Ohio State University, West Virginia University, Tennessee Technological University) have begun using standardized evaluations of instruction.  Some universities use published measures such as the Individualized Development & Educational Assessment (IDEA,, while others develop their own questions that are used by all departments. 


After the workshop, several faculty approached members of the senate executive committee about the possibility of revising our current student evaluation of instruction system.  One of the most common concerns expressed by faculty was that all promotion candidates must include results from a single item in their promotion file, yet the wording of the item varies across departments.  Faculty also were concerned about the over- and under-reliance on the results of that single item.  As a result, the Senate Executive Committee (SEC) requested the Promotion, Tenure, and Salary committee 1) investigate and choose approximately 3 to 5 global Student Evaluation of Instruction items that have been found, via previous research, to be reliable and valid and 2) make recommendations about how to report and use the results.  Unfortunately, the PTS committee did not have time to address that request.  Therefore, during the summer of 2006, the SEC investigated best practices of SEI use.  The following report summarizes the results of that investigation and proposes a set of standard items for use in UW-L courses, standard classroom administration instructions, and guidelines for the interpretation and use of SEI results. 





Student evaluations of instruction generally serve two purposes:  1) to help improve instruction and 2) for personnel decisions (i.e. merit, tenure, and promotion).  Student evaluations used for teaching improvement should focus on the details of teaching (e.g. items asking about opportunities for active learning, group work, availability of instructor in and out of class) (Cashin, 1990).  Obviously some items will be more relevant to some courses and less to others.  As such, standardized forms are less useful for teaching improvement.  Rather, dependent upon course objectives and teaching methods, instructors should be allowed to choose among a group of items and write items specific to their class.  Alternatively, when student evaluations of instruction provide information for personnel decisions, a few global items are ideal (Algozzine, et al., 2004; Cashin, 1990; Cashin & Downey1992; d’Apollonia, 1997).  This handbook addresses using student evaluation as one measure of teaching effectiveness in making personnel decisions.  Instructors and departments are encouraged to add items to the proposed SEI form for their own teaching improvement and program evaluation. 


In October, 1974, the Wisconsin Board of Regents adopted policy 74-13, Student Evaluation of Instruction (see  The policy recognizes the importance of student evaluation information.  It speaks to the use of student evaluations for the improvement of instruction; retention, promotion, and tenure decisions; and merit salary increase deliberations.  The policy recognizes that student evaluations used for teaching improvement are most useful when the instructor makes decisions about the items, methods, and frequency of evaluation.  The policy encourages system institutions to allow a wide variety of methods for using SEI results for teaching improvement and requires each institution to develop “systematic and firm procedure(s)” guiding the use of SEI results for personnel decisions (e.g. tenure or promotion).  The Regents recognize that student evaluations are but one measure of teaching effectiveness and must be interpreted with reference to the reliability and validity of the method employed.  The UW-L Faculty Personnel Rule 3.05, Periodic Review, requires both student and peer evaluation of teaching for merit pay, retention, tenure and promotion decisions (see  The student evaluations will be governed by the Board of Regents policy, but the Chancellor, UW-L Faculty Senate, and departments may establish additional regulations. 


A common set of SEI items has several characteristics that make it particularly useful for personnel decisions. Specifically, standardized items and administration procedures allow for a shared framework to discuss evaluations of teaching.  Such items need to be applicable to a wide variety of instructional settings and focus on dimensions of teaching performance that are generally acknowledged as being important.  Despite the utility of a common set of SEI items, it is important to keep in mind several considerations:

  • No SEI is a precise instrument. Therefore, the interpretation of results should focus on extreme patterns across several classes, not on trivial differences in mean values for single teaching episodes.
  • Although student evaluation of teaching is required, departments are not limited to the common SEI items. Departments and individual instructors are encouraged to design and add items especially for teaching improvement use.
  • Student feedback is only one source of data for assessing teaching effectiveness. SEI results should be interpreted in conjunction with data from other assessment methods, in particular peer evaluation of teaching and self-assessments. 
  • Several variables, other than teaching effectiveness, are related to student evaluations, and as such, judgments about SEI results must consider those contexts.





Correlates of Student Evaluations

Much research on student evaluations of instruction focuses on factors related to SEI scores.  In general, most of the research has examined four categories of variables:  course variables, student variables, instructor variables, and SEI administration variables.  While some factors not related to teaching effectiveness (e.g. discipline, course level, course size) are related to student evaluations, most have relatively weak relationships (Greenwald, 1997). 


Course Variables.  Research consistently suggests that class size relates to student evaluations.  Teachers of smaller classes receive higher ratings than teachers of larger classes (Algozzine, et al., 2004; Cashin, 1995; Cramer & Alexitch, 2000; Nerger, Viney, & Reidell, 1997; Schlenker, McKinnon, & Cole, 1994).  Some research suggests that the relationship is curvilinear in that ratings are highest in smaller classes (fewer than 20 students), followed by larger classes (over 100 students), and lowest in middle-sized classes (40 to 60 students) (Franklin, 2001). 


Course level also relates to student evaluations of instruction (Algozzine, et al., 2004; Cashin, 1990; Nerger, et al., 1997; Schlenker & McKinnon, 1994).  While relationships are weak, instructors of upper level courses, especially graduate level courses, generally receive higher ratings than those of lower level courses. 


Students rate teachers differently dependent upon discipline.  Teachers in the humanities tend to receive the highest ratings, followed by those in the social sciences, and finally, teachers of the sciences tend to receive the lowest ratings (Cramer & Alexitch, 2000; Franklin, 2001; Nerger, et al., 1997). 


Intensity of the time schedule relates to student evaluations in that teachers of short-term courses tend to receive higher ratings than those of courses with more traditional schedules (Nerger, et al., 1997).


Student evaluations tend to increase as instructors have more experience teaching a particular course with a particular method.  New and revised courses often receive lower ratings (Franklin, 2001). 


Much research focuses on the relationship between student evaluations and grades.  Two possible reasons could explain a relationship between grades and student evaluations.  First, students could be rewarding instructors of “easy classes.”  In effect, by employing a lenient grading system, instructors “buy” good evaluations.  Second, students could be earning higher grades because the teaching was effective.  Therefore, one would expect the student evaluations of instruction to be higher as well (Franklin, 2001; Marsh & Roche, 1997).  Results have been mixed (Nerger, et al., 1997).  Some (Marsh & Roche, 1997; Centra, 2003) failed to find a practically significant relationship between grading lenience and student evaluations.  Others (Cashin, 1995; Franklin, 2001) report positive, but weak correlations (.10 to .30) between student evaluations and grades.  Somewhat stronger correlations (.36) emerge when comparing student evaluations with expected grades (Heckert, Latier, Ringwald, & Silvey, 2006).  Recent research (Addison, Best, & Warrington, 2006) found that students’ perception of course difficulty was more important in evaluations than were grades alone.  Students who expected the class to be more difficult than it actually was gave higher evaluations, while those who expected the class to be easier than it actually was gave lower evaluations. 


Student VariablesStudent motivation for taking the course seems to be the one student variable consistently related to student evaluations.  Heckert, et al. (2006) found a correlation of .71 between student ratings of interest in the subject matter and evaluations of instruction.  Students consistently give higher ratings to instructors of elective courses than those of required courses (Algozzine, 2004; Cashin, 1995; Franklin, 2001).  Research fails to support any relationship between student evaluations of instruction and student age, gender, level, or personality.  Finally, students’ overall g.p.a. does not seem to be related to SEI scores (Cashin, 1995; Latier, et al., 2006; Marsh & Roche, 1997).


Instructor Variables.  Considerable research has examined the relationship between instructor expressiveness and student evaluations.  In a now controversial article, Naftulin, Ware, and Donnelly (1973) reported that instructor expressiveness influenced student evaluations more than teaching effectiveness.  Naftulin et al. hired a professional actor to teach one class day.  “Dr. Fox” gave a highly entertaining lecture filled with nonsense and contradictory statements.  Student evaluations of Dr. Fox were extremely positive, even though the lecture was void of content.  Marsh and Roche (1997) report that, while instructor expressiveness is related to student evaluations, it tends to be related specifically to ratings of enthusiasm and not to ratings of clarity of presentation, organization or knowledge of the content.  Cashin (1995) argues that instructor expressiveness helps make the class more interesting and therefore improves learning, thus should be related to evaluations of teaching effectiveness. 


Teaching experience is, as one would expect, related to student evaluations of instruction in that instructors with more experience generally receive higher ratings than those with less teaching experience (Cashin, 1995; Cramer & Alexitch, 2000; Franklin, 2001).  Research fails to support any consistent relationships between student evaluations and instructor age, gender, race, personality, or research productivity (Cashin, 1995). 


Administration variables.  The methods employed in the administration of student evaluation of instruction forms can impact evaluations.  Non-anonymous evaluations tend to be higher than anonymous ones.  Instructor presence during the administration of student evaluations increases ratings.  Some research suggests that students who are told their evaluations will be used in personnel decisions give higher ratings than those not provided that information (Algozzine, et al., 2004; Cashin, 1995; Nerger, et al., 1997).  Finally, research is mixed about the timing of student evaluations.  In general, best practices suggest administering SEI’s no later than the second-to-last week of the term.  After that time, students tend to distracted by final exams and assignments and provide less detailed and thoughtful feedback (Cashin, 1990).  These factors need not invalidate student evaluations and can easily be controlled by standardized administration methods. 


Validity of Student Evaluations for Personnel Evaluation

McKeachie (1997) argues that “student ratings are the single most valid source of data on teaching effectiveness” (p. 1219).  Students observe the instructor more than any other person and are the intended beneficiaries of the teaching, and therefore, teaching effectiveness should matter to them (Seldin, 2006).  The validity of a measure often is assessed by comparing scores on that measure with other measures designed to assess the same construct.  Student evaluations of instruction are related to several other measures of teaching effectiveness including:  student learning, instructor self-ratings, colleague ratings, administrator ratings, alumni ratings, and trained observer ratings (Cashin, 1995; Marsh & Roche, 1997; McKeachie, 1997). 


Best practices, therefore, suggest that student evaluations of instruction should not be the sole measure of teaching effectiveness.  Guidelines from the Individual Development & Educational Assessment Center (an agency whose mission is “To serve college and university communities by supporting the assessment and improvement of teaching, learning, and administrative performance”) suggest that student evaluations should comprise no more than 30% to 50% of the evaluation of teaching for several reasons.  For example, student evaluations are not perfectly correlated with student learning.  Additionally, student evaluations likely represent a rather unsophisticated view of teaching effectiveness (McKeachie, 1997) for students are not able to judge several important factors associated with effective teaching.  Students do not have the background or knowledge to judge:

  • the appropriateness of the class objectives
  • how well an instructor knows the subject matter
  • if the course materials are balanced and relevant, or
  • the quality and fairness of tests or assignments and grading standards (Seldin, 2006).

Alternatively, students are well-equipped to assess such factors as clarity of communication, organization, course difficulty, workload, classroom atmosphere, pace of instruction, and their own learning (Franklin, 2001; Seldin, 2006).  In sum, student evaluations should never be used in isolation to evaluation teaching.  Peer and department evaluations of teaching need to be included to create a comprehensive understanding of an instructor’s teaching effectiveness and contributions to the department.




As previously stated, research and best practices suggest that students are able to validly rate some, but not all, factors involved of effective teaching.  Research also suggests that when student evaluations are to be used in personnel decisions, a few global items are ideal.  Best practices suggest response scales should contain approximately five categories and be based on student agreement with statements rather than on instructor ability (Cashin, 1995). Based on the literature and best practices, the following items were chosen from published instruments and those used at other institutions:



Strongly Disagree




Strongly Agree

1.       Prior to this course, I wanted to take it regardless of who taught it.






2.       Overall, this instructor was excellent.






3.       The instructor was helpful and responsive to students.






4.       The instructor was well prepared.






5.       The instructor communicated the subject matter clearly.






6.       I learned a great deal from this instructor.







These items shall be included on all SEI forms, although, instructors and departments are encouraged to include additional items to provide information for teaching improvement and program evaluation.  Results to each item will be reported individually for all personnel decisions in which SEI results are used.


Item 1 is designed to measure student motivation, not teaching effectiveness, and as such should not influence personnel decisions.  Rather the results of item 1 will provide a context within which to interpret results on the remaining items.  It is important to interpret SEI scores relative to student motivation given the strong relationship between student interest in course material and student evaluation of instruction.  Evaluations in required courses in which students tend not to be intrinsically interested in the material are expected to be lower; those in elective courses in which students are interested in the material are expected to be higher.  Evaluations that severely deviate from the expect pattern (i.e. low evaluations in high student interest courses or high evaluations in low student interest) are particularly noteworthy. 





  • The instructor and any teaching assistants shall not participate in the administration of student evaluations of instruction.  Neither shall distribute the SEI forms, be present in the room when the forms are completed, or collect the forms.  Departments shall create policies to ensure that other individuals are available to administer the SEI forms. 


  • Results will be reported to instructors only after final grades have been submitted.


  • All responses to the SEI will be anonymous.  Students shall not sign or in any way identify themselves on the SEI forms.  It is recommended that evaluations not be administered in courses with fewer than five students as it is difficulty to protect student anonymity with fewer students. (see Seldin, 2006)


  • The following instructions will be given to students (either in writing or read aloud) (adapted from Seldin, 2006): 


“This course evaluation is an important means for you to express your view of your classroom experience.  Although we assess the quality of instruction in many ways, we place great value on student input because of the unique perspective you have on what occurs in the classroom throughout the semester.  Thus you are important partners in the process of making the course more effective, the instructor more attuned to his or her strengths and weaknesses, and the university a better place to learn.  As such we ask you to treat the process professionally, seriously, sensitively, and collegially.  Carefully consider the questions and answer truthfully.  Your responses are one important factor in decisions affecting the career of your instructor.  Instructors will not have access to course evaluations until after grades have been posted.  We will treat the evaluation forms as the confidential documents that they are.  These general guidelines also should be followed:

    1. You should be given a minimum of 10 minutes to complete the evaluation.
    2. The instructor, as well as any teaching assistants, should not be present when you are completing the evaluation.
    3. The designated representative should deliver completed evaluations promptly to the proper office.”


  • Departments and instructors can provide additional instructions for any items added for teaching improvement or program evaluation. 


  • Evaluations shall not be administered during the final exam period.  It is recommended that evaluations be administered during weeks 12 to 13.


  • The timing of evaluations should not coincide with special classroom events.


  • Departments and instructors should strive for the following response rates (from Franklin, 2001):


Class Size

Recommended Responsea


At least 80%; more recommended


At least 75%; more recommended


At least 66%; 75% or more recommended


At least 60%; 75% or more recommended

100 or more

At least 50%; 75% or more recommended

a Assuming there is no systematic reason that students were absent, the impact of absence on results is larger in smaller classes.




Student evaluation of instruction should be only one of several measures of teaching effectiveness.  As previously mentioned, students are not equipped to evaluate all factors important in teaching effectiveness.  Departments are encouraged to develop additional procedures to evaluate teaching effectiveness.  Additionally, SEI scores should be interpreted within the context of variables known to be related to evaluations (e.g. student motivation, class size, discipline, etc.) and in general, it is recommended that SEI scores not be compared across instructors.  


SEI results will be reported only to one decimal place (Cashin, 1990).  All measures are subject to error and therefore none perfectly measure the construct of interest.  Depending on the standard error of measurement, differences of a few tenths may not reflect real differences in teaching effectiveness, but instead reflect measurement error.  Differences of less than a tenth of a point are extremely unlikely to represent real differences in teaching effectiveness.  In general, it is recommended that absolute cut-off scores (e.g. SEI scores must be above 3.5 for a candidate to be considered for tenure) be avoided for the same reason.  Similarly, personnel decisions should be based on evaluations from a variety of courses over several terms.  SEI scores from a single course should never be used to justify a personnel decision.  Best practices suggest such decisions be based on at least 5 courses over a two year period, more if the courses have fewer than 15 students. (Cashin, 1995; see also Franklin, 2001).  Trivial differences across courses should be ignored.  Decisions should be based on overall patterns across courses across terms.  It also is recommended that frequency distributions for relevant items are provided to department chairs and instructors. 




As student evaluations of instruction typically are negatively skewed (i.e., student ratings tend to clump at higher scores, with fewer outlying lower ratings), the mean is not the best summary of scores.  When distributions are not symmetrical, the median is a better measure.  To allow for more fine-grained measures, the fractional median should be used.  Departments are encouraged to have students complete SEIs using scantron forms so that summaries of results can be computed by Information Technology to eliminate human error in calculations.  If departments choose to summarize results themselves, the procedure described in Appendix A shall be used to calculate fractional median scores.


Departments will report the fractional median (reported to one decimal place) for the individual instructor of the six global SEI items for each course taught during the last 3 years.  Departments also will report the overall fractional median for all faculty.




Addison, W. E., Best, J., & Warrington, J. D. (2006).  Students’ perceptions of course difficulty and their

     ratings of the instructor. College Student Journal, 40, 409-416.


Algozzine, B., Beattie, J., Bray, M., Flowers, C., Gretes, J., Howley, L., Mohanty, G., & Spooner, F.

     (2004). Student evaluation of college teaching: A practice in search of principles. College Teaching,

     52(4), 134-141.


Cashin, W. E. (1995). Student ratings of teaching: The research revisited (IDEA Paper No. 32). 

     Manhattan:  Kansas State University, Center for Faculty Evaluation and Development.


Cashin, W. E. (1990).  Student ratings of teaching: Recommendations for use  (IDEA Paper No. 22). 

     Manhattan:  Kansas State University, Center for Faculty Evaluation and Development.


Cashin, W. E., & Downey, R. G. (1992). Using global student rating items for summative evaluation. 

     Journal of Educational Psychology, 84, 563-572.


Centra, J. A. (2003). Will teachers receive higher student evaluations by giving higher grades and less

     course work? Research in Higher Education, 44(5), 495-518.


Cramer, K. M., & Alexitch, L. R. (2000). Student evaluations of college professors: Identifying sources of

     bias. The Canadian journal of higher Education, XXX, 143-164.


D’Apollonia, S., & Abrami, P. C. (1997). Navigating student ratings of instruction. American

     Psychologist, 52, 1198-1208.


Franklin, J. (2001).  Interpreting the numbers:  Using a narrative to help others read student evaluations of

     your teaching accurately. In K. Lewis (Ed.) Techniques and strategies for interpreting student

     evaluation (pp. 85-100). New Directions for Teaching and Learning, San Francisco: Jossey-Bass.


Greenwald, A. G. (1997).  Validity concerns and usefulness of student ratings of instruction.  American

     Psychologist, 52, 1182-1186.


Heckert, T. M., Latier, A., Ringwald, A., & Silvey, B. (2006).  Relation of course, instructor and student

     characteristics to dimensions of student ratings of teaching effectiveness. College Student Journal, 40,



Marsh, H. W., & Roche, L. A. (1997).  Making students’ evaluation of teaching effectiveness effective: 

     The critical issue of validity, bias, and utility.  American Psychologist, 52, 1187-1197.


Nerger, J.L., Viney, W., & Riedel, R.G. II (1997). Student ratings of teaching effectiveness: Use and

     misuse. The Midwest Quarterly, 38, 218-233.


Schlenker, D., & McKinnon, N. (1994). Assessing faculty performance using the student evaluation of

     instruction. Atlantic Baptist College. (ERIC Document reproduction Service No.  HE027508).


Seldin, P. (2006). Evaluating faculty performance: A practical guide to assessing teaching, research, and

     service. Boston, MA.: Anker Publishing Company, Inc.


Appendix A


Explanation of Fractional Median


Using the median becomes problematic when the data set contains large numbers of repeated values. In the case of SEI scores, the student can only choose 1 of 5 values, and so, by design, the set of SEI scores of most classes contains large groups of the same value. And so, if the regular median were calculated, there would only be a few possible results for all classes/instructors. The fractional median is used to provide a wider range of possible results, while still maintaining some of the desirable properties of the regular median. In terms of the mathematics, the fractional median provides a more continuous range of outcomes instead of the discrete set possible with the regular median.


Here is the basic idea (and an example): To arrive at a continuous set of outcomes, one assumes that each data value is the center of the true set of values that could have been measured. For example, when a student selects a 3 instead of a 2 or a 4, one can assume that if the student were allowed to choose any numerical value from the real line, they would have selected something between 2.5 and 3.5, and since they were not allowed to list their exact observation, they selected the nearest choice, in this case a 3.


We call this range of values associated with each observable measurement (choice) a bin or a cell. The cell for the choice 1 is .5 to 1.5, for a 2 it is 1.5 to 2.5, etc. We now want to calculate the fractional median, which estimates what the median would have been if the student could have selected any real value (not just the 5 choices given). First we determine what cell the standard median lives in. The fractional median will be a value from the cell that contains the standard median. We then determine how far into the cell the median actually is (again assuming they could have selected any value in the cell). This gives the fractional median.


Example Data Set: two 2's, nine 3's, eight 4's, and eight 5's. (I picked an odd number of values because it is a little tricky, the even number case is a bit easier.)


This is a total of 27 measurements (student scores). Half of 27 is 13.5, and so if we look for the thirteen and a half value, we end up looking between two 4's. So the median is a 4, which comes from the cell ranging from 3.5 to 4.5, and so the fractional median will be between 3.5 and 4.5.


The fractional median in this case will be 3.5 plus the percentage of the distance into the cell the middle value represents. So if it were the case that the true median was the middle 4, then the percentage of the distance into the cell would be 50%= .5, thus the fractional median would be a 3.5+.5=4 (so the median equals the fractional median in this situation). In our example, the 4 that represents the true median would be between the 2nd and 3rd four (the 2.5th four, let's say) of the eight 4's in the cell, which is 2.5/8 ths of the way into the cell. Now, since 2.5/8=.3125, the fractional median for this example would be 3.5+.3125=3.8125. Note: if more of the 3's were 4's, then the "middle 4" would be a greater distance into the cell, resulting in a higher fractional median (but the regular median would still be 4).