About Me

My photo

Shital Bhandary
Kathmandu
Nepal

Tuesday, July 2, 2013

Standard Setting in Health Professions Education

Kathmandu, Nepal

Health Professions Education (HPE) is embracing the competency based curriculum, teaching and assessment cycle, which is quite different from the conventional objective driven approach.
 
The competencies laid down in the course and/or curriculum require a criterion-referenced standard setting system for an effective and defensible student assessment. Standard setting sessions require subject matter / content experts to determine the competencies for each item such as Multiple Choice Question (MCQ), Short Answer Question (SAQ), Objective Structured Practical/Clinical Examination (OSPE/OSCE).
 
One of the most widely used criterion-reference method of standard setting in HPE is Angoff method where judges review the items in terms of its content and difficulty levels and award score between 0 and 1. The underlying assumption of this score is related with the "Minimally Competent Borderline Candidate" or simple "Borderline" candidate who are those students who "either can barely pass or fail" in any examination. These are "hypothetical" students and thus a common understanding among the judges must be obtained a priori to the actual Angoff scoring session (Angoffing).
 
When the  individual scores of the judges for an item are averaged then the resulting value becomes the "cut-off score" or pass-mark for that particular item. So, if a test consist of 50 standard set items then the pass mark of the test is obtained as sum of the Angoff score of each of the items included in the test. This shows the "competency" required to pass the test.
 
Below I present a hypothetical example for a test with 5 items where each item is judged by a "mixed" panel of  six judges. With six judges, Angoff scores become reliable further adding weight in favor of its validity.
 
Example:
Judge1
Judge2
Judge3
Judge4
Judge5
Judge6
Angoff
Item1
0.45
0.5
0.55
0.4
0.6
0.55
0.51
Item2
0.4
0.35
0.45
0.4
0.45
0.4
0.41
Item3
0.5
0.65
0.6
0.55
0.6
0.5
0.57
Item4
0.7
0.75
0.8
0.65
0.7
0.65
0.71
Item5
0.65
0.55
0.75
0.7
0.55
0.6
0.63
Pass Mark
2.70
2.80
3.15
2.70
2.90
2.70
2.83
 
 
 
 
 
 
Pass % =
56.5
 
One of the main problem of the Angoff method is the "content expert bias" which means that when standard setting is done by a group of experts belonging to the same discipline they tend to give higher score resulting the "upward bias" in the Angoff score by shifting the pass-mark higher. It has big consequences if the test is "high stake" as adjustments in the Angoff scores are not permitted. Thus, it is recommended to use a "mixed judge" panel to balance the scores in the standard setting process.
 
Another problem occurs when Angoff score is done for the very first time and there is a little discussion on the concept and meaning of the "borderline" student. As most of the HPE courses in South Asia including Nepal uses 50% cut-score, faculty here tend to give at least 50% marks to maintain this norm. I call this a "novice bias".
 
When "content expert bias" or "novice bias" occurs then a compromised methods like Hofstee and Beuk are recommended for formative examinations. These methods use the students' actual scores and Angoff scores to determine the adjusted pass-mark after correcting these bias if present. It is actually recommended to use these methods in the initial phase of the formative student assessment until a normalization effect takes place among the judges. For summative assessment, it is recommended to standard set the items with item analysis results and/or repeat Angoffing to reduce the biases.
 
If standard setting is done correctly then a test with pre dominantly easy items will have higher pass mark whereas a test with mostly difficult items will have lower pass mark. In other words, "competency" for a test is known to the students and faculty before the test is conducted.
 
At last, a test should typically be compiled based on examination blueprint as it allows the teachers/faculty to sample the contents from different sections and of varied difficulty. It also allows the faculty to choose the knowledge and skills using appropriate methods, which in turn increase the validity of the test. As knowledge can also be assessed in varying degree of difficulty, it is always better to use the "mixed bag" approach to get the sample of curricular contents, item difficulty and assessment methods for producing technically "competent" human resources for health.
 
More can be found here:
 
 
2. www.act.org/research/researchers/reports/pdf/ACT_RR89-2.pdf

Rest later ...

So, happy Angoffing!