The Intercollegiate Examination in Trauma and Orthopaedic Surgery
This article will look at Section 1 of the Intercollegiate examination in Trauma and Orthopaedics. It will describe how the section is written, administered and used to determine who will be allowed to proceed to sit Section 2 (clinicals and vivas). If candidates understand more about the nature of this section of the exam they may be more relaxed when they come to face it, and may therefore perform better!
Section 1 of the Intercollegiate Examination in Trauma and Orthopaedics is the ‘written’ component, now actually computer-based testing sat at Pearson View centres around the country to avoid the need for candidates to travel and incur hotel costs. It is designed to test knowledge across the curriculum and, insofar as is possible, does so using questions that require higher-order thinking (rather than asking for a fact, it looks for the application of knowledge to solve problems, often clinical scenarios). An ongoing development of the examination is the progressive rewriting of questions in the bank that are currently recorded as level 1 questions (factual knowledge) into higher order questions. Exams are beginning to contain more ‘harder’ questions but this does not affect the standard to pass, as will be described.
A characteristic of all Intercollegiate exams is the extraordinary attention to ensuring the consistency of standards from one diet to the next, and one decade to the next. It is worth looking at how the paper is compiled, marked and decisions made about who passes and fails.
Questions have come into the bank from many sources, but always through the question-writing committee. Prior to section 1 changing from short essays to ‘MCQ’ type questions a committee of examiners was put together tasked with writing ‘Single Best Answer’ (SBA)and ‘Extended Matching Item’ (EMI) questions covering as much of the curriculum as possible. All examiners were likewise tasked, and successful candidates were also asked to submit questions. It was the job of the MCQ writing committee to take proposed questions from all sources and identify those that could be written into a format consistent with best educational practice before being placed into the question bank. Later the questions in the bank were coded to the curriculum so that the bank could more easily be interrogated and, for example, question writing could be focused to address areas of relative deficiency.
If we look at the journey of a typical question now, from conception to regular use, we would see the following. A question will be proposed – lets say it is brought to the question writing committee by a member who had been asked at the previous meeting to write an SBA on a specific curriculum topic where a question was needed. The question would be projected for the committee to review and about half (in T&O at least) will, after 15 minutes or so of debate, be rejected. Otherwise the debate will continue with numerous edits being made and, over the course of typically up to an hour, the question will be rewritten until it satisfies the committee. The question is then coded and banked as a new question. It is available for selection into an exam but when it is used, it is flagged as new and its performance compared to established ‘superbank’ questions that have a track record of solid performance. It is no exaggeration to say that an A4 side of statistical data is produced on the performance of each and every question in every diet of the exam. Only if the new question performs adequately will it count towards the final mark of candidates in the exam and be available for use in subsequent diets. If its performance falls short it is removed from the exam, does not affect the final mark of any of the candidates, and it is returned to the question writing committee for review.
It is worth looking at the format of the questions, and this is described in templates that are freely available from the JCIE website. SBA questions are exactly what the name suggests. A question will be set and the candidate has to choose the best from 5 possible answers. It is important to note that this is not a ‘Single Correct Answer’ question but a ‘Single Best Answer’. In fact all 5 possible answers could be ‘correct’ but candidates are asked which is the ‘Best’ answer given the information presented in the stem. As questions are designed to test higher order thinking, this may mean that not all of the information needed is in the stem – some of it may need to be judged from your knowledge of the available evidence. Questions about which some candidates complain ‘There was more than one correct answer’, ‘the question was ambiguous’ etc are often the best performing questions on the paper!
Questions are also written to avoid cues being taken to allow guessing. There has been, and continues, a huge amount of input from Educational Psychologists at all stages in the Intercollegiate exam. Suffice it to say that there is no point in using some of the tricks that can get you through poorly constructed exams. For instance the order of possible answer choices is simply alphanumeric. The possible answer choices are adjusted to be of similar length (in lesser exams the possible answer that is longer or shorter than the rest is the correct one!) and all possible answers will be of the same nature (eg if being asked about a diagnostic test the possible answers will all be radiological investigations, for example, rather than 4 radiological investigations and one blood test). The bottom line is that you should not try to look for patterns or clues – if you want to guess just guess. Its still worth it – there is no negative marking so everyone will guess the questions they can’t work out and get about 20% of them right.
The above also applies to EMI questions, which lend themselves to clinical scenarios – for example data is given on a patients history and examination findings along with test results and a diagnosis has to be chosen from a list of 8 or more possibilities – the same list is used for blocks of three EMI questions with differing clinical scenarios. Again the information provided may be incomplete and what is needed is the most likely correct response from the list when you combine the information provided with your knowledge of the evidence and clinical experience (just like the decision making process that you will have to undertake as a consultant, and that has to be safe). A typical evolution of an EMI question is that the first time it is used in an exam it is flagged up as ‘too easy’. It is removed from the exam, comes back to the question writing committee, and a debate takes place about what information is essential and what is provided but could differ in the real world without altering the correct response. Information is stripped out, the question returned to the exam and its performance reviewed. Often it is a better question but if the 2 were looked at side by side the original would have looked superficially to be preferable.
An exam is compiled by ‘random’ selection of questions from the bank by a computer – random in parentheses, as rules are followed. The proportion of questions from each coded section of the curriculum is the same for all exams and each exam has blocks of established well performing questions, new questions and rewritten questions. Candidate feedback after every exam always contains self-cancelling comments eg ‘there were far too many upper limb questions’ and ‘there were far too many lower limb questions’ etc. Since removal of the ‘interpretation of a written paper’ section, each diet will also include a proportion of questions on the theme of interpretation of the literature, for example statistics, study design etc.
The first draft of a paper, which contains a few more questions than needed, is sent to the chairman of the Examination Quality Assessment (EQA) Group securely. His job is to check that there is no duplication of questions and that SBA and EMI questions aren’t covering exactly the same material. He will also check that, for example, knee questions include the same trauma component that a trauma question covers when it deals with the knee. Similar and overlapping questions are removed to bring the papers down to the correct number of questions while maintaining balance. This second draft is then considered by a convened EQA group meeting whose job it is to go through the paper with a fine toothcomb and pick up potential problems that can be ironed out before the exam. Even at this stage questions can be removed and substituted. Even with several read-throughs spelling mistakes and typo’s creep through, some that even make the question impossible to answer. Don’t worry – any rogue question will not contribute to your final mark!
The exam itself is sat at Pearson View centres simultaneously around the country in 3 diets a year. The papers are automatically marked and at this stage there is simply a raw mark indicating how many correct responses each candidate achieved. However, as noted above, extensive data is collected on how each and every question is answered. As an example of the sort of data collected, the final scores of candidates are ranked and divided into quintiles. For each possible response to each question data is generated on how each quintile of candidates responded. One measure of question reliability will be to look at how it predicts the final result of a candidate – a ‘good’ question will be answered correctly by almost all of the candidates who end up in the top 20% and incorrectly by most candidates who end up in the bottom 20%. All of this data is stored in the bank with the questions and is available when questions are reviewed. Facility refers to how easy a question is – if 90% of all candidates get a question right or wrong it is too easy or too hard and is actually a useless question. In fact, such questions are removed from the exam and do not count toward the final mark, but they are sent back to the question writing committee. If the purpose of the exam was to identify the best and worst candidates in the country reliably, giving a National rank, then these questions would be vital. However the exam has to discriminate reliably around a pass mark based on specialty standards and by removing ‘too easy’ and ‘too hard’ questions from the final consideration the middle ground becomes ‘stretched out’ and separates candidates better around the pass mark.
The crunch therefore is how do we set the pass mark? It is not true to say that there is any sort of regulation of the flow of candidates through to the next stage by manipulation of the pass mark. The mark for eligibility to proceed (the correct term) is that which would be obtained by the candidate who just meets the standards required by the specialty and the GMC, often loosely defined as the day one consultant who has spent an appropriate period of time revising for the specialty exam. At all stages from allocation of a candidate number before the exam is sat through to signing the Standard Setting outcome at which the eligibility to proceed is defined, the candidate details are anonymous. Indeed the examiners setting the eligibility to proceed mark do not even know what marks candidates have achieved. Let us consider how a Standard Setting meeting runs.
Around 20-25 experienced examiners will convene in Edinburgh. They will first be split into 2 groups to look at some of the SBA and EMI questions that have been flagged statistically as possibly poor performers. Some questions will already have been removed automatically – for example all of the questions that proved too easy or too hard (usually new questions, as any question previously used would have passed through this hurdle already). The examiners will review each question and decide whether it is a fair question that should stay in the exam, or is flawed and should be removed and returned to the question writers. Typical reasons for the latter would be ambiguity that had not previously been recognized, new evidence that has challenged the previously decreed correct answer, or simply that the answer in the bank is wrong. It is worth noting that some very good questions end up being flagged as having possible wrong answers yet remain in the exam. If a question is hard so that only 20% of candidates answer it correctly then 80% will chose a wrong response. Lets say 40% chose one of the incorrect stems – this flags as a possible wrong answer automatically, as more candidates have chosen a specific incorrect response than the correct one.
Once the poorly performing questions are weeded out the examiners sit down with the papers and consider each question individually using an Anghof procedure. Each examiner works independently and considers every question in turn. The essence of an Anghof procedure is that it determines the difficulty of each question individually. What the examiner is tasked with doing is considering what proportion of borderline candidates would answer each question correctly. The examiners are not told the answers – they do not need the answer paper to recognize how a borderline candidate will behave faced with a particular question, each having had considerable experience of borderline candidates both in their roles as trainers and as examiners. To simplify matters, if we consider that the whole exam had only 10 questions and all of the examiners independently concluded that 6 of every 10 borderline candidates would get each question correct then a pass mark of 6 out of 10 (60%) would mean that 50% of borderline candidates would pass and 50% would fail. The pass mark therefore divides the borderline candidates down the middle. If the exam has a lot of hard questions the pass mark will be lower. If there are a lot of easy questions it will be higher. The mark is unique to each diet. The Anghof-derived pass mark is not the mark determining eligibility to proceed, however. The GMC argue that there is some uncertainty in judgements made in this way, which can be expressed statistically as the Standard Error of Measurement (SEM). For patient safety reasons the GMC would not want incompetent candidates being allowed to proceed, even if removing them means some potentially competent candidates are prevented from doing so. The eligibility to proceed mark is therefore the Anghof derived mark plus one SEM. When this step was first introduced the historical performance of candidates scraping through was reviewed and it was noted that they went on to fail section 2, so this rule in fact saves some candidates a whole lot of money!
Finally it should be noted that not only is every question in every exam statistically dissected, but so is each paper in each exam, and each exam compared to all previous exams. We thus have data on the reliability of the examination including statistics such as Kronbach alpha, which scales from minus one to plus one. For high stakes professional examinations such as the FRCS(Tr&Orth) the standard to be aspired to is a Kronbach Alpha of +0.8. Around the world few professional examinations, particularly in medical specialties, achieve this. Section 1 of the FRCS(Tr&Orth) never drops below +0.9.
So what can be said to help candidates tackle the exam? I would say that the best preparation is doing the background reading but applying it by solving problems in clinic and theatre and questioning the boss about how they make decisions throughout training and especially as you approach the exam. It doesn’t work without the background reading bit. Don’t go in thinking you will be treated in any way unfairly – you will be a number and an enormous effort will be put into making sure the decision made on your eligibility to proceed is sound – even if the questions seem ambiguous or incomplete. No-one will know whether you have attempted it before. Training programmes are designed to deliver the curriculum for T&O, which is what the Intercollegiate exam tests. For this reason alone, the pass rate is much higher in those on training programmes than in those who are not. Good performing candidates in UKITE are usually good performing candidates in the FRCS(Tr&Orth) – the huge effort related to Quality Assessment and Standard Setting makes sure that when pass/fail is decreed in the FRCS(Tr&Orth) it is done so by a process that is sound by the standards of the most rigorous external review. That’s why its so expensive! Don’t look for patterns in the answers. Don’t start to panic if there seem to be a lot of hard questions – if its true, the pass mark will be lower or if the questions are very hard they will be removed from consideration. A word of caution about practice papers – examiners are not allowed to write books about the exam, so any published practice questions are written by someone with no experience of the FRCS(Tr&Orth) writing group. Questions in the bank evolve from exam to exam – subtle changes make big differences to the correct answer. If you practice on a website and think you recognize the question in an exam be very careful indeed – there are a number of question which, when used, generate very interesting responses. Clearly there is a correct answer that is agreed by all examiners present but a whole cohort of otherwise sensible people plump for the same incorrect answer – now why are they doing that?
Leader of the Section 1 writing group