Assessment
Assessment
- What does formative assessment with VPs look like?
- What are the different forms of feedback that can be given to learners as part of VP?
- Are teachers needed in formative assessment of VP activities?
There are several formats of formative assessment related to VPs and to activities performed in their context. The basic assessment of performance in VPs is diagnostic accuracy which is the agreement of the diagnosis selected by the student with that of the VP author. Diagnostic efficiency can be defined as the number of (correctly) selected diagnoses divided by the time needed for the task [Braun 2019]. Also important but less exposed in literature is therapeutic accuracy meaning the selection of adequate management of the VP [Tausendfreund 2022]. The task of diagnosis can be divided into domains of the clinical reasoning process that can be assessed separately [Plackett 2022]. Basic knowledge relevant for diagnosis embedded in VPs can be tested with multiple choice questions. Furthermore, assessed can be the recall (did the student notice relevant information) and precision (ratio of non-relevant information) of selected case - e.g. observed clinical findings, asked questions in history taking, selected diagnostic tests. Another possibility is to look into summary statements written by the students for the skills of problem representation and the use of appropriate medical language [Hege 2020]. Formative assessment is more likely to require the student to provide open-ended elaborations which makes it easier for the teacher to provide feedback. This is different to summative assessment that requires clear-cut decisions that can be reliably marked with a numerical score.
Students using VPs expect and appreciate opportunities for formative assessment and related feedback [Posel 2015]. This should come without surprise as case exposure does not by itself lead directly to learning and students need guidance on making sense of their experiences [Edelbring 2011]. There are also studies showing that engagement in formative assessment in VPs is positively correlated with results of summative assessment [Seagrave 2022].
Feedback regarding activities in VPs can be either automatic (i.e. computer-based pre-programmed) or teacher-led (e.g in follow-up seminars) [Zary 2009]. Automated feedback has the strength to be immediate which is often not possible with clinical teachers in a busy workplace [Naumann 2016] or with simulated patients [Stevens06]. Even though adding feedback to the VPs costs extra time, the effort pays off. As demonstrated in a study by Zary et al. students select VPs that are marked as containing feedback more frequently and engage with them on a deeper level (e.g. provide elaborate answers instead of using the cases merely as source of medical data) than those without it [Zary 2009].
There are various forms of automatic feedback a VPs can provide. Static [Sailer 2022] or neutral [Zary 2009] feedback is the display of expert opinion leaving the comparison of provided answers to the student. Constructive feedback is a checklist that matches and compares student answers to the expert recommendations [Zary 2009]. Students appreciate it by the end of the session as it clearly depicts what they have missed while completing the case [Al-Dosari 2017]. Consequential feedback is common in branched VPs and instead of showing directly what is correct or wrong displays the consequences of wrong decisions [Round 2009]. Cumulative feedback shows students progress across several solved cases and may encourage students' repeated use of VPs [Hirumi 2016]. Adaptive feedback is automatically adjusted to the answer given by the student. This can be implemented by an artificial neural network based on former experiences of what kind of guidance worked previously well in a similar situation [Sailer 2022]. Another form could be the alteration of subsequently suggested VPs e.g. increasing or decreasing their difficulty level depending on performance of the learner [Berman 2016]. Radon et al. recommend carefully selecting the difficulty level of formative questions in VPs as too easy questions bore students and too difficult tend to frustrate. They recommend a level of 60% to 70% of the questions being answered correctly as a good balance [Radon 2011].
Caution is required when using automated feedback, even if for formative purposes, as students are sensitive to unfair judgment and such feedback may result in disappointment. This became apparent in the pilot evaluation of the iCoViP project when some students reported their disagreement with the way their concept maps integrated with VPs were graded. A potential risk is that when the feedback contains errors it will be difficult for the student to unlearn incorrect information.
Teachers should not rely on automatic feedback only as students also expect to discuss with them the cases [Edelbring 2011]. The iCoViP project partners recommend that automatically generated feedback should be checked and extended by teachers. For example, follow-up seminars stimulate the students to post-case reflection, give opportunities to ask questions that could not be answered based on the computer presented materials and deepen the learning through a dialog with the teacher and peers [Posel 2015]. Formative assessment is also a valuable experience for the teachers as it informs them directly about the VP integration when there is still time for remedy and by that is part of ongoing curriculum evaluation and improvement [Wood 2019].
- What should be verified before VPs are used as a summative assessment tool?
- What are the opportunities and challenges in using VPs for summative assessment?
- What are the preferences of students for the use of VPs in assessment?
There have been several attempts to use VPs in summative assessment (e.g. [Waldmann 2008, Gunning 2012, Setrakian 2020]). Because VPs depict the context of clinical decision making, they are believed to be particularly suitable to test application of knowledge and problem solving [Cook 2009, Gesundheit 2009]. In this chapter we will briefly describe the opportunities, but also challenges with the use of VPs in summative assessment.
To use VPs for the purpose of summative assessment you should be able to present evidence for their quality as an assessment instrument. The two core criteria for that are reliability and validity [Schuwirth 2019]. In terms of reliability it should be remembered that clinical reasoning is content and context specific [Trowbridge 2015]. Making judgment on clinical reasoning based on performance in VPs requires many cases [Vleuten 2005]. Unfortunately, authoring VPs is expensive [Huang 2007]. The potential solution is to use for assessment purposes short VPs focusing on the key decision in the diagnostic and treatment process - so called electronic key-feature cases [Fischer 2005]. The strength of VPs in terms of assessment reliability when compared with such practice-oriented assessment forms as simulated patients and real patients is perfect reproducibility of examination conditions for all students. An identical case presentation with humans is impossible, while easily achieved with VPs [Waldmann 2008].
Validity evidence of a VPs-based assessment can be divided in content and construct validity [Schuwirth 2019]. Regarding content validity the cases used in the examination should be selected to fit into an examination blueprint [Waldmann 2008]. This aims to ensure good coverage of topics of the case - e.g. in terms of leading symptom, disease group, setting and meeting the course learning objectives [ Hege2007]. Blueprinting is usually done by an expert panel after several rounds of discussion [Mayer 2022].
The other important aspect, construct validity, requires demonstrating correlation of the outcomes of this form of assessment with established examination methods (so called criterion measure). Unfortunately, this is rarely observed in studies. For instance, Waldmann et al. showed only weak correlation when comparing VPs with MCQ-based examinations [Waldmann 2008]. The likely explanation is that VPs examination is qualitatively different from regular exams [Botezatu 2010]. MCQ-based exams are more focused on theoretical knowledge corresponding to the low levels of Miller pyramid while VPs are believed to test higher level problem-solving abilities [Waldmann 2008]. The use of VPs is also beneficial in terms of reaching a better constructive alignment of the methods used in teaching and assessment. More emphasis is now being placed in medical education on application of knowledge and this should be reflected by assessment methods [Gesundheit 2009]. Current curricula do often not assess explicit clinical reasoning and there is no gold standard for that to reference [Kononowicz 2020]. This would explain the reason for the difficulty in showing correlations in former methods of assessment and not necessarily question the validity of VPs. This aspect still needs more research.
The idea of using VPs for summative assessment is attractive to students [Botezatu 2010]. They see it as an authentic form of examination which is relevant to practice and wish to be tested that way. For instance students argue that VPs are more realistic in assessment because of the ability to present abnormal findings which is not given with standardized patients. Early stage students are more in favor of using VPs in assessment than more advanced students [Gesundheit 2009].
There is also hope that using VPs for summative assessment will strengthen the clinical reasoning abilities by the end of the curriculum because students tend to focus on learning on what is being formally assessed [McEvoy 2012]. For a fair examination it is necessary that the students are familiar with the use of the system prior to the assessment [Gunning 2012, Waldmann 2008]. It is also better not to push the students too much with time on such examinations [Gunning 2012]. Finally, the VP system used for summative assessment should be checked for security, failure tolerance and compliance with legal regulations [Haag 2010].
In summary, VPs have potential in summative assessment of clinical reasoning. However, judgment should not be made based on performance on a limited number of VPs and it should not be the only assessment form used. High acceptance of students for this assessment instrument is promising and encourages further development. At this stage none of the iCoViP partners uses the projects’ VPs for summative assessment purposes and see them more as a tool for formative than summative assessment.
- What is the purpose of using learning analytics in VPs?
- What data related to virtual patients is suitable for learning analytics and how to process and display it to the users?
- What are the risks of using learning analytics in VPs?
The learning analytics process should be planned, otherwise we risk getting lost in the abundance of data [Lang 2011]. The rational approach is to use learning theories to pose a priori hypotheses that are then verified with the collected data [Cirigliano 2020]. It is important to decide what data to select for analysis. Examples of low level, fine grained activities in VPs that are frequently recorded and can be later used for learning analytics are time spent on tasks, mouse clicks on external links, zoom in of images, or answering questions [Kononowicz 2015, Berman 2018, Cirigliano 2020]. The recorded activities can already have some inherited meaning related to the trained competency. In the case of clinical reasoning training such activities include identifying relevant clinical findings, building and prioritizing differential diagnoses, making connections between clinical observations and hypotheses, writing summary statements, selecting diagnostics tests, committing to a final diagnosis, and declaring a confidence levels in decision-making [Doleck 2016, Berman 2018, Hege 2018]. Important is also the time sequence when student actions happened [Cendan 2012, Doleck 2016]. The quality of those activities can be automatically assessed by checking against declared standards using sensitivity metrics (i.e., how much of the relevant information contained in each section the student was able to find), precision metrics (i.e., how many actions performed by the student were considered correct) [Furlan 2022], or by means of checking the degree of similarity to the expert solution using artificial intelligence methods - e.g. using clustering or natural language processing [Berman 2018], [Hege 2020]. Many VP systems allow exporting of recorded activity data in popular formats to enable learning analytics in external specialized statistical, machine learning or visualization tools.
When planning the use of learning analytics in VPs one should consider who is the target group of the outcomes [Knight 2017]. The audience could be directly the learners. In such cases results may illustrate to the students their progress in relation to their peers. This is intended to motivate them to become more engaged in their learning or to seek help from their teachers when struggling. For instance, Berman and Artino constructed a metric to judge student engagement while learning with VPs based on time spent on cards, accuracy of answering multiple-choice questions, number of added key finding and differential diagnoses, as well as machine-learning rating of summary statements [Berman 2018]. The metric was displayed for the learner in traffic light colors indicating low, medium, or high engagement. The conclusions from the iCoViP pilot studies is to put learning analytics into context and not compare students who are very different - e.g., at different stages of their education, otherwise it might be frustrating or contribute to lower self-esteem of the students. Furlan et al. proposed to display learner performance in key aspects of clinical reasoning (such as history taking, physical examination or hypothesis generation) as radar graphs in relation to best, worst and average performance in the student cohort [Furlan 2022]. The comparisons of learner actions can be also relative to an expert performance. For instance, in a clinical reasoning tool developed by Hege et al., similarity of the learner’s clinical reasoning concept maps to the expert maps is displayed as percentage charts in a dashboard [Hege 2017].
The other possibility is to make teachers and faculty staff the target group of learning analytics. For them such data is useful for identification of struggling students for supporting actions, in judging the quality of education they provide, and in understanding the learning process. Cedan and Lok have developed a learning analytics tool that present on a time axis the individual actions taken during a patient scenario by the student and allows by that to visually inspect the process that led the student to reach the final diagnosis and by that identify outliers and seek patterns in a way that would have required time-consuming video analysis in the past [Cendan 2012]. Based on the collected data, computer algorithms can look for correlations of use of particular elements to learning performance indicators. For example, after the iCoViP pilot evaluation a few VPs draw our attention by an unrealistic low diagnostic accuracy of students . Based on that we corrected design errors that sneaked into the authoring process of VPs. A study by Cirigliano et al. showed that students rushing through VPs are likely to score low when answering related multiple choice questions. Underutilisation or poor correlation of particular VP elements with success indicators allow making judgments on the quality of material and plan refinements. They also used learning analytics to question the utility of some external links and expert comments when not correlated with success on subsequent multiple-choice questions [Cirigliano 2020]. Finally, learning analytics is also a tool for virtual patient researchers. Hege et al. deepened the understanding of clinical reasoning learning with learning analytics methods applied to large collection of learner activities in VPs by showing differences between learners who diagnose the virtual patient until they find a correct answer in contrast to those who give up and request the answer is displayed by the system [Hege 2018].
Use of learning analytics is also connected with risks. Introduction of tools like engagement metrics is a form of extrinsic motivation [Berman 2018]. This is likely to influence the learning to improve the scores, but may not be connected with deeper learning, and in the worst case result in actions of superficial learning to game the system. Use of learning analytics may also compromise the feeling of safety in the learning environment. Students should be aware of which of their activities are tracked and retain the right to delete the recording of their activities while learning. Learning analytics may turn VPs more into an assessment tool, where students will be unlikely to show a candid approach of reflection and experimentation [Cirigliano 2017]. Labeling students based on learning analytics may do more harm than good when there is no firm evidence of validity of such a metric. This may backfire in disencouragement and frustration when the system prematurely judges the student as not capable of reaching the desired performance level. Learning analytics is good in showing “what” is happening, but not necessarily “why” [Chan 2018]. Long time spent on activity may be caused by distraction of the student but also reflection, referencing to textbook or group work offline. It is recommended to triangulate the conclusions suggested by learning analytics with other sources of data that reflect students' intentions like verbal protocols, surveys or interviews [Cirigliano 2017].
In summary, the collection of large amounts of data in the process of learning with VPs and their analysis using statistical and machine learning methods has much potential in learner motivation, identification of struggling students, quality improvement of the VPs and better understanding of the learning process. However, the use of learning analytics should be cautious in order not to violate the safety of learning, induce superficial learning or make unwarranted judgments or predictions.
Recommendations: – VPs are a form of assessment regarded by students to be authentic and can be used to assess clinical reasoning. – If you decide to use VPs for summative assessment you should be convinced about the validity and reliability of VPs used in the examination. – Do not to make assessment decisions based on a small sample of VPs due to the problem of case specificity. – Use learning analytics for student motivation and feedback and quality control – Be careful not to impact by learning analytics the students’ feeling of safety while learning with VPs by being open of what is recorded how and why. – Take advantage of the different forms of formative assessment available in VPs that enable the students to compare their decisions with author answers, checklists, or to analyze exemplar responses. – Artificial intelligence and natural language processing enable increasingly sophisticated ways of formative assessment that can be adaptive and include feedback to open-ended answers but remember that incorrect feedback is harmful and you should retain a healthy dose of skepticism before you trust innovation in automatic feedback. – Computer generated feedback should be reviewed and extended by human teachers to ensure the quality of feedback. |