Can we counter the assessment cheating power of ChatGPT?

Sep 5, 2023 | Artificial Intelligence, Home Featured, Opinion, Selection & assessment

Ben Williams, a chartered occupational psychologist at Sten10, explores ChatGPT for cheating assessments and some of the mitigations being deployed.

We don’t know for sure how prevalent cheating using ChatGPT is in recruitment. A study by Cibyl found that half of students are using ChatGPT in their studies and nearly half expect to use it for job applications.

However, other research (University of Madrid and Alpha Academic Appeals) shows that cheating in recruitment settings is typically only around 15% – people don’t automatically default to cheating just because they can.

How much easier ChatGPT makes it to cheat in a recruitment process varies across assessment methods. For example, while covering letters and application forms are easily open to abuse, assessment centres are pretty robust against ChatGPT.

There is no universally accepted solution to combating the use of ChatGPT. Different providers will offer different solutions: only time (and validation studies) will tell how effective they are. Here are some of the more common ones currently being proposed and discussed.

3 mitigations to counter the assessment cheating power of ChatGPT

1. Encouraging, and checking, honesty

Within personality profiling, it has long been known that simply asking people to be honest in their responses is more likely to make people respond truthfully.

Some firms now go further and ask candidates to digitally sign an ‘honesty contract’, which draws upon principles from behavioural science by asking people to take an active step in committing to take an action.

Taking a more positive approach, explaining the benefits of being honest and the insights it can provide both the employer and the candidate in order to get a great role match can help promote truthfulness.

In terms of checking honesty, then good assessment practice says that every skill/ability/behaviour you are assessing should be assessed more than once in the selection process. Let candidates know that this will be done so there is no point in ‘cheating’.

Some firms claim to be able to ‘detect’ the use of AI via various ‘signals’. For example, overly perfect responses, a lack of personal content, speed of candidate response etc. As this is such a fast-evolving field, there is already scepticism about how effective these detection methods are, and will continue to be, as ChatGPT evolves.

2. Administration

Firms have been asking people to verify their identify online for some time. However, there are other things that you can consider around the administration of tests that could give some ‘quick wins’ as well as some longer-term solutions.

Regarding invigilation, some companies have adopted remote proctoring (supervision) of tests – either via a webcam with a human observer, or via artificial intelligence that can detect if someone moves to a different browser window, or if someone else comes into the room. Other firms have brought back in-person testing for certain core skills.

Time will tell how comfortable candidates feel being monitored. In the past it was the norm for this to happen face-to-face, but people may resent being remotely monitored.

Some test platforms allow you to disable the copy/paste function, which makes it more onerous to cheat using ChatGPT. Setting time limits helps, as does providing the questions live in ‘real time’ which makes preparatory cheating using ChatGPT more difficult.

Interactive elements (e.g. asking for a video response, or even incorporating a live discussion with an assessor) make it more difficult to cheat, which is why assessment centres are robust against ChatGPT.

Also consider how much information you reveal in advance about the assessment and what you are looking for. On the one hand, revealing the competency model you use gives candidates an idea about how to prepare. Yet at the same time, you are also revealing the ‘right answer’, which can be entered into ChatGPT as something for the AI to consider when generating a response.

Likewise, revealing interview questions in advance can help candidates who are neurodiverse prepare. This also provides useful information for other candidates looking to ‘cheat’. It is a difficult balance to strike.

3. Assessment design and assessment suite choice

In terms of assessment design, it is currently much more laborious to use ChatGPT under timed conditions if a test includes multiple different question types. For example, rank order, multiple choice, free text. If you move to incorporating audio, video and images too then it becomes more difficult again (if not impossible, with current technology – although this could well change).

For earlier stages in the process such as personal statements and application forms, one recommendation is to ask very specifically for personal examples that draw from their employment history (coupled with saying that this will be probed later in the process). ChatGPT would need to fabricate something that the candidate may be hesitant to use.

Some assessment providers use ‘metadata’ about how someone has completed the test, as well as (or instead of) their ‘score’. For example, looking at reaction times when answering questions or mouse movement patterns. These tests are many and varied, so it is heard to generalise. Some look quite work-based, others more ‘game-based’.

As with any good assessment you should ask for strong evidence of validity and be sure it is possible to explain why this metadata links to job performance if it were challenged by candidates or legally. For those tests that look quite different to a work-task, consider applicant reactions to the test, some might welcome it as a breath of fresh air, others may question the test’s relevance.

Early days

It can feel a little unsettling to be a recruiter in the current climate and to know how to respond to the threat of ChatGPT. It is early days and it’s important to be vigilant and to implement changes where there are obvious flaws.

At the same time, the ultimate test of the size of the threat and the effectiveness of current assessments (as well as the effectiveness of any mitigations) is going to be what the data tells us: are selection processes still predicting performance in the job? This is what will ultimately tell you the size of the threat and what actions you need to take to counter it.

You may also be interested in…

What is the impact of ChatGPT on early careers recruitment?

What recruiters need to know about ChatGPT for assessment and selection

Less than 10% of students won’t use ChatGPT when applying for jobs

Was this article helpful?

YesNo