The Evaluation Task Force have recently published a new annex to
the Magenta Book which covers best practice for impact evaluation
of AI tools and technologies.
In December the Evaluation Task Force published a new annex to
the Magenta Book, focusing on best practice for evaluating the
impact of AI evaluation methods (click here
to read the guidance). The guidance will enhance the safety
and confidence with which government departments and agencies can
adopt AI technologies, ensuring that public sector innovation
keeps pace with the private sector. It reflects an understanding
of the unique challenges posed by AI and the need for tailored
approaches to address these challenges.
The guidance has been coproduced with the Department for
Transport and Frontier Economics, in consultation with leading AI
specialists. It is expected to be a valuable resource for
policymakers, public sector professionals, and digital
specialists working to integrate AI solutions into government
operations. Moving forwards, the guidance will be co-owned with
the Central Digital and Data
Office (CDDO).
What does the guidance cover?
The guidance details best practice, including evaluation design,
methodology, and timing, for evaluating the impact of new AI
tools and technologies being introduced in the public sector. In
particular, it advocates for the use of Randomised Control Trials
when testing a new AI product to produce high quality evidence on
the intended and unintended impacts of introducing these new
technologies. The guidance also includes a series of hypothetical
case studies to illustrate possible high-quality approaches to
evaluating the impact of different types of AI tools.
Please note: this guidance does not address how to evaluate the
quality, safety and accuracy of new AI tools. This process is
typically referred to as “model evaluation” or assurance
activities, and is typically carried out by Digital, Data and
Technology (DDaT) professionals rather than social researchers.
Instead, the new AI guidance focuses on the impact of AI tools on
decisions and outcomes. An example of an impact evaluation of an
AI tool can be found here,
and an example of a model evaluation of an AI tool can be found
here.
Why is this guidance important?
Recent growth in the capabilities of Artificial Intelligence (AI)
technologies has led to increased interest in the use of AI in
Government. Robustly evaluating the impact of AI use in
government (including process, impact and value for money
questions) is essential in making sure we understand the impact
of new AI systems, are able to improve current interventions, and
can inform future policy development. By providing a framework
for assessing the impact and effectiveness of AI tools, the
guidance underscores the government's commitment to maintaining
high standards of evaluation and accountability in its use of
emerging technologies.
What happens next?
The Evaluation Task Force will be working with CDDO to help embed
evaluation best-practice in digital processes across Government,
and working to support colleagues designing and delivering impact
evaluations of AI interventions. If you have a project or piece
of work related to AI that you'd like to discuss with the
Evaluation Task Force, you can get in touch with the Evaluation
Task Force at: etf@cabinetoffice.gov.uk.
Useful links
Examples of best practice
Model testing and development