How schools and universities are introducing AI tutors at scale

At Bloom AI, we’ve worked with schools, universities and other education institutions globally to introduce AI tutors to tens of thousands of students. Along the way, we’ve seen what works and doesn’t work when bringing AI into real classrooms. Here are five lessons that can help institutions deploy AI securely, responsibly, and with measurable impact.

A shorter version of this article was originally published as a LinkedIn post by Educate Ventures Research.

1. Measure impact holistically.

The first mistake I see is not measuring anything at all. At the very least, there should be some hypotheses of potential impacts, and ways to measure them. The second mistake I see is taking a single-minded uncritical approach when measuring impact. A lot of people just look at student engagement numbers (e.g. number of questions asked). But there are a whole suite of potential impacts: student retention, course satisfaction, educator workload, learning outcomes. One dimension is often not enough. You want to use a mixed-methods evaluation, which is both quantitative and qualitative. Here are some examples of ways you could assess impact:

Surveys, interviews and focus groups with both students and educators
Analyses of student retention, assessment grades and learning management system (LMS) data
De-identified analyses of student and staff chat logs (e.g. thematic or intent analysis)

Any one of these by themselves is inadequate to attain the full picture of impact, which is messy and complex when we’re dealing with real world deployments. It’s very unlikely you’ll have the chance to perform a randomised controlled trial (RCT), and even they have their own issues such as equity and measurement.

An important point to call out: it’s not enough to simply correlate assessment grades with the usage of an AI tool. At the risk of stating the obvious, correlation does not imply causation. Many headlines will claim that “AI helped boost student grades by X%”, but when you dive deeper into their methodology, they are simply comparing two groups. It could very well be that the motivated students are the ones who are more likely to use AI. Controlling for past performance is not enough to fix this self-selection bias. This being said, such correlations are still valuable as part of the broader suite of evaluation.

2. Separate productivity and learning.

AI for staff and AI for students are entirely different games. Deploying AI with staff is usually for enhancing productivity, while student-facing AI should be about improving and deepening learning. These are very different use cases, and therefore require different tools, metrics and processes.

We’ve all heard about the potential time-saving benefits of AI as an assistant. For example, this report from the Walton Family Foundation and Gallup study surveyed teachers, and those who used AI weekly said that they saved an average of 5.9 hours a week. But do we want these same time savings for students?

For students, time saved often means learning lost. Generic AI is built to be an assistant which gives you the complete answer. It makes doing more efficient. But learning isn’t about efficiency. Generic AI takes away the productive struggle required for deep learning. I call this the doing-learning tradeoff. The same feature for staff is a bug for students.

The calculator analogy is instructive. Yes, most adults now carry a calculator in our pockets. For most of us, using a calculator is much faster than computation by hand. Yet, we do not give the most powerful calculator to a first grader. We only introduce the tool when a foundation is established and the focus is no longer to learn the times tables, but higher-level concepts.

This leads us to our next point.

3. Care about learning.

Implementing AI in education has an added difficulty compared to other industries. It’s not just about productivity. More than just recognising the difference between AI for productivity and AI for learning, institutions need to actually care.

AI tutors built for learning are not the same as ChatGPT plugged into a course. We have a growing amount of evidence that giving students generic AI without guardrails can harm learning e.g. see Bastani et al. (my overview here). The use of generic AI is associated with lower brain activity. Institutions need to be thinking about how we safeguard students from over-reliance and hallucinations, but also the many other emerging risks. AI tools which are built for learning will, among other things, foster productive struggle.

It’s not enough to just have a “learning mode”. ChatGPT and Perplexity have Study Mode, Gemini has Guided Learning, Claude has a Learning style. But when the full answers to a student’s homework is one toggle away, guess what they will choose?

4. Walk before you run, but never stand still.

The institutions that make the most progress aren’t the ones that wait for a perfect, centralised rollout. They start small, test fast, and learn faster. Pilots build capability, trust, and momentum for scale. Doing nothing is the only wrong move. Joanne Villis, Director of Technology Enrichment at St Dominic’s Priory College in Adelaide, Australia, started introducing AI tutors in a phased approach over the past two years. Starting with a group of Year 12 psychology and physics students, and their supportive teachers, the AI tutor was piloted over a course of a term, with clear evaluation metrics and feedback from students. The pilot acted as a low-stakes experiment which had clear benefits:

Developing AI implementation capability among the staff
Collecting institution-specific data to inform future larger-scale implementation
Building trust with senior leadership through authentic student and educator stories

Institutions, like learners, need small, scaffolded challenges that build confidence and conviction.

5. Bring educators on the journey.

Too often, AI initiatives are led by well-meaning administrators or central teams who don’t get to see how it plays out day-to-day in the classroom. The best results we’ve seen are from when passionate, empowered educators are in the room to shape deployment and adoption.

Ideally, educators are in the room co-designing the solution. Professor Florian Breuer from the University of Newcastle in Australia started introducing AI tutors with his large first-year mathematics course. Because he had a voice in the room, he surfaced two critical insights which were institution- and course-specific. These would not have been foreseen through a top-down implementation. Firstly, many of his students were regional and working full-time, which explained the initially lower-than-expected uptake. So we developed specific strategies to increase uptake of the AI tutor. Secondly, first-year students struggled with mathematics notation, something which the AI tutor could not handle. With his input, we built a mathematics editor that went on to become one of Bloom’s most popular features.

Gary Liang (LinkedIn) is the Founder & CEO of Bloom AI, an AI tutor platform for schools and universities. He’s worked with schools and universities to deploy AI tutors with tens of thousands of students globally. He was a founder of a high school tutoring company, and a management consultant and researcher at McKinsey & Company. He also previously taught economics and mathematics at UNSW and the University of Sydney.