One of my students recently came to me with an interesting dilemma. His sister had written (without AI tools) an essay for another class, and her teacher told her that an "AI detection tool" had classified it as having been written by AI with "100% confidence". He was going to give her a zero on the assignment.
Putting aside the ludicrous confidence score, the student's question was: how could his sister convince the teacher she had actually written the essay herself? My only suggestion was for her to ask the teacher to sit down with her and have a 30-60 minute oral discussion on the essay so she could demonstrate she in fact knew the material. It's a dilemma that an increasing number of honest students will face, unfortunately.
My son recently told me his teacher used him as an example for the class as someone who wrote a good piece himself. Teacher accused all the other students of using AI.
My son told me that he had in fact used AI, but asked AI multiple times to simplify the text, and he had entered the simplified version. He like the first version best, but was aware his teacher would consider it written by AI.
Exactly this. It really is this easy. You have the full class period to write an easy on the economic causes of the civil war .. or on gender roles in pride and prejudice, or on similarities and differences on morality from the stoic ideals to Christianity in the Roman Empire. Kind of like most of my 90’s era college experience.
Nothing. Word is getting around about how to do this. I anticipate that in another couple of years it'll have diffused to everyone, except the constant crew of new younglings who have to find out and be told about it from their older siblings and such.
"AI detection" wasn't even a solution in the short term and it won't be going forward. Take-home essays are dead, the teachers are collectively just hoping some superhero will swoop in and somehow save them. Sometimes such a thing is possible, but it isn't going to happen this time.
I wouldn't mind seeing education return to its roots of being about learning instead of credentialization. In an age where having a degree is increasingly meaningless in part due to many places simply becoming thinly veiled diploma treadmills (which are somehow nonetheless accredited), this is probably more important than ever. This is doubly so if the AI impact extremists end up being correct.
So why is the issue you described an issue? Because it's about a grade. And the reason that's relevant is because that credential will then be used to determine where she can to to university which, in turn, is a credential that will determine her breadth of options for starting her career, and so on. But why is this all done by credentials instead of simple demonstrations of skill? What somebody scored in a high school writing class should matter far less than the output somebody is capable of producing when given a prompt and an hour in a closed setting. This is how you used to apply to colleges. Here [1], for instance, is Harvard's exam from 1869. If you pass it, you're in. Simple as that.
Obviously this creates a problem of institutions starting to 'teach the test', but with sufficiently broad testing I don't see this as a problem. If a writing class can teach somebody to write a compelling essay based on an arbitrary prompt, then that was simply a good writing class! As an aside this would also add a major selling point to all of the top universities that offer free educational courses online. Right now I think 'normal' people are mostly disinterested in those because of the lack of widely accepted credentials, which is just so backwards - people are actively seeking to maximize credentials over maximizing learning.
This is one of the very few places I think big tech in the US has done a great job. Coding interviews can be justifiably critiqued in many ways, but it's still a much better system than raw credentialization.
> In an age where having a degree is increasingly meaningless
I wish I would agree with you, but I think that having a degree (or rather the right degree) is more important than ever.
Basically grades exist to decide who gets a laid back high paying job, and who has to work 2 low paying labor intensive job just to live paycheck to paycheck.
As one teacher told me once: we could have all of you practice chess, make a big tournament and you get to choose your university based on your chess ranking. It wouldn't be any less stupid than the current system.
I still don't understand why standardized testing gets so much pushback. Having the students do their work in a controlled environment is the obvious solution to AI and many other problems related to academic integrity.
Its also the only way that students can actually be held to the same standards. When I was a freshman in college with a 3.4 highschool GPA, I was absolutely gobsmacked by how many kids with perfect >= 4.0 GPAs couldn't pass the simple algebra test that the university administered to all undergraduates as a prerequisite for taking any advanced mathematics course.
Nah. Goodhart's law is literally just "if you play a matrix game don't announce your pick in advance". It is not a real law, or not different from common sense. (By matrix game I mean what wiki calls "Normal form game[0]", e.g. rock-paper-scissors or prisoner's dilemma.)
In education, regarding exams, Goodhart's law just means that you should randomize your test questions instead of telling the students the questions before the exam. Have a wide set of questions, randomize them. The only way for students to pass is to learn the material.
A randomized standardized test is not more susceptible to Goodhart's law than a randomized personal test. The latter however has many additional problems.
That's not even remotely true. A randomized standardized test will still have some domain that it chooses its questions from and that domain will be perfectly susceptible to Goodhart's Law. It is already the case that no one is literally teaching "On the SAT you're going to get this problem about triangle similarity and the answer is C." When a fresh batch of students sits down in front of some year's SATs the test is still effectively "randomized" relative to the education they received. But that randomization is relative to a rigid standardized curriculum and the teaching was absolutely Goodhart'd relative to that curriculum.
"The only way for students to pass is to learn the material."
Part of Goodhart's law in this context is precisely that it overdetermines "the material" and there is no way around this.
I wish Goodhart's law was as easy to dodge as you think it is, but it isn't.
I do not believe schooling is purely an exercise in knowledge transfer, especially grade school.
School needs to provide opportunities to practice applying important skills like empathy, tenacity, self-regulation, creativity, patience, collaboration, critical thinking, and others that cannot be assessed using a multiple choice quiz taken in silence. When funding is tied to performance on trivia, all of the above suffers.
Well, for one thing, people learn differently and comparing a "standard" test result just measures how much crap someone has been able to cram into their brain. I compare it to people memorizing trivia for Jeopardy. Instead what needs to be tested and taught is critical thinking. Yes a general idea of history and others is important, but again its teaching people to think about those subject, not just memorizing a bunch of dates that will be forgotten the day after the test.
you cannot possibly do any higher level of analysis of any subject if you dont even know the base facts. its the equivalent of saying you dont need to know your times tables to do physics. Like, theoretically its possible to look up 4x6 every time you need to do arithmetic but why would you not just memorize it.
If you dont even know that the american civil war ended in 1865 how could you do any meaningful analysis on its downstream implications or causes and its relationship to other events.
More important than knowing what 4x6 is, is understanding what multiplication is, why division is really the same operation, understanding commutative, associative, distributive properties of operations, etc. All of this comes as a result of repeated drilling of multiplication problem sets. Once this has been assimilated, you can move on to more abstract concepts that build on that foundation, and at that point sure you can use a calculator to work out the product of two integers as a convenience.
>What somebody scored in a high school writing class should matter far less than the output somebody is capable of producing when given a prompt and an hour in a closed setting
Right, in an ideal world we'd peer into the minds of people and compute what they know. But if we did that, our eyes would probably catch on fire like that lady in Kingdom of the Crystal Skull.
We need some way to distill the unbelievable amount of data in human brains into something that can be processed in a reasonable amount of time. We need a measurement - a degree, a GPA, something.
Imagine if in every job interview they could assume absolutely nothing. They know nothing about your education. They might start by asking you to recite your ABCs and then, finally at sunset, you might get to a coding exam. Which still won't work, because you'll just AI cheat the coding exam.
We require gatekeepers to make the system work. If we allow the gatekeepers to just rubber stamp based off of if stuff seems correct, that tells us nothing about the person itself. We want the measurement to get close to the real understanding.
That means AI papers have to be given a 0, which means we need to know if something is AI generated. And we want to catch this at the education level, not above.
I did have interviews with a government agency many years ago that, among other things, involved a battery of tests including what I assume were foreign civil service exams. I got an offer though I didn't take it.
But assuming in-person day long batteries of tests for universities and companies is probably not very practical.
You can argue whether university is a very efficient use of time or money but it presumably does involve some learning and offers potential employers some level of a filter that roughly aligns with what they're looking for.
In a world where some but not all programs are “diploma treadmills,” you would expect that the reputation of the bad credentials would go down and the good credentials would go up. In some sense if the credentials were really being used (and not just as a perfunctory first pass elimination), you’d expect the most elite programs to have the highest signal to noise ratio. But the market doesn’t seem to respond to changes in credentialing capability (by hiring more from programs that start focusing on the “right” things to test). Instead it’s really just a background check.
> you would expect that the reputation of the bad credentials would go down and the good credentials would go up.
We should expect this if employers can efficiently and objectively evaluate a candidate's skills without relying on credentials. When they're unable to, we should worry about this information asymmetry leading to a "market for lemons" [0]. I found an article [1] about how this could play out:
> This scenario leads to a clear case of information asymmetry since only the graduate knows whether their degree reflects real proficiency, while employers have no reliable way to verify this. This mirrors the classic “Market for Lemons” concept introduced by economist George Akerlof in 1970, where the presence of low-quality goods (or in this case, under-skilled graduates) drives down the perceived value of all goods, due to a lack of trustworthy signals.
In the US, it's also because there are so many options that it's not feasible to have a clear ranking of schools outside of the extreme ends of the spectrum.
> This is one of the very few places I think big tech in the US has done a great job. Coding interviews can be justifiably critiqued in many ways, but it's still a much better system than raw credentialization.
Just so we're clear, the coding tests are in addition to credentialisation. I'll never forget when I worked at Big Tech (from Ireland) and I would constantly hear recruiters talk about the OK school list (basically the Ivy league). Additionally, I remember having to check the University a candidate had attended before she had an interview with one of our directors.
He was fine with her, because she had gone to Oxford. Honestly, I'm surprised that I was able to get hired there given all this nonsense.
My experience with big tech has been the polar opposite - nobody has ever cared and I've never tried to hide it either. Which one was it if you don't mind me asking?
I'm a drop out (didn't finish BSc) from a no name Northern European university and I've worked at or gotten offers from:
- Meta
- Amazon
- Google
- Microsoft
- Uber
- xAI
+ some unicorns that compete with FAANG+ locally.
I didn't include some others that have reached out for interviews which I declined at the time. The lack of a degree has literally never come up for me.
Once you have a relevant work history, a degree matters much less. It still does to some employers, however, for whom it's a simple filter on applicants: No degree? Resume into the bin.
Hiring is still a pretty non-uniform thing despite attempts to make it less so - I'm sure there are some teams and orgs at all these large companies that do it well, and some that do it les well. I think it is pretty well accepted that university brand is not a good signal, but it is an easy signal and if the folks in the hiring process are a bit lazy and pressed for time, a bit overwhelmed by the number of inbound candidates, or don't really know how to evaluate for the role competencies, I think it's a tool that is still reached for today.
In a way, I think the hiring process at second-tier (not FAANG) companies is actually better because you have to "moneyball" a little bit - you know that you're going to lose the most-credentialed people to other companies that can beat you dollar for dollar, so you actually have to think a little more deeply about what a role really needs to find the right person.
If anything, it will get worse. There was a deficit of tech workers, from now on, there will be an excess. Which means that differentiators will be even more important.
Always stunned by how much teachers can accuse without proof and invert the "innocent until proven guilty".
Honestly, students should have a course in "how the justice system works" (or at least should work). So should the teachers.
Student unions and similar entities should exist and be ready to intervene to help students in such situations.
This is nothing new, AI will just make this happen more often, revealing how stupid so many teachers are. But when someone spent thousands for a tool, which purports to be reliable, and is so quick to use, how can an average person resist it? The teacher is as lazy as the cheaters they intend to catch.
Student unions tend to focus on all sorts of other issues, I wouldn't trust them to handle cases like this.
The only way to reliably prevent the use of AI tools without punishing innocent students is to monitor the students while they work.
Schools can either do that by having essays be written on premise, either by hand or by using computers managed by the school.
But students that are worried that they will be targeted can also do this themselves, by setting up their phone to film them while working.
And if they do this, and the teacher tries to punish someone who can prove they wrote the essay themselves, either the teacher or the school should hopefully learn that such tools can't be trusted.
It's also the case that even pre-Web and certainly pre-LLMs, different schools and even departments within schools had different rules about working with other students on problem sets. In some cases, that was pretty much the norm, in others strictly verboten.
It’s strange watching people put so much faith in these so called “AI detection tools”. Nobody really knows how they work yet they’re treated like flawless judges. In practice they’re black boxes that quietly decide who gets flagged for “fraud”, and because the tool said so everyone pretends it must be true. The result is a neat illusion that all the “cheaters” were caught, when in reality the system is mostly just picking people at random and giving the process a fake sense of certainty.
I hope this could be a "teachable moment" for all involved: have some students complete their assignments in person, then submit their "guaranteed to be not AI written" essays to said AI detection tool. Objectively measure how many false positives it reports.
When I was in college, there was a cheating scandal for the final exam where somehow people got their hands on the hardest question of the exam.
The professor noticed it (presumably via seeing poor "show your work") and gave zero points on the question to everyone. And once you went to complain about your grade, she would ask you to explain the answer there in her office and work through the problem live.
I thought it was a clever and graceful way to deal with it.
I think this kind of approach is the root of (the US's) hustle culture. Instead of receiving a fair score, you get a zero and need to "hustle" and challenge your teacher.
The teacher effectively filtered out the shy boys/girls who are not brave enough to "hustle." Gracefully.
Nah, the professor wasn't American (as is often the case) and she had a tricky situation. She had strong reasons to believe people were cheating and had to sort out who did and who did not in a swift way.
This has nothing to do with American Hustle culture and just with that professor's judgment.
They had to challenge her first. So, yes, challenging her was the only way to get better grade. And you still knew im advance what questions are giing to be.
Cheaters and non cheaters were punished in exactly the same way. Effectively cheating gave you an advantage and being shy gave you disadvantage.
Except they did not learned to not be shy. There was no such lesson. This is like saying that stealing from a student is ok, because it is teaching them thieves exist.
They learned that cheating gives advantage to the cheating individual. They also learned that reporting cheating harms them and non cheaters.
Lol, in 3rd grade algebra, a teacher called 2 of us in for cheating. She had us take the test again, I got the same exact horribly failing score (a 38%) and the cheater got a better score, so the teacher then knew who the cheater was. He just chose the wrong classmate to cheat of of.
My son is learning algebra in 2nd grade. They don’t call it “algebra” yet nor mention “variables”, but they’re working on questions like solving “4 + ? = 9”.
He just goes to our local public elementary school.
Yeah I guess technically that's algebra but at that age it is based on memorization (you just learn that 4 + 5 = 9) and you're not actually using algebra to solve the problem e.g. "subtract 4 from both sides of the equation."
I assume that the cheating student didn't know that he was copying answers from someone who was doing poorly. It was third graders after all; one wouldn't necessarily expect them to be able to pick the best target every time.
Oh. That would have never crossed my mind! So the cheater student was copying from GP who had worse results, and when they both redid it all by themselves the cheater answered correctly, and GP did not.
> Which, in a subject like algebra, is extremely suspicious ("how could both of them get the exact same WRONG answer?").
In Germany, the traditional sharp-tongued answer of pupils to the question "How could both of you get the exact same WRONG answer (in the test)?" is: "Well, we both have the same teacher." :-)
Except the power imbalance: position, experience, social, etc. meant that the vast majority just took the zero and never complained or challenged the prof. Sounds like your typical out-of-touch academic who thought they were super clever.
It's an incredible abuse of power to intentionally mark innocent students' answers wrong when they're correct. Just to solve your own problem, that you may very well be responsible for.
Knowing the way a lot of professors act, I'm not surprised, but it's always disheartening to see how many behave like petty tyrants who are happy to throw around their power over the young.
If you cheat, you should get a zero. How is this controversial.
Since high school, the expectation is that you show your work. I remember my high school calculus teacher didn't even LOOK at the final answer - only the work.
The nice thing was that if you made a trivial mistake, like adding 2 + 2 = 5, you got 95% of the credit. It worked out to be massively beneficial for students.
The same thing continued in programming classes. We wrote our programs on paper. The teacher didn't compile anything. They didn't care much if you missed a semicolon, or called a library function by a wrong name. They cared if the overall structure and algorithms were correct. It was all analyzed statically.
I understand both that this is valuable AND how many (most?) education environments are (supposed) to work, but 2 interesting things can happen with the best & brightest:
1. they skip what are to them the obvious steps (we all do as we achieve mastery) and then get penalized for not showing their work.
2. they inherently know and understand the task abut not the mechanized minutia. Think of learning a new language. A diligent student can work through the problem and complete an a->b translation, then go the other way, and repeat. Someone with mastery doesn't do this; they think within one language and then only pass the contextual meaning back and forth when explicitly required.
"showing your work" is really the same thing as "explain how you think" and may be great for basics in learning, but also faces levels of abstraction as you ascend towards mastery.
It's not great for the teacher though. They're the ones who will truly suffer from the proliferation of AI - increased complexity of work around spotting cheating 'solved' by a huge increase in time pressure. Faced with that teachers will have three options: accept AI detection as gospel without appeals and be accused of unfairness or being bad at the job by parents, spend time on appeals to the detriment of other duties leading to more accusations of being bad at the job, or leave teaching and get an easier (and probably less stressful and higher paid) job. Given those choices I'd pick the third option.
option 4b: resolve the teacher from being the gatekeeper who has to "prove" knowledge has been imparted, accepted and consolidated? It's your idea, but with explicit candor and not a sly wink :)
4. Use AI to talk to the student to find out if they understand.
Tests were created to save money, more students per teacher, we're just going back to the older, actually useful, method of talking to people to see if they understand what they've been taught.
You weren't asked to write an essay because someone wanted to read your essay, only to intuit that you've understood something
I really believe this is the way forward, but how do you make sure the AI is speaking to the student rather than to another AI impersonating the student? You could make it in person but that's a bit sad.
Both can be true at the same time. You outlined the objective, the money is an extra constraint (and let's be honest, when isn't money an extra constraint?)
I agree. Most campuses use a product called Turnitin, which was originally designed to check for plagiarism. Now they claim it can detect AI-generated content with about 80% accuracy, but I don’t think anyone here believes that.
I had Turn It In mark my work as plagiarism some years ago and I had to fight for it. It was clear the teacher wasn’t doing their job and blindly following the tool.
What happened is that I did a Q&A worksheet but in each section of my report I reiterated the question in italics before answering it.
The reiterated questions of course came up as 100% plagiarism because they were just copied from the worksheet.
This matches my experience pretty well. My high school was using it 15 years ago and it was a spotty, inconsistent morass even back then. Our papers were turned in over the course of the semester, and late into the year you’d get flagged for “plagiarizing” your own earlier paper.
80% accuracy could mean 0 false negatives and 20% false positives.
My point is that accuracy is a terrible metric here and sensitivity, specificity tell us much more relevant information to the task at hand. In that formulation, a specificity < 1 is going to have false positives and it isn't fair to those students to have to prove their innocence.
That's more like the false positive rate and false negative rate.
If we're being literal, accuracy is (number correct guesses) / (total number of guesses). Maybe the folks at turnitin don't actually mean 'accuracy', but if they're selling an AI/ML product they should at least know their metrics.
It depends on their test dataset. If the test set was written 80% by AI and 20% by humans, a tool that labels every essay as AI-written would have a reported accuracy of 80%. That's why other metrics such as specificity and sensitivity (among many others) are commonly reported as well.
Just speaking in general here -- I don't know what specific phrasing TurnItIn uses.
The promise (not saying that it works) is probably that 20% of people who cheated will not get caught. Not that 20% of the work marked as AI is actually written by humans.
I suppose 80% means you don't give them a 0 mark because the software says it's AI, you only do so if you have other evidence reinforcing the possibility.
you're missing out on the false positives though; catching 80% of cheaters might be acceptable but 20% false positives (not the same thing as 20% of the class) would not be acceptable. AI generated content and plagarism are completely different detection problems.
If they are serious they should realize that "80% accuracy" is almost meaningless for this kind of classifier. They should publish a confusion matrix if they haven't already.
Had a professor use this but it was student-led. We had to run it through ourselves and change our stuff enough to get a high enough mark to pass TurnItIn. Avoided the false allegations problems at least.
There have always been problems like this. I had a classmate who wrote poems and short stories since age 6. No teacher believed she wrote those herself. She became a poet, translator and writer and admitted herself later in life that she wouldn't have believed it herself.
> My only suggestion was for her to ask the teacher to sit down with her and have a 30-60 minute oral discussion on the essay so she could demonstrate she in fact knew the material.
This sounds like, a good solution? It’s the exception case, so shouldn’t be constant (false positives), although I suppose this fails if everyone cheats and everyone wants to claim innocence.
You hinted to it but at what point are you basically giving individual oral exams to the entire class for every assignment? There are surveys where 80% of high school students self report using AI on assignments.
I guess we could go back to giving exams soviet Russia style where you get a couple of questions that you have to answer orally in front of the whole class and that’s your grade. Not fun…
My current idea for this is to have AI administer the 1:1 oral exam. I’m quite confident this would work through grade school at least.
For exams you’d need a proctored environment of some sort, say a row of conference booths so students can’t just bring notes.
You’d want to have some system for ephemeral recording so the teachers can do a risk-based audit and sample some %, eg one-two questions from each student.
Honestly for regular weekly assignments you might not even need the heavyweight proctoring and could maybe allow notes, since you can tell if someone knows what they are talking about in conversation , it’s impossible to crib-sheet your way to fluent conversational understanding.
You don't need oral exams, you just need in-person. So a written test in the classroom, under exam conditions, would suffice.
In this particular resolution example, it would be quicker to ask the student some probing questions versus have them re-write (and potentially regurgitate) an essay.
2. Speaking about your work in front of 1-2-5 people is one thing, but being tested in front of an entire class (30 people?) is a totally different thing.
In high school English, someone (rotating order) had to give a 5-10 minute talk about something in front of the class every week/class. Seems like a pretty good idea in general.
My high school history teacher gave me an F on my term paper. I asked him why, and he said it was "too good" for a high school student. The next day I dumped on his desk all the cited books, which were obscure and in my dad's extensive collection. He capitulated, but disliked me ever since.
This stuff is getting more pervasive too. I'm working on my Master's degree right now and any code I submit, I make sure it has spelling mistakes and make it god awful because I don't want to get flagged by some 3rd party utility that checks if it was AI generated or copied from someone else.
I've had the same problem online for years, when I translate something people presume I am using Google Translate (even though in one case said language isn't on Google Translate — I checked!)... Or got the answer off Wikipedia.
One of the funniest things was being accused of plagiarising Wikipedia, when I'd actually written most of the Wikipedia article on said subject. The irony... Wikipedia doesn't just use unpaid labour, it ends up undermining the people who wrote it.
> when I'd actually written most of the Wikipedia article on said subject. The irony... Wikipedia doesn't just use unpaid labour, it ends up undermining the people who wrote it.
Surely it would be relatively easy to offer to show the edit history to prove that you actually contributed to the article? And, by doing so, would flip the situation in your favour by demonstrating your expertise?
The fact that you should have to is pretty annoying but also fairly edge case. And if a teacher or institute refuses to review that evidence then I don't think the credential on the table worth the paper it's printed on anyway.
That's an interesting point. It seems it makes cheaper to provide knowledge but more expensive to have individual assessments.
I think AI got me some brain rot as I concern to finish stuff on time and I can't bare to spend brain energy on that (and spend on it anyway because AI sucks)
It's not that hard to prove that you did the work and not an AI. Show your work. Explain to the teacher why you wrote what you did, why that particular approach to the narrative appealed to you and you chose that as the basis for your work. Show an outline on which the paper was based. Show rough drafts. Explain how you revised the work, where you found your references, and why you retained some sources in the paper and not others.
To wit, show the teacher that YOU did the work and not someone else. If the teacher is not willing to do this with every student they accuse of malfeasance, they need to find another job. They're lazy as hell and suck at teaching.
Computer, show "my" work and explain to the teacher why "I" wrote what "I" did, describe why that particular approach to the narrative appealed to "me" and "I" chose that as the basis of "my" work. Produce an outline on which the paper could have been based and possible rough drafts, then explain how I could have revised the work to produce the final result.
And if you do all of that, and memorize it well enough to have an in-person debate with the teacher over whether or not you did the work, then maybe that's close enough to actually doing the work?
Write it in something like Google docs that tracks changes and then share the link with the revision history.
If this is insufficient, then there are tools specifically for education contexts that track student writing process.
Detecting the whole essay being copied and pasted from an outside source is trivial. Detecting artificial typing patterns is a little more tricky, but also feasible. These methods dramatically increase the effort required to get away with having AI do the work for you, which diminishes the benefit of the shortcut and influences more students to do the work themselves. It also protects the honest students from false positives.
Thought it is a good idea at first, but can easily be defeated with typing out AI contents. One can add pauses/deletions/edits or true edits from joining ideas different AI outputs.
> Detecting artificial typing patterns is a little more tricky, but also feasible.
Keystroke dynamics can detect artificial typing patterns (copying another source by typing it out manually). If a student has to go way out of their way to make their behavior appear authentic then it's decreasing advantage of cheating and less students will do it.
If the student is integrating answers from multiple AI responses then maybe that's a good thing for them to be learning and the assessment should allow it.
Not 0 time, but yes, integrity preservation is an arms race.
The best solutions are in student motivations and optimal pedagogical design. Students who want to learn, and learning systems that are optimized for rate of learning.
Depends how you work. I've rarely (never?) drafted anything and almost all of the first approach ended up in the final result. It would look pretty close to "typed in the AI answer with very minor modifications after". I'm not saying that was a great way to do it, but I definitely wouldn't want to be failed for that.
There is a fractal pattern between authentic and inauthentic writing.
Crude tools (like Google docs revision history) can protect an honest student who engages in a typical editing process from false allegations, but it can also protect a dishonest student who fabricated the evidence, and fail to protect an honest student who didn't do any substantial editing.
More sophisticated tools can do a better job of untangling the fractal, but as with fractal shaped problems the layers of complexity keep going and there's no perfect solutions, just tools that help in some situations when used by competent users.
The higher Ed professors who really care about academic integrity are rare, but they are layering many technical and logistical solutions to fight back against the dishonest students.
Not really, also the timing of the saves won't reflect the expected work needing to be put in. Unless you are taking the same amount of time to feed in the AI output as a normal student used to actually write / edit the paper, at which point cheating is meaningless
> language models are more likely to suggest that speakers of [African American English] be assigned less-prestigious jobs, be convicted of crimes and be sentenced to death.
This one is just so extra insidious to me, because it can happen even when a well-meaning human has already "sanitized" overt references to race/ethnicity, because the model is just that good at learning (bad but real) signals in the source data.
Family law judges, in my small experience, are so uninterested in the basic facts of a case that I would actually trust an LLM to do a better job. Not quite what you mean, but maybe there is a silver lining.
We are already (in the US) living in a system of soft social-credit scores administered by ad tech firms and non-profits. So “the algorithms says you’re guilty” has already been happening in less dramatic ways.
The oral discussion does not scale well in large classes. The solution is to stop using essays for evaluation, relying on (supervised) examinations instead.
Of course, there will be complaints from many students. However, as a prof for decades, I can say that some will prefer an exam-based solution. This includes the students who are working their way through university and don't have much time for busy-work, along with students who write their essays themselves and get lower grades than those who do not.
The real problem here is (in this case) lazy teachers. These kind of tools should only be used to flag potential AI generation. If the teacher read the essay and thought it reflected standard work for this student, then all would be fine. Instead they are just running the tool first to be lazy and taking the tool as gospel.
This reminds me of when GPS routing devices first came onto the scene. Lots of people drove right into a lake or ocean because the device said keep going straight. (because of poorly classified multi-modal routing data)
The new trick being used by some professors in college classes is to mandate a specific document editor with a history view. If the document has unusual copy/paste patterns or was written in unusual haste then they may flag it. That being said, they support use of ai in the class and have confidence the student is not able to one shot the assignment with ai.
"Please take this finished essay and write me a rough first draft version of it that looks like something someone might type in on the fly before editing"
Doesn't google docs have fairly robust edit history? If I was a student these days I'd either record my screen of me doing my homework, or at least work in google docs and share the edit history.
Yeah that was my thought. Although, I went a bit more paranoid with it.
If it looks like AI cheating software will be a problem for my children (and currently it has not been an issue), then I'm considering recording them doing all of their homework.
I suspect school admin only has so much appetite for dealing with an irate parent demanding a real time review of 10 hours of video evidence showing no AI cheating.
Not really, document editors save every few moments. Someone cheating with AI assistance will not have a similar saved version pattern as someone writing and editing themselves. And if one did have the same pattern, it would defeat the purpose of cheating because it would take a similar amount of time to pull off
honest q: what would it look like from your perspective if someone worked in entirely different tools and then only moved their finished work to google docs at the end?
In this case, the school was providing chromebooks so Google Docs were the default option. Using a different computer isn’t inherently a negative signal - but if we are already talking about plagiarism concerns, I’m going to start asking questions that are likely to reveal your understanding of the content. If your understanding falters, I’m going to ask you to prove your abilities in a different way/medium/etc.
In general, I don’t really understand educators hyperventilating about LLM use. If you can’t tell what your students are independently capable of and are merely asking them to spit back content at you, you’re not doing a good job.
Seems like this could be practically addressed by teachers adopting the TSA's randomized screening. That is, roll some dice to figure out which student on a given assignment comes in either for the oral discussion or-- perhaps in higher grades-- to write the essay in realtime.
It should be way easier than TSA's goal because you don't need to stop cheaters. You instead just need to ensure that you seed skills into a minimal number of achievers so that the rest of the kids see what the real target of education looks like. Kids try their best not to learn, but when the need kicks in they learn way better spontaneously from their peers than any other method.
Of course, this all assumes an effective pre-K reading program in the first place.
> Of course, this all assumes an effective pre-K reading program in the first place.
Pre-k is preschool aka kindergarten?
Is this really needed? It's really stressful for kids under 5 or 6 to read and is there a big enough statistical difference in outcome enough to rob them of some of their early youth?
I started reading around 6 years old and I was probably ahead of the vast majority of kids within 6 months.
Kids starting around 6 years old have much better focus and also greatly enhanced mental abilities overall.
I suspect this is going in the wrong direction. Telling a sandboxed AI to have a long conversation with a student to ensure they actually know what they're talking about, while giving minimal hints away, seems like the scalable solution. Let students tackle the material however they will, knowing that they will walk into class the next day and be automatically grilled on it, unaided. There's no reason a similar student couldn't have their essay fed into the AI and then asked questions about what they meant on it.
Once this becomes routine the class can become e.g. 10 minutes conversation on yesterday's topic, 40 minutes lecturing and live exercises again. Which is really just reinventing the "daily quiz" approach, but again the thing we are trying to optimize for is compliance.
I wrote a paper about building web applications in 10th grade a long time ago. When class was out the teacher asked me to stay for a minute after everybody left. He asked in disbelief, “did you really write that paper?”
I could see why he didn’t, so I wasn’t offended or defensive and started to tell him the steps required to build web apps and explained it in a manner he could understand using analogies. Towards the end of our conversation he could see I both knew about the topic and was enthusiastic about it. I think he was still a bit shocked that I wrote that paper, but he could tell from the way I talked about it that it was authentic.
It will be interesting to see how these situations evolve as AI gets even better. I suspect assessment will be more manual and in-person.
Yeah, this, but also as an adult; When you are a non-native speaker and you use AI to make things more concise and correct. The detector will go off. People may find some wording "AI-ish" (even though I replaced em-dashes with commas and told it to "avoid American waiter-like enthusiasm"). My reaction is: Ok. you want my original? Which is much harder to read and uses 2x the amount of words? Fine.
I mean, what is the problem? It's my report! I know all the ins and outs, I take full responsibility for it. I'm the one taking this to the board of directors who will grill me on all the details. I'm up for it. So why is this so "not done"? Why do you assume I let the AI do the "thinking"? I'm appalled by your lack of trust in me.
I routinely see people accuse any writing they don't like the style or as being AI generated. There is no possible evidence for this being the case, pple are just dicks.
I've intentionally changed my writing style to be less AI-like due to people thinking I'm just pasting my emails from ChatGPT.
Perhaps it's an artifact of LLMs being trained on terabytes of autistic internet commenters like me. Maybe being detected as AI by Turnitin even has some diagnostic value.
I guess you've never read the English of a Dutch person ;) During my PhD defense I was told I "should have checked with a native speaker." Pre-LLMs, I'd go to my American colleague and she'd mostly remove text and rewrite some bit to make texts much more readable.
Nowadays, often I put my text into the LLM, and say: Make more concise, include all original points, don't be enthusiastic, use business style writing.
And then it will come with some lines of which I think: Yes! That is what I meant!
I can't imagine you'd rather read my Dunglish. Sure, I could have "studied harder", but one simply is just much more clever in their native tongue, I know more words, more subtleties etc. Over time, and I believe due to LLM use I do get better at it myself! It's a language model after all, not a facts model. I can trust it to make nice sentences.
I am telling you my own preferences, as a native speaker of English. I would rather read my coworkers' original output in their voice than read someone else's writing (including a machine edit of their own text).
I doubt that very strongly and would like to talk to you again after going though 2 versions (with and without LLM) of my 25-pager to UMC management on HPC and Bioinformatics :)
I understand the sentiment, even appreciate it, but there are books that draw you into a story when your eyes hit the paper, and there are books that don't and induce yawning instead (on the same topic). That is a skill issue.
Perhaps I should add that using the LLM does not make me faster in any way, maybe even slower. But it makes the end results so much more pleasant.
"If I Had More Time, I Would Have Written a Shorter Letter". Now I can, but in similar time.
As they said, they are telling you their preference, there is nothing to doubt.
Recently there was a non-native english speaker heavily using an LLM to review their answers on a Show HN post, and it was incredibly annoying. The author did not realize (because of their lack of skills in the language) but the AI-edited version felt fake and mechanical in tone. In that case yes, the broken original is better because it preserves the humanity of the original answers, mistakes and all.
Ok, well it depends on the context then and the severity of the AIness (which I always try to reduce in the prompt, sometimes I’ll ask it to maintain my own style for example).
You know maybe it is annoying for native speakers to pick up subtle AI signals, but for non-natives it can be annoying to find the correct words that express what you want to say as precisely as in your mother tongue. So don’t judge too much. It’s an attempt at better communication as well.
The funny part is that Googe has all the edit history data. In other words, it's a piece of cake for them to train a model that mimics human editing process.
The only thing prevents them from doing so is the fact Google is too big to sell a "plagiarism assistant."
So the model is going to spend hours hesitantly typing in a google doc, moving paragraphs around, cutting and pasting, reworking sentences, etc so that the timestamped history matches up with something a human could realistically have done?
I’m very tempted to write a tool that emulates human composition and types in assignments in a human-like way, just to force academia to deal with their issues sooner.
I seriously think the people selling AI detection tools to teachers should be sued into the ground by a coalition of state attorneys general, and that the tools should be banned in schools.
There is too much focus on students cheating with AI and not enough on the other side of the equation: teachers.
I've seen assignments that were clearly graded by ChatGPT. The signs are obvious: suggestions that are unrelated to the topic or corrections for points the student actually included. But of course, you can't 100% prove it. It's creating a strange feedback loop: students use an LLM to write the essay, and teachers use an LLM to grade it. It ends up being just one LLM talking to another, with no human intelligence in the middle.
However, we can't just blame the teachers. This requires a systemic rethink, not just personal responsibility. Evaluating students based on this new technology requires time, probably much more time than teachers currently have. If we want teachers to move away from shortcuts and adapt to a new paradigm of grading, that effort needs to be compensated. Otherwise, teachers will inevitably use the same tools as the students to cope with the workload.
Education seemed slow to adapt to the internet and mobile phones, usually treating them as threats rather than tools. Given the current incentive structure and the lack of understanding of how LLMs work, I'm not optimistic this will be solved anytime soon.
I guess the advantage will be for those that know how to use LLMs to learn on their own instead of just as a shortcut. And teachers who can deliver real value beyond what an LLM can provide will (or should) be highly valued.
It is probably a good time to view the root goals of education instead of the markers of success that we have been shooting at for a long time now (worksheets, standardized tests, etc.).
A one hour lecture where students (especially <20 year old kids) need to proactively interject if they don't understand something is a pretty terrible format.
> "Education seemed slow to adapt to the internet and mobile phones, usually treating them as threats rather than tools. Given the current incentive structure and the lack of understanding of how LLMs work"
Good point, it is less like a threat and more like... "how do we shoehorn this into our current processes without adapting them at all? Oh cool now the LLM generates and grades the worksheets for me!".
We might need to adjust to more long term projects, group projects, and move away from lectures. A teacher has 5*60=300 minutes a week with a class of ~26. If you broke the class into groups of 4 - 5 you could spend a significant amount of time with each group and really get a feel for the students beyond what grade the computer gives to their worksheet.
“It's creating a strange feedback loop: students use an LLM to write the essay, and teachers use an LLM to grade it. It ends up being just one LLM talking to another, with no human intelligence in the middle.”
As a teacher, I agree. There's a ton of covert AI grading taking place on college campuses. Some of it by actual permanent faculty, but I suspect most of it by overworked adjuncts and graduate student teaching assistants. I've seen little reporting on this, so it seems to be largely flying under the radar. For now. But it's definitely happening.
Is using AI to support grading such a bad idea? I think that there are probably ways to use it effectively to make grading more efficient and more fair. I'm sure some people are using good AI-supported grading workflows today, and their students are benefiting. But of course there are plenty of ways to get it wrong, and the fact that we're all pretending that it isn't happening is not facilitating the sharing of best practices.
Of course, contemplating the role of AI grading also requires facing the reality of human grading, which is often not pretty. Particularly the relationship between delay and utility in providing students with grading feedback. Rapid feedback enables learning and change, while once feedback is delayed too long, its utility falls to near zero. I suspect this curve actually goes to zero much more quickly than most people think. If AI can help educators get feedback returned to students more quickly, that may be a significant win, even if the feedback isn't quite as good. And reducing grading burden also opens up opportunities for students to directly respond to the critical feedback through resubmission, which is rare today on anything that is human-graded.
And of course, a lot of times university students get the worst of both worlds: feedback that is both unhelpful and delayed. I've been enrolling in English courses at my institution—which are free to me as a faculty member. I turned in a 4-page paper for the one I'm enrolled in now in mid-October. I received a few sentences of written feedback over a month later, and only two days before our next writing assignment was due. I feel lucky to have already learned how to write, somehow. And I hope that my fellow students in the course who are actual undergraduates are getting more useful feedback from the instructor. But in this case, AI would have provided better feedback, and much more quickly.
When I was in high school none of my teachers actually read any of the homework we turned in. They all skimmed it, maybe read the opening and closing paragraph if it was an essay. So I guess the question is if having an ai grade it is better than having a teacher look at it for 15 seconds, because that’s the real alternative.
Colleges will need to reduce class sizes, or close entirely, for the next decade at least. With smaller class sizes brings the opportunity for course instructors to provide more time per pupil so that things like in-person homework and project review is possible.
Hurray! We’ve just made education only viable for the wealthy! Good job everyone, I can see this technology revolution is already living up to its promise of distributing wealth and power more evenly throughout society.
I just looked up some numbers for UCLA as an example. <45k students (undergrad and grad) and >5k faculty (and another 30k on staff)— so thats a pessimistic ratio of 1 to 9.
If you imagine students take 4 classes per semester and faculty teach 4 per semester… it seems stunningly feasible.
> If the value of human labor is going to zero, which some say ai will induce
These "some" are founders of AI companies and investors who put a lot of money into such companies. Of course, the statements that these people "excrete" serve an agenda ...
Back in the day if you were studying in Cambridge under somebody like Russell, it would have been a class of 5, and all of you were poised to become professors. Now, well, now it's a different story. Say what you will about higher education, but it's not for everyone. Most people currently attending university have no business being there, frankly, but what else are they supposed to do?! The game is rigged so. I wish we had something better, but we don't.
Maybe all this social stuff that AI would bring to focus—may prove a catalyser for radical change?
In my CS undergrad I had Doug Lea as a professor, really fantastic professor (best teacher I have ever had, bar none). He had a really novel way to handle homework hand ins, you had to demo the project. So you got him to sit down with you, you ran the code, he would ask you to put some inputs in (that were highly likely to be edge cases to break it). Once that was sufficient, he would ask you how you did different things, and to walk him through your code. Then when you were done he told you to email the code to him, and he would grade it. I am not sure how much of this was an anti-cheating device, but it required that you knew the code you wrote and why you did it for the project.
I think that AI has the possibility of weakening some aspects of education but I agree with Karpathy here. In class work, in person defenses of work, verbal tests. These were corner stones of education for thousands of years and have been cut out over the last 50 years or so outside of a few niche cases (Thesis defense) and it might be a good thing that these come back.
Yep, it's easy to shortcut AI plagiarism, but you need time. In most of the universities around the world (online universities especially), the number of students is way too big, while professors get more and more pressure on publishing and bureaucracy.
I did my masters in GaTech OMSCS (Chatgpt came out at the very end of my last semester). Tests were done with cameras on and it was recorded and then they were watched I think by TAs. Homework was done with automated checking and a plagiarism checker. Do you need to have in person proctoring via test centers or libraries? Video chats with professors? I am not sure. Projects are importants, but maybe they need to become a minority of grades and more being based on theory to circumvent AI?
It's not even about plagiarism. But, sure, 1:1 or even 1:few instruction is great but even at elite schools is not really very practical. I went to what's considered a very good engineering school and classes with hundreds of students was pretty normal.
For many of the “very good” engineering schools that I know of they got “very good” status because of their graduate programs. In graduate school a 1:few relation is almost certain. In undergraduate, not so much.
Probably generally true. There's some "trickle down" (sorry) especially for students who take direct advantage of it or from the institutional wealth generally. But, yes, students at such institutions who struggle aren't necessarily well-supported.
Ironically the practically of such instruction goes down as the status of the school goes up. I got a lot of 1:1 or 1:few time with my community college professors.
In some university systems it seems to be possible (I'm thinking of the khôlle system in France), so I don't see how the much better funded US system would not be able to do it.
Google tells me that is more of a system with preparatory schools in France. That said, there is more of an emphasis at some schools than others in individual interactions and seminars at the undergraduate collegiate level. I had some of that--just not mostly in engineering. In US elite schools, there's certainly time conflict for professors given research priorities.
Strongly agree. I was involved with several CS lectures in the past ~10 years that did not require a final exam, and we always did a 1:1 session between student and tutor in which the tutor asked the student detailed questions about their past exercise sheet solutions. Over the years, I estimate that I conducted about 100 of such 1:1s. It was always obvious when the students did not write the code themselves. They couldn't really explain their design process, they didn't encounter the edge cases themselves during testing, and you couldn't discuss possible improvements with them.
15+ years ago I was doing a CS undergrad (or Bachelors? not sure how it translates) at the local uni in a small EU Country and this approach was the standard across all subjects as part of 'lab work'. There were people there to do that, not the prof himself, but approach was exactly the same. And after a few months they had a really good picture on what level everyone is ect.
On the other hand, I had a neighbour ask me if he can make his 1 month apprenticeship when he finished his 3rd year of CS High School (eg ~18 years old, 3 of 4 years of 'CS trade school') 6 months ago or so. I was totally gobsmacked by his lack of basic understanding of how computers work, I am confident that he did not confidentially know the difference between a file and a folder. But he was very confident in the AI slop he produced. I had a grand plan of giving him tasks that would show him the pitfalls of AI -> no need for that, he blindly copied whatever AI gave him (he did not figure out Claude Code exsists), even when the results were very visibly bad - even from afar. I tried explaining stuff to him to no avail. I know this is a sample size of 1, but damn, I did not expect it to be that bad.
Maybe as a society we can take some of the productivity gains from AI and funnel them into moving teaching away from scantrons and formulaic essays. I want to be optimistic.
As someone who was incredibly lazy intellectually in high school, I can't imagine what would have got me motivated beyond time and growing up.
I did nothing in high school and then by 19 for fun on Saturdays I was checking out 5 non-fiction books from the library and spending all Saturday reading.
There was no inspiring teacher or anything like that for me that caused this. At 16 I only cared about girls and maneuvering within the high school social order.
The only thing I can think of that would have changed things for me is if the math club were the cool kids and the football team were the outcasts.
At 16 anything intellectual seemed too remote to bother. That is why I would suspect the real variable is ultimately how much the parents care about grades. Mine did not care at all so there was no way my 16 year old self was going to become intrinsically motivated to grow intellectually.
All AI would have done for me in high school would have been swapping a language model for copying my friend's homework.
> The only thing I can think of that would have changed things for me is if the math club were the cool kids and the football team were the outcasts.
For background I grew up in the US, my wife grew up in China. And how she grew up (in a high tier Shanghai Highschool) she says that is kind of how it was. Top social order was basically Rich and politically connected (not different from anywhere I guess) but also really good students. Where the best students are looked up to. But also just everyone asks you how you do all in school all of the time. There are students who focus more on sports and go to sports schools, but unless they end up going to the Olympics or something, its really looked down upon compared to those who specialize in STEM or more difficult subjects.
In my high school, honors/AP students weren't outcasts, we were kind of just a separate set mostly our own clique with some being popular and some not independently of being AP students. Like I happened to be Football Team Captain and in AP classes, 3 other Captains weren't in AP. Academic success was just a non factor.
The TL:DR of every "AI vs Schools, what should teachers do?" article boils down to exactly this: Talk with the students 1-1. You can fake an essay, you can't fake a conversation about the topic at hand.
Or just do some work/exam in a controlled setting.
Talking to students in order to gauge their understanding is not as easy or reliable as some people make it out to be.
There are many students who are basically just useless when required to answer on the spot, some of whom likely to score top-of-the-class given an assignment and time to work on it alone (in a proctored setting).
And then there are students whom are very likable and swift to pick up on subtle signals the examiners might be giving of, and constantly adjusting course accordingly.
Grading objectively is extremely hard when doing oral exams. Especially when when you're doing them back-to-back for an entire workday, which is quite likely to happen if most examination is to be done in this way.
Not yet but we are getting close to be able to do it, tiny microphone, tiny earpiece, zero AI lag, I give it less than 10 years before it's trivial for anyone.
In college I had a professor assign us to write a 100% plagiarized paper. You had to highlight every word in a color associated with the source. You couldn't plagiarize more then one sequential sentence from a single source.
It ended up being harder then writing an ordinary paper but taught us all a ton about citation and originality. It was a really cool exercise.
I imagine something similar could be done to teach students to use AI as a research tool rather then as a plagiarization machine.
Contemporary alternative: Copy/paste an essay entirely from LLM output, but make sure none of the information contained checks out. One would want to use an older model for this. :-)
Reminds me of that thing Olmo3 has in the web demo, where it shows you a trace for a selected sentence generated by the LLM and points you at the exact document in the training data it comes from verbatim. If they all had that it would be really trivial to verify sources, and more or less does that exact exercise for you at a press of a button.
The framing here is typically optimistic. Three years in, we're seeing AI primarily used for homework completion (defeating the stated purpose of learning) and administrative busywork. The real implication isn't 'personalized learning'—it's credential devaluation. If every student can produce 'their own' essays with AI assistance, how do we distinguish actual capability? The schools adopting AI fastest are ironically the ones least equipped to enforce academic integrity. The policy question isn't 'how do we use AI in schools?' but 'what's education for if not to demonstrate work capability?'
I think legacy schooling just needs to be reworked. Kids should be doing way more projects that demonstrate the integration of knowledge and skills, rather than focusing so much energy on testing and memorization. There's probably a small core of things that really must be fully integrated and memorized, but for everything else you should just give kids harder projects which they're expected to solve by leveraging all the tools at their disposal. Focus on teaching kids how to become high-agency beings with good epistemics and a strong math core. Give them experiments and tools to play around and actually understand how things work. Bring back real chemistry labs and let kids blow stuff up.
The key issue with schools is that they crush your soul and turn you into a low-agency consumer of information within a strict hierarchy of mind-numbing rules, rather than helping you develop your curiosity hunter muscles to go out and explore. In an ideal world, we would have curated gardens of knowledge and information which the kids are encouraged to go out and explore. If they find some weird topic outside the garden that's of interest to them, figure out a way to integrate it.
I don't particularly blame the teachers for the failings of school though, since most of them have their hands tied by strict requirements from faceless bureaucrats.
As much as I hated schooling, I do want to say that there are parts of learning that are simply hard. There are parts that you can build enthusiasm for with project work and prioritizing for engagement. But there are many things that people should learn that will require drudgery to learn and won't excite all people.
Doing derivatives, learning the periodic table, basic language and alphabet skills, playing an instrument are foundational skills that will require deliberate practice to learn, something that isn't typically part of project based learning. At some point in education with most fields, you will have to move beyond concepts and do some rote memorization and repetition of principles in order to get to higher level concepts. You can't gamify your way out of education, despite our best attempts to do so.
I have never had to rote learn anything mental since at least mid childhood. I can't remember before then.
If it's something I need to do regularly, I eventually learn it through natural repetition while working towards the high level goal I was actually trying to achieve. Derivatives were like this for me. I still don't fully know the periodic table though, because it doesn't really come up in my life; if it's not something I need to do regularly, I just don't learn it.
My guess is this doesn't work for everything (or for everyone), and it probably depends on the learning curve you experience. If there are cliff edges in the curve that are not aligned with useful or enjoyable output, dedicated practice of some sort is probably needed to overcome them, which may take the form of rote learning, or, maybe better, spaced repetition or quizzing or similar. However at least for me, I've not encountered anything like that.
If I was to speculate why rote learning doesn't work well for me, I don't seem to experience a feeling of reward during it, and it seems like my ability to learn is really heavily tied somehow to that feeling. I learn far more quickly if it's a problem I've been struggling with for a while that I solve, or it's a problem I really wanted to solve, as the reward feeling is much higher.
Not something everyone learns. My kids seemed to enjoy it. My older daughter learned quite a lot of algebra etc. by doing physics.
> learning the periodic table
You do not need to rote learn all of it, and you remember enough by learning about particular elements etc.
> basic language and alphabet skills
My kids learned to read through firstly reading with me (or others) so enjoying the story and learning words as we went and guessing words on flashcards. Then on to reading because they linked it.
Admittedly none of the above was in school, but my point is that its not intrinsic to learning.
> At some point in education with most fields, you will have to move beyond concepts and do some rote memorization and repetition of principles in order to get to higher level concepts.
Not a great deal and it does not feel like as much of a grind if you enjoy the subject and know where you are going.
This depends on the kid. I tried for literal years to get my kid to read. Nothing works. It doesn't matter what games I picked. What context she has to read, books, comics, video games, real life. It was not happening.
You know what kick started my kid's ability to read? A reading teacher sitting with her every single day and teaching her explicitly the drudgery of what reading was. And then me doing the same at home.
Rote is for kids like this and a lot of kids have areas like this. No my kid doesn't need as much math facts practice as she gets. But her cousin? That kid isn't learning anything without doing lines about how to add.
> At some point in education with most fields, you will have to move beyond concepts and do some rote memorization and repetition of principles in order to get to higher level concepts. You can't gamify your way out of education, despite our best attempts to do so.
I don't know if we'll ever be successful, but the entire point of gamification is to make the rote parts more palatable. A lot of gamification techniques try to model after MMO gaming for a reason, as that's a genre where people willingly subject themselves to a lot of rote tasks.
Yeah, I agree that there's some skills that require deliberate practice. I think LLMs will be a huge boon there as well, because you can get real-time feedback as you're solving problems. And if you get stuck you can get immediate help or clarification, which is closer to having a personal tutor. In college if I got stuck on a problem, I might end up having to wait multiple days to ask someone for help.
In software engineering we often come across build environments that make code iteration really difficult and slow, and speeding up that iteration cycle usually results in being able to experiment more and ship faster changes.
Most learning curves in the education system today are very bumpy and don't adapt well to the specific student. Students get stuck on big bumps or get bored and demotivated at plateaus.
AI has potential to smooth out all curves so that students can learn faster and maximize time in flow.
I've spent literally thousands of hours thinking about this (and working on it). The future of education will be as different from today as today is to 300 years ago.
Kids used to get smacked with a stick if they spelled a word wrong.
The point is that the education system has come a long way in utilizing STEM to make education more efficient (helping students advance faster and further with less resources) and it will continue to go a long way further.
People thought the threat of physical violence was a good way to teach. We have learned better. What else is there for us to learn? What have we already learned but just don't have the resources to apply?
I've met many educators who have told me stories of ambitions learning goals for students that didn't work because there weren't the time or resources to facilitate them properly.
Often instructors are stuck trading off between inauthentic assessments that have scalable evaluation methods or authentic exercises that aren't feasible to evaluate at scale and so evaluation is sparse, incomplete or students only receive credit for completion.
I just went and had a flutter at being a high school math teacher. I went in saying 'I never used math to create until my honours year, I want different for my students'.
I soon changed my mind; I think those of us who become expert have often have really rich memories of a project where we learnt so much, but we just don't remember episodically all the accumulated learning that happened in boring classrooms to enable the project-induced higher order synthesis.
Increase spending on schools by an order of magnitude and it would be possible.
All of schooling breaks down to costs and society’s willingness and desire to invest in child nutrition, education, and training.
We simply do not even have the wherewithal to have the conversation about it, without getting blackholed by cultural minefields and assumptions of child rearing, parental responsibility, morality and religion.
But testing and paper assessments are cheap and feasible for mass education. There are only so many workshop projects you can have before you run out of budget.
Projects are less efficient for learning foundational skills. They have their place, but with infinite funds I would still give my children an education with a bedrock of boring drill and testing and memorisation.
People have been saying that we "focus too much on memorization" for as long as I have been alive. To be honest, I don't really think that is true, if anything we don't focus enough on memorization nowadays since people leave school without knowing basic things about the world whether in science or in history. Knowing things allows one to make connections and see things in a different way that you simply cannot get if you rely on the internet or LLMs or whatever to look everything up.
Having had some experience teaching and designing labs and evaluating students in my opinion there is basically no problem that can't be solved with more instructor work.
The problem is that the structure pushes for teaching productivity which basically directly opposes good pedagogy at this point in the optimization.
Some specifics:
1. Multiple choice sucks. It's obvious that written response better evaluates students and oral is even better. But multiple choice is graded instantly by a computer. Written response needs TAs. Oral is such a time sink and needs so many TAs and lots of space if you want to run them in parallel.
1.5 Similarly having students do things on computers is nice because you don't have to print things and even errors in the question can be fixed live and you can ask students to refresh the page. But if the chatbots let them cheat too easily on computers doing hand written assesments sucks cause you have to go arrange for printing and scanning.
2. Designing labs is a clear LLM tradeoff. Autograded labs with testbenches and fill in the middle style completetions or API completetions are incredibly easy to grade. You just pull the commit before some specific deadline and run some scripts.
You can do 200 students in the background when doing other work its so easy. But the problem is that LLMS are so good at fill in the middle and making testbenches pass.
I've actually tried some more open ended labs before and its actually very impressive how creative students are. They are obviously not LLMs there is this diversity in thought and simplicity of code that you do not get with ChatGPT.
But it is ridiculously time consuming to pull people's code and try to run open ended testbenches that they have created.
3. Having students do class presentations is great for evaluating them. But you can only do like 6 or 7 presentations in a 1 hr block. You will need to spend like a week even in a relatively small class.
4. What I will say LLMs are fun for are having students do open ended projects faster with faster iterations. You can scope creep them if you expect expect to use AI coding.
I know a teacher who basically only does open questions but since everything is digital nowadays students just use tools like Cluely [0] that run on the background and provide answers.
Since the testing tool they use does notice and register 'paste'-events they've resorted to simply assigning 0 points to every answer that was pasted.
A few of us have been telling her to move to in-class testing etc. but like you also notice everything in the school organization pushes for teaching productivity so this does require convincing management / school board etc. which is a slow(er) process.
I tried that once. Specifically because I wanted to see if we could leverage some sort of productivity enhancements.
I was using a local LLM around 4B to 14B, I tried Phi, Gemma, Qwen, and LLama. The idea was to prompt the LLM with the question, the answer key/rubric, and the student answer. The student answer at the end did some prompt caching to make it much faster.
It was okay but not good, there were a lot of things I tried:
* Endlessly messing with the prompt.
* A few examples of grading.
* Messing with the rubric to give more specific instructions.
* Average of K.
* Think step by step then give a grade.
It was janky and I'll throw it up to local LLMs at the time being somewhat too stupid for this to be reasonable. They basically didn't follow the rubric very well. Qwen in particular was very strict giving zeros regardless of the part marks described in the answer key as I recall.
I'm sure with the correct type of question and correct prompt and a good GPU it could work but it wasn't as trivially easy as I had thought at the time.
I was very much the kind of student who didn't perform well under exam-taking pressure. For marked work that I did outside of school, it was straight-As for me. For written exams performed under time pressure and oral examinations (administered without advance notice), it was very much hit-and-miss.
If my son should grow up to run into the same kinds of cognitive limitations, I really don't know what I will tell him and do about it. I just wish there was a university in a Faraday cage somewhere where I could send him, so that he can have the same opportunities I had.
Fun fact on the side: Cambridge (UK) getting a railway station was a hugely controversial event at the time. The corrupting influence of London being only a short journey away was a major put-off.
As a parent of a kid like this: start early with low stakes. You can increase your child's tolerance for pressure. I sometimes hear my kid saying sometimes after a deep breath, "There nothing to it, but to do it". And then work on focusing on what they did better versus how they did in absolute terms.
I see collapsing under pressure to be either a kind of anxiety or a fixation on perfect outcomes. Teaching a tolerance for some kinds of failure is the fix for both.
To me the solution is: Ban homework. All projects, essays, tests, etc are done at school on air-gapped machines or written by hand. Work is monitored by the teacher with video surveillance in 'test halls'. Cell phones can be kept in a small locker. If the student needs to be reached, a parent or whomever can call the school. Internet research should be done in a separate environment, printed out, then taken to the 'work' location. Like how you used to photo-copy sections of a book. Printed text can be cataloged by the school and compared to completed work for plagiarism.
Take away the internet. Except in a research/library scenario. Give them a limited time to complete tasks. This would promote a stronger work ethic, memory/recall and more realistic to time management skills. They need to learn to rely on themselves, not technology. The only effective way is to remove tech from the equation, otherwise the temptation to cheat to compete/complete is too strong.
> This relegates the use of AI to personal choice of learning style and any misuse of AI is only hurting the student.
I'm a teacher. Kids don't have the capacity to make this choice without guidance. There are so so many that don't (can't?) make the link between what we teach and how they grow as learners. And this is at a rich school with well-off parents who largely value education.
> To me the solution is: Ban homework. All projects, essays, tests, etc are done at school on air-gapped machines or written by hand.
Rather: don't grade homework. Make the homework rather the preparation that if you did it seriously will prepare you for the test (and if you didn't do it seriously, you won't have the skills that are necessary to pass the test).
The problem with this strategy is that homework is mostly a tool for learning, not checking progress. Most teachers use homework as a way to get an extra hour or two of learning in each week for the students. If we remove it there will be less learning time available. So you’re gonna have to expand the school day or school year which means more teachers which is expensive.
80-90% of the teachers are not equipped to handle AI in the classrooms. You can’t expect teachers to know the SOTA that’s rapidly changing. And at the same time punish students from using available tools. Especially in public schools, teaching quality has plummeted in the past decade. This also applies to lower tier colleges. The whole point of education is to learn. Not to weed out talent. If students want to use AI tools to take shortcuts then it’s entirely on them. It will catch up to them at some point.
At my school, long before AI, the work was of two kinds - homework type essays/problems that you could cheat on if you wanted but there was no point because the feedback was for your benefit and didn't count towards anything, and then proper exams where you were watched and couldn't cheat easily.
Not sure why they don't just do that? It worked fine and would be compatible with LLM use.
We're almost at the point that CGP grey predicted over a decade ago with the "digital Aristotle" concept. Teachers will have to eventually transition into class babysitting roles, but the transition period will be ugly as long as tech stays at this level where it renders regular teaching impossible while also not yet being on a level where it can useably replace it.
As a teacher, I try to keep an open mind, but consistently I can find out in 5 minutes of talking to a student if they understand the material. I might just go all in for the oral exams.
A teacher in an environment with +100 students and a lot of assignments that are graded by a random grad student is useless anyway and might as well not exist. If AI could move us away from this cargo cult, that's great.
It's a fair question, but there's maybe a bit of US defaultism baked in? If I look
back at my exams in school
they were mostly closed-book written + oral examination, nothing would really need to change.
A much bigger question is what to teach assuming we get models much more powerful than those we have today. I'm still confident there's an irreducible hard core in most subjects that's well worth knowing/training, but it might take some soul searching.
Oxide and Friends recently had a podcast episode [0] with Michael Littman about this topic for anyone who's curious about this topic.
This topic has been an interesting part of the discourse in a group of friends the past few weeks because one of us is a teacher who has to deal with this on an almost daily basis and is struggling to get her students to not cheat and the options available to her are limited (yes, physical monitoring would probably work but requires concessions from the school management etc. it's not something that has an easy or quick fix available.)
The current education system is going to collapse. Teachers and students alike won't be able to resist the ultimate cheat code.
Schools need to become tech free zones. Education needs to reorient around more frequent standardized tests. Any "tech" involved needs to be exclusively applied towards solving the supply and demand issue - the number of "quality teachers" to "students per classroom."
I admire Karpathy for advocating common sense, but none of this will happen because SV is full of IQ realists who only see "education" as a business opportunity and the bureaucratic process is too dysfunctional for common sense decisions to prevail. The future is chrome books with GPT browsers for every student.
It’s mind boggling that image generators can solve physics and chem problems like this—but I will note that there are a few slight mistakes in both. (An extra i term in LHS, a few of the chemical names look wrong, etc.) Unbelievable that we’re here, but it still remains an essential to check the work.
With my partner we have been working to invert the overall model.
She started grading conversation than the students have with LLMs.
From the question that the students ask, it is obvious who knows the material and who is struggling.
We do have a custom setup, so that she creates an homework. There is a custom prompt to avoid the LLM answering the homework question. But thats pretty much it.
The results seems promising, with students spending 30m or so going back and forth with the LLMs.
If any educator wants to Ty or is interested in more information, let me know and we can see how we collaborate.
This makes some sense, but my first question would be how do you define a clear, fair grading rubric? Second, this sounds like it could work for checking who is smart, but can it motivate students to put in work to learn the material?
"You have to assume that any work done outside classroom has used AI."
That is just such a wildly cynical point of view, and it is incredibly depressing. There is a whole huge cohort of kids out there who genuinely want to learn and want to do the work, and feel like using AI is cheating. These are the kids who, ironically, AI will help the most, because they're the ones who will understand the fundamentals being taught in K-12.
I would hope that any "solution" to the growing use of AI-as-a-crutch can take this cohort of kids into consideration, so their development isn't held back just to stop the less-ethical student from, well, being less ethical.
What possible solution could prevent this? The best students are learning on their own anyways, the school can't stop students using AI for their personal learning.
There was a reddit thread recently that asked the question, are all students really doing worse, and it basically said that, there are still top performers performing toply, but that the middle has been hollowed out.
So I think, I dunno, maybe depressing. Maybe cynical, but probably true. Why shy away from the truth?
And by the way, I would be both. Probably would have used AI to further my curiosity and to cheat. I hated school, would totally cheat to get ahead, and am now wildly curious and ambitious in the real world. Maybe this makes me a bad person, but I don't find cheating in school to be all that unethical. I'm paying for it, who cares how I do it.
Well, it seems the vast majority doesn't care about cheating, and is using AI for everything. And this is from primary school to university.
It's not just that AI makes it simpler, so many pupils cannot concentrate anymore. Tiktok and others have fried their mind. So AI is a quick way out for them. Back to their addiction.
As someone who had a college English assignment due literally just yesterday, I think that "the vast majority" is an overstatement. There are absolutely students in my class who cheat with AI (one of them confessed to it and got a metaphorical slap on the wrist with a 15 point deduction and the opportunity to redo the assignments, which doesn't seem fair but whatever), but the majority of my classmates were actively discussing and working on their essays in class.
Whatever solution we implement in response to AI, it must avoid hurting the students who genuinely want to learn and do honest work. Treating AI detection tools as infallible oracles is a terrible idea because of the staggering number of false reports. The solution many people have proposed in this thread, short one-on-one sessions with the instructor, seems like a great way to check if students can engage with and defend the work they turned in.
Sure, but the point is that if 5% of students are using AI then you have to assume that any work done outside classroom has used AI, because otherwise you're giving a massive advantage to the 5% of students who used AI, right?
I think part of the reason AI is having such a negative effect on schools in particular is because of how many education processes are reliant on an archaic, broken way of "learning." So much of it is focused upon memorization and regurgitation of information (which AI is unmatched at doing).
School is packed with inefficiency and busywork that is completely divorced from the way people learn on their own. In fact, it's pretty safe to say you could learn something about 10x by typing it into an AI chat bot and having it tailor the experience to you.
Teachers worry about AI because they do not just care about memorization. Before AI, being able to write cohesive essays about a subject is a good proxy to prove your understanding beyond simple memorization. Now it's gone.
A lazy, irresponsible teacher who only cares about memorization will just grade students via in-class multi choices tests exclusively and call it a day. They don't need to worry about AI at all.
> Before AI, being able to write cohesive essays about a subject is a good proxy to prove your understanding beyond simple memorization. Now it's gone.
Take-homes were never a good proxy for anything because any student can pay for private "lessons" and get their homework done for them.
> A lazy, irresponsible teacher who only cares about memorization will just grade students via in-class multi choices tests exclusively and call it a day. They don't need to worry about AI at all.
What stops a diligent responsible teacher from doing in-class essays?
Yes, the biggest problem with authentic exercises is evaluating the students' actions and giving feedback. The problem is that authentic assessments didnt previous scale (e.g. what worked in 1:1 coaching or tutoring couldn't be done for a whole classroom). But AI can scale them.
It seems like AI will destroy education but it's only breaking the old education system, it will also enable a new and much better one. One where students make more and faster progress developing more relevant and valuable skills.
Education system uses multiple choice quizzes and tests because their grading can be automated.
But when evaluation of any exercise can be automated with AI, such that students can practice any skill with iterative feedback at the pace of their own development, so much human potential will be unlocked.
Memorizing and tests at school are the archaic approach that schools don't believe in anymore (at least the school board my kids are at), but they happen to be AI proof.
It's the softer, no memorizing, no tests, just assignments that you can hand in at anytime because there's no deadlines, and grades don't matter, type of education that is particularly useless with AI.
> So much of it is focused upon memorization and regurgitation of information, which AI is unmatched at doing.
This applies both to education, and to what people need to know to do work. Knowing all the written stuff is less valuable. Automated tools can been able to look it up since the Google era. Now they can work with what they look up.
There was a time when programmers poured over Fundamental Algorithms. No one does that today. When needed, you find existing code that does that stuff. Probably better than you could write. Who codes a hash table today?
The way you learn is totally different from the way a novice learns; they don't have a vast memorised store of knowledge, let alone the connected structure over that memorised knowledge. When you learn something, it gets incoporated thanks to these foundations.
> So much of it is focused upon memorization and regurgitation of information (which AI is unmatched at doing).
No, lots of classes are focused on producing papers which aren't just memorization and regurgitation, but generative AI is king at... Generating text... So that class of work output is suspect now
It's not the students. It's the teachers and school using AI first, and publicly. Why does he talk about only students using AI?
Also, just like how calculators are allowed in the exam halls, why not allow AI usage in exams? In real-life job you are not going to avoid use of calculator or AI. So why test people in a different context? I think the tests should focus on the skills in using calculator and AI.
I beg to differ. Tactical use of a scientific or graphing calculator can absolutely replace large parts of the thinking process. If you're testing for the ability to solve differential equations, a powerful enough calculator can trivialize it, so they aren't allowed in calculus exams. A 10-digit calculator cannot trivialize calculus, so they are allowed. That's the distinction. LLMs operate at the maximum level of "helpfulness" and there's no good way to dial them back.
I real life if someone with an administrative job would jot 50 * 3,000 in a calculator and not notice the answer 1,500,000 is wrong (a typo) I will consider them most definitely at fault. Similarly I know some structural engineers who will notice something went wrong with the input if an answer is not within a given range.
A calculator can be used to do things you know how to do _faster_ imho but in most jobs it still requires you to at least somewhat understand what is happening under the hood. The same principle applies to using LLMs at work imho. You can use it to do stuff you know how to do faster but if you don't understand the material there's no way you can evaluate the LLMs answer and you will be at fault when there's AI slop in your output.
eta: Maybe it would be possible to design labs with LLM's in such a way that you teach them how to evaluate the LLM's answer? This would require them to have knowledge of the underlying topic. That's probably possible with specialized tools / LLM prompts but is not going to help against them using a generic LLM like ChatGPT or a cheating tool that feeds into a generic model.
> Maybe it would be possible to design labs with LLM's in such a way that you teach them how to evaluate the LLM's answer? This would require them to have knowledge of the underlying topic. That's probably possible with specialized tools / LLM prompts but is not going to help against them using a generic LLM like ChatGPT or a cheating tool that feeds into a generic model.
What you are desribing is that they should use LLM just after they know the topic. A dilemma.
Yeah, I kinda like the method siscia suggests downthread [0] where the teacher grades based on the question they ask the LLMs during the test.
I think you should be able to use the LMM at home to help you better understand the topic (they have endless patience and you can usually you can keep asking until you actually grok the topic) but during the test I think it's fair to expect that basic understanding to be there.
>Also, just like how calculators are allowed in the exam halls, why not allow AI usage in exams?
Dig deeper into this. When are calculators allowed, and when are they not? If it is kids learning to do basic operations, do we really allow them to use calculators? I doubt it, and I suspect that places that do end up with students who struggle with more advanced math because they off loaded the thinking already.
On the other hand, giving a calculus student a 4 function calculator is pretty standard, because the type of math they can do isn't what is being tested, and having a student be able to plug 12 into x^3 - 4x^2 + 12 very quickly instead of having to work it out doesn't impact their learning. On the other hand, more advanced calculator are often not allowed when they trivialize the content.
LLMs are much more powerful than a calculator, so finding where in education it doesn't trivialize the learning process is pretty difficult. Maybe at grad level or research, but anything grade school it is as bad as letting a kid learning their times tables use a calculator.
Now, if we could create custom LLMs that are targeted at certain learning levels? That would be pretty nice. A lot more work. Imagine a Chemistry LLM that can answer questions, but know the homework well enough to avoid solving problems for students. Instead, it can tell them what chapter of their textbook to go read, or it can help them when they are having a deep dive beyond the level of material and give them answers to the sorts of problems they aren't expected to solve. The difficulty is that current LLMs aren't this selective and are instead too helpful, immediately answering all problems (even the ones they can't).
> The verification ability is especially important in the case of AI, which is presently a lot more fallible in a great variety of ways compared to calculators.
Um. yea. This is the first time a non-deterministic technology has achieved mass adoption for every day use. Despite repeated warnings (which are not even close to the tenor of warnings they should broadcast), folks don’t understand that AI will likely hallucinate some or all of their answer.
A calculator will not, and even the closest aspect of buggy behavior for a calculator (exploring the fringes of floating point numbers, for example) is light years away from the hallucination of generated AI for general, every day questions.
The mass exuberance over generative AI has been clouding folks from the very real effects of over-adoption or AI, and we aren’t going to see the full impact of that for some time, and when we do, folks are going to ask questions like “how were we so dumb?” And of course the answer will be “no one saw this coming.”
My spouse is an educator with nearly 20 years in the industry, and even her school has adopted AI. It’s shocking how quickly it has taken hold, even in otherwise lagging adoption segments. Her school finally went “1-1” with devices in 2020, just prior to COVID.
> The students remain motivated to learn how to solve problems without AI because they know they will be evaluated without it in class later.
Learning how to prepare for in-class tests and writing exercises is a very particular skillset which I haven't really exercised a lot since I graduated.
Never mind teaching the humanities, for which I think this is a genuine crisis, in class programming exams are basically the same thing as leetcode job interviews, and we all know what a bad proxy those are for "real" development work.
> in class programming exams are basically the same thing as leetcode job interviews, and we all know what a bad proxy those are for "real" development work.
Confusing university learning for "real industry work" is a mistake and we've known it's a mistake for a while. We can have classes which teach what life in industry is like, but assuming that the role of university is to teach people how to fit directly into industry is mistaking the purpose of university and K-12 education as a whole.
Writing long-form prose and essays isn't something I've done in a long time, but I wouldn't say it was wasted effort. Long-form prose forces you to do things that you don't always do when writing emails and powerpoints, and I rely on those skills every day.
There's no mistake there for all the students looking at job listings that treat having a college degree as a hard prerequisite for even being employable.
Trust goes offline. The value shift is happening in realtime as more in-person events, offline meetups take more value over digital communications and meetups. You can forge in the digital space but the real you is in-person.
Here is my proposal for AI in schools: raise the bar dramatically. Rather than trying to prevent kids from using AI, just raise the expectations of what they should accomplish with it. They should be setting really lofty goals rather than just doing the same work with less effort.
AI doesn't help you do higher quality work. It helps you do (or imitate) mediocre work faster. But thing is, it is hard to learn how to do excellent work without learning to do mediocre work first.
This is great for a “capstone project” at the end of a degree. But along the way, you have to master sub tasks and small skills in order to build on them later to accomplish lofty goals. So you need to learn the basics first. But AI is really good at helping you cheat on the basics without learning. So we still need to get them to the point of being able to use AI intelligently
Code quality is still a culture and prioritisation issue more than a tool issue. You can absolutely write great code using AI.
AI code review has unquestionably increased the quality of my code by helping me find bugs before they make it to production.
AI coding tools give me speed to try out more options to land on a better solution. For example, I wrote a proxy, figured out problems with that approach, and so wrote a service that could accomplish the same thing instead. Being able to get more contact with reality, and seeing how solutions actually work before committing to them, gives you a lot of information to make better decisions.
But then you still need good practices like code review, maintaining coding standards, and good project management to really keep code quality high. AI doesn’t really change that.
> Code quality is still a culture and prioritisation issue more than a tool issue.
AI helps people more that "write" (i.e. generate) low-quality code than people who write high-quality code. This means AI will lead to a larger percentage of new code being low-quality.
that is what they do in the software industry, before it was let me catch you off guard with asking how to reverse a linked list, now its leetcode questions that are so hard that you need to know and study them weekly, and prep for a year, interviewer can tell if you started prep 3 weeks prior
It seems like a good path forward is to somewhat try to replicate the idea of "once you can do it yourself, feel free to use it going forward" (knowing how various calculator operations work before you let it do it for you).
I'm curious if we instead gave students an AI tool, but one that would intentionally throw in wrong things that the student had to catch. Instead of the student using LLMs, they would have one paid for by the school.
This is more brainstorming then a well thought-out idea, but I generally think "opposing AI" is doomed to fail. If we follow a montessori approach, kids are naturally inclined to want to learn thing, if students are trying to lie/cheat, we've already failed them by turning off their natural curiosity for something else.
I agree, I think schools and universities need to adapt, just like calculators, these things aren't going away. Let students leverage AI as tools and come out of Uni more capable than we did.
AI _do_ currently throw in an occasional wrong thing. Sometimes a lot. A students job needs to be verifying and fact checking the information the AI is telling them.
The student's job becomes asking the right questions and verifying the results.
This is the correct take. To contrast the Terance Tao piece from earlier (https://news.ycombinator.com/item?id=46017972), AI research tools are increasingly useful if you're a competent researcher that can judge the output and detect BS. You can't, however, become a Terence Tao by asking AI to solve your homework.
So, in learning environments we might not have an option but to open the floodgates to AI use, but abandon most testing techniques that are not, more or less, pen and paper, in-person. Use AI as much as you want, but know that as a student you'll be answering tests armed only with your brain.
I do pity English teachers that have relied on essays to grade proficiency for hundreds of years. STEM fields has an easier way through this.
Andrej and Garry Trudeau are in agreement that "blue book exams" (I.e. the teacher gives you a blank exam booklet, traditionally blue) to fill out in person for the test, after confiscating devices, is the only way to assess students anymore.
My 7 year old hasn't figured out how to use any LLMs yet, but I'm sure the day will come very soon. I hope his school district is prepared. They recently instituted a district-wide "no phones" policy, which is a good first step.
Blue book was the norm for exams in my social science and humanities classes way after every assignment was typed on a computer (and probably a laptop, by that time) with Internet access.
I guess high schools and junior highs will have to adopt something similar, too. Better condition those wrists and fingers, kids :-)
I'm oldish, but when I was in college in the late 90s we typed a huge volume of homework (I was a history & religious studies double major as an undergrad), but the vast majority of our exams were blue books. There were exceptions where the primary deliverable for the semester was a lengthy research paper, but lots and lots of blue books.
That was how I took most of my school and university exams. I hated it then and I'd hate it now. For humanities, at least, it felt like a test of who could write the fastest (one which I fared well at, too, so it's not case of sour grapes).
I'd be much more in favour of oral examinations. Yes, they're more resource-intensive than grading written booklets, but it's not infeasible. Separately, I also hope it might go some way to lessening the attitude of "teaching to the test".
Oh how I hated those as a student. Handwriting has always been a slow and uncomfortable process for me. Yes, I tried different techniques of printing and cursive as well as better pens. Nothing helped. Typing on a keyboard is just so much faster and more fluent.
It's a shame that some students will again be limited by how fast they can get their thoughts down on a piece of paper. This is such an artificial limitation and totally irrelevant to real world work now.
Maybe this is a niche for those low distraction writing tools that pop up from time to time. Or a school managed Chromebook that’s locked to the exam page.
> My 7 year old hasn't figured out how to use any LLMs yet, but I'm sure the day will come very soon. I hope his school district is prepared. They recently instituted a district-wide "no phones" policy, which is a good first step.
This sounds as if you expect that it will become possible to access an LLM in class without a phone or other similar device. (Of course, using a laptop would be easily noticed.)
The phone ban certainly helps make such usage noticeable in class, but I'm not sure the academic structure is prepared to go to in-person assessments only. The whole thread is about homework / out of class work being useless now.
1. Corporate interests want to sell product
2. Administrators want a product they can use
3. Compliance people want a checkbox they can check
4. Teachers want to be ablet to continue what they have been doing thus far within the existing ecosystem
5. Parents either don't know, don't care, or do, but are unable to provide a viable alternative or, can and do provide it
We have had this conversation ( although without AI component ) before. None of it is really secret. The question is really what is the actual goal. Right now, in US, education is mostly in name only -- unless you are involved ( which already means you are taking steps to correct it ) or are in the right zip code ( which is not a guarantee, but it makes your kids odds better ).
> AI research tools are increasingly useful if you're a competent researcher that can judge the output and detect BS.
This assumes we even need more Terence Taos by the time these kids are old enough. AI has gone from being completely useless to solving challening math problems in less than 5 years. That trajectory doesn't give me much hope that education will matter at all in a few years.
One more thing: teachers should use LLMs for grading. Grading is a boring chore, the majority of teachers hate it. Many procrastinate, and return the graded tests or assignments with a significant delay, well beyond the window where the kids are curious to see how they did. This completely destroys the feedback purpose of grading. With an LLM, you feed the scanned tests, the LLMs gives you back the graded test, and the teacher can quickly do a bit of quality control to see the LLM did not make any mistakes. If the LLM makes mistakes, then wait 6 months, there will be new version that won't make those mistakes.
So it is feasible (in principle) to give every student a different exam!
You’d use AI to generate lots of unique exams for your material, then ensure they’re all exactly the same difficulty (or extremely extremely close) by asking an LLM to reject any that are relatively too hard or too easy. Once you have generated enough individual exams, assign them to your students in your no-AI setting.
Code that the AI writes would be used to grade them.
- AI is great at some things.
- Code is great at other things.
- AI is bad at some things code is great for.
- AI is great at coding.
Therefore, leverage AI to quickly code up deterministic and fast tools for the tasks where code is best.
And to help exams be markable by code, it makes sense to be smart about exam structure - eg. only ask questions with binary answers or multiple choice so you don’t need subjective judgment of correctness.
except (like it or not) students are in direct competition with each other. Unique assessments would be impossible to defend the first time a student claimed your "unfair" test cost them a job, scholarship or other competitive opportunity.
The answer to AI use in schools breaking old teaching and evaluation methods isn't to cripple students, it's to create new assignments and evaluation methods.
One idea: Have students generate videos with their best "ELI5" explanations for things, or demos/teaching tools. Make the conciseness and clarity of the video video and the quality/originality of the teaching tools the grading criteria. Make the videos public, so classmates can compare themselves with their peers.
Students will be forced to learn the material and memorize it to make a good video. They'll be forced to understand it to create really good teaching tools. The public aspect will cause students to work harder not to feel foolish in front of their peers.
The beauty of this is that most kids these days want to be influencers, so they're likely to invest time into the assignment even if they're not interested in the subject.
Most of what schools teach is either useless or toxic. They rarely teach practical skills. I would argue for getting people to use handwriting/cursive as a workaround for this issue. It would mean that if they did use AI then they would have to process some of the content mentally rather than just present it.
> Most of what schools teach is either useless or toxic
You must have a pretty broad definition of useless / toxic if you think that reading, writing and basic math, but also geometry, calculus, linear algebra, probability theory, foreign languages, a broad overview of history, and basic competency in physics / electronics fall under these categories.
Sure, I learned a lot in school that turned out to be pretty useless for me (chemistry, basically anything I learned in PE, french), but I did not know that at the time and I am still grateful that I was being exposed to these topics. Some of my classmates developed successful careers from these early exposures.
K-12 is first and foremost a daycare so both parents can be "productive". Its not only a daycare, but this is certainly its primary function since I've been able to observe the world.
I made a tool for this! It's an essay writing platform that tracks the edits and keystrokes rather than the final output, so its AI detection accuracy is _much_ higher than other tools:
https://collie.ink/
I’ve been following this approach since last school year. I focus on in-class work and home-time is for reading and memorization. My classmates still think classrooms are for lecturing, but it's coming. The paper-and-pen era is back to school!
I did a lot of my blog and book writing before these AI tools, but now I show my readers images of handwritten notes and drafts (more out of interest than demonstrating proof of work).
This couldn’t have happened at a better time. When I was young my parents found a schooling system that had minimal homework so I could play around and live my life. I’ve moved to a country with a lot less flexibility. Now when my kids will soon be going to school, compulsory homework will be obsolete.
Zero homework grades will be ideal. Looking forward to this.
If AI gets us reliably to a flipped classroom (=research at home, work through work during class) then I'm here for it. Homework in the traditional sense is an anti pattern.
1. Assume printing press exists
2. Now there's no need for a teacher to stand up and deliver information by talking to a class for 60 mins
3. Therefore students can read at home (or watch prepared videos) and test their learning in class where there's experts to support them
4. Given we only need 1 copy of the book/video/interactive demo, we can spend wayyyyy more money making it the best it can possibly be
What's sad is it's 500 years later and education has barely changed
> What's sad is it's 500 years later and education has barely changed
From my extensive experience of four years of undergrad, the problem in your plan is "3. Therefore students can read at home " - half the class won't do the reading, and the half that did won't get what it means until they go to lecture[1].
[1] If the lecturer is any good at all. If he spends most of his time ranting about his ex-wife...
Most of what I learned in college was only because I did homework and struggled to figure it out myself. Classroom time was essentially just a heads up to what I'll actually be learning myself later.
Granted, this was much less the case in grade school - but if students are going to see homework for the first time in college, I can see problems coming up.
If you got rid of homework throughout all of the "standard" education path (grade school + undergrad), I would bet a lot of money that I'd be much dumber for it.
> but if students are going to see homework for the first time in college, I can see problems coming up.
If the concept is too foreign for them, I'm sure we could figure out how to replicate the grade school environment. Give them their 15 hours/week of lecture, and then lock them in a classroom for the 30 hours they should spend on homework.
besides the poor UX for unauthenticated users, i would rather not view ads from advertisers who still pay X for the access to my eyeballs (in the event i'm using a browser that doesn't block them to begin with).
this is a very American issue. In my entire student career in Italy, home assignments were never graded. Maybe you had a project or two through university, but otherwise I got all my grades for onsite tests.
> the majority of grading has to shift to in-class work (instead of at-home assignments)
My wife is a teacher. He school did this a long time ago, long before AI. But they also gave every kid a laptop and forced the teachers to move all tests/assignments to online applications with the curriculum picked out by the administrators (read as: some salesperson talked them into it). Even with assignments done in class, it's almost impossible to catch kids using AI when they're all on laptops all the time and she can't teach and monitor them all at the same time.
Bring back pencil and paper. Bring back calculators. Internet connected devices do not belong in the classroom.
This is exactly why I'm focusing on job readiness and remediation rather than the education system. I think working all this out is simply too complex for a system with a lot of vested interest and that doesn't really understand how AI is evolving. There's an arms race between students, teachers, and institutions that hire the students.
It's simply too complex to fix. I think we'll see increased investment by corporates who do keep hiring on remediating the gaps in their workforce.
Most elite institutions will probably increase their efforts spent on interviewing including work trials. I think we're already seeing this with many of the elite institutions talking about judgment, emotional intelligence critical thinking as more important skills.
My worry is that hiring turns into a test of likeability rather than meritocracy (everyone is a personality hire when cognition is done by the machines)
Source: I'm trying to build a startup (Socratify) a bridge for upskilling from a flawed education system to the workforce for early stage professionals
> Using the calculator as an example of a historically disruptive technology, school teaches you how to do all the basic math & arithmetic so that you can in principle do it by hand, even if calculators are pervasive and greatly speed up work in practical settings. In addition, you understand what it's doing for you, so should it give you a wrong answer (e.g. you mistyped "prompt"), you should be able to notice it, gut check it, verify it in some other way, etc.
Calculator analogy is extremely inaccurate, understandably people keep doing this comparison. The premise is that calculator didn't take bookkeepers' job, but instead it helped them.
First of all calculator do one job and does it very well, you never question it because it solely works with numbers. But AI wants to be everything, calculator, translator, knowledge base etc.. And, it's very confident at everything all the time until you start to question it, and even then it continues to lie. Because sadly current AI products' purpose isn't to give you accurate answer, it's about making you believe that it's giving you credible information.
More importantly calculators are not connected to the internet, and they are not capable of creating profile of an individual.
It's sad to see big players push this agenda to make people believe that they don't need to think anymore, AI will do everything for them.
A detector is easy to write, simply monitor the kid’s computer and phone use. AI is ruining school but it will be solved in the lowest resistance way possible.
I agree that focus should just shift to in-class work so that students are left free to do whatever they want once they are done for the day at school. Homework and at-home assignments are lazy handovers.
Also, all of these AI threats to public education can be mitigated if we just step 1-2 decades back and go the pen-and-paper way. I am yet to see any convincing argument in favor of digital/screen-based teaching methods being superior in any way than the traditional ones, on the contrary I have seen thousands of arguments against them.
How about just dispense with the AI nonsense in education and go to totally in-person, closed-book, manually-written, proctored exams? No homework, no assignments, no projects. Just pure mind-to-paper writing in a bare room under the eye of an examiner. Those that want to will learn and will produce intelligent work regardless of setting.
Its really something to just casually say “all exams need to be in person”, reversing in a single move significant gains in education availability.
It may not be obvious in a country with smaller student to teacher ratios, but for a place like India, you never have enough teachers for students.
Being able to provide courses, and homework digitally, reduced the amount of work required to grade and review work.
Then to add insult to injury, AI is removing entry level roles, removing other chances for people to do work which is easy to verify, practice and learn from.
Yes, yes, eventually tool use will result in increases in GDP. Except our incentives are not to hire more teachers, build more schools, and improve educational outcomes. Those are all public goods, not private goods. We aren’t going to tax firms further, because commerce must be protected, yet we will socialize the costs to society.
This doesn't adress the point that AI can replace going to school. AI can be your perfect personal tutor to help you learn thing 1:1. Needing to have a teacher and prove to them that you know what they teached will become a legacy concept. That we have an issue of AI cheating at school is in my eyes a temporary issue.
ChatGPT just told me to put the turkey in my toaster oven legs facing the door, and you think it can replace school. Unless there is a massive architectural change that can be provably verified by third parties, this can never be. I’d hate for my unschooled surgeon to check an llm while I’m under.
Don't worry, someone will put another hack on top the model to teach it to handle this specific case better. That will totally fix the problem, right? Right?
A trained professional making their best guess is far more capable and trustworthy than the slop LLMs put out. So yeah, winging it is a good alternative here.
I see, I overlooked the 'toaster' part. That's a good world model benchmark question for models and a good reading comprehension question for humans. :-P
GPT 5.1 Pro made the same mistake ("Face the legs away from the door.") Claude Sonnet 4.5 agreed but added "Note: Most toaster ovens max out around 10-12 pounds for a whole turkey."
Gemini 3 acknowledged that toaster ovens are usually very compact and that the legs shouldn't be positioned where they will touch the glass door. When challenged, it hand-waved something to the effect of "Well, some toaster ovens are large countertop convection units that can hold up to a 12-pound turkey." When asked for a brand and model number of such an oven, it backtracked and admitted that no toaster oven would be large enough.
Changing the prompt to explicitly specify a 12-pound turkey yielded good answers ("A 12-pound turkey won't fit in a toaster oven - most max out at 4-6 pounds for poultry. Attempting this would be a fire hazard and result in dangerously uneven cooking," from Sonnet.)
It is considered valuable and worthwhile for a society to educate all of its children/citizens. This means we have to develop systems and techniques to educate all kinds of people, not just the ones who can be dropped off by themselves at a library when they turn five, and picked up again in fifteen years with a PHD.
Sure. People who are self motivated are who will benefit the earliest. If a society values ensuring every single citizen gets a baseline education they can figure out how to get an AI to persuade or trick people into learning better than a human could.
For someone that wants to learn, I agree with this 100%. AI has been great at teaching me about 100s of topics.
I don't yet know how we get AI to teach unruly kids, or kids with neurodivergencies. Perhaps, though, the AI can eventually be vastly superior to an adult because of the methods it can use to get through to the child, keep the child interested and how it presents the teaching in a much more interactive way.
Putting aside the ludicrous confidence score, the student's question was: how could his sister convince the teacher she had actually written the essay herself? My only suggestion was for her to ask the teacher to sit down with her and have a 30-60 minute oral discussion on the essay so she could demonstrate she in fact knew the material. It's a dilemma that an increasing number of honest students will face, unfortunately.
My son told me that he had in fact used AI, but asked AI multiple times to simplify the text, and he had entered the simplified version. He like the first version best, but was aware his teacher would consider it written by AI.
Guess the teachers have already lost...
"AI detection" wasn't even a solution in the short term and it won't be going forward. Take-home essays are dead, the teachers are collectively just hoping some superhero will swoop in and somehow save them. Sometimes such a thing is possible, but it isn't going to happen this time.
So why is the issue you described an issue? Because it's about a grade. And the reason that's relevant is because that credential will then be used to determine where she can to to university which, in turn, is a credential that will determine her breadth of options for starting her career, and so on. But why is this all done by credentials instead of simple demonstrations of skill? What somebody scored in a high school writing class should matter far less than the output somebody is capable of producing when given a prompt and an hour in a closed setting. This is how you used to apply to colleges. Here [1], for instance, is Harvard's exam from 1869. If you pass it, you're in. Simple as that.
Obviously this creates a problem of institutions starting to 'teach the test', but with sufficiently broad testing I don't see this as a problem. If a writing class can teach somebody to write a compelling essay based on an arbitrary prompt, then that was simply a good writing class! As an aside this would also add a major selling point to all of the top universities that offer free educational courses online. Right now I think 'normal' people are mostly disinterested in those because of the lack of widely accepted credentials, which is just so backwards - people are actively seeking to maximize credentials over maximizing learning.
This is one of the very few places I think big tech in the US has done a great job. Coding interviews can be justifiably critiqued in many ways, but it's still a much better system than raw credentialization.
[1] - https://graphics8.nytimes.com/packages/pdf/education/harvard...
I wish I would agree with you, but I think that having a degree (or rather the right degree) is more important than ever.
Basically grades exist to decide who gets a laid back high paying job, and who has to work 2 low paying labor intensive job just to live paycheck to paycheck.
As one teacher told me once: we could have all of you practice chess, make a big tournament and you get to choose your university based on your chess ranking. It wouldn't be any less stupid than the current system.
Its also the only way that students can actually be held to the same standards. When I was a freshman in college with a 3.4 highschool GPA, I was absolutely gobsmacked by how many kids with perfect >= 4.0 GPAs couldn't pass the simple algebra test that the university administered to all undergraduates as a prerequisite for taking any advanced mathematics course.
Goodhart's law.
In education, regarding exams, Goodhart's law just means that you should randomize your test questions instead of telling the students the questions before the exam. Have a wide set of questions, randomize them. The only way for students to pass is to learn the material.
A randomized standardized test is not more susceptible to Goodhart's law than a randomized personal test. The latter however has many additional problems.
[0] : https://en.wikipedia.org/wiki/Normal-form_game
"The only way for students to pass is to learn the material."
Part of Goodhart's law in this context is precisely that it overdetermines "the material" and there is no way around this.
I wish Goodhart's law was as easy to dodge as you think it is, but it isn't.
School needs to provide opportunities to practice applying important skills like empathy, tenacity, self-regulation, creativity, patience, collaboration, critical thinking, and others that cannot be assessed using a multiple choice quiz taken in silence. When funding is tied to performance on trivia, all of the above suffers.
If you dont even know that the american civil war ended in 1865 how could you do any meaningful analysis on its downstream implications or causes and its relationship to other events.
I'd imagine millions if not billions of people have found basic math useful without ever learning what "commutative" even means.
Sure, but it takes < 1 second to read a GPA.
We need some way to distill the unbelievable amount of data in human brains into something that can be processed in a reasonable amount of time. We need a measurement - a degree, a GPA, something.
Imagine if in every job interview they could assume absolutely nothing. They know nothing about your education. They might start by asking you to recite your ABCs and then, finally at sunset, you might get to a coding exam. Which still won't work, because you'll just AI cheat the coding exam.
We require gatekeepers to make the system work. If we allow the gatekeepers to just rubber stamp based off of if stuff seems correct, that tells us nothing about the person itself. We want the measurement to get close to the real understanding.
That means AI papers have to be given a 0, which means we need to know if something is AI generated. And we want to catch this at the education level, not above.
But assuming in-person day long batteries of tests for universities and companies is probably not very practical.
You can argue whether university is a very efficient use of time or money but it presumably does involve some learning and offers potential employers some level of a filter that roughly aligns with what they're looking for.
We should expect this if employers can efficiently and objectively evaluate a candidate's skills without relying on credentials. When they're unable to, we should worry about this information asymmetry leading to a "market for lemons" [0]. I found an article [1] about how this could play out:
> This scenario leads to a clear case of information asymmetry since only the graduate knows whether their degree reflects real proficiency, while employers have no reliable way to verify this. This mirrors the classic “Market for Lemons” concept introduced by economist George Akerlof in 1970, where the presence of low-quality goods (or in this case, under-skilled graduates) drives down the perceived value of all goods, due to a lack of trustworthy signals.
[0] https://quickonomics.com/terms/market-for-lemons/
[1] https://competitiveness.in/how-ai-could-exacerbate-the-skill...
Just so we're clear, the coding tests are in addition to credentialisation. I'll never forget when I worked at Big Tech (from Ireland) and I would constantly hear recruiters talk about the OK school list (basically the Ivy league). Additionally, I remember having to check the University a candidate had attended before she had an interview with one of our directors.
He was fine with her, because she had gone to Oxford. Honestly, I'm surprised that I was able to get hired there given all this nonsense.
I'm a drop out (didn't finish BSc) from a no name Northern European university and I've worked at or gotten offers from:
- Meta
- Amazon
- Google
- Microsoft
- Uber
- xAI
+ some unicorns that compete with FAANG+ locally.
I didn't include some others that have reached out for interviews which I declined at the time. The lack of a degree has literally never come up for me.
It seems to be a US role thing in my experience.
In a way, I think the hiring process at second-tier (not FAANG) companies is actually better because you have to "moneyball" a little bit - you know that you're going to lose the most-credentialed people to other companies that can beat you dollar for dollar, so you actually have to think a little more deeply about what a role really needs to find the right person.
Honestly, students should have a course in "how the justice system works" (or at least should work). So should the teachers.
Student unions and similar entities should exist and be ready to intervene to help students in such situations.
This is nothing new, AI will just make this happen more often, revealing how stupid so many teachers are. But when someone spent thousands for a tool, which purports to be reliable, and is so quick to use, how can an average person resist it? The teacher is as lazy as the cheaters they intend to catch.
The only way to reliably prevent the use of AI tools without punishing innocent students is to monitor the students while they work.
Schools can either do that by having essays be written on premise, either by hand or by using computers managed by the school.
But students that are worried that they will be targeted can also do this themselves, by setting up their phone to film them while working.
And if they do this, and the teacher tries to punish someone who can prove they wrote the essay themselves, either the teacher or the school should hopefully learn that such tools can't be trusted.
Bizzare and unfair
https://decrypt.co/286121/ai-detectors-fail-reliability-risk...
We learned how government and justice worked.
And to add to that, there should be a justice system there. The idea of due process is laughable in most educational settings.
The professor noticed it (presumably via seeing poor "show your work") and gave zero points on the question to everyone. And once you went to complain about your grade, she would ask you to explain the answer there in her office and work through the problem live.
I thought it was a clever and graceful way to deal with it.
The teacher effectively filtered out the shy boys/girls who are not brave enough to "hustle." Gracefully.
The time spent challenging exam grades is usually better spent studying for the next exam. I've never gotten a significant grade improvement from it.
This has nothing to do with American Hustle culture and just with that professor's judgment.
She didn't ask them to challenge them, she asked them additional questions. The test already asks them questions.
If you are really shy, a culture where no one cheats is far better because your actual ability and intelligence shines through
Cheaters and non cheaters were punished in exactly the same way. Effectively cheating gave you an advantage and being shy gave you disadvantage.
The world at large rarely accommodates shy people. Coping skills are essential, even if they are unpleasant.
They learned that cheating gives advantage to the cheating individual. They also learned that reporting cheating harms them and non cheaters.
He just goes to our local public elementary school.
How the heck is that even possible? :o
Because what the cheater is trying to accomplish is to avoid having to think.
It's an act motivated by either laziness, apathy or rebellion (or some combination thereof). Not motivated by trying to get a good grade.
In Germany, the traditional sharp-tongued answer of pupils to the question "How could both of you get the exact same WRONG answer (in the test)?" is: "Well, we both have the same teacher." :-)
> the final exam where somehow people got their hands on the hardest question of the exam.
They got the question but not the answer so they had to work it out before the test. They couldn't explain it later?
Knowing the way a lot of professors act, I'm not surprised, but it's always disheartening to see how many behave like petty tyrants who are happy to throw around their power over the young.
Since high school, the expectation is that you show your work. I remember my high school calculus teacher didn't even LOOK at the final answer - only the work.
The nice thing was that if you made a trivial mistake, like adding 2 + 2 = 5, you got 95% of the credit. It worked out to be massively beneficial for students.
The same thing continued in programming classes. We wrote our programs on paper. The teacher didn't compile anything. They didn't care much if you missed a semicolon, or called a library function by a wrong name. They cared if the overall structure and algorithms were correct. It was all analyzed statically.
1. they skip what are to them the obvious steps (we all do as we achieve mastery) and then get penalized for not showing their work.
2. they inherently know and understand the task abut not the mechanized minutia. Think of learning a new language. A diligent student can work through the problem and complete an a->b translation, then go the other way, and repeat. Someone with mastery doesn't do this; they think within one language and then only pass the contextual meaning back and forth when explicitly required.
"showing your work" is really the same thing as "explain how you think" and may be great for basics in learning, but also faces levels of abstraction as you ascend towards mastery.
Tests were created to save money, more students per teacher, we're just going back to the older, actually useful, method of talking to people to see if they understand what they've been taught.
You weren't asked to write an essay because someone wanted to read your essay, only to intuit that you've understood something
Personally I don't believe that any of the problems caused by AI are going to be solved by "more AI"
People do learn how to use the web, social media, mobile devices to ultimately work for them or against them.
How is this working out in practice? Every piece of technology is absolutely adversarial nowadays and people are getting ground to bits by it.
I'm skeptical. Tests are a way of standardizing the curriculum and objectively determining if the lessons were learned.
The lesson of how to swim sometimes only comes in applying the learning.
What happened is that I did a Q&A worksheet but in each section of my report I reiterated the question in italics before answering it.
The reiterated questions of course came up as 100% plagiarism because they were just copied from the worksheet.
Wow I'd have been screwed, so many of my high school papers were just rewrites and improvements on stuff I wrote in earlier years.
My point is that accuracy is a terrible metric here and sensitivity, specificity tell us much more relevant information to the task at hand. In that formulation, a specificity < 1 is going to have false positives and it isn't fair to those students to have to prove their innocence.
If we're being literal, accuracy is (number correct guesses) / (total number of guesses). Maybe the folks at turnitin don't actually mean 'accuracy', but if they're selling an AI/ML product they should at least know their metrics.
Just speaking in general here -- I don't know what specific phrasing TurnItIn uses.
False positives with technology that is non-deterministic is guaranteed.
It's more than slightly comedic people being amazed when LLM math works as it's created to.
It's shit software for schools and teachers to cover their ass. Nothing more, and deserves no more attention.
All it takes is one moron with power and a poor understanding of statistics.
This sounds like, a good solution? It’s the exception case, so shouldn’t be constant (false positives), although I suppose this fails if everyone cheats and everyone wants to claim innocence.
I guess we could go back to giving exams soviet Russia style where you get a couple of questions that you have to answer orally in front of the whole class and that’s your grade. Not fun…
For exams you’d need a proctored environment of some sort, say a row of conference booths so students can’t just bring notes.
You’d want to have some system for ephemeral recording so the teachers can do a risk-based audit and sample some %, eg one-two questions from each student.
Honestly for regular weekly assignments you might not even need the heavyweight proctoring and could maybe allow notes, since you can tell if someone knows what they are talking about in conversation , it’s impossible to crib-sheet your way to fluent conversational understanding.
In this particular resolution example, it would be quicker to ask the student some probing questions versus have them re-write (and potentially regurgitate) an essay.
You can’t keep hiding behind being an introvert your whole life.
2. Speaking about your work in front of 1-2-5 people is one thing, but being tested in front of an entire class (30 people?) is a totally different thing.
One of the funniest things was being accused of plagiarising Wikipedia, when I'd actually written most of the Wikipedia article on said subject. The irony... Wikipedia doesn't just use unpaid labour, it ends up undermining the people who wrote it.
Surely it would be relatively easy to offer to show the edit history to prove that you actually contributed to the article? And, by doing so, would flip the situation in your favour by demonstrating your expertise?
The fact that you should have to is pretty annoying but also fairly edge case. And if a teacher or institute refuses to review that evidence then I don't think the credential on the table worth the paper it's printed on anyway.
I think AI got me some brain rot as I concern to finish stuff on time and I can't bare to spend brain energy on that (and spend on it anyway because AI sucks)
To wit, show the teacher that YOU did the work and not someone else. If the teacher is not willing to do this with every student they accuse of malfeasance, they need to find another job. They're lazy as hell and suck at teaching.
Computer, show "my" work and explain to the teacher why "I" wrote what "I" did, describe why that particular approach to the narrative appealed to "me" and "I" chose that as the basis of "my" work. Produce an outline on which the paper could have been based and possible rough drafts, then explain how I could have revised the work to produce the final result.
If this is insufficient, then there are tools specifically for education contexts that track student writing process.
Detecting the whole essay being copied and pasted from an outside source is trivial. Detecting artificial typing patterns is a little more tricky, but also feasible. These methods dramatically increase the effort required to get away with having AI do the work for you, which diminishes the benefit of the shortcut and influences more students to do the work themselves. It also protects the honest students from false positives.
Keystroke dynamics can detect artificial typing patterns (copying another source by typing it out manually). If a student has to go way out of their way to make their behavior appear authentic then it's decreasing advantage of cheating and less students will do it.
If the student is integrating answers from multiple AI responses then maybe that's a good thing for them to be learning and the assessment should allow it.
Manually re-typing another source is something these tools were originally designed to detect. The original issue was "essay mills", not AI.
The best solutions are in student motivations and optimal pedagogical design. Students who want to learn, and learning systems that are optimized for rate of learning.
I guess you could use AI to guide this, at which point it's basically a research tool and grammar checker.
Crude tools (like Google docs revision history) can protect an honest student who engages in a typical editing process from false allegations, but it can also protect a dishonest student who fabricated the evidence, and fail to protect an honest student who didn't do any substantial editing.
More sophisticated tools can do a better job of untangling the fractal, but as with fractal shaped problems the layers of complexity keep going and there's no perfect solutions, just tools that help in some situations when used by competent users.
The higher Ed professors who really care about academic integrity are rare, but they are layering many technical and logistical solutions to fight back against the dishonest students.
https://news.ycombinator.com/item?id=14285116 ('Justice.exe: Bias in Algorithmic sentencing (justiceexe.com)")
https://news.ycombinator.com/item?id=43649811 ("Louisiana prison board uses algorithms to determine eligility for parole (propublica.org)")
https://news.ycombinator.com/item?id=11753805 ("Machine Bias (propublica.org)")
> language models are more likely to suggest that speakers of [African American English] be assigned less-prestigious jobs, be convicted of crimes and be sentenced to death.
This one is just so extra insidious to me, because it can happen even when a well-meaning human has already "sanitized" overt references to race/ethnicity, because the model is just that good at learning (bad but real) signals in the source data.
We are already (in the US) living in a system of soft social-credit scores administered by ad tech firms and non-profits. So “the algorithms says you’re guilty” has already been happening in less dramatic ways.
https://www.youtube.com/watch?v=XL2RLTmqG4w
The great thing about AI is that with a bit of imagination it can be used to amplify teachers too.
In this case, yes, you need to do a viva voce to convince the teacher (though I suspect they should be able to get fairly confident in 10-15 minutes).
But you could also have students convince an AI (probably in a proctored space?) if you need to scale this approach out.
Of course, there will be complaints from many students. However, as a prof for decades, I can say that some will prefer an exam-based solution. This includes the students who are working their way through university and don't have much time for busy-work, along with students who write their essays themselves and get lower grades than those who do not.
This reminds me of when GPS routing devices first came onto the scene. Lots of people drove right into a lake or ocean because the device said keep going straight. (because of poorly classified multi-modal routing data)
If it looks like AI cheating software will be a problem for my children (and currently it has not been an issue), then I'm considering recording them doing all of their homework.
I suspect school admin only has so much appetite for dealing with an irate parent demanding a real time review of 10 hours of video evidence showing no AI cheating.
In general, I don’t really understand educators hyperventilating about LLM use. If you can’t tell what your students are independently capable of and are merely asking them to spit back content at you, you’re not doing a good job.
It should be way easier than TSA's goal because you don't need to stop cheaters. You instead just need to ensure that you seed skills into a minimal number of achievers so that the rest of the kids see what the real target of education looks like. Kids try their best not to learn, but when the need kicks in they learn way better spontaneously from their peers than any other method.
Of course, this all assumes an effective pre-K reading program in the first place.
Often it is more work to cheat than just learn it.
Pre-k is preschool aka kindergarten?
Is this really needed? It's really stressful for kids under 5 or 6 to read and is there a big enough statistical difference in outcome enough to rob them of some of their early youth?
I started reading around 6 years old and I was probably ahead of the vast majority of kids within 6 months.
Kids starting around 6 years old have much better focus and also greatly enhanced mental abilities overall.
Once this becomes routine the class can become e.g. 10 minutes conversation on yesterday's topic, 40 minutes lecturing and live exercises again. Which is really just reinventing the "daily quiz" approach, but again the thing we are trying to optimize for is compliance.
I could see why he didn’t, so I wasn’t offended or defensive and started to tell him the steps required to build web apps and explained it in a manner he could understand using analogies. Towards the end of our conversation he could see I both knew about the topic and was enthusiastic about it. I think he was still a bit shocked that I wrote that paper, but he could tell from the way I talked about it that it was authentic.
It will be interesting to see how these situations evolve as AI gets even better. I suspect assessment will be more manual and in-person.
It turned out he ran it through a plagiarism detector and multiple lines of code where identical to lines in their database.
It was very silly because there’s a lot of boiler plate code in win32 projects
I mean, what is the problem? It's my report! I know all the ins and outs, I take full responsibility for it. I'm the one taking this to the board of directors who will grill me on all the details. I'm up for it. So why is this so "not done"? Why do you assume I let the AI do the "thinking"? I'm appalled by your lack of trust in me.
Perhaps it's an artifact of LLMs being trained on terabytes of autistic internet commenters like me. Maybe being detected as AI by Turnitin even has some diagnostic value.
If no, why not?
Personally I would rather read a human's output than their selection of machine outputs.
Nowadays, often I put my text into the LLM, and say: Make more concise, include all original points, don't be enthusiastic, use business style writing. And then it will come with some lines of which I think: Yes! That is what I meant!
I can't imagine you'd rather read my Dunglish. Sure, I could have "studied harder", but one simply is just much more clever in their native tongue, I know more words, more subtleties etc. Over time, and I believe due to LLM use I do get better at it myself! It's a language model after all, not a facts model. I can trust it to make nice sentences.
I understand the sentiment, even appreciate it, but there are books that draw you into a story when your eyes hit the paper, and there are books that don't and induce yawning instead (on the same topic). That is a skill issue.
Perhaps I should add that using the LLM does not make me faster in any way, maybe even slower. But it makes the end results so much more pleasant.
"If I Had More Time, I Would Have Written a Shorter Letter". Now I can, but in similar time.
Recently there was a non-native english speaker heavily using an LLM to review their answers on a Show HN post, and it was incredibly annoying. The author did not realize (because of their lack of skills in the language) but the AI-edited version felt fake and mechanical in tone. In that case yes, the broken original is better because it preserves the humanity of the original answers, mistakes and all.
You know maybe it is annoying for native speakers to pick up subtle AI signals, but for non-natives it can be annoying to find the correct words that express what you want to say as precisely as in your mother tongue. So don’t judge too much. It’s an attempt at better communication as well.
- Write it in google docs, and share the edit history in the google docs, it is date and time stamped.
- Make a video of writing it in the google docs tab.
If this is available, and sufficient, I would pursue a written apology to remind the future detectors.
Edit: clarity
She can't because she didn't write the essay herself, obviously.
The only thing prevents them from doing so is the fact Google is too big to sell a "plagiarism assistant."
but if u talk like this boss i had, then obv ur a human, kthx
Great incentives. /s
I've seen assignments that were clearly graded by ChatGPT. The signs are obvious: suggestions that are unrelated to the topic or corrections for points the student actually included. But of course, you can't 100% prove it. It's creating a strange feedback loop: students use an LLM to write the essay, and teachers use an LLM to grade it. It ends up being just one LLM talking to another, with no human intelligence in the middle.
However, we can't just blame the teachers. This requires a systemic rethink, not just personal responsibility. Evaluating students based on this new technology requires time, probably much more time than teachers currently have. If we want teachers to move away from shortcuts and adapt to a new paradigm of grading, that effort needs to be compensated. Otherwise, teachers will inevitably use the same tools as the students to cope with the workload.
Education seemed slow to adapt to the internet and mobile phones, usually treating them as threats rather than tools. Given the current incentive structure and the lack of understanding of how LLMs work, I'm not optimistic this will be solved anytime soon.
I guess the advantage will be for those that know how to use LLMs to learn on their own instead of just as a shortcut. And teachers who can deliver real value beyond what an LLM can provide will (or should) be highly valued.
A one hour lecture where students (especially <20 year old kids) need to proactively interject if they don't understand something is a pretty terrible format.
> "Education seemed slow to adapt to the internet and mobile phones, usually treating them as threats rather than tools. Given the current incentive structure and the lack of understanding of how LLMs work"
Good point, it is less like a threat and more like... "how do we shoehorn this into our current processes without adapting them at all? Oh cool now the LLM generates and grades the worksheets for me!".
We might need to adjust to more long term projects, group projects, and move away from lectures. A teacher has 5*60=300 minutes a week with a class of ~26. If you broke the class into groups of 4 - 5 you could spend a significant amount of time with each group and really get a feel for the students beyond what grade the computer gives to their worksheet.
This was the plot to a recent South Park episode: https://m.imdb.com/title/tt27035146/
Is using AI to support grading such a bad idea? I think that there are probably ways to use it effectively to make grading more efficient and more fair. I'm sure some people are using good AI-supported grading workflows today, and their students are benefiting. But of course there are plenty of ways to get it wrong, and the fact that we're all pretending that it isn't happening is not facilitating the sharing of best practices.
Of course, contemplating the role of AI grading also requires facing the reality of human grading, which is often not pretty. Particularly the relationship between delay and utility in providing students with grading feedback. Rapid feedback enables learning and change, while once feedback is delayed too long, its utility falls to near zero. I suspect this curve actually goes to zero much more quickly than most people think. If AI can help educators get feedback returned to students more quickly, that may be a significant win, even if the feedback isn't quite as good. And reducing grading burden also opens up opportunities for students to directly respond to the critical feedback through resubmission, which is rare today on anything that is human-graded.
And of course, a lot of times university students get the worst of both worlds: feedback that is both unhelpful and delayed. I've been enrolling in English courses at my institution—which are free to me as a faculty member. I turned in a 4-page paper for the one I'm enrolled in now in mid-October. I received a few sentences of written feedback over a month later, and only two days before our next writing assignment was due. I feel lucky to have already learned how to write, somehow. And I hope that my fellow students in the course who are actual undergraduates are getting more useful feedback from the instructor. But in this case, AI would have provided better feedback, and much more quickly.
It's getting rid of cheap methods.
Scantrons and bluebooks were always a way that made it cheap for institutions to produce results. Now those methods kinda seem silly, right?
500 person freshman lectures seem kinda absurd now, right?
Teaching via adjuncts that had 3 days notice for the class and are paid nothing is kinda scammy, right?
R1s professors whose tenure evals have nothing to do with teaching is kinda wrong, right?
The Oxbridge model of 5-10 person classes with a proctor is what the education with AI is going to be about. It's small, intimate, and expensive.
https://www.npr.org/2025/01/08/nx-s1-5246200/demographic-cli...
PDF warning: https://www.cdc.gov/nchs/data/nvsr/nvsr73/nvsr73-02.pdf
Colleges will need to reduce class sizes, or close entirely, for the next decade at least. With smaller class sizes brings the opportunity for course instructors to provide more time per pupil so that things like in-person homework and project review is possible.
If you imagine students take 4 classes per semester and faculty teach 4 per semester… it seems stunningly feasible.
These "some" are founders of AI companies and investors who put a lot of money into such companies. Of course, the statements that these people "excrete" serve an agenda ...
Maybe all this social stuff that AI would bring to focus—may prove a catalyser for radical change?
Unlikely, but one can dream!
I think that AI has the possibility of weakening some aspects of education but I agree with Karpathy here. In class work, in person defenses of work, verbal tests. These were corner stones of education for thousands of years and have been cut out over the last 50 years or so outside of a few niche cases (Thesis defense) and it might be a good thing that these come back.
On the other hand, I had a neighbour ask me if he can make his 1 month apprenticeship when he finished his 3rd year of CS High School (eg ~18 years old, 3 of 4 years of 'CS trade school') 6 months ago or so. I was totally gobsmacked by his lack of basic understanding of how computers work, I am confident that he did not confidentially know the difference between a file and a folder. But he was very confident in the AI slop he produced. I had a grand plan of giving him tasks that would show him the pitfalls of AI -> no need for that, he blindly copied whatever AI gave him (he did not figure out Claude Code exsists), even when the results were very visibly bad - even from afar. I tried explaining stuff to him to no avail. I know this is a sample size of 1, but damn, I did not expect it to be that bad.
I did nothing in high school and then by 19 for fun on Saturdays I was checking out 5 non-fiction books from the library and spending all Saturday reading.
There was no inspiring teacher or anything like that for me that caused this. At 16 I only cared about girls and maneuvering within the high school social order.
The only thing I can think of that would have changed things for me is if the math club were the cool kids and the football team were the outcasts.
At 16 anything intellectual seemed too remote to bother. That is why I would suspect the real variable is ultimately how much the parents care about grades. Mine did not care at all so there was no way my 16 year old self was going to become intrinsically motivated to grow intellectually.
All AI would have done for me in high school would have been swapping a language model for copying my friend's homework.
For background I grew up in the US, my wife grew up in China. And how she grew up (in a high tier Shanghai Highschool) she says that is kind of how it was. Top social order was basically Rich and politically connected (not different from anywhere I guess) but also really good students. Where the best students are looked up to. But also just everyone asks you how you do all in school all of the time. There are students who focus more on sports and go to sports schools, but unless they end up going to the Olympics or something, its really looked down upon compared to those who specialize in STEM or more difficult subjects.
In my high school, honors/AP students weren't outcasts, we were kind of just a separate set mostly our own clique with some being popular and some not independently of being AP students. Like I happened to be Football Team Captain and in AP classes, 3 other Captains weren't in AP. Academic success was just a non factor.
Talking to students in order to gauge their understanding is not as easy or reliable as some people make it out to be.
There are many students who are basically just useless when required to answer on the spot, some of whom likely to score top-of-the-class given an assignment and time to work on it alone (in a proctored setting).
And then there are students whom are very likable and swift to pick up on subtle signals the examiners might be giving of, and constantly adjusting course accordingly.
Grading objectively is extremely hard when doing oral exams. Especially when when you're doing them back-to-back for an entire workday, which is quite likely to happen if most examination is to be done in this way.
It ended up being harder then writing an ordinary paper but taught us all a ton about citation and originality. It was a really cool exercise.
I imagine something similar could be done to teach students to use AI as a research tool rather then as a plagiarization machine.
Contemporary alternative: Copy/paste an essay entirely from LLM output, but make sure none of the information contained checks out. One would want to use an older model for this. :-)
Following to see what they do in the future.
Well it would give you a similar final artifact but it wouldn't be doing the exercise at all.
The key issue with schools is that they crush your soul and turn you into a low-agency consumer of information within a strict hierarchy of mind-numbing rules, rather than helping you develop your curiosity hunter muscles to go out and explore. In an ideal world, we would have curated gardens of knowledge and information which the kids are encouraged to go out and explore. If they find some weird topic outside the garden that's of interest to them, figure out a way to integrate it.
I don't particularly blame the teachers for the failings of school though, since most of them have their hands tied by strict requirements from faceless bureaucrats.
Doing derivatives, learning the periodic table, basic language and alphabet skills, playing an instrument are foundational skills that will require deliberate practice to learn, something that isn't typically part of project based learning. At some point in education with most fields, you will have to move beyond concepts and do some rote memorization and repetition of principles in order to get to higher level concepts. You can't gamify your way out of education, despite our best attempts to do so.
If it's something I need to do regularly, I eventually learn it through natural repetition while working towards the high level goal I was actually trying to achieve. Derivatives were like this for me. I still don't fully know the periodic table though, because it doesn't really come up in my life; if it's not something I need to do regularly, I just don't learn it.
My guess is this doesn't work for everything (or for everyone), and it probably depends on the learning curve you experience. If there are cliff edges in the curve that are not aligned with useful or enjoyable output, dedicated practice of some sort is probably needed to overcome them, which may take the form of rote learning, or, maybe better, spaced repetition or quizzing or similar. However at least for me, I've not encountered anything like that.
If I was to speculate why rote learning doesn't work well for me, I don't seem to experience a feeling of reward during it, and it seems like my ability to learn is really heavily tied somehow to that feeling. I learn far more quickly if it's a problem I've been struggling with for a while that I solve, or it's a problem I really wanted to solve, as the reward feeling is much higher.
Not something everyone learns. My kids seemed to enjoy it. My older daughter learned quite a lot of algebra etc. by doing physics.
> learning the periodic table
You do not need to rote learn all of it, and you remember enough by learning about particular elements etc.
> basic language and alphabet skills
My kids learned to read through firstly reading with me (or others) so enjoying the story and learning words as we went and guessing words on flashcards. Then on to reading because they linked it.
Admittedly none of the above was in school, but my point is that its not intrinsic to learning.
> At some point in education with most fields, you will have to move beyond concepts and do some rote memorization and repetition of principles in order to get to higher level concepts.
Not a great deal and it does not feel like as much of a grind if you enjoy the subject and know where you are going.
You know what kick started my kid's ability to read? A reading teacher sitting with her every single day and teaching her explicitly the drudgery of what reading was. And then me doing the same at home.
Rote is for kids like this and a lot of kids have areas like this. No my kid doesn't need as much math facts practice as she gets. But her cousin? That kid isn't learning anything without doing lines about how to add.
> A reading teacher sitting with her every single day and teaching her explicitly the drudgery of what reading was
Individual attention makes a huge difference.
I don't know if we'll ever be successful, but the entire point of gamification is to make the rote parts more palatable. A lot of gamification techniques try to model after MMO gaming for a reason, as that's a genre where people willingly subject themselves to a lot of rote tasks.
In software engineering we often come across build environments that make code iteration really difficult and slow, and speeding up that iteration cycle usually results in being able to experiment more and ship faster changes.
AI has potential to smooth out all curves so that students can learn faster and maximize time in flow.
I've spent literally thousands of hours thinking about this (and working on it). The future of education will be as different from today as today is to 300 years ago.
Kids used to get smacked with a stick if they spelled a word wrong.
People thought the threat of physical violence was a good way to teach. We have learned better. What else is there for us to learn? What have we already learned but just don't have the resources to apply?
I've met many educators who have told me stories of ambitions learning goals for students that didn't work because there weren't the time or resources to facilitate them properly.
Often instructors are stuck trading off between inauthentic assessments that have scalable evaluation methods or authentic exercises that aren't feasible to evaluate at scale and so evaluation is sparse, incomplete or students only receive credit for completion.
I soon changed my mind; I think those of us who become expert have often have really rich memories of a project where we learnt so much, but we just don't remember episodically all the accumulated learning that happened in boring classrooms to enable the project-induced higher order synthesis.
Doing rather than memorizing outdated facts in a textbook.
All of schooling breaks down to costs and society’s willingness and desire to invest in child nutrition, education, and training.
We simply do not even have the wherewithal to have the conversation about it, without getting blackholed by cultural minefields and assumptions of child rearing, parental responsibility, morality and religion.
The problem is that the structure pushes for teaching productivity which basically directly opposes good pedagogy at this point in the optimization.
Some specifics:
1. Multiple choice sucks. It's obvious that written response better evaluates students and oral is even better. But multiple choice is graded instantly by a computer. Written response needs TAs. Oral is such a time sink and needs so many TAs and lots of space if you want to run them in parallel.
1.5 Similarly having students do things on computers is nice because you don't have to print things and even errors in the question can be fixed live and you can ask students to refresh the page. But if the chatbots let them cheat too easily on computers doing hand written assesments sucks cause you have to go arrange for printing and scanning.
2. Designing labs is a clear LLM tradeoff. Autograded labs with testbenches and fill in the middle style completetions or API completetions are incredibly easy to grade. You just pull the commit before some specific deadline and run some scripts.
You can do 200 students in the background when doing other work its so easy. But the problem is that LLMS are so good at fill in the middle and making testbenches pass.
I've actually tried some more open ended labs before and its actually very impressive how creative students are. They are obviously not LLMs there is this diversity in thought and simplicity of code that you do not get with ChatGPT.
But it is ridiculously time consuming to pull people's code and try to run open ended testbenches that they have created.
3. Having students do class presentations is great for evaluating them. But you can only do like 6 or 7 presentations in a 1 hr block. You will need to spend like a week even in a relatively small class.
4. What I will say LLMs are fun for are having students do open ended projects faster with faster iterations. You can scope creep them if you expect expect to use AI coding.
Since the testing tool they use does notice and register 'paste'-events they've resorted to simply assigning 0 points to every answer that was pasted.
A few of us have been telling her to move to in-class testing etc. but like you also notice everything in the school organization pushes for teaching productivity so this does require convincing management / school board etc. which is a slow(er) process.
[0] https://cluely.com/
Can AI not grade written responses?
I was using a local LLM around 4B to 14B, I tried Phi, Gemma, Qwen, and LLama. The idea was to prompt the LLM with the question, the answer key/rubric, and the student answer. The student answer at the end did some prompt caching to make it much faster.
It was okay but not good, there were a lot of things I tried:
* Endlessly messing with the prompt. * A few examples of grading. * Messing with the rubric to give more specific instructions. * Average of K. * Think step by step then give a grade.
It was janky and I'll throw it up to local LLMs at the time being somewhat too stupid for this to be reasonable. They basically didn't follow the rubric very well. Qwen in particular was very strict giving zeros regardless of the part marks described in the answer key as I recall.
I'm sure with the correct type of question and correct prompt and a good GPU it could work but it wasn't as trivially easy as I had thought at the time.
If my son should grow up to run into the same kinds of cognitive limitations, I really don't know what I will tell him and do about it. I just wish there was a university in a Faraday cage somewhere where I could send him, so that he can have the same opportunities I had.
Fun fact on the side: Cambridge (UK) getting a railway station was a hugely controversial event at the time. The corrupting influence of London being only a short journey away was a major put-off.
I see collapsing under pressure to be either a kind of anxiety or a fixation on perfect outcomes. Teaching a tolerance for some kinds of failure is the fix for both.
Take away the internet. Except in a research/library scenario. Give them a limited time to complete tasks. This would promote a stronger work ethic, memory/recall and more realistic to time management skills. They need to learn to rely on themselves, not technology. The only effective way is to remove tech from the equation, otherwise the temptation to cheat to compete/complete is too strong.
Then allow students who want their homework evaluated for feedback to turn it in, but no homework will be graded.
This relegates the use of AI to personal choice of learning style and any misuse of AI is only hurting the student.
I'm a teacher. Kids don't have the capacity to make this choice without guidance. There are so so many that don't (can't?) make the link between what we teach and how they grow as learners. And this is at a rich school with well-off parents who largely value education.
Rather: don't grade homework. Make the homework rather the preparation that if you did it seriously will prepare you for the test (and if you didn't do it seriously, you won't have the skills that are necessary to pass the test).
Not sure why they don't just do that? It worked fine and would be compatible with LLM use.
A much bigger question is what to teach assuming we get models much more powerful than those we have today. I'm still confident there's an irreducible hard core in most subjects that's well worth knowing/training, but it might take some soul searching.
This topic has been an interesting part of the discourse in a group of friends the past few weeks because one of us is a teacher who has to deal with this on an almost daily basis and is struggling to get her students to not cheat and the options available to her are limited (yes, physical monitoring would probably work but requires concessions from the school management etc. it's not something that has an easy or quick fix available.)
[0] https://oxide-and-friends.transistor.fm/episodes/ai-in-highe...
Schools need to become tech free zones. Education needs to reorient around more frequent standardized tests. Any "tech" involved needs to be exclusively applied towards solving the supply and demand issue - the number of "quality teachers" to "students per classroom."
I admire Karpathy for advocating common sense, but none of this will happen because SV is full of IQ realists who only see "education" as a business opportunity and the bureaucratic process is too dysfunctional for common sense decisions to prevail. The future is chrome books with GPT browsers for every student.
She started grading conversation than the students have with LLMs.
From the question that the students ask, it is obvious who knows the material and who is struggling.
We do have a custom setup, so that she creates an homework. There is a custom prompt to avoid the LLM answering the homework question. But thats pretty much it.
The results seems promising, with students spending 30m or so going back and forth with the LLMs.
If any educator wants to Ty or is interested in more information, let me know and we can see how we collaborate.
That is just such a wildly cynical point of view, and it is incredibly depressing. There is a whole huge cohort of kids out there who genuinely want to learn and want to do the work, and feel like using AI is cheating. These are the kids who, ironically, AI will help the most, because they're the ones who will understand the fundamentals being taught in K-12.
I would hope that any "solution" to the growing use of AI-as-a-crutch can take this cohort of kids into consideration, so their development isn't held back just to stop the less-ethical student from, well, being less ethical.
There was a reddit thread recently that asked the question, are all students really doing worse, and it basically said that, there are still top performers performing toply, but that the middle has been hollowed out.
So I think, I dunno, maybe depressing. Maybe cynical, but probably true. Why shy away from the truth?
And by the way, I would be both. Probably would have used AI to further my curiosity and to cheat. I hated school, would totally cheat to get ahead, and am now wildly curious and ambitious in the real world. Maybe this makes me a bad person, but I don't find cheating in school to be all that unethical. I'm paying for it, who cares how I do it.
People aren't one thing.
Well, it seems the vast majority doesn't care about cheating, and is using AI for everything. And this is from primary school to university.
It's not just that AI makes it simpler, so many pupils cannot concentrate anymore. Tiktok and others have fried their mind. So AI is a quick way out for them. Back to their addiction.
Whatever solution we implement in response to AI, it must avoid hurting the students who genuinely want to learn and do honest work. Treating AI detection tools as infallible oracles is a terrible idea because of the staggering number of false reports. The solution many people have proposed in this thread, short one-on-one sessions with the instructor, seems like a great way to check if students can engage with and defend the work they turned in.
There’s a reason this stuff is banned in China. Their pupils suffer no such opiate.
School is packed with inefficiency and busywork that is completely divorced from the way people learn on their own. In fact, it's pretty safe to say you could learn something about 10x by typing it into an AI chat bot and having it tailor the experience to you.
> focused upon memorization and regurgitation
This is what is easy to test in-class.
Teachers worry about AI because they do not just care about memorization. Before AI, being able to write cohesive essays about a subject is a good proxy to prove your understanding beyond simple memorization. Now it's gone.
A lazy, irresponsible teacher who only cares about memorization will just grade students via in-class multi choices tests exclusively and call it a day. They don't need to worry about AI at all.
Take-homes were never a good proxy for anything because any student can pay for private "lessons" and get their homework done for them.
> A lazy, irresponsible teacher who only cares about memorization will just grade students via in-class multi choices tests exclusively and call it a day. They don't need to worry about AI at all.
What stops a diligent responsible teacher from doing in-class essays?
Who do you think will "learn" archery quicker? The kid writing an essay about it or the kid shooting a bow?
> Who do you think will "learn" archery quicker? The kid writing an essay about it or the kid shooting a bow?
The kid who imitates good archer's posture and motion.
It seems like AI will destroy education but it's only breaking the old education system, it will also enable a new and much better one. One where students make more and faster progress developing more relevant and valuable skills.
Education system uses multiple choice quizzes and tests because their grading can be automated.
But when evaluation of any exercise can be automated with AI, such that students can practice any skill with iterative feedback at the pace of their own development, so much human potential will be unlocked.
It's the softer, no memorizing, no tests, just assignments that you can hand in at anytime because there's no deadlines, and grades don't matter, type of education that is particularly useless with AI.
This applies both to education, and to what people need to know to do work. Knowing all the written stuff is less valuable. Automated tools can been able to look it up since the Google era. Now they can work with what they look up.
There was a time when programmers poured over Fundamental Algorithms. No one does that today. When needed, you find existing code that does that stuff. Probably better than you could write. Who codes a hash table today?
But the foundations start with memorisation.
No, lots of classes are focused on producing papers which aren't just memorization and regurgitation, but generative AI is king at... Generating text... So that class of work output is suspect now
Also, just like how calculators are allowed in the exam halls, why not allow AI usage in exams? In real-life job you are not going to avoid use of calculator or AI. So why test people in a different context? I think the tests should focus on the skills in using calculator and AI.
A calculator can be used to do things you know how to do _faster_ imho but in most jobs it still requires you to at least somewhat understand what is happening under the hood. The same principle applies to using LLMs at work imho. You can use it to do stuff you know how to do faster but if you don't understand the material there's no way you can evaluate the LLMs answer and you will be at fault when there's AI slop in your output.
eta: Maybe it would be possible to design labs with LLM's in such a way that you teach them how to evaluate the LLM's answer? This would require them to have knowledge of the underlying topic. That's probably possible with specialized tools / LLM prompts but is not going to help against them using a generic LLM like ChatGPT or a cheating tool that feeds into a generic model.
What you are desribing is that they should use LLM just after they know the topic. A dilemma.
I think you should be able to use the LMM at home to help you better understand the topic (they have endless patience and you can usually you can keep asking until you actually grok the topic) but during the test I think it's fair to expect that basic understanding to be there.
[0] https://news.ycombinator.com/item?id=46043012
Dig deeper into this. When are calculators allowed, and when are they not? If it is kids learning to do basic operations, do we really allow them to use calculators? I doubt it, and I suspect that places that do end up with students who struggle with more advanced math because they off loaded the thinking already.
On the other hand, giving a calculus student a 4 function calculator is pretty standard, because the type of math they can do isn't what is being tested, and having a student be able to plug 12 into x^3 - 4x^2 + 12 very quickly instead of having to work it out doesn't impact their learning. On the other hand, more advanced calculator are often not allowed when they trivialize the content.
LLMs are much more powerful than a calculator, so finding where in education it doesn't trivialize the learning process is pretty difficult. Maybe at grad level or research, but anything grade school it is as bad as letting a kid learning their times tables use a calculator.
Now, if we could create custom LLMs that are targeted at certain learning levels? That would be pretty nice. A lot more work. Imagine a Chemistry LLM that can answer questions, but know the homework well enough to avoid solving problems for students. Instead, it can tell them what chapter of their textbook to go read, or it can help them when they are having a deep dive beyond the level of material and give them answers to the sorts of problems they aren't expected to solve. The difficulty is that current LLMs aren't this selective and are instead too helpful, immediately answering all problems (even the ones they can't).
Um. yea. This is the first time a non-deterministic technology has achieved mass adoption for every day use. Despite repeated warnings (which are not even close to the tenor of warnings they should broadcast), folks don’t understand that AI will likely hallucinate some or all of their answer.
A calculator will not, and even the closest aspect of buggy behavior for a calculator (exploring the fringes of floating point numbers, for example) is light years away from the hallucination of generated AI for general, every day questions.
The mass exuberance over generative AI has been clouding folks from the very real effects of over-adoption or AI, and we aren’t going to see the full impact of that for some time, and when we do, folks are going to ask questions like “how were we so dumb?” And of course the answer will be “no one saw this coming.”
My spouse is an educator with nearly 20 years in the industry, and even her school has adopted AI. It’s shocking how quickly it has taken hold, even in otherwise lagging adoption segments. Her school finally went “1-1” with devices in 2020, just prior to COVID.
I'm not minimizing Karpathy in any way, but this is obviously the right way to do this.
Learning how to prepare for in-class tests and writing exercises is a very particular skillset which I haven't really exercised a lot since I graduated.
Never mind teaching the humanities, for which I think this is a genuine crisis, in class programming exams are basically the same thing as leetcode job interviews, and we all know what a bad proxy those are for "real" development work.
Confusing university learning for "real industry work" is a mistake and we've known it's a mistake for a while. We can have classes which teach what life in industry is like, but assuming that the role of university is to teach people how to fit directly into industry is mistaking the purpose of university and K-12 education as a whole.
Writing long-form prose and essays isn't something I've done in a long time, but I wouldn't say it was wasted effort. Long-form prose forces you to do things that you don't always do when writing emails and powerpoints, and I rely on those skills every day.
Preparing for a test requires understanding what the instructor wants. concentrate on the wrong thing get marked down.
Same applies to working in a corporation. You need to understand what management wants. It’s a core requirement.
Effect of AI applied to coding is precisely the opposite though?
AI code review has unquestionably increased the quality of my code by helping me find bugs before they make it to production.
AI coding tools give me speed to try out more options to land on a better solution. For example, I wrote a proxy, figured out problems with that approach, and so wrote a service that could accomplish the same thing instead. Being able to get more contact with reality, and seeing how solutions actually work before committing to them, gives you a lot of information to make better decisions.
But then you still need good practices like code review, maintaining coding standards, and good project management to really keep code quality high. AI doesn’t really change that.
AI helps people more that "write" (i.e. generate) low-quality code than people who write high-quality code. This means AI will lead to a larger percentage of new code being low-quality.
This will _never_ happen. Output will increase and quality will decrease.
I'm curious if we instead gave students an AI tool, but one that would intentionally throw in wrong things that the student had to catch. Instead of the student using LLMs, they would have one paid for by the school.
This is more brainstorming then a well thought-out idea, but I generally think "opposing AI" is doomed to fail. If we follow a montessori approach, kids are naturally inclined to want to learn thing, if students are trying to lie/cheat, we've already failed them by turning off their natural curiosity for something else.
AI _do_ currently throw in an occasional wrong thing. Sometimes a lot. A students job needs to be verifying and fact checking the information the AI is telling them.
The student's job becomes asking the right questions and verifying the results.
So, in learning environments we might not have an option but to open the floodgates to AI use, but abandon most testing techniques that are not, more or less, pen and paper, in-person. Use AI as much as you want, but know that as a student you'll be answering tests armed only with your brain.
I do pity English teachers that have relied on essays to grade proficiency for hundreds of years. STEM fields has an easier way through this.
Andrej and Garry Trudeau are in agreement that "blue book exams" (I.e. the teacher gives you a blank exam booklet, traditionally blue) to fill out in person for the test, after confiscating devices, is the only way to assess students anymore.
My 7 year old hasn't figured out how to use any LLMs yet, but I'm sure the day will come very soon. I hope his school district is prepared. They recently instituted a district-wide "no phones" policy, which is a good first step.
I guess high schools and junior highs will have to adopt something similar, too. Better condition those wrists and fingers, kids :-)
I'd be much more in favour of oral examinations. Yes, they're more resource-intensive than grading written booklets, but it's not infeasible. Separately, I also hope it might go some way to lessening the attitude of "teaching to the test".
It's a shame that some students will again be limited by how fast they can get their thoughts down on a piece of paper. This is such an artificial limitation and totally irrelevant to real world work now.
All for a calculator that can lie.
This sounds as if you expect that it will become possible to access an LLM in class without a phone or other similar device. (Of course, using a laptop would be easily noticed.)
1. Corporate interests want to sell product 2. Administrators want a product they can use 3. Compliance people want a checkbox they can check 4. Teachers want to be ablet to continue what they have been doing thus far within the existing ecosystem 5. Parents either don't know, don't care, or do, but are unable to provide a viable alternative or, can and do provide it
We have had this conversation ( although without AI component ) before. None of it is really secret. The question is really what is the actual goal. Right now, in US, education is mostly in name only -- unless you are involved ( which already means you are taking steps to correct it ) or are in the right zip code ( which is not a guarantee, but it makes your kids odds better ).
This assumes we even need more Terence Taos by the time these kids are old enough. AI has gone from being completely useless to solving challening math problems in less than 5 years. That trajectory doesn't give me much hope that education will matter at all in a few years.
So it is feasible (in principle) to give every student a different exam!
You’d use AI to generate lots of unique exams for your material, then ensure they’re all exactly the same difficulty (or extremely extremely close) by asking an LLM to reject any that are relatively too hard or too easy. Once you have generated enough individual exams, assign them to your students in your no-AI setting.
Code that the AI writes would be used to grade them.
- AI is great at some things.
- Code is great at other things.
- AI is bad at some things code is great for.
- AI is great at coding.
Therefore, leverage AI to quickly code up deterministic and fast tools for the tasks where code is best.
And to help exams be markable by code, it makes sense to be smart about exam structure - eg. only ask questions with binary answers or multiple choice so you don’t need subjective judgment of correctness.
One idea: Have students generate videos with their best "ELI5" explanations for things, or demos/teaching tools. Make the conciseness and clarity of the video video and the quality/originality of the teaching tools the grading criteria. Make the videos public, so classmates can compare themselves with their peers.
Students will be forced to learn the material and memorize it to make a good video. They'll be forced to understand it to create really good teaching tools. The public aspect will cause students to work harder not to feel foolish in front of their peers.
The beauty of this is that most kids these days want to be influencers, so they're likely to invest time into the assignment even if they're not interested in the subject.
You must have a pretty broad definition of useless / toxic if you think that reading, writing and basic math, but also geometry, calculus, linear algebra, probability theory, foreign languages, a broad overview of history, and basic competency in physics / electronics fall under these categories.
Sure, I learned a lot in school that turned out to be pretty useless for me (chemistry, basically anything I learned in PE, french), but I did not know that at the time and I am still grateful that I was being exposed to these topics. Some of my classmates developed successful careers from these early exposures.
Zero homework grades will be ideal. Looking forward to this.
1. Assume printing press exists 2. Now there's no need for a teacher to stand up and deliver information by talking to a class for 60 mins 3. Therefore students can read at home (or watch prepared videos) and test their learning in class where there's experts to support them 4. Given we only need 1 copy of the book/video/interactive demo, we can spend wayyyyy more money making it the best it can possibly be
What's sad is it's 500 years later and education has barely changed
From my extensive experience of four years of undergrad, the problem in your plan is "3. Therefore students can read at home " - half the class won't do the reading, and the half that did won't get what it means until they go to lecture[1].
[1] If the lecturer is any good at all. If he spends most of his time ranting about his ex-wife...
Granted, this was much less the case in grade school - but if students are going to see homework for the first time in college, I can see problems coming up.
If you got rid of homework throughout all of the "standard" education path (grade school + undergrad), I would bet a lot of money that I'd be much dumber for it.
If the concept is too foreign for them, I'm sure we could figure out how to replicate the grade school environment. Give them their 15 hours/week of lecture, and then lock them in a classroom for the 30 hours they should spend on homework.
IS NO ONE GOING TO POINT OUT MULTIPLE OF THOSE DOODLES ARE WRONG???
My wife is a teacher. He school did this a long time ago, long before AI. But they also gave every kid a laptop and forced the teachers to move all tests/assignments to online applications with the curriculum picked out by the administrators (read as: some salesperson talked them into it). Even with assignments done in class, it's almost impossible to catch kids using AI when they're all on laptops all the time and she can't teach and monitor them all at the same time.
Bring back pencil and paper. Bring back calculators. Internet connected devices do not belong in the classroom.
It's simply too complex to fix. I think we'll see increased investment by corporates who do keep hiring on remediating the gaps in their workforce.
Most elite institutions will probably increase their efforts spent on interviewing including work trials. I think we're already seeing this with many of the elite institutions talking about judgment, emotional intelligence critical thinking as more important skills.
My worry is that hiring turns into a test of likeability rather than meritocracy (everyone is a personality hire when cognition is done by the machines)
Source: I'm trying to build a startup (Socratify) a bridge for upskilling from a flawed education system to the workforce for early stage professionals
Calculator analogy is extremely inaccurate, understandably people keep doing this comparison. The premise is that calculator didn't take bookkeepers' job, but instead it helped them.
First of all calculator do one job and does it very well, you never question it because it solely works with numbers. But AI wants to be everything, calculator, translator, knowledge base etc.. And, it's very confident at everything all the time until you start to question it, and even then it continues to lie. Because sadly current AI products' purpose isn't to give you accurate answer, it's about making you believe that it's giving you credible information.
More importantly calculators are not connected to the internet, and they are not capable of creating profile of an individual.
It's sad to see big players push this agenda to make people believe that they don't need to think anymore, AI will do everything for them.
Also, all of these AI threats to public education can be mitigated if we just step 1-2 decades back and go the pen-and-paper way. I am yet to see any convincing argument in favor of digital/screen-based teaching methods being superior in any way than the traditional ones, on the contrary I have seen thousands of arguments against them.
It may not be obvious in a country with smaller student to teacher ratios, but for a place like India, you never have enough teachers for students.
Being able to provide courses, and homework digitally, reduced the amount of work required to grade and review work.
Then to add insult to injury, AI is removing entry level roles, removing other chances for people to do work which is easy to verify, practice and learn from.
Yes, yes, eventually tool use will result in increases in GDP. Except our incentives are not to hire more teachers, build more schools, and improve educational outcomes. Those are all public goods, not private goods. We aren’t going to tax firms further, because commerce must be protected, yet we will socialize the costs to society.
GPT 5.1 Pro made the same mistake ("Face the legs away from the door.") Claude Sonnet 4.5 agreed but added "Note: Most toaster ovens max out around 10-12 pounds for a whole turkey."
Gemini 3 acknowledged that toaster ovens are usually very compact and that the legs shouldn't be positioned where they will touch the glass door. When challenged, it hand-waved something to the effect of "Well, some toaster ovens are large countertop convection units that can hold up to a 12-pound turkey." When asked for a brand and model number of such an oven, it backtracked and admitted that no toaster oven would be large enough.
Changing the prompt to explicitly specify a 12-pound turkey yielded good answers ("A 12-pound turkey won't fit in a toaster oven - most max out at 4-6 pounds for poultry. Attempting this would be a fire hazard and result in dangerously uneven cooking," from Sonnet.)
So, progress, but not enough.
I don't yet know how we get AI to teach unruly kids, or kids with neurodivergencies. Perhaps, though, the AI can eventually be vastly superior to an adult because of the methods it can use to get through to the child, keep the child interested and how it presents the teaching in a much more interactive way.