The idea was simple: a teacher uploads a syllabus or notes, and the agent generates a test paper from that material. The hard requirement was reliability. No invented questions, no drifting outside the syllabus.
Instead of trying to “fix” hallucinations with better prompts, I constrained the agent’s job very narrowly.
I defined:
a fixed knowledge base (only the uploaded syllabus)
explicit tools the agent was allowed to use
a structured output format for the test paper
a hardness distribution (for example 30% easy, 50% medium, 20% hard)
Once those constraints were in place, the behavior changed a lot. The agent stopped being creative in the wrong places and consistently produced usable test papers. The quality improvement came from reducing freedom, not from changing models.
I built this using GTWY.ai, mainly because it let me wire together a knowledge base, step-level tool permissions, and model choice without writing a lot of glue code. But the interesting part for me wasn’t the platform, it was the pattern.
It made me wonder:
Are others seeing similar results by narrowing agent scope instead of adding verification layers?
Do constraints scale better than smarter models for production use cases?
For education or other regulated domains, is this how people are actually shipping agents?
Curious what’s working for others in real deployments
0 comments