What building an AI chatbot taught me about real engineering
Photo by Igor Omilaev on Unsplash
When I was handed the task of building an AI-powered healthcare chatbot at Heartcore, I assumed the hard part would be the AI. I was wrong. The hard part was everything around it.
The brief was simple enough: build a chatbot that could answer health questions from users of a female-health app. We had a dataset of curated Q&A pairs in a CSV file, an OpenAI API key, and a deadline. I started where most engineers start — just sending every user query straight to the API and returning the result.
It worked. But it was expensive, inconsistent, and occasionally wrong in ways that mattered. For a health product, "occasionally wrong" isn't acceptable.
The pipeline I didn't know I needed
The fix wasn't more AI — it was less of it, used more deliberately. I redesigned the system as a three-tier pipeline:
First, try an exact match against the CSV dataset. If the user's question matches something we already have a vetted answer for, return that. No API call, no cost, no hallucination risk. Second, if there's no exact match, try fuzzy matching — find semantically similar questions using NLP techniques and return the closest curated answer. Only if both of those fail does the query reach the OpenAI API.
The result was a system that was cheaper to run, more consistent in its answers, and safer for users — because the high-confidence answers came from humans, not a language model.
The lesson I took from this: AI is a powerful tool, but it works best when you design the system around it carefully — not when you throw everything at it and hope for the best.
The part nobody talks about: duplicate responses
Once the pipeline was running, a new problem emerged. Users were asking the same question in ten different ways, and getting ten slightly different answers — sometimes contradictory ones. The NLP matching logic was treating each phrasing as a distinct query rather than recognizing them as semantically equivalent.
I spent a week rethinking the query-matching logic from scratch. The goal was to cluster semantically similar questions and always resolve them to the same canonical answer. It was unglamorous work — a lot of testing edge cases, tuning similarity thresholds, and reading through chat logs to find the failure modes. But when it worked, the chatbot felt genuinely coherent for the first time.
What I actually learned
Building this system taught me something I've carried into every project since: the most important engineering decisions aren't about which technology to use. They're about where to put the boundaries between what the machine decides and what humans control.
For a health product, those boundaries matter enormously. Getting them wrong doesn't just cost money — it erodes user trust in something people are relying on for real information about their bodies.
I'm a full-stack engineer, and I love the variety that comes with that. But this project is the one that made me think seriously about AI not just as a feature to integrate, but as a system to design around. That shift in perspective has shaped how I approach every product I've worked on since.
If you're working on products that sit at the intersection of AI and real user needs, I'd love to connect.