Testimonial - Collect and embed testimonials in minutes

Sergiy Korniychuk

Staff Software Engineer - Full Stack at Sondermind Inc

Recently I took part in a course called "AI Evals For Engineers & PMs" led by Hamel H. and Shreya Shankar. It provided a solid foundation for assessing systems that use AI in various products. My biggest takeaway is the brilliant "three gulfs" mental model: the gulfs of comprehension, specification, and generalization. This model completely reframed how I approach understanding and improving LLM applications. It's incredibly powerful and a practical way to break down where things might be going wrong. First, you see if your LLM even understands what users want. Next, you check if your prompts are clear and specific. Finally, you see if your system can handle new or unexpected cases. It's simple but powerful, and something I plan to apply to every new feature and prompt I work on. Another thing that's truly stood out for me is how important trace-based analysis is. Whether you're working with real user data or building synthetic examples, looking at traces is the only way to spot real problems. That's where the real insights come from, and it stops you from wasting time on things that don't matter. This has fundamentally shifted my focus to solving upstream problems. I also really like the idea of starting with code-based evaluators before using LLM-as-Judge - for example checking if response is in the correct JSON format might be very easy to evaluate. This approach advocates saving the more complex, subjective evaluations for later, and instead, focusing on automating the easy wins first. It's about being pragmatic and effective, not just measuring for measurement's sake. Practically, this course has directly impacted how I operate as an engineer. I now see prompts as carefully crafted specifications and understand that the evaluation process is an iterative, continuous cycle. I'm more convinced now about looking at failure modes first, collaborating with domain experts to define ground truth, and ensuring our evals are simple, durable, and tied directly to customer outcomes. It's no longer 'prompt and forget,' but 'prompt, measure, and refine.' Overall, the course content was exceptional, offering a rare balance of theory and practical application. The instructors were incredibly experienced, and their emphasis on real-world scenarios made everything immediately applicable. They also had a lot of guests speakers with real life experience - that was really helpful. This has been one of the most beneficial and enriching learning experiences I've had, and I highly recommend it to anyone serious about building robust AI products!