AI Engineering: A Comprehensive Guide To Building Applications With Foundation Models

AI Engineering by Chip Huyen

AI Engineering: Building Applications with Foundation Models is a groundbreaking book written by Chip Huyen, an accomplished AI systems expert, educator, and entrepreneur. Known for her clarity and ability to demystify technical subjects, Huyen has produced a timely guide for leaders who want to harness the power of artificial intelligence to build smarter products, streamline operations, and stay competitive in a world transformed by foundation models like ChatGPT and Claude.

The book is not a programming manual. It’s a strategic, accessible framework that empowers business leaders, product owners, and innovators to understand, deploy, and refine AI systems—without needing to write code. In essence, it’s a roadmap for turning AI from a buzzword into practical business value.

Why This Book Matters for Leaders and Entrepreneurs

Whether you’re building a startup, leading innovation inside a large company, or simply looking to future-proof your business, AI Engineering offers a unique and valuable perspective. It focuses on how to build AI products, not just models—emphasizing usability, performance, cost-efficiency, and long-term adaptability.

In a business environment where AI is rapidly reshaping communication, operations, and customer expectations, understanding how to lead AI projects is a critical leadership skill. Chip Huyen’s book gives decision-makers the language, structure, and mental models needed to lead AI transformation—without being technical experts.

Overview of the Book’s Core Premise

The book introduces “AI engineering” as a new discipline focused on building applications powered by foundation models—versatile AI models trained on massive data that can perform a wide range of tasks out-of-the-box. It walks the reader through the full lifecycle of AI product development: from use case selection and evaluation to prompt design, finetuning, data strategy, system architecture, and feedback integration.

At every step, the book is driven by a simple truth: success in AI is not about chasing hype or using the biggest model. It’s about building systems that solve real problems for real users.

Key Ideas and Concepts from the Book

Foundation Models as a Platform

The book explains how models like GPT-4 and Claude can serve as platforms for many different business tasks—from content creation and data organization to workflow automation and decision support. These models, when properly adapted, can reduce development time and make AI accessible across industries.

The AI Engineering Stack

Chip Huyen breaks down the AI stack into three layers: the model layer (foundation models), the orchestration layer (prompting, tools, agents), and the application layer (business logic and user interface). This structure helps non-technical leaders think in terms of modular systems instead of monolithic solutions.

Evaluation and Feedback as Strategic Tools

Rather than obsessing over model accuracy alone, the book encourages teams to build systems that evaluate usefulness, user satisfaction, and outcomes. Feedback isn’t an afterthought—it’s the fuel for continuous improvement.

Low-Code and No-Code AI Development

The book emphasizes that modern AI tools allow entrepreneurs to build prototypes and deploy solutions without coding expertise. With thoughtful design and the right tools, business professionals can own the AI product lifecycle from end to end.

Responsible and Scalable AI

Scalability isn’t just about performance—it’s about minimizing cost, latency, and hallucination. The book explains how to make AI systems robust, efficient, and aligned with ethical goals, even as they grow.

Practical Lessons for Leaders and Entrepreneurs

Start with the Problem, Not the Model: Effective AI projects begin with clear, validated use cases. Leaders should identify business problems that AI can uniquely solve—not just apply AI for the sake of innovation.
Choose the Right Adaptation Strategy: There are many ways to customize a foundation model: prompt engineering, RAG (retrieval-augmented generation), finetuning, or using agents. Leaders must assess cost, complexity, and ROI to choose the best fit.
Build with Feedback Loops in Mind: Successful AI systems learn from users. Whether it’s thumbs-up/down feedback, comment fields, or follow-up questions, leaders must bake in mechanisms for gathering and using feedback from day one.
Design for Modularity and Change: AI evolves quickly. Systems should be built with components that can be updated independently—prompts, datasets, models—so your team can adapt quickly without rebuilding from scratch.
Invest in Data Quality, Not Just Quantity: Leaders often focus on model selection but ignore data quality. Clean, relevant, and well-labeled data—especially from your own business context—is more valuable than massive datasets from the web.
Prioritize Inference Efficiency: Every AI interaction has a cost. Leaders should understand the balance between speed, accuracy, and resource consumption. Smarter prompts, smaller models, and caching can significantly cut costs.
Think Like a Product Manager: Treat your AI system like a product: plan launches, track performance, gather user input, and iterate. AI is not a one-time project but an evolving asset.
Empower Non-Technical Teams: With the rise of no-code tools, anyone can participate in building and refining AI systems. Leaders should encourage experimentation and cross-functional collaboration around AI projects.

AI Engineering by Chip Huyen is more than a technical guide—it’s a leadership manual for building intelligent, adaptive systems that drive real business value. It demystifies the AI product lifecycle and empowers entrepreneurs, executives, and change agents to lead with confidence in the AI age.

Whether you’re launching a startup, modernizing operations, or planning your company’s next big leap, this book gives you the strategic clarity to lead AI initiatives that are not only innovative but also sustainable and impactful.

1. Understanding the Foundation of AI Engineering

The first chapter of AI Engineering: Building Applications with Foundation Models by Chip Huyen introduces readers to the transformative power of foundation models and the rising discipline of AI engineering. This chapter is particularly important for entrepreneurs and business leaders who may not have technical backgrounds but are looking to harness AI to create better products, services, and workflows. By stripping away the jargon and breaking down complex concepts with relatable examples, Chapter 1 sets the stage for business innovation without requiring coding expertise.

What Are Foundation Models and Why Should You Care?

Imagine hiring a team member who can write, translate, design images, organize data, and answer customer questions—all at once and in multiple languages. That’s what foundation models offer. These AI systems are trained on vast amounts of data and can perform a wide variety of tasks without needing to be rebuilt for each one. Think of them as Swiss Army knives for digital work: versatile, powerful, and ready to be adapted to your business.

Before foundation models, building AI applications was like assembling a machine from scratch. You needed specialized engineers, large datasets, and months of development. Now, with these models readily available through APIs (essentially, plug-and-play services), businesses can integrate AI into operations with minimal technical effort.

The Rise of AI Engineering as a Business Discipline

AI engineering refers to the process of building applications on top of these pre-trained models. Unlike traditional machine learning, which required building a model from the ground up, AI engineering focuses on using what’s already built and adapting it to your needs. This makes AI development faster, cheaper, and more accessible.

For example, ChatGPT and Midjourney are based on foundation models and can generate text and images respectively. Entrepreneurs use them to draft marketing copy, design prototypes, answer support queries, and even analyze legal contracts. These aren’t just tech tools—they’re productivity enhancers.

Why Now Is the Right Time for Entrepreneurs

Three major factors make this the perfect moment for business leaders to embrace AI:

First, foundation models are general-purpose tools. One model can do many things—write emails, generate product descriptions, translate languages—making it easier to test different use cases without large investments.

Second, investment in AI is booming. According to Goldman Sachs, AI investment in the U.S. alone is expected to hit $100 billion by 2025. Businesses that embed AI now are positioning themselves ahead of the curve.

Third, the entry barrier is lower than ever. With APIs and no-code tools, you don’t need to know how to program. You just need to understand your business problem and how AI can help solve it.

Real-World Example: Automating Customer Communication

Let’s say your business receives hundreds of customer inquiries every week. Responding to each one takes time, and quality can vary. A foundation model can analyze and respond to these emails using natural language. You don’t need to build this capability yourself. Services like OpenAI or Google provide access to models that can do this through simple setup processes. You provide the prompt (for example, “Respond to this email in a friendly tone and offer a discount if the customer is unhappy”) and the model completes the task.

Action Steps to Implement AI Engineering in Your Business

Identify repetitive or time-consuming tasks in your operations. These could be writing email responses, summarizing customer feedback, generating content for social media, or even organizing information from spreadsheets. Make a list of what slows your team down or requires manual work.
Determine if these tasks involve text, images, or data that can be processed by AI. Foundation models are particularly strong with language and visuals. If a task involves analyzing or creating text, chances are AI can help.
Experiment with a publicly available tool such as ChatGPT or Claude. Try giving it a task you’ve identified, like writing a customer response or creating a product description. Review the output and compare it to what a human team member would do. This gives you a sense of the model’s strengths and limitations.
Think about how to integrate this into your workflow. Could you use AI to generate first drafts that your team edits? Could it respond to customer emails after manager review? Start with a small process and gradually expand once you’re confident in the results.
Plan for iteration and improvement. AI outputs may not be perfect at first. You can improve results by adjusting your prompts (instructions given to the model), reviewing outputs, and gradually automating more steps.

From Evaluation to Execution

One of the key lessons in Chapter 1 is the importance of evaluation. Before investing time or money, you need to ask: “Should I build this AI application?” and “What value will it deliver?” For business leaders, this means thinking in terms of outcomes. Will the AI save time, reduce errors, improve customer experience, or scale a service?

You don’t have to make this decision alone. Chapter 1 advises that your role is not to become a technical expert, but to ask the right questions. Just like you wouldn’t build your own legal software but would evaluate what it does for your business, the same applies to AI.

Comparing AI Engineering to Traditional Development

Traditional machine learning was like building a car from parts: slow, costly, and technical. AI engineering is like choosing a car model that fits your needs and customizing the seats and GPS. You don’t need to understand how the engine works; you need to know where you want to go.

For example, in traditional development, building a recommendation system for products would require collecting user data, training a model, and maintaining that model. With AI engineering, you can use a foundation model to generate suggestions based on product descriptions, customer reviews, or chat history—no model-building required.

Practical Example: AI-Powered Product Descriptions

Imagine running an e-commerce business. Writing unique product descriptions for hundreds of items takes time and often lacks consistency. Using AI, you can provide a few examples of well-written descriptions, and the model can generate the rest. Want them to sound playful? Formal? Eco-conscious? Just tell the model in your prompt. This is known as prompt engineering, and it’s a key technique introduced in the book.

What Entrepreneurs Should Remember

AI doesn’t replace your business know-how—it amplifies it. Foundation models are tools, and like any tool, their value depends on how you use them. Chapter 1 emphasizes that AI engineering is not about reinventing the wheel; it’s about using smarter wheels to get to your destination faster.

The promise of AI is not in complexity, but in usability. You don’t need to code. You don’t need to build a model. You just need to understand what your business needs and test how AI can meet those needs.

Getting Started with Confidence

Pick a single use case—just one area in your business where AI might help. Keep it small and manageable.
Try using a foundation model tool to perform this task. Use natural language to describe the job, and see what results you get.
Evaluate the output. Ask: Is this useful? Can this reduce workload? Could it improve consistency?
Share your findings with your team and plan how to integrate the tool. This might involve combining AI outputs with human review or using it to brainstorm ideas.
Monitor, tweak, and repeat. AI gets better the more you work with it, and the more feedback you provide, the better your results will become.

Chapter 1 of AI Engineering lays out a clear message for entrepreneurs: AI is ready for you. It’s not a mystery, it’s not unreachable, and it’s certainly not only for large tech companies. The tools are powerful, accessible, and flexible. By understanding what foundation models are and how they can be used, you open up a world of possibilities to grow your business, serve your customers better, and compete in a rapidly evolving digital economy.

Start small, think strategically, and let AI engineering become a practical part of your entrepreneurial journey.

2. Understanding Foundation Models

Chapter 2 of AI Engineering: Building Applications with Foundation Models by Chip Huyen provides a foundational understanding of what makes these powerful tools work and why they are transforming business, technology, and the very idea of product development. For entrepreneurs and business leaders without a technical background, this chapter offers a clear, approachable explanation of the key mechanics behind foundation models—along with examples and strategies to help you think about applying them in your own business.

What Are Foundation Models, Really?

At their core, foundation models are large AI systems trained to handle many tasks using natural language, images, and even code. They don’t just “know” how to do one thing; instead, they can adapt to many different jobs—like a multitasking team member who learns new skills quickly.

These models work by identifying and predicting patterns in massive amounts of data. Think of it as teaching a machine to write emails, answer questions, translate languages, or generate marketing content—all based on patterns it has seen before. Once trained, these models can generalize: they can take what they’ve learned from one domain and apply it to another. This is why you can ask a model like GPT-4 to write a blog post, summarize a contract, or explain a scientific concept—all using the same underlying intelligence.

Two Key Ingredients: Self-Supervised Learning and Scale

One of the most important ideas in Chapter 2 is that foundation models are trained using a technique called self-supervised learning. Unlike traditional AI models, which rely on labeled datasets (for example, “this is a cat,” “this is a dog”), foundation models learn by predicting missing parts of data. For instance, they might learn by filling in the next word in a sentence, similar to how a person might guess the next word in a familiar phrase.

This learning method allows models to be trained on vast amounts of unlabeled data pulled from the internet, books, and other public sources. Imagine feeding the model every business newsletter, Wikipedia article, and Reddit comment ever written. The result is a model that can mimic how humans use language.

The second ingredient is scale. These models are massive—measured in billions of parameters (the parts of a model that store what it’s learned). Bigger models tend to perform better across a wider range of tasks. Think of it like comparing a trainee with one week of experience to a seasoned expert who has read everything on a subject. The more training and exposure, the more capable the model becomes.

Why Entrepreneurs Should Care

This chapter makes it clear: you don’t need to build your own foundation model to benefit from it. Companies like OpenAI, Anthropic, and Google have already done the heavy lifting. As a business leader, your role is to understand how to use these models to drive outcomes—whether that means better customer service, faster content creation, or smarter analytics.

For example, if you run a marketing agency, you can use foundation models to generate email campaigns, summarize client briefs, or produce social media content. You can even tailor the tone—professional, casual, enthusiastic—simply by telling the model what you want. And because these models are general-purpose, you can apply them to new challenges as your business grows.

Not Just Big Data—Big Transferability

Chapter 2 emphasizes transferability as a superpower of foundation models. This means they can apply what they’ve learned in one context to a different one. For example, a model trained on legal documents might also perform well in writing business contracts, even if it’s never seen your company’s specific format before.

This matters because you don’t need to feed the model tons of specific training data from your business to start seeing results. You can use a well-designed prompt—just a carefully worded instruction—to guide the model to do what you need.

Action Steps to Implement Chapter 2 Learnings in Business

Understand the types of tasks foundation models are good at. These include generating written content, summarizing long documents, translating languages, answering questions, writing code (even if you don’t read code), and more. Identify areas in your business where these capabilities could reduce manual work.
Start with simple experiments using general-purpose AI tools. Tools like ChatGPT or Claude are based on foundation models. Ask them to summarize a meeting transcript, draft a follow-up email, or rewrite a product description in a different tone. See what kind of output you get.
Learn the basics of prompt design. The prompt is your way of instructing the model. A good prompt clearly explains what you want: “Summarize this contract in plain English” or “Write a 200-word product description in a friendly tone.” Test different wordings to see how results change.
Don’t worry about data labeling or model training. That’s already been done. Your focus should be on adapting the model’s capabilities to your workflow. For example, use AI to scan and summarize customer reviews to uncover common complaints or product suggestions.
Think in terms of transferability. If you use a model to write blog posts, it can probably also write newsletters or landing pages. Look for similar tasks in your business that can be supported with minimal adjustment.
Be mindful of when general knowledge is enough—and when it’s not. Foundation models are trained on broad data, so they may not know your unique product or brand voice. In these cases, you can “prime” the model with examples. For instance, share two examples of your typical customer service replies before asking the model to draft new ones.

Strategy Before Technology

Chapter 2’s greatest contribution for non-technical readers is helping them understand that using AI doesn’t mean building AI. Instead of worrying about how these systems are trained, entrepreneurs should focus on how they behave—and how to guide them.

Foundation models are like incredibly capable interns. They need clear instructions, examples, and supervision. But once onboarded, they can deliver enormous value across your organization. The opportunity is not just to automate, but to amplify—your creativity, your insight, and your team’s productivity.

By understanding the fundamentals outlined in this chapter—self-supervised learning, scalability, and transferability—you gain the confidence to treat AI not as a mystery, but as a business tool. One that’s ready to use, flexible, and surprisingly intuitive when guided well.

3. Evaluation Methodology

In Chapter 3 of AI Engineering: Building Applications with Foundation Models, Chip Huyen introduces one of the most critical, yet often overlooked, aspects of working with AI: evaluation. For entrepreneurs and business leaders without coding experience, understanding how to evaluate AI models is essential—not in technical terms, but in practical, business-relevant ways. The chapter offers an accessible framework that helps you assess whether an AI tool is producing useful, reliable, and cost-effective results for your specific goals.

Why Evaluation Matters in Business

Imagine hiring a new employee without setting expectations or measuring their output. You wouldn’t know if they were performing well, improving over time, or making costly mistakes. The same is true for AI. Without a way to measure its output, you can’t tell whether the AI is helping or hurting your business.

AI outputs—especially from foundation models—can appear confident but be wrong. They might generate beautiful text that sounds convincing but contains factual errors or misses the point entirely. Chapter 3 explains that successful AI deployment requires setting clear criteria for quality and then consistently measuring performance against those benchmarks.

Three Types of Evaluation

The chapter introduces three main categories of evaluation: human evaluation, automatic metrics, and AI-as-a-judge.

Human evaluation is the gold standard. It involves people—your team or your customers—reviewing outputs and rating their quality. This approach is slow but often the most accurate.

Automatic metrics are faster but may not reflect real-world usefulness. For example, a common metric like BLEU score (used in translation tasks) compares AI output to a “correct” answer, but it may miss nuances or creative variation.

AI-as-a-judge is a newer technique where another AI model evaluates the output. While still experimental, it’s faster than human review and more scalable than manual checks.

These methods can be adapted to business contexts. For instance, if your AI writes marketing emails, your team can rate how persuasive or on-brand the content feels (human evaluation), compare it to previous high-performing copy (automatic metrics), or even ask another AI to assess tone and clarity (AI-as-a-judge).

What You Should Evaluate

Chapter 3 highlights several essential dimensions of AI performance:

Helpfulness: Does the AI output serve its intended purpose? For example, if it’s supposed to summarize a legal document, does the summary capture the key points without distortion?
Correctness: Are the facts accurate? This is crucial for tasks like financial analysis, legal review, or product descriptions.
Toxicity and Bias: Is the content appropriate and fair? AI models can sometimes generate offensive or biased language, especially when handling sensitive topics or customer data.
Cost and Latency: How long does the model take to respond? How much does each call cost? These are especially important when using commercial APIs.
Consistency: Does the model perform reliably across different inputs? For example, does it always follow your brand tone or fluctuate wildly depending on phrasing?

These aren’t abstract concerns—they are daily business realities. If you’re using AI to communicate with customers, errors can damage trust. If you rely on it for product data, inaccuracies can lead to returns or compliance issues.

Action Steps for Evaluating AI in Your Business

Define what “good” means for each use case. If you’re using AI to write blog posts, you might care most about clarity, tone, and engagement. If you’re summarizing customer reviews, you might prioritize accuracy and coverage of key themes. Make these criteria explicit so your team knows what to look for.
Create a small review process. Have one or two people manually check a sample of AI outputs each week. Ask them to rate quality on a 1–5 scale and flag any problems. This doesn’t have to be formal or time-consuming. Even five reviews per week can reveal trends.
Test against real examples. Compare the AI’s output to past successful work—emails that had high open rates, social posts with strong engagement, or reports that helped you make good decisions. Use these comparisons to judge whether the AI is improving or declining over time.
Use structured prompts to control output. If your AI model tends to go off-topic, try giving it clearer instructions. For instance, instead of “Write a sales pitch,” say, “Write a three-sentence pitch for this product, highlighting the main benefit and including a customer quote.” Tighter prompts make evaluation easier.
Track response time and cost per task. Even if the AI writes great content, it’s not viable if it’s too slow or expensive. Record how long tasks take and what you pay per use. Consider whether the AI is replacing human effort or just duplicating it.
Consider a scoring system. Build a simple dashboard or spreadsheet with metrics like helpfulness, correctness, and consistency. Use these to monitor progress over time. If scores drop, you’ll know it’s time to revise your prompts or model choice.

Business Example: Evaluating an AI Customer Service Tool

Suppose you use AI to handle basic customer inquiries. Chapter 3 helps you evaluate this setup across several dimensions.

Start by defining what matters: speed, correctness, and tone. Review a sample of responses weekly. Are customers getting useful answers? Are they polite and aligned with your brand voice? Are any answers completely wrong or misleading?

If you notice repeated mistakes—like confusing return policies or missing details—adjust your prompts or add context. You might instruct the model to “Only answer using this policy document” or “Begin every message with a thank you.” These changes can improve accuracy and consistency.

Track the number of conversations handled, the time saved, and any escalation rates (when customers still need human help). This data tells you whether the AI tool is making a real impact.

Don’t Skip Evaluation—Even for Simple Tasks

Chapter 3 warns against the temptation to rely on gut feel or assume AI “just works.” AI models, especially foundation models, are powerful but imperfect. Without regular checks, you risk publishing incorrect information, alienating customers, or misrepresenting your brand.

By evaluating early and often, you catch problems before they scale. And you give your team confidence that AI isn’t replacing quality—it’s enhancing it.

Evaluation is a Competitive Advantage

In fast-moving markets, the companies that succeed with AI won’t be those with the flashiest tools—they’ll be the ones that use those tools most responsibly and effectively. Chapter 3 reframes evaluation not as a chore, but as a leadership opportunity. It’s how you turn a powerful technology into a reliable asset for your business.

By defining success clearly, monitoring performance, and adjusting your approach, you set your business up to benefit from AI safely, affordably, and at scale. And most importantly, you stay in control of how the technology serves your customers, your team, and your vision.

4. Evaluate AI Systems

Chapter 4 of AI Engineering: Building Applications with Foundation Models by Chip Huyen dives deeper into the practical challenges of evaluating AI systems once they are deployed. This chapter is particularly useful for entrepreneurs and business leaders who may not be technical but need to ensure that AI applications truly deliver business value. Evaluating an AI system—not just its underlying model—is about assessing how the entire product works in real life. It includes performance, safety, cost, reliability, and user experience. This article unpacks Chapter 4’s insights and provides actionable steps you can use to evaluate AI tools in your own business operations.

Why You Must Evaluate More Than the Model

Many AI tools today are built on top of foundation models like GPT-4 or Claude. However, the model is just one part of the equation. An AI system also includes your data, prompts, interface, infrastructure, and customer interaction layer. You may use the same model as a competitor, but your system can perform better or worse depending on how you put these pieces together.

Think of it like using the same espresso machine. One café produces a perfect latte; another serves something bitter and slow. The difference isn’t the machine—it’s the process, the ingredients, and the environment. Chapter 4 teaches you how to evaluate the entire process to ensure your AI product serves your customers reliably and responsibly.

What Should You Evaluate in an AI System?

Chapter 4 identifies several key dimensions of system-level evaluation that matter to business leaders.

Quality of Output: This refers to how accurate, helpful, and appropriate the AI’s responses are. For example, if you’re using AI to write product descriptions, does the content reflect the product’s true benefits and match your brand tone?
Consistency: Does the system deliver reliable outputs over time and across tasks? If one day it writes excellent marketing copy and the next it creates errors, your brand credibility is at risk.
Latency (Response Time): How fast does the system respond? For customer-facing tools, slow replies can frustrate users. Even internal tools lose value if employees must wait.
Cost and Resource Use: How much does each interaction cost? Even if the tool is effective, it’s not scalable if it’s too expensive to run for every user.
Robustness and Safety: Is the system resilient to misuse? Can it handle strange inputs or edge cases without failing? This matters for tools exposed to the public or used in high-stakes environments.
Security and Privacy: Is customer data protected? This is especially critical in industries like finance, healthcare, or education.
User Feedback Loop: Is there a mechanism to collect and act on user feedback? This helps the system improve over time and remain aligned with business needs.

Action Steps to Evaluate Your AI System

Start by mapping the full AI pipeline. Identify each part of your system, from data input (such as customer questions) to model response (like a chatbot answer), and through to the user experience (what the customer sees). Understanding the full journey helps pinpoint where problems may occur.
Define evaluation metrics for each stage. For output quality, this could include clarity and accuracy. For latency, measure how many seconds users wait for a response. For cost, track usage per dollar. Set clear expectations so you can measure what matters most to your business goals.
Use real user scenarios for testing. Instead of generic test cases, simulate real-world interactions. For example, test your AI support tool with actual customer complaints or refund requests. This gives a more realistic view of performance.
Build a feedback loop. Ask users to rate AI responses or flag mistakes. Even a simple thumbs-up/down system gives you valuable data. Over time, this helps improve prompts or adjust how the AI is used.
Track failures and edge cases. Create a process to log when the AI gives wrong or irrelevant answers. This could be a shared spreadsheet where team members record examples. Use this log to refine inputs or escalate sensitive topics to human review.
Monitor response time and cost regularly. Some AI models may become slower or more expensive depending on usage volume or model updates. Stay informed so you can make changes if performance declines or costs rise.
Consider fallback strategies. For mission-critical tasks—like sending legal documents or processing refunds—set rules for when to defer to a human. This ensures safety and trust.

Business Example: Evaluating an AI Tool for Drafting Proposals

Imagine you run a consulting firm and use AI to draft client proposals. At first, the tool seems to save hours of work. But after a few weeks, you notice inconsistencies: some proposals are brilliant, others are vague. Clients complain about delays in receiving drafts.

Following Chapter 4’s guidance, you map the pipeline. You realize that the AI performs well only when given a full client brief—but your team often inputs partial information. You define new rules: always include company name, project goals, and expected budget in prompts. You set a timer: proposals must be generated in under 30 seconds. You introduce a feedback tool where team members rate each draft from 1 to 5.

Within weeks, proposal quality improves, delays shrink, and your team begins to trust the system more. You didn’t change the AI model—you changed the system around it.

Why This Matters for Non-Technical Leaders

You don’t need to understand machine learning to evaluate an AI system. What you need is business intuition: What does success look like? Where can things go wrong? How will we know when something needs fixing?

Chapter 4 empowers business leaders to approach AI like any other operational system. Ask the right questions. Measure the right outcomes. And ensure your tools work as promised—not just once, but consistently, affordably, and securely.

Evaluate Like a CEO

The real message of Chapter 4 is this: you can’t improve what you don’t evaluate. AI isn’t magic—it’s a system. And like any system, it needs oversight, testing, and refinement. The leaders who build great AI products aren’t the ones with the most advanced models. They’re the ones with the best processes to make those models work. By applying the principles of system-level evaluation, you can build AI tools that don’t just impress—they deliver.

5. Prompt Engineering

Chapter 5 of AI Engineering: Building Applications with Foundation Models by Chip Huyen introduces one of the most essential and practical skills for working with AI: prompt engineering. This chapter is a goldmine for entrepreneurs and business leaders who want to get meaningful output from AI tools without writing a single line of code. Prompt engineering is simply the art of writing instructions that guide an AI model to do what you want—whether that’s writing a product description, summarizing customer feedback, or generating marketing content.

By learning to craft better prompts, you can significantly improve the quality, accuracy, and relevance of the AI’s output. This article explains the key ideas from Chapter 5 in plain language and shows how you can use them to drive real business value.

What Is a Prompt and Why It Matters

A prompt is the input you give to an AI model to guide its response. It’s your side of the conversation. The model’s output depends heavily on how you phrase your prompt. Just like giving a vague brief to an intern results in poor work, a vague prompt leads to subpar AI output. Well-crafted prompts, on the other hand, help the AI understand your expectations and produce responses that are more aligned with your needs.

For example, if you ask the AI, “Write about this product,” the response may be generic. But if you say, “Write a 100-word description of a leather backpack for college students, highlighting durability, comfort, and modern design,” the result is likely to be sharper and more useful.

Prompt Engineering as a Business Skill

Chapter 5 positions prompt engineering as a critical business skill—not a technical one. It’s about communication, clarity, and understanding your audience. This makes it especially relevant for marketers, content creators, product managers, and service professionals.

Just like learning how to write an effective email or ad headline, learning to write good prompts can dramatically boost productivity. You can get better copy faster, summarize documents more accurately, and make AI tools more predictable and reliable.

Key Concepts from the Chapter

Prompt Templates: These are reusable instructions with placeholders. For example, you could create a prompt template like: “Summarize the following customer complaint in two sentences and suggest one action we can take to improve: [insert complaint].” Templates help you scale AI usage across teams with consistent output.

System Prompts vs. User Prompts: Foundation models often use system prompts to guide overall behavior (e.g., “You are a helpful assistant.”). You usually control the user prompt—the part that gives specific instructions for each task. Understanding this helps you shape how the AI responds.

Few-Shot Prompting: This involves providing examples of what you want before asking the model to generate a new response. For instance, if you want customer service replies to follow a certain tone, you can include two good examples and then ask the AI to handle a new case in the same style.

Prompt Chaining: Instead of writing one big prompt, break the task into smaller steps. First, ask the model to extract key facts from a document. Then, in a second prompt, use those facts to write a summary. This often improves accuracy and clarity.

Instructions Matter: Adding specific phrases like “Be concise,” “Use a friendly tone,” or “Respond in bullet points” can dramatically change the output. The more explicit your instruction, the more control you have over the result.

Real-World Example: Improving Marketing Copy

Suppose you’re using AI to write product blurbs for an e-commerce site. Early results feel bland. Instead of accepting this, you apply prompt engineering principles from Chapter 5.

You create a prompt template: “Write a 3-sentence product description for [product], targeting [audience]. Highlight [benefit 1], [benefit 2], and [benefit 3]. Use an enthusiastic tone.”

You then test a few versions. One includes a system prompt: “You are an expert brand copywriter.” Another provides two sample blurbs that performed well in past campaigns.

Over time, your prompts become more refined, and the AI produces copy that sounds like your brand, resonates with customers, and saves hours of manual work.

Action Steps to Apply Prompt Engineering in Your Business

Identify one recurring task where AI output feels inconsistent or low-quality. This could be content generation, email replies, report summaries, or customer service messages. Start by focusing on this use case.
Break down what a great result looks like. Write two or three sample outputs you’d consider “ideal.” This gives you a target for prompt engineering.
Design a prompt template using the structure: task + context + desired style or tone. For example: “Summarize the following review into a cheerful two-sentence testimonial.”
Experiment with few-shot prompts. Include a couple of examples followed by a new input. Evaluate how the results change.
Test small adjustments. Add instructions like “Avoid repetition” or “Focus on emotional benefits.” Watch how the model adapts. Log what works well so your team can reuse successful prompts.
Share successful prompt templates with your team. Store them in a shared document or workflow tool so others can benefit from your work and contribute improvements.
Chain prompts for complex tasks. Instead of asking for a full report, first extract key points, then summarize them, then write the report. This approach creates more consistent and readable output.
Set up a review loop. Ask team members to rate AI outputs generated from your new prompts. Collect feedback and iterate on the prompt to improve accuracy and usefulness.

Clarity Is the New Code

Chapter 5 emphasizes that great AI outcomes come from great instructions. You don’t need to know how to code—you need to know how to be clear. Prompt engineering is the bridge between business goals and technical systems. When you master it, you unlock the full potential of AI tools in your organization.

The future of work will increasingly rely on people who know how to “speak AI.” By learning to design better prompts, you position yourself and your business to operate smarter, faster, and more creatively—without ever touching a line of code.

6. RAG and Agents

Chapter 6 of AI Engineering by Chip Huyen introduces two transformative concepts that every entrepreneur and business leader should know: Retrieval-Augmented Generation (RAG) and Agents. These concepts enable AI to not just generate text or answer questions, but to dynamically access information and autonomously complete tasks. For non-technical leaders, understanding these tools opens new possibilities for automating workflows, enhancing customer interactions, and delivering scalable AI services without needing to write a single line of code.

What Is Retrieval-Augmented Generation (RAG)?

Retrieval-Augmented Generation (RAG) is a method that enhances the output of AI models by allowing them to fetch relevant information from external sources instead of relying solely on pre-trained knowledge. Think of it like this: instead of expecting your assistant to know everything off the top of their head, you give them access to your company documents, product manuals, and knowledge bases so they can look up the most accurate information when needed.

For example, imagine you run a travel consultancy. A customer asks about visa requirements for five countries. Instead of training your AI assistant on thousands of scenarios, RAG lets it search through the latest embassy websites or your internal guides and respond with accurate, timely details. This is particularly useful in fast-changing industries like finance, travel, or law, where static knowledge quickly becomes outdated.

What Are Agents?

Agents take RAG a step further. While RAG helps with information retrieval, agents are designed to perform sequences of actions toward a goal. Think of them as autonomous digital employees. They can reason about what needs to be done, retrieve information if necessary, and take the appropriate steps to complete a task.

Consider a sales manager who wants to automate lead follow-ups. A traditional AI might write an email. An agent, however, can look into the CRM, find the right contact, craft a personalized message, send the email, and schedule a follow-up—without human intervention.

Agents consist of three parts: the executor (what actions the agent can take), the planner (deciding what steps to take), and memory (storing useful information from past interactions). This structure makes agents ideal for workflows such as onboarding new employees, managing marketing campaigns, or assisting with procurement.

How RAG and Agents Work Together

The real power comes when RAG and agents are combined. RAG ensures that the agent has access to up-to-date and relevant information. The agent then uses this information to decide and execute the best course of action. For example, a legal advisor agent could use RAG to access the latest regulations and then walk a user through the legal steps of starting a company in their country—generating forms, emailing the right authorities, and scheduling consultations.

Action Steps to Implement RAG and Agents in Your Business

Identify Repetitive Information-Based Tasks: Start by mapping out tasks in your business that rely on information retrieval and can be made consistent. For example, onboarding customers, responding to FAQs, handling internal policy requests, or preparing reports.
Organize Your Knowledge Sources: RAG depends on having access to relevant, structured information. Gather your company’s documents, product descriptions, customer service logs, and process guides. Tools like Notion, Google Docs, or SharePoint can be used to store this information. Make sure it’s up to date and easy to search.
Use No-Code Platforms to Build a RAG System: Several tools now allow you to build RAG-enhanced AI assistants without coding. Examples include ChatGPT with custom GPTs, Zapier with OpenAI plugins, and platforms like LangChain or LlamaIndex (some may require technical setup, but no-code integrations are emerging rapidly). Use these tools to let your AI assistant pull live data when needed rather than relying on canned responses.
Design Agent Workflows for Key Use Cases: Pick one or two high-impact areas—such as lead generation, customer onboarding, or compliance checklists—and outline the steps involved. Use AI tools with agent capabilities to automate these steps. Define clear goals and let the agent decide how to reach them based on current data.
Test and Iterate with Real Users: Deploy your RAG and agent-based tools in a limited setting. Watch how they perform with real users. Are they fetching the right data? Are they completing tasks correctly? Gather feedback and refine. Business teams should focus on accuracy, usefulness, and clarity.
Monitor and Maintain Your Agent Systems: Agents learn over time. But they also need oversight. Set up a review process to ensure that your agents are not making outdated or incorrect decisions. Update your knowledge sources regularly and refine agent instructions (often written as natural language goals) based on business needs.
Start Small, Then Scale: Don’t try to automate everything at once. Start with one or two valuable use cases and expand as you gain confidence. For example, start with a knowledge assistant for your team, then expand to customer support, sales outreach, or inventory management.

Real-World Example from the Book

The book shares an example of an agent being used to automate customer onboarding. Instead of giving customers a list of steps, the agent used RAG to fetch relevant documents and compliance rules, then walked the user through a personalized setup. It answered questions using internal knowledge, generated documents, and even scheduled follow-up meetings. This dramatically reduced customer friction and allowed employees to focus on more strategic work.

Why This Matters for Business Leaders

RAG and agents are not just technical tools—they are business enablers. They allow entrepreneurs and executives to scale decision-making, provide better customer service, and reduce operational overhead. And with the rise of no-code platforms, these capabilities are now accessible to anyone with a clear goal and structured information.

By understanding and applying RAG and agent technologies, business leaders can empower their teams with digital workers that think, act, and improve over time—freeing up human talent for creativity and strategy.

7. Finetuning

Chapter 7 of AI Engineering by Chip Huyen breaks down the process of finetuning, a powerful method that allows AI models to better understand and serve specific business needs. Unlike using generic AI tools, finetuning enables entrepreneurs and leaders to create AI systems tailored to their domain, tone, customers, and workflows. For non-technical readers, this means you don’t have to settle for one-size-fits-all AI responses—you can personalize AI performance for your business using your own data and examples.

What Is Finetuning?

Finetuning is the process of teaching a foundation model—like ChatGPT or Claude—to perform better on tasks that are specific to your business. Think of a foundation model as a highly intelligent intern who knows a bit about everything. Finetuning is like onboarding this intern with examples from your past work, customer interactions, or unique communication style, so they can perform more like a trained employee.

For instance, if you run a legal consultancy and want your AI assistant to draft emails with a professional legal tone while referencing local regulations, finetuning allows you to train the model on past emails and regulatory content. This ensures your assistant communicates in the way your clients expect.

Why Use Finetuning?

Generic models are trained on the internet, which means they carry assumptions, biases, and generalizations. While this broad knowledge is useful, it may not align with your brand voice or industry standards. Finetuning helps you reduce hallucinations (made-up information), increase relevance, and improve response consistency. More importantly, it gives your AI a competitive edge that cannot be copied easily.

The book highlights that businesses often try prompt engineering (clever instructions) first. However, when prompts reach their limits—especially when dealing with complex domain-specific language or repeated workflows—finetuning becomes the better choice.

Example from the Book

One example given is a company that builds AI to help healthcare workers summarize patient notes. Initially, they used prompt engineering to get the model to extract useful summaries. But the results were inconsistent. By finetuning the model with hundreds of real examples—carefully cleaned and labeled—they improved accuracy and reliability. This saved doctors hours of work and ensured summaries were medically sound.

Action Steps to Implement Finetuning in Your Business

Decide Whether Finetuning Is Necessary: Before diving in, evaluate if prompt engineering is sufficient. If you only need simple tasks—like rewording emails or summarizing meeting notes—prompts might work. But if you want your AI to reflect your business logic, speak in your brand voice, or perform tasks based on specialized knowledge, finetuning will be more effective.
Gather and Curate Quality Data: Finetuning requires good examples. Start collecting data that reflect how you want your AI to behave. This could include customer support transcripts, email responses, reports, legal memos, or product descriptions. The key is to use examples that are consistent and high quality. Imagine you’re training a new employee—give them clear, strong examples, not messy or outdated ones.
Choose the Right Model and Tool: Many providers like OpenAI, Google, and Anthropic allow you to finetune their base models. Some no-code or low-code platforms are emerging to support business users. If you prefer not to deal with code, work with a consultant or agency that specializes in finetuning, or use platforms that simplify the process (e.g., fine-tune GPT-3.5 with a spreadsheet of inputs and outputs).
Structure the Data for Training: Your data should follow a format that the AI can learn from. One common method is to provide the model with a series of “prompt and response” pairs—like questions and ideal answers, or tasks and ideal outputs. If you’re working with customer support, each “ticket” could become a pair showing how you want your AI to respond to similar cases in the future.
Run a Pilot Finetuning Session: Start with a small batch of data (e.g., 100 to 500 examples) and fine-tune a model. Evaluate its performance on new tasks using real-world examples. Make sure you’re measuring things that matter: clarity, tone, accuracy, and whether it saves you or your team time.
Test Before Scaling: Finetuned models are not perfect. Test them thoroughly. Ask colleagues to review AI responses and note areas of improvement. You may need to iterate—add more examples, adjust the training data, or combine finetuning with prompt engineering to get the best results.
Maintain and Refresh Your Model: Business evolves, and so should your AI. Set a regular cadence—perhaps quarterly—to review how your finetuned model is performing. Refresh it with new examples that reflect changes in your products, policies, or customer preferences. Treat your AI like a living system, not a one-off tool.

When Not to Use Finetuning

The book also warns against using finetuning when the task can be solved with simple prompting or retrieval-based systems (like RAG). Finetuning is best when tasks are complex, consistent, and domain-specific. It also requires some upfront investment, so use it where the return is high.

Empowering Business Leaders Through Personalization

Finetuning brings a new level of intelligence and customization to AI applications. It allows entrepreneurs to build AI tools that reflect their brand, serve niche markets, and deliver better customer experiences. You no longer need to accept generic AI tools—you can train your own, just as you’d train a top-performing team member.

By following these steps, business leaders without any coding experience can successfully direct AI personalization efforts. Finetuning transforms AI from a helpful tool into a strategic asset—one that grows smarter and more aligned with your goals over time.

8. Dataset Engineering

Chapter 8 of AI Engineering by Chip Huyen introduces a vital concept often overlooked by non-technical leaders: dataset engineering. While most entrepreneurs are drawn to flashy AI tools or models, the real secret to building powerful, reliable AI lies in the quality and structure of your data. Dataset engineering is about creating and refining the data that trains, evaluates, or supports your AI system—whether it’s a chatbot, recommendation engine, or customer service tool.

In simple terms, if your AI is like a chef, dataset engineering is about sourcing and prepping the best ingredients. Without good ingredients, even the best chef can’t make a great meal.

What Is Dataset Engineering?

Dataset engineering involves designing, collecting, cleaning, and curating the right data for your specific AI application. This includes labeled examples for training, prompts for retrieval systems, or evaluation sets to measure performance. For business leaders, it’s the process of transforming existing documents, emails, chat logs, forms, and feedback into usable formats that guide AI behavior.

A key insight from the chapter is that many failures in AI performance are not because of bad models—but because of bad data. High-quality data doesn’t mean quantity alone; it means that each data point is clear, relevant, and representative of the task you want the AI to perform.

Why It Matters for Business Leaders

Business leaders already possess data goldmines: CRM notes, customer service transcripts, internal wikis, marketing emails, and more. Dataset engineering is the process of organizing and refining this information so that AI systems can learn and respond more effectively. It enables you to reduce errors, improve performance, and make AI systems aligned with your brand and objectives.

In industries like law, finance, education, or healthcare, domain-specific data can make or break the AI’s usefulness. Dataset engineering is how you give your AI business intuition.

Example from the Book

One compelling example discussed in the book is a company using AI to help teachers evaluate student essays. Initially, the AI gave generic feedback. By designing a dataset that included annotated essays—showing what good feedback looks like, categorized by writing level and style—they were able to train a model that could generate detailed, accurate, and personalized evaluations. This reduced teacher workload while improving student outcomes.

Action Steps to Start Dataset Engineering for Your Business

Define the Outcome You Want From AI: Start by clarifying the goal. Do you want the AI to answer questions like a support agent, summarize reports, personalize emails, or offer recommendations? This will guide what kind of data you need. Be specific. For example, instead of “improve customer service,” you might aim for “respond to order delay queries with empathy and clarity.”
Identify the Best Data You Already Have: Look inside your organization for examples of excellent performance. This could be the way your top salesperson responds to objections, your customer success team’s best help desk responses, or how your brand speaks in newsletters. Extract these examples and organize them. This data is your training material. If your business is new, generate synthetic examples based on how you’d like the AI to behave.
Clean and Format Your Data for Learning: Remove errors, irrelevant information, and inconsistencies. Organize your examples in clear input-output formats: questions and answers, problems and solutions, or prompts and ideal completions. The book stresses that AI learns from consistency. You’re not just feeding data—you’re teaching behavior.
Balance for Diversity and Representativeness: Make sure your data reflects the range of scenarios your AI will face. For instance, if you run a fashion retail brand, include queries about returns, sizing, style recommendations, and product care. But avoid overwhelming the dataset with repetitive or overly similar examples. Variety helps AI become flexible, while consistency teaches it to stay on-brand.
Label and Annotate Carefully: When training AI to do tasks such as summarizing or classifying, labels are important. Think of labels as signposts for the AI. In a customer service setting, you might label queries as “billing issue,” “technical help,” or “general info.” Proper labeling helps the AI recognize patterns and respond more accurately.
Test Your Dataset With Real-World Prompts: Before using your dataset to train a model or feed into a retrieval system, test it. Ask your team to simulate real queries and see how the AI performs using the current data. You may discover gaps—missing scenarios, unclear examples, or inconsistent tone. Improve your dataset based on this feedback.
Iterate and Update Regularly: Just like your business, your data evolves. Add new examples as your products change, your customers evolve, or your brand voice matures. The book emphasizes that good dataset engineering is not a one-time task—it’s a continuous process of reflection and refinement.

Practical Use Cases Across Industries

In retail, dataset engineering can help train AI to upsell or cross-sell based on past purchases. In finance, it ensures compliance-focused answers that match regulatory tone. In healthcare, curated datasets ensure that AI recommendations align with clinical guidelines. Across sectors, the principle is the same: your AI is only as good as the data you feed it.

The Business Value of Better Datasets

Well-engineered datasets reduce hallucinations (false information), improve response quality, and create trust in AI tools. More importantly, they allow you to build proprietary AI assets—systems that reflect your knowledge, your brand, and your way of doing business. This becomes a competitive advantage that cannot be easily copied by competitors using off-the-shelf models.

For business leaders, mastering dataset engineering doesn’t mean learning to code. It means learning to think like a teacher—curating the best examples, showing your AI what great looks like, and reinforcing it over time. By doing so, you transform AI from a tool into a trained member of your team.

9. Inference Optimization

Chapter 9 of AI Engineering by Chip Huyen focuses on a topic critical for business leaders who want to scale AI affordably and efficiently: inference optimization. Inference is the moment when an AI system takes a user request and generates a response. While the AI model’s training determines what it knows, inference determines how it performs in real-time. For entrepreneurs and executives, this chapter explains how to make AI systems not only smarter—but faster and more cost-effective.

Think of inference like customer service. Training an AI is like onboarding and educating your staff. Inference is when your staff answers customer calls. It needs to be fast, clear, and efficient. The cost of inference is what businesses pay each time the AI answers a query. Optimizing this cost is essential when deploying AI at scale.

What Is Inference Optimization?

Inference optimization involves improving the speed, efficiency, and cost of running AI models. The goal is to deliver high-quality responses while using fewer resources. This becomes especially important as companies deploy AI in customer service, analytics, sales outreach, or personalized experiences, where each interaction costs time and money.

According to the book, most AI projects fail to scale not because the model isn’t good—but because it’s too slow or expensive to run frequently. Optimization strategies help ensure that AI systems remain practical for daily use in real-world applications.

Business Examples from the Book

One example in the book highlights a company using AI to summarize earnings reports. Initially, their model was highly accurate but slow and expensive. By optimizing inference—through smaller models, batching, and prompt adjustments—they were able to cut costs by 70% while maintaining response quality. Another case involved a customer support tool. The company improved latency (speed) by deploying AI in a way that prioritized simple questions for faster processing while reserving heavier computing for complex queries.

These examples illustrate that inference optimization can have a massive impact on profitability and customer experience.

Action Steps to Apply Inference Optimization in Your Business

Evaluate Current AI Usage and Costs: Start by reviewing where and how AI is being used in your business. Identify the most frequent or expensive use cases—like chatbots, content generation, or data summaries. If you’re paying per query or per token, small inefficiencies can quickly become large expenses. Get clarity on what each AI interaction costs you.
Prioritize High-Value Interactions: Focus your optimization efforts where the business impact is highest. For instance, if your AI is helping close sales or handle VIP customer issues, prioritize those. For lower-value tasks—like auto-tagging documents—you can afford simpler and faster models. Don’t over-invest in high-cost AI where a simpler system would suffice.
Use Prompt Optimization for Efficiency: The way you ask an AI to do something (your prompt) affects both quality and cost. Shorter, clearer prompts not only give better results but use fewer resources. Chip Huyen emphasizes that prompt design is a business skill, not just a technical one. You can reframe complex instructions into modular steps or use reusable templates to reduce duplication.
Experiment with Smaller or Specialized Models: Bigger isn’t always better. The book explains how smaller models—especially those fine-tuned on your own data—can perform just as well for specific tasks. For example, a finetuned model that only writes customer follow-ups may outperform a general-purpose one at a fraction of the cost. Evaluate if smaller or open-source models can meet your needs.
Batch Requests Where Possible: If your business uses AI for bulk operations—like generating hundreds of marketing descriptions—batching them together instead of sending them one by one can dramatically reduce costs. Many platforms offer bulk-processing modes or allow you to automate grouping tasks during off-peak hours.
Cache Repeated Responses: Often, AI systems are asked the same questions repeatedly. Instead of generating new responses every time, store and reuse answers for common queries. For instance, if your chatbot answers “What’s your return policy?” a hundred times a day, cache the best version of the answer and reuse it.
Choose the Right Deployment Option: If you’re deploying AI internally or through vendors, make sure you’re not overpaying for speed or accuracy that isn’t needed. Some tasks can afford a few seconds’ delay. Others may require near-instant responses. Understand your latency tolerance for each use case and choose options that fit.
Measure and Monitor Regularly: Set up dashboards to monitor how fast your AI responds, how much it costs per request, and how often it’s used. Without this, it’s impossible to optimize. Business leaders should treat inference cost like any other operational metric—tracked, reviewed, and improved continuously.
Combine AI with Rules-Based Systems: Not everything needs to go through AI. Sometimes, simple rules are faster and cheaper. For example, if a customer asks, “What are your store hours?”, a rules-based lookup might be faster than asking the AI. Use AI for judgment-heavy tasks and combine it with straightforward automation where appropriate.

Scaling Smartly with Inference Optimization

Inference optimization is not about cutting corners—it’s about making sure you get the most out of your AI investments. For business leaders, the benefits are clear: lower costs, faster user experiences, and more predictable performance. As AI becomes part of daily operations, inference becomes a key lever for operational efficiency.

What’s empowering is that most optimization doesn’t require coding. It requires clarity, experimentation, and strategic thinking—traits entrepreneurs already use daily. Whether you’re automating customer replies, generating reports, or building interactive agents, inference optimization ensures you scale responsibly and sustainably.

By understanding and applying the principles from Chapter 9, you’ll not only reduce costs but also build AI systems that serve your business with speed, precision, and impact.

10. AI Engineering Architecture and User Feedback

Chapter 10 of AI Engineering by Chip Huyen outlines how to design robust AI systems and continuously improve them through feedback. For entrepreneurs and business leaders, this chapter is a blueprint for thinking beyond isolated AI features and toward scalable systems that grow smarter over time. It introduces a practical AI engineering architecture and emphasizes that user feedback is not just helpful—it’s essential.

At its core, AI engineering isn’t just about using models. It’s about building intelligent systems with clear workflows, flexible components, and constant iteration. With a smart architecture and feedback loop, even non-technical teams can deploy AI that adapts, personalizes, and delivers increasing business value.

What Is AI Engineering Architecture?

AI engineering architecture refers to how you structure the different components of your AI system—from user interface to data to decision-making—to ensure it performs reliably and evolves with use. The chapter explains that good architecture supports modularity (easy to update parts), observability (tracking performance), and feedback integration (learning from users).

A useful business analogy is your operations department. If it’s well-structured, everyone knows their role, can report on their work, and can adjust based on customer feedback. AI systems should function similarly.

The Five-Layer Architecture

Chip Huyen proposes a five-layer architecture for AI systems. Each layer plays a role and can evolve independently:

User Interface (UI): How people interact with the AI (e.g., chatbot, app, email).
Application Layer: Manages user sessions and business logic.
Orchestration Layer: Coordinates which tools or actions to use (e.g., whether to search a database or generate a new answer).
Foundation Model Layer: Where the AI logic lives—language models like GPT.
Data and Feedback Layer: Stores documents, data, and feedback used to improve performance.

This structure allows business teams to make improvements without breaking the whole system. For example, if feedback reveals that answers are too long, only the prompting logic in the orchestration layer needs adjusting—not the whole model.

Why Feedback Is Central to AI Success

Unlike traditional software, AI systems don’t behave identically every time. They learn, adapt, and sometimes make mistakes. That’s why feedback is not optional—it’s your tool for directing the AI toward better performance. This includes user thumbs-up/down, comments, edits, and follow-up actions. Over time, these signals guide retraining, prompt changes, or content updates.

The book shares a real-world example of a product team that added simple thumbs-up/down buttons to their AI assistant. This led to the discovery that users disliked overly generic answers. By tweaking the prompting and model routing, they increased user satisfaction significantly.

Action Steps to Build AI Systems With Feedback in Mind

Start With a Clear Use Case and Workflow: Define what you want your AI to do and how users will interact with it. Will it answer questions, draft content, or process forms? Design a simple journey: how the user starts the task, how the AI responds, and how the user confirms or corrects the outcome. This clarity will shape your architecture.
Design Modular Components: Avoid building monolithic solutions. Use tools and platforms that let you separate your interface, logic, AI prompts, and data sources. This makes it easier to update only what’s broken or outdated. For example, if your AI-generated marketing emails sound too robotic, you can revise the prompt template without rebuilding the full system.
Integrate Feedback From Day One: Even the best models won’t get everything right. Design feedback channels into your product from the beginning. This could be thumbs-up/down buttons, comments, revision options, or follow-up prompts like “Was this helpful?” Make giving feedback frictionless. Don’t wait until users complain—invite feedback early and often.
Track and Store Feedback Transparently: Set up systems to collect and analyze feedback in a structured way. This might involve tagging feedback by category (too long, inaccurate, irrelevant) and linking it to the original prompt or input. Over time, patterns will emerge, and your team can make targeted improvements.
Use Feedback to Improve Prompting and Logic: Many issues can be fixed not by retraining the AI, but by rewriting prompts or adjusting orchestration logic. If feedback shows the AI misunderstands tone, update the prompt to emphasize the brand voice. If users ask similar questions, update routing rules to fetch pre-written responses or better source documents.
Measure Outcomes, Not Just Accuracy: Traditional AI systems are judged by accuracy. But for business applications, usefulness matters more. Track how AI impacts key metrics: time saved, user satisfaction, repeat usage, or sales conversions. Feedback loops should help optimize for these outcomes, not just correctness.
Create a Feedback Culture Within Your Team: Encourage everyone—from support agents to marketing managers—to share how the AI is doing. AI performance should not be a black box. Make feedback review part of regular team rituals and share wins when user suggestions lead to better outcomes.
Plan for Continuous Iteration: The book makes it clear: AI is not a launch-it-and-leave-it system. The most successful AI tools are updated weekly or even daily. Plan resources and workflows to review, prioritize, and act on user feedback regularly. Over time, your system will become sharper, more aligned with business goals, and more trusted by users.

Building Trust and Loyalty Through AI Responsiveness

When users see that their feedback changes the AI’s behavior, they trust the system more. It becomes less of a gimmick and more of a partner. Businesses that create this virtuous cycle—use, feedback, improvement—win long-term loyalty.

For non-technical business leaders, this chapter is a call to action: you don’t need to build the model, but you must build the system. Think like a systems designer. Focus on outcomes. Enable iteration. And most importantly, listen to your users—because your AI certainly can.

Table of Contents

AI Engineering by Chip Huyen

Why This Book Matters for Leaders and Entrepreneurs

Overview of the Book’s Core Premise

Key Ideas and Concepts from the Book

Foundation Models as a Platform

The AI Engineering Stack

Evaluation and Feedback as Strategic Tools

Low-Code and No-Code AI Development

Responsible and Scalable AI

Practical Lessons for Leaders and Entrepreneurs

1. Understanding the Foundation of AI Engineering

What Are Foundation Models and Why Should You Care?

The Rise of AI Engineering as a Business Discipline

Why Now Is the Right Time for Entrepreneurs

Real-World Example: Automating Customer Communication

Action Steps to Implement AI Engineering in Your Business

From Evaluation to Execution

Comparing AI Engineering to Traditional Development

Practical Example: AI-Powered Product Descriptions

What Entrepreneurs Should Remember

Getting Started with Confidence

2. Understanding Foundation Models

What Are Foundation Models, Really?

Two Key Ingredients: Self-Supervised Learning and Scale

Why Entrepreneurs Should Care

Not Just Big Data—Big Transferability

Action Steps to Implement Chapter 2 Learnings in Business

Strategy Before Technology

3. Evaluation Methodology

Why Evaluation Matters in Business

Three Types of Evaluation

What You Should Evaluate

Action Steps for Evaluating AI in Your Business

Business Example: Evaluating an AI Customer Service Tool

Don’t Skip Evaluation—Even for Simple Tasks

Evaluation is a Competitive Advantage

4. Evaluate AI Systems

Why You Must Evaluate More Than the Model

What Should You Evaluate in an AI System?

Action Steps to Evaluate Your AI System

Business Example: Evaluating an AI Tool for Drafting Proposals

Why This Matters for Non-Technical Leaders

Evaluate Like a CEO

5. Prompt Engineering

What Is a Prompt and Why It Matters

Prompt Engineering as a Business Skill

Key Concepts from the Chapter

Real-World Example: Improving Marketing Copy

Action Steps to Apply Prompt Engineering in Your Business

Clarity Is the New Code

6. RAG and Agents

What Is Retrieval-Augmented Generation (RAG)?

What Are Agents?

How RAG and Agents Work Together

Action Steps to Implement RAG and Agents in Your Business

Real-World Example from the Book

Why This Matters for Business Leaders

7. Finetuning

What Is Finetuning?

Why Use Finetuning?

Example from the Book

Action Steps to Implement Finetuning in Your Business

When Not to Use Finetuning

Empowering Business Leaders Through Personalization

8. Dataset Engineering

What Is Dataset Engineering?

Why It Matters for Business Leaders

Example from the Book

Action Steps to Start Dataset Engineering for Your Business

Practical Use Cases Across Industries

The Business Value of Better Datasets

9. Inference Optimization

What Is Inference Optimization?

Business Examples from the Book

Action Steps to Apply Inference Optimization in Your Business

Scaling Smartly with Inference Optimization

10. AI Engineering Architecture and User Feedback

What Is AI Engineering Architecture?

The Five-Layer Architecture