- AI Fire
- Posts
- π€ AI Agents: Models, Tools, & The Secret To Making Them Work (Part 1)
π€ AI Agents: Models, Tools, & The Secret To Making Them Work (Part 1)
Why do many AI projects fail? They skip the most important step. Learn the 3 ingredients (Models, Tools, Evals) to build AI that actually works.

What's your main use for AI right now? |
Table of Contents
Introduction: Don't Just "Talk" To AI, "Give It A Job"
You probably use AI chatbots a lot. You ask a question, it gives an answer. You ask it to write an email, it writes an email. This is very useful, but it's... passive. Itβs like a smart person you can "talk to," but it won't actually do anything for you.
Now, imagine this:
You tell your AI, "Look at the last 100 customer feedback emails, find the top 3 biggest problems they are complaining about, and write an email to the tech team summarizing those 3 problems."
And it does it. It doesn't just write the email. It reads, thinks, analyzes, and summarizes all on its own.
That is an "AI Agent." Itβs not a "talker," itβs a "doer."
After many months of trying and building these myself, I realized this is the next big step for AI. But many guides out there are just too complex.
In this two-part series, I'm going to break down everything you need to know. We will use simple language, just like we're talking, so anyone can understand.
In Part 1 (this article): We will learn what an AI Agent really is. We will look at the 3 basic "ingredients" you need to build one. And most importantly, we will look very closely at the "secret" to making them work well: how to check and measure them (Evals). I will give you detailed prompt examples for every idea.
In Part 2 (the next article): We will get practical. I will show you 4 proven "recipes" (design patterns) to put those ingredients together and build real Agents.
Let's start with the first big question...
I. What Is An AI Agent (And Why Is It Different)?
1. The Simple Definition

Think of an AI Agent like a smart personal assistant you hire.
A normal AI (Chatbot): You ask one question, it gives one answer. The conversation ends.
An AI Agent: You give it a project or a goal.
An AI Agent is a system that can take multiple steps on its own to finish a goal. It can:
Plan: "To do A, I first need to do B, then C, then D."
Use Tools: "I need to use Google Search to do step B."
Reflect (Think about its work): "I finished step C. Let me check... is it good? Ah, I need to fix it a little bit."
Learn How to Make AI Work For You!
Transform your AI skills with the AI Fire Academy Premium Plan - FREE for 14 days! Gain instant access to 500+ AI workflows, advanced tutorials, exclusive case studies and unbeatable discounts. No risks, cancel anytime.
2. Example: "Analyzing Customer Feedback"
Let's look at the big difference when you give the same request.
Your Request: "I have 3 customer feedback emails. Tell me what the main problem is."
Way 1: Using a normal AI (Chatbot - One Step)
You have to do most of the work yourself.
Your Prompt:
"Here are 3 emails. Please summarize them:
Email 1: 'Hi team, I can't find the 'reset password' button anywhere. It's very annoying.'
Email 2: 'Your app is great, but I tried to reset my password and the link you sent me doesn't work.'
Email 3: 'I am locked out of my account and there is no way to reset my password. I need help NOW!!'
What is the main problem?"The Chatbot's Answer:

The Problem Here: You had to read the emails yourself and copy and paste them. If you have 1000 emails, this is impossible. The chatbot only summarizes what you give it.
Way 2: Using An AI Agent (Multi-Step)
You set up a system (the Agent) one time and just give it the goal.
The System Prompt (to create the Agent):
(This is what you set up one time to "teach" the Agent how to work)
"You are a Customer Feedback Analyst Assistant. Your job is to read new emails, find problems, and report to the manager.
You have these tools:
read_new_emails(folder="Feedback"): Use this to read new emails from the "Feedback" folder.
create_report(summary): Use this to create a formal report.
Always follow these 3 steps:
Step 1 (Read): Use the read_new_emails tool to get all emails.
Step 2 (Analyze): Read all the emails and find the top 3 most common complaints.
Step 3 (Report): Write a short summary of those 3 topics and use the create_report tool to send it."
Now, your daily command is just:
"Run the feedback analysis process."The AI Agent's Result (It does everything by itself):

See the difference? You changed from being an "employee" for the AI to being the "manager" of the AI.
II. Two Types Of AI Agents (From Easy To Hard)
Not all Agents are the same. They are on a scale from "you control everything" to "it decides everything."
1. Less Autonomous Agent (Like An Assembly Line)
This is the type where you plan all the steps in advance. The Agent cannot change the steps. This is like the example we just saw.
You tell it: "Step 1: Do A. Step 2: Do B. Step 3: Do C."
The Agent cannot decide to do something different.
Result: Very reliable, you always know what will happen.
Best for: Repetitive jobs you want to automate, like processing invoices or answering simple support emails.
Detailed Prompt Example (Teaching a Less Autonomous Agent):
"You are an Order Processing Agent.
MANDATORY WORKFLOW:
When you get a new order (input), you MUST follow these 3 steps in order:
Extract Info: Read the order and find the [Product Name] and [Quantity].
Check Stock: Use the tool check_inventory(product_name, quantity).
Reply:
If check_inventory returns 'True', you must reply: 'Order for [Product Name] is confirmed.'
If check_inventory returns 'False', you must reply: 'Sorry, [Product Name] is out of stock.'
Do not add any other words to your reply."
2. Highly Autonomous Agent (Like A Consultant)
This is the type where you only give it a goal and a set of tools. The Agent decides the steps itself.
You tell it: "Goal: Book a flight for me."
The Agent thinks to itself: "OK, to do that, I will (1) Ask the user for the date, (2) Ask for the budget, (3) Use the
search_flightstool, (4) Show the top 3 options, (5) Ask for confirmation, (6) Use thebook_flighttool..."Result: Less predictable, more flexible. It might have great ideas you didn't think of.
Best for: Complex, open-ended problems that need creativity or deep research.
Detailed Prompt Example (Teaching a Highly Autonomous Agent):
"You are a Personal Travel Assistant.
YOUR GOAL: To help the user plan and book a complete trip.
YOUR TOOLS:
search_flights(origin, destination, date): Finds flights.
Google Hotels(city, check_in_date, nights): Finds hotels.
get_weather_forecast(city, date): Gets the weather.
ask_user(question): Use this to ask the user a question if you are missing information.
INSTRUCTIONS:
Based on the user's request, make your own plan of steps. Think step-by-step. If you are missing information (like the travel date or budget), you must use the ask_user tool first."
III. The 3 Core "Ingredients" To Build An Agent
Like cooking, you need three basic "ingredients."
1. Models (The Brains)

This is the "thinking engine" of the Agent. These are the big AI models you hear about.
A tip from my experience: You don't always need the biggest, most expensive "brain."
Use a big brain (like GPT-4): For complex jobs that need planning or deep thinking (like the Highly Autonomous Agent).
Use a smaller brain (like Claude 3 Haiku): For simple, repetitive jobs (like the Less Autonomous Agent that just checks stock). It is much faster and cheaper.
2. Tools (The Hands And Eyes)
Without tools, the Agent is just a brain in a jar. Tools let it interact with the real world.
Common tools are:
APIs: This is like a "menu" that lets computer programs talk to each other. Examples: a Google Search tool, your calendar's API, a weather API.
Information Retrieval (RAG): This is a technical word for a simple idea: letting the Agent "read" your private documents. You can give it your PDFs, Word documents, or your company's database.
Code Execution: This lets the Agent run small pieces of code (usually Python) to do math, analyze data, or make charts.
Detailed Prompt Example (How the AI "sees" its tools):
(You don't write this, but this is how frameworks like LangChain help the AI understand its tools)
"You can use the following tools. To use a tool, respond with a special JSON block.
get_weather(city_name): Returns the current weather for a city.
Parameter: city_name (string) - The name of the city.
send_email(to, subject, body): Sends an email.
Parameters: to (string), subject (string), body (string).
User's Request: 'What's the weather in New York?'Your Answer (the AI):

3. Evaluations / "Evals" (Quality Control)

This is the most important part, but it is also the part most people skip when they start.
Evals are how you check if your Agent is doing its job correctly.
If you build an Agent without Evals, it's like hiring a new employee, giving them a lot of work, and then going on vacation for 2 weeks without checking on them. You might come back to a big mess.
Example: You build an Agent to answer customer emails.
Customer Email:
"My order arrived broken!"Bad Agent Reply:
"Thank you for your purchase. Visit our website to see new products!"
This is a terrible reply. A good Eval system would "catch" this mistake before the email is sent. It would check: "Does this reply solve the 'broken' problem?" The answer is "No," so it would stop the email.
IV. A Deep Look At Evals (The 4 Types You Must Know)
OK, so how do we "check" our Agent? Evals sound complex, but they are just four simple types.
Think about it with 2 questions:
How to check: Can you check it with computer code (Objective), or do you need a human/another AI to think about it (Subjective)?
The standard: Do you have one specific correct answer (like a math test), or do you have a general standard (like grading an essay)?
Here are the 4 types:
1. Type 1: Objective + Specific Correct Answer

This is the easiest type. It's like checking a multiple-choice test.
Example: You have an Agent that reads invoices and finds the "Due Date."
How you do it: You have a sample invoice. You know the correct Due Date is
11/15/2025.How to Eval (with code):
if (agent_extracted_date == "11/15/2025"):Result = PASS
else:Result = FAIL
It's simple and 100% automatic.
2. Type 2: Subjective + Specific Correct Answer
This type needs a little "thinking" to grade, but you still know what you are looking for.
Example: You have an Agent that summarizes product reviews. You want to make sure it includes all the features that were mentioned.
Original Review: "This phone has great battery life, but the camera is blurry at night."
The key points you want: (1) Battery, (2) Camera.
How to Eval (Use an AI as a "Judge"): You can't use simple code, because the Agent might say "camera quality" or "blurry photos." Instead, you use another AI (like GPT-4) as the "judge."
Detailed "Judge" Prompt Example (Type 2):
"You are an AI Judge. Read the 'Original Review' and the 'Agent's Summary'.
Original Review: 'This phone has great battery life, but the camera is blurry at night.'
Agent's Summary: 'The product has good battery and camera quality.'
Required Points (from the original):
Battery
Camera
Question: Does the 'Agent's Summary' mention ALL of the required points? Answer only 'Yes' or 'No'."Judge's Answer: "Yes"

3. Type 3: Objective + Universal Standard
You have one simple rule that applies to everything.
Example: You have an Agent that writes blog posts. One rule must always be true: "The post must be at least 300 words long."
How to Eval (with code):
if (word_count_of_blog_post >= 300):Result = PASS
else:Result = FAIL
This is one simple rule that works for all 1,000 blog posts the Agent writes.
4. Type 4: Subjective + Universal Standard
This is the hardest type, but also the most powerful. It's like grading an essay. You don't have a "correct answer," but you have a "grading rubric" (a list of standards).
Example: You want to make sure all customer service replies are "professional" and "empathetic" (show that you care).
Your Rubric (Standards):
Professional: (1-5 points) Does it use polite language?
Empathetic: (1-5 points) Does it acknowledge the customer's feelings (e.g., "I understand this is frustrating...")?
Detailed "Judge" Prompt Example (Type 4):
"You are a strict Quality Manager. Read the Agent's reply below and grade it using the rubric.
Agent's Reply: 'That's the shipping company's fault, not ours. We will check.'
GRADING RUBRIC:
Professional (1-5): (1 = Very rude/blaming, 5 = Very professional/takes responsibility)
Empathetic (1-5): (1 = Zero empathy, 5 = Very empathetic, e.g., 'I understand your frustration')
You MUST answer in this exact JSON format:
{
"professional_score": [number],
"empathetic_score": [number],
"short_reason": "[Explain your score briefly]"
}
"Judge's Answer:

Now you can see immediately: This Agent is failing badly and needs to be retrained.
Quick Wrap-Up For Part 1
OK, we have covered the most important ideas and looked at detailed prompt examples.
You now know what an AI Agent is (a "doer"), why it's different (it takes many steps), and the 3 "ingredients" you need to build one: Models (Brains), Tools (Hands), and Evals (Quality Checks).
Most importantly, you've seen how to use prompts to "teach" an Agent (Less vs. Highly Autonomous) and how to use Evals (especially an AI Judge) to control the quality.
Now you have all the "ingredients" in your hand.
But how do you "cook" with them? How do you combine Models, Tools, and Evals to make an Agent that actually works well?
In Part 2, we will look at the 4 most powerful "recipes" (design patterns) to do just that, with detailed prompt examples for each one.
Are you ready to continue to Part 2?
If you are interested in other topics and how AI is transforming different aspects of our lives or even in making money using AI with more detailed, step-by-step guidance, you can find our other articles here:
How helpful was this AI Automation article for you? πLet us know how it helped your work or learning. Your feedback helps us improve! |
Reply