Skip to main content
Background Image

Build Your First AI Agent This Week: A Practical Guide

·3111 words·15 mins·
Pini Shvartsman
Author
Pini Shvartsman
Architecting the future of software, cloud, and DevOps. I turn tech chaos into breakthrough innovation, leading teams to extraordinary results in our AI-powered world. Follow for game-changing insights on modern architecture and leadership.
Table of Contents

In my previous article, I covered what makes AI agents different and which platforms are worth using. Now it’s time to actually build one.

This isn’t theory. This is the practical path to shipping your first useful agent in seven days. Real steps, real code patterns, real deployment.

Day 1: Pick a problem that won’t waste your time
#

The most common mistake is picking the wrong first problem. Too ambitious, too vague, or too risky. You want something that teaches you how agents work without creating a disaster if it fails.

The criteria that matter:

Repetitive and annoying. Something you or your team does regularly and wish you didn’t. The kind of task where you know you’ll use the agent because the manual version is painful.

Multi-step with clear logic. It needs to check multiple sources or make decisions based on what it finds. Otherwise, you don’t need an agent, you need a function.

Low stakes. Mistakes are annoying but not catastrophic. No customer-facing systems, no data deletion, no money movement.

Well-defined success. You can describe what “done” looks like in concrete terms. Vague goals produce vague agents.

Good first problems:

Weekly engineering status report. Query your project management tool for completed tickets, check Git for merged PRs, pull highlights from meeting notes, and generate a summary. Multiple data sources, clear output format, low risk.

Pull request pre-review. Check new PRs for common issues before human review: missing tests, documentation gaps, security patterns, code style. Clear checks, actionable output, saves reviewer time.

Production health check. Monitor key metrics across your services, check error rates and latency, identify anomalies, and escalate only when thresholds are crossed. Defined logic, measurable impact.

Support ticket triage. Read incoming tickets, categorize by type, check for similar past issues, route to the right team, and flag urgent cases. Clear workflow, easy to validate.

Bad first problems:

Autonomous customer support. Too high stakes. Customers see the output directly. Requires judgment and empathy that agents don’t have.

Writing production code without review. You’re trusting an agent with your system’s reliability before you understand how agents fail. That’s backwards.

Making architectural decisions. Agents can gather information, but they can’t make taste-based trade-offs or understand your business context deeply enough.

Pick your problem now. Write down the specific task, the data sources it needs, and what the output should look like. Be concrete.

Day 2: Set up your environment and tools
#

You have two main paths: managed platforms (fast but less control) or open-source frameworks (more work, more flexibility).

Path A: OpenAI Agents SDK (fastest start)
#

When to choose this: You want to build something working today and don’t mind vendor lock-in.

Setup:

pip install openai

Create an API key from OpenAI’s platform, set it as an environment variable:

export OPENAI_API_KEY='your-key-here'

First test:

from openai import OpenAI

client = OpenAI()

# Simple function calling example
def get_ticket_count(status):
    # Your actual logic here
    return {"status": status, "count": 42}

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "How many open tickets?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_ticket_count",
            "description": "Get count of tickets by status",
            "parameters": {
                "type": "object",
                "properties": {
                    "status": {"type": "string", "enum": ["open", "closed", "pending"]}
                },
                "required": ["status"]
            }
        }
    }]
)

print(response)

If that runs without errors, you’re ready.

Path B: LangGraph (maximum control)
#

When to choose this: You want to understand how agents work at a deeper level, need to avoid vendor lock-in, or have requirements that managed platforms can’t meet.

Setup:

pip install langgraph langchain-openai langsmith

You’ll still need an OpenAI API key (or use Anthropic, Gemini, or local models). Set up LangSmith for observability (free tier is fine):

export LANGCHAIN_API_KEY='your-langsmith-key'
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_PROJECT='my-first-agent'

First test:

from langgraph.graph import StateGraph, END
from typing import TypedDict

class State(TypedDict):
    messages: list
    next_step: str

def analyze(state):
    return {"next_step": "complete"}

graph = StateGraph(State)
graph.add_node("analyze", analyze)
graph.set_entry_point("analyze")
graph.add_edge("analyze", END)

app = graph.compile()

result = app.invoke({"messages": [], "next_step": ""})
print(result)

If that runs, you’re good.

Connect to your actual data
#

Don’t build against mock data. Use real systems from day one, but safely.

Use MCP servers (covered in my MCP article) to connect to:

  • Your filesystem (code, documentation)
  • Your databases (read-only credentials on development instances)
  • Your Git repository
  • Your project management tools

Install basic MCP servers:

# Filesystem access
npm install -g @modelcontextprotocol/server-filesystem

# PostgreSQL access
npm install -g @modelcontextprotocol/server-postgres

# Git repository access
npm install -g @modelcontextprotocol/server-git

Configure them in your Claude Desktop or connect them programmatically in your agent code.

Day 3-4: Build the minimal viable agent
#

Start simple. Don’t try to handle every edge case or build the perfect architecture. Build something that works for the happy path.

Define your tools clearly
#

Each tool should do one thing well. Clear inputs, clear outputs, clear purpose.

Example: Status report agent tools

def get_completed_tickets(days=7):
    """Get tickets completed in the last N days"""
    # Query your project management API
    # Return: list of {id, title, assignee, completed_date}
    pass

def get_merged_prs(days=7):
    """Get PRs merged in the last N days"""
    # Query GitHub API or use Git MCP server
    # Return: list of {pr_number, title, author, merged_date}
    pass

def get_meeting_highlights(days=7):
    """Extract highlights from meeting notes"""
    # Read meeting notes from your docs system
    # Return: list of highlight strings
    pass

Keep them focused. One tool shouldn’t try to do everything.

Write explicit prompts
#

Tell the agent exactly what you want. Agents don’t read between the lines well.

Bad prompt:

"Generate a status report"

Good prompt:

You are a status report generator for the engineering team.

Your task:
1. Get all tickets completed in the last 7 days
2. Get all PRs merged in the last 7 days  
3. Get highlights from team meetings
4. Generate a summary in this format:

## Completed This Week
- [Ticket list with assignees]

## Shipped Features
- [PR list with authors]

## Team Updates
- [Meeting highlights]

Be concise. Focus on user-visible impact.

Specificity matters enormously.

Wire it together: OpenAI example
#

from openai import OpenAI

client = OpenAI()

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_completed_tickets",
            "description": "Get tickets completed in the last N days",
            "parameters": {
                "type": "object",
                "properties": {
                    "days": {"type": "integer", "default": 7}
                }
            }
        }
    },
    # Define other tools similarly
]

messages = [
    {
        "role": "system",
        "content": "You are a status report generator..."  # Full prompt here
    },
    {
        "role": "user",
        "content": "Generate this week's status report"
    }
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools
)

# Handle tool calls
while response.choices[0].finish_reason == "tool_calls":
    tool_call = response.choices[0].message.tool_calls[0]
    
    # Execute the requested tool
    if tool_call.function.name == "get_completed_tickets":
        result = get_completed_tickets()
        
    messages.append(response.choices[0].message)
    messages.append({
        "role": "tool",
        "tool_call_id": tool_call.id,
        "content": str(result)
    })
    
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools
    )

print(response.choices[0].message.content)

Wire it together: LangGraph example
#

from langgraph.graph import StateGraph, END
from langgraph.prebuilt import ToolExecutor
from langchain_openai import ChatOpenAI
from langchain.tools import tool

@tool
def get_completed_tickets(days: int = 7) -> list:
    """Get tickets completed in the last N days"""
    # Your implementation
    return []

tools = [get_completed_tickets]
tool_executor = ToolExecutor(tools)

class State(TypedDict):
    messages: list
    next_action: str

def call_agent(state):
    llm = ChatOpenAI(model="gpt-4o")
    llm_with_tools = llm.bind_tools(tools)
    response = llm_with_tools.invoke(state["messages"])
    return {"messages": state["messages"] + [response]}

def execute_tools(state):
    last_message = state["messages"][-1]
    tool_calls = last_message.tool_calls
    results = [tool_executor.invoke(call) for call in tool_calls]
    return {"messages": state["messages"] + results}

def should_continue(state):
    last_message = state["messages"][-1]
    if hasattr(last_message, 'tool_calls') and last_message.tool_calls:
        return "execute_tools"
    return "end"

graph = StateGraph(State)
graph.add_node("agent", call_agent)
graph.add_node("execute_tools", execute_tools)
graph.set_entry_point("agent")
graph.add_conditional_edges("agent", should_continue, {
    "execute_tools": "execute_tools",
    "end": END
})
graph.add_edge("execute_tools", "agent")

app = graph.compile()

Add guardrails immediately
#

Rate limits: Don’t let the agent make unlimited API calls.

import time
from functools import wraps

def rate_limit(max_calls, period):
    calls = []
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            now = time.time()
            calls[:] = [c for c in calls if c > now - period]
            if len(calls) >= max_calls:
                raise Exception(f"Rate limit: {max_calls} calls per {period}s")
            calls.append(now)
            return func(*args, **kwargs)
        return wrapper
    return decorator

@rate_limit(max_calls=10, period=60)
def expensive_api_call():
    pass

Read-only access: Start with read-only database credentials and API tokens. No write permissions until you’re confident.

Timeouts: Every tool should have a timeout. Agents can get stuck waiting.

from concurrent.futures import TimeoutError
import signal

def timeout(seconds):
    def decorator(func):
        def handler(signum, frame):
            raise TimeoutError()
        def wrapper(*args, **kwargs):
            signal.signal(signal.SIGALRM, handler)
            signal.alarm(seconds)
            try:
                result = func(*args, **kwargs)
            finally:
                signal.alarm(0)
            return result
        return wrapper
    return decorator

@timeout(30)
def slow_operation():
    pass

Day 5-6: Test, break, fix, iterate
#

Now use it for real work. Not a demo. Actual tasks.

Test with real scenarios
#

Run your agent on actual data from the past week. Compare its output to what you would have produced manually.

What to check:

Accuracy: Is the information correct? No hallucinated data?

Completeness: Did it find everything it should have?

Format: Is the output actually useful? Does it need reformatting?

Efficiency: How many API calls did it make? How long did it take?

Watch what it does
#

Use LangSmith (works with both OpenAI and LangGraph) to see traces of every step.

In LangSmith’s interface, you’ll see:

  • Every message sent to the LLM
  • Every tool call with parameters
  • Every tool response
  • The final output
  • Time and token costs for each step

Look for:

  • Unnecessary tool calls (calling the same thing twice)
  • Wrong tool choices (using the wrong tool for a task)
  • Poor reasoning (making bad decisions about what to do next)
  • Missing error handling (crashes instead of graceful failures)

Iterate on prompts and tools
#

Improve the prompt when the agent:

  • Makes the right tool calls but draws wrong conclusions
  • Doesn’t understand what you’re asking for
  • Produces output in the wrong format

Improve the tools when the agent:

  • Can’t find the information it needs
  • Gets errors from tool calls
  • Needs more granular control over what it can do

Add more guardrails when you see:

  • Excessive API calls
  • Attempts to access things it shouldn’t
  • Operations that take too long

Common issues and fixes
#

Issue: Agent keeps calling the same tool repeatedly

Fix: Add memory of what it’s tried. Or be more explicit in the prompt: “Call each tool exactly once, then synthesize results.”

Issue: Output format is inconsistent

Fix: Use structured output. OpenAI supports response_format with JSON schema. LangChain has structured output parsers.

Issue: Agent gives up too easily on errors

Fix: Add retry logic to tools. Return helpful error messages the agent can act on.

Issue: Too slow

Fix: Reduce model calls by better prompt design. Cache results. Use cheaper models for simple decisions.

Day 7: Package it for others to use
#

Your agent works for you. Now make it work for your team.

Turn it into a CLI tool
#

Simple wrapper for command-line use:

import argparse

def main():
    parser = argparse.ArgumentParser(description='Generate status report')
    parser.add_argument('--days', type=int, default=7, help='Days to report')
    parser.add_argument('--output', type=str, help='Output file (optional)')
    args = parser.parse_args()
    
    report = generate_report(days=args.days)
    
    if args.output:
        with open(args.output, 'w') as f:
            f.write(report)
    else:
        print(report)

if __name__ == "__main__":
    main()

Now anyone can run: python agent.py --days 7 --output report.md

Or turn it into an API
#

from fastapi import FastAPI

app = FastAPI()

@app.post("/generate-report")
async def generate_report_endpoint(days: int = 7):
    report = generate_report(days=days)
    return {"report": report}

Deploy with: uvicorn agent:app --host 0.0.0.0 --port 8000

Document how to use it
#

Write a README that covers:

What it does (specific description)

When to use it (and when not to)

How to run it (exact commands)

What it needs (API keys, permissions, data access)

What to do if it fails (common errors and fixes)

How to improve it (where to file issues or make changes)

Add observability for team use
#

Connect to LangSmith or another observability platform so you can see:

  • Who’s using it
  • Success rate
  • Common errors
  • Cost per run

This tells you if it’s actually providing value or if people hit problems.

Patterns that work
#

After building several agents, certain patterns consistently work better than others.

Pattern: Small focused agents with clear hand-offs
#

Don’t build one agent that does everything. Build multiple small agents, each with a specific job, that hand off to each other explicitly.

Example: Instead of a single “incident response agent,” build:

  • Detection agent: Monitors metrics and logs, identifies anomalies
  • Triage agent: Categorizes incidents, determines severity
  • Diagnosis agent: Analyzes logs and code, identifies root cause
  • Communication agent: Updates status page, notifies team

Each agent has clear inputs and outputs. The orchestration layer coordinates hand-offs.

Why this works:

  • Easier to debug (small surface area)
  • Easier to test (focused scope)
  • Easier to improve (change one without affecting others)
  • Easier to understand (clear responsibilities)

Pattern: Human-in-the-loop for consequential actions
#

Agents should recommend, not execute, anything with real consequences.

For actions that:

  • Change production systems
  • Spend money
  • Contact customers
  • Modify data

Show the plan first. Get approval. Then act.

Implementation:

def execute_with_approval(action, description):
    print(f"Agent wants to: {description}")
    print(f"Command: {action}")
    approval = input("Approve? (yes/no): ")
    
    if approval.lower() == 'yes':
        return execute(action)
    else:
        return {"status": "cancelled", "reason": "User rejected"}

Or for async workflows, write the proposed action to a queue and wait for approval before executing.

Pattern: Explicit memory and state
#

Stateless agents repeat mistakes. Give them memory so they learn from experience.

What to remember:

  • Past conversations and context
  • What worked and what failed
  • User preferences and corrections
  • Domain-specific knowledge learned over time

Simple implementation:

class AgentMemory:
    def __init__(self):
        self.conversation_history = []
        self.learned_patterns = {}
        
    def remember_interaction(self, input, output, feedback):
        self.conversation_history.append({
            "input": input,
            "output": output,
            "feedback": feedback,
            "timestamp": time.time()
        })
        
    def get_relevant_history(self, current_input):
        # Return similar past interactions
        pass

Use vector databases (Pinecone, Weaviate, Chroma) for semantic search over past interactions.

Traps that waste time
#

Trap: Building without understanding the workflow
#

Don’t automate what you don’t understand. If the manual process is unclear, the automated version will be worse.

Before building, document:

  • What exactly happens at each step
  • What decisions get made and why
  • What exceptions occur and how they’re handled
  • What the output should look like

Then build the agent.

Trap: No guardrails until something breaks
#

Every agent needs boundaries. Define them before you need them.

Minimum guardrails:

  • Rate limits on expensive operations
  • Timeouts on all tools
  • Read-only access by default
  • Explicit approval for risky actions
  • Input validation on all tool parameters

Trap: Ignoring observability
#

You can’t improve what you can’t see. Instrument from day one.

At minimum, log:

  • Every agent invocation
  • Every tool call with parameters and results
  • Every error with context
  • Final output and user feedback

Use LangSmith, Arize Phoenix, or W&B Weave. The free tiers are sufficient for starting out.

Trap: Optimizing too early
#

Your first version should work, not be perfect. Get it running, use it for real work, then optimize based on actual bottlenecks.

Don’t spend time on:

  • Complex caching before you know what’s slow
  • Multi-agent orchestration before single-agent works
  • Advanced error handling before you know what errors occur

Do spend time on:

  • Clear problem definition
  • Simple working implementation
  • Basic guardrails
  • Real usage and feedback

The 90-day rollout plan
#

You’ve built an agent that works for you. Now scale it to your team.

Weeks 1-2: Pilot with willing participants
#

Pick 2-3 people who:

  • Have the same pain point your agent solves
  • Are willing to give feedback
  • Won’t be upset if it fails occasionally

Have them use it for real work but with oversight. Check outputs before they’re used in important contexts.

Gather feedback systematically:

  • What worked well?
  • What produced wrong results?
  • What was confusing?
  • What took too long?
  • What would make them use it more?

Weeks 3-6: Refine based on reality
#

Fix the issues that came up in the pilot:

Accuracy problems: Improve prompts, add better tools, fix data quality issues.

Usability problems: Better documentation, clearer error messages, simpler interface.

Performance problems: Reduce latency, cache results, optimize tool calls.

Coverage problems: Handle edge cases that came up, add missing functionality.

Track metrics:

  • Success rate (tasks completed correctly)
  • Usage frequency (how often people actually use it)
  • Time saved (measured, not guessed)
  • User satisfaction (ask directly)

Weeks 7-10: Expand to more users
#

Open it up to the broader team, but with good documentation and support.

What people need to start:

  • Clear explanation of what it does
  • Exact setup instructions
  • Example usage for common cases
  • Who to ask when it breaks
  • How to give feedback

Set expectations:

  • What it’s good at
  • What it’s not good at
  • When to trust the output
  • When to double-check manually

Weeks 11-12: Measure and decide
#

Look at actual data:

Usage: Are people using it voluntarily? How often?

Value: Time saved, quality of output, impact on workflow.

Cost: API expenses, maintenance time, support burden.

Sustainability: Can you maintain this? Does it keep working as things change?

Decision time:

If it’s working: Commit to maintaining it. Document it properly. Plan the next agent.

If it’s marginal: Figure out what would make it valuable. Fix those things or kill it.

If it’s failing: Kill it cleanly. Document why so you learn for next time.

Don’t let zombie agents accumulate. Half-working automation that people route around is worse than no automation.

What to measure
#

Focus on metrics that matter for real productivity.

Time to complete workflows: Full end-to-end time, not individual steps. This captures actual impact.

Quality of output: Accuracy, completeness, usefulness. Sample outputs regularly and compare to manual work.

Adoption rate: Percentage of team using it voluntarily after the pilot ends.

Trust level: Do people use the output directly or always double-check everything?

Cost per task: API calls, compute time, maintenance effort.

Failure modes: What breaks? How often? How bad are the failures?

What’s next
#

You’ve built one agent. That’s the hard part. The second one is easier. The third one is easier still.

Build a portfolio of focused agents:

Each solving a specific problem. Each well-understood and properly bounded. Each delivering clear value.

The compounding effect is real: agents that handle routine work free you for higher-leverage problems. Which lets you build better agents. Which free up more time.

Key principles to keep:

  • Start with clear, specific problems
  • Build focused agents with explicit boundaries
  • Add guardrails and observability from day one
  • Test with real work, not demos
  • Measure actual value, not vanity metrics
  • Iterate based on usage, not assumptions
  • Kill what doesn’t work

The teams pulling ahead aren’t the ones with the most sophisticated agents. They’re the ones who started building simple agents months ago and never stopped learning.

Your first agent doesn’t need to be impressive. It needs to be useful. Pick a problem that annoys you, build something that solves it, and use it until it works reliably.

Then build the next one.


Resources:

The gap between reading about agents and building them is execution. Start today.

Related

AI Agents for Real Productivity: What Works in 2025
·2570 words·13 mins
OpenAI Announces ChatGPT Pulse: Your AI Assistant Gets Proactive
·548 words·3 mins
What's Holding You Back from Succeeding in the AI Era?
·3041 words·15 mins