Production AWS Pipeline: Serverless Intelligence, End
to End
Designed, built, and maintained a fully serverless AWS system that
replaced a manual weekly reporting process, pulling live ad data,
generating AI-driven analysis via Claude API, and delivering structured
HTML reports to the leadership team every Monday at 7am.
No manual steps. No server to maintain. Built as the sole technical
owner.
The reporting process was entirely manual. I replaced it.
The company manages digital advertising across multiple client
properties. There was no automated system, no historical data store,
no consistent format, and no way to compare performance week-over-week
without doing it manually.
I designed and shipped the replacement entirely on my own. It has run
in production, without manual intervention, since the day I deployed
it.
The Problem, Precisely
No pipeline. No history. No consistency.
01
No historical data store
Week-over-week comparisons required manual lookups. Nothing
tracked trends automatically, no baseline, no variance
detection, no way to surface a pattern without someone doing the
math.
02
No standardized format
Reports varied by who produced them. No consistent KPI
structure. No repeatable template. Stakeholders had to interpret
different formats each week.
03
No lead attribution
Reports showed impressions and clicks but not which ad sets
generated leads. CPL calculations used total account spend
instead of campaign-level spend, making costs look artificially
high.
04
No actionable analysis
Data existed, but didn't surface decisions. Stakeholders
interpreted raw numbers with no guidance on what to act on.
Architecture Decisions
Seven services. Every choice deliberate.
I designed the architecture from scratch with one constraint that
shaped every decision:
no one would maintain this system after I left. It
had to be self-running, auditable, and extensible without me.
EventBridge over a cron job
A managed schedule (cron(0 12 ? * MON *)) means no
server to maintain, no cron daemon to monitor, automatic retry on
failure, and full visibility in the AWS console. A non-engineer can
inspect or modify the schedule without touching code.
DynamoDB for time-series metrics
Needed week-over-week comparison without a relational database.
DynamoDB with campaign_id as partition key and
week_start_date as sort key gives efficient
point-in-time lookups at effectively zero cost at this data volume.
No schema migrations. No maintenance window.
Secrets Manager over environment variables
Three API tokens, Meta, Anthropic, HubSpot, need rotation without
redeployment. Secrets Manager makes access auditable via CloudWatch
and decouples credential management from the Lambda entirely. If a
token rotates, no code changes.
SES over third-party email
Kept everything in the AWS ecosystem, enabled full HTML email
formatting, and cost fractions of a cent per report. Verified the
[company domain] domain and all all recipient addresses
before go-live, a step that would have blocked production silently
if skipped.
S3 for raw data archival
Every API response gets written to S3 with a 90-day TTL before any
transformation. If the Lambda logic has a bug downstream, the raw
data still exists and can be replayed. Immutable inputs, mutable
processing.
Code: Lambda Orchestrator
What the entry point looks like.
The Lambda handler orchestrates the full pipeline in sequence: fetch
credentials, pull API data, persist to DynamoDB, generate AI analysis,
render HTML, deliver via SES. Each step is independently testable.
lambda_function.py · handler excerptPython 3.13
deflambda_handler(event, context):
# 1. Pull credentials from Secrets Managersecrets =
get_secrets()
# 2. Fetch this week's campaign data from Meta + HubSpotmeta_data =
fetch_meta_campaigns(secrets["META_TOKEN"])
hs_data =
fetch_hubspot_leads(secrets["HUBSPOT_TOKEN"])
# 3. Archive raw payloads to S3 (immutable, 90-day TTL)write_to_s3(meta_data, hs_data)
# 4. Persist week-over-week metrics to DynamoDBprev_week =
get_previous_metrics(week_start) write_metrics(meta_data, week_start)
# 5. Generate AI analysis (3-pass extended thinking)analysis =
generate_claude_analysis(
meta_data,
hs_data,
prev_week,
api_key=secrets["ANTHROPIC_KEY"] )
# 6. Render HTML report + deliver via SES to 6 recipientshtml =
render_report(meta_data, hs_data,
analysis)
send_via_ses(html, recipients=STAKEHOLDER_LIST)
return {"statusCode": 200,
"body":
"Report delivered"}
AI Design
Three-pass reasoning. Stability-first philosophy.
The Claude API integration was the part I iterated on most. Early
versions produced fluent but inconsistent analysis, sometimes dramatic
about normal variance, sometimes vague about real anomalies. The
prompt went through multiple rewrites before landing on what worked.
Stability-first philosophy
The insight: marketing stakeholders don't want to
be alarmed by normal week-to-week variance. The prompt explicitly
instructs Claude to treat swings under a threshold as noise, not
signal. Only surface an alert if the trend is directional and
sustained across multiple weeks.
Three-pass extended thinking
Pass 1: read the raw data. Pass 2: identify patterns and anomalies
against the 4-week rolling trend. Pass 3: generate
department-specific recommendations for Marketing, Digital/Dev, and
Leadership. Extended thinking gives the model room to reason before
committing to output, reducing confident-sounding errors.
Structured scannable output
Replaced paragraph analysis with [UP] /
[DOWN] / [FLAT] / [ALERT] /
[OK] rendered as colored HTML indicators. A stakeholder
can scan the full report in under 30 seconds and know exactly what
needs attention.
Added CloudWatch structured logging for Claude's raw response and
parser output, essential for diagnosing output deviations in a
serverless system where there is no console to inspect.
What Broke, What Got Fixed, What Got Added
Real production iteration.
Fix
Meta app in Development mode, blocked all API calls
Lambda returned OAuthException code 200, every API
call blocked silently. Root cause: the Meta app was in
Development (Unpublished) mode, restricting access to app admins
only. Fix: published app to Live, generated a new system user
token, updated the secret in Secrets Manager. Cost: two days of
debugging. Fix time: five minutes. Now a first-deploy checklist
item.
Fix
KPI cards triple-rendering in Outlook
KPI summary cards appeared three times in Outlook due to nested
HTML tables. Replaced with flat
<td width=25%> layout, Outlook-compatible, no
nesting. HTML email rendering is its own compatibility layer
entirely.
Fix
CPL calculated against wrong spend denominator
Cost per lead was calculated against total account spend instead
of campaign-level spend. Produced inflated CPL figures that
overstated advertising costs. Fixed to use only spend attributed
to lead-generating campaigns.
Added
4-week rolling trend window
Extended get_previous_week_metrics to return a
4-week dictionary and added get_all_time_totals.
Trend data feeds into Claude so analysis reflects directional
movement across a month, not just a single point-in-time
comparison.
Added
HubSpot lead attribution integration
New Lead Activity section pulls from HubSpot's API, showing
which campaign and ad set generated each lead, with calculated
CPL and landing page views. Green status when leads exist that
week, yellow when none, visible at a glance.
Outcomes
Running in production. Zero manual steps.
7
AWS services integrated in a single cohesive pipeline
0
Manual steps required to produce a report after deployment
6
Leadership team receiving structured AI-generated analysis
weekly
This system runs in production at a real company. Every Monday at 7am,
it fetches live data, reasons over it, and delivers a formatted
report, without anyone touching it. That's the outcome that matters.
What I'd Do Differently
Three things I'd change from day one.
01
Add structured CloudWatch logging from the start
Debug logging for the Claude response parser was added after the
first production issues. In a serverless system with no
interactive console, structured logging is your only visibility
into what actually happened inside a function. It should be a
first-commit item, not an afterthought.
02
Test the Meta app publishing state before go-live
The Development mode issue cost two days. The fix was five
minutes. App publishing state, system user token scope, and API
permission tiers should be checklist items before any first
deploy involving the Meta Graph API.
03
Design the AI prompt around the reader, not the data
Early prompts asked Claude to analyze data. Better prompts told
Claude who would read the output, what decisions they needed to
make, and what useful vs. alarming looked like in this context.
The stability-first philosophy came from asking "what would make
a marketing director trust this report?"