Building a Resume-Native AI Assistant for My Portfolio

Project

Building a Resume-Native AI Assistant for My Portfolio

Context

I wanted the portfolio to do more than display information. I wanted it to answer for itself.

Not with a generic chatbot and not with a prompt that tries to improvise from thin air. The goal was narrower and more useful: let a visitor ask questions about my experience, projects, technical strengths, and background, and get answers grounded in the actual content already published on the site.

That design constraint shaped the entire implementation.

The assistant is not a toy layer on top of a marketing site. It is a retrieval system built around structured resume data, deterministic guardrails, and a thin production-safe proxy to GitHub Models.

What I Built

The result is an AI assistant embedded directly into the portfolio experience that:

loads a generated static resume dataset from the site itself
converts that content into searchable snippets
answers common questions locally when a deterministic path is safer and faster
ranks the most relevant context using embeddings and keyword fallback
calls GitHub Models through a Cloudflare Worker proxy
returns concise answers with citations tied to the underlying portfolio content

In practical terms, visitors can ask questions like:

What kind of backend systems has he built?
Has he worked on payment infrastructure?
What technologies does he use most often?
What is his current role and background?

That sounds straightforward. The real work is in making the answers accurate, constrained, fast, and maintainable.

The Core Design Decision

The most important decision was to treat the portfolio as a data product, not just a set of pages.

Because this site is statically exported, the assistant cannot rely on traditional server-side Next.js API routes. Instead, the content pipeline generates a static resume payload at build time and publishes it as a site asset.

That gave me a clean contract:

author content once in the portfolio
normalize it into a structured resume payload
let the assistant reason only over that approved dataset

This is a much better engineering tradeoff than scraping rendered HTML or letting a model guess from partial page context.

Abstract Architecture Diagram

Visitor Question
      |
      v
Portfolio Assistant UI
      |
      +--> Load /api/resume.json
      |
      +--> Build normalized snippets
      |
      +--> Try local deterministic answer
      |
      +--> Retrieve relevant context
             | 
             +--> Embeddings available -> semantic ranking
             |
             +--> Otherwise -> keyword ranking fallback
      |
      v
Cloudflare Worker /assistant
      |
      +--> origin validation
      +--> payload validation
      +--> rate limiting
      +--> token isolation
      |
      v
GitHub Models
  - embeddings
  - chat completions
      |
      v
Structured JSON response + citations
      |
      v
Grounded answer rendered in the portfolio

Request Lifecycle Diagram

The full request path is a little more interesting than the high-level diagram suggests, because there are several control points before the model is ever asked to answer.

1. Visitor opens the footer assistant drawer
   |
   v
2. Client loads /api/resume.json
   |
   +--> generated at build time from content/about, content/home,
   |    content/settings, content/recommendations, and content/posts
   |
   v
3. Client normalizes content into snippets
   |
   +--> summary, about, skills, links, contact
   +--> experience and education entries
   +--> projects, articles, case studies, recommendations
   |
   v
4. Client computes a content hash
   |
   +--> used as the cache key for snippet embeddings
   |
   v
5. Visitor submits a question
   |
   +--> local guardrails check scope and reject off-topic or adversarial prompts
   |
   +--> local deterministic answer path tries known patterns first
   |    - current role
   |    - education
   |    - common links
   |    - top technologies
   |    - payment-related experience
   |
   +--> if local answer exists, return immediately with citations
   |
   v
6. Retrieval phase selects supporting evidence
   |
   +--> embeddings ready
   |    - embed the user question
   |    - score all snippets by cosine similarity
   |
   +--> embeddings unavailable
   |    - fall back to keyword overlap scoring
   |
   v
7. Top-ranked snippets only are sent to the model
   |
   +--> recent chat context is included
   +--> response is constrained to a JSON schema
   |
   v
8. Cloudflare Worker /assistant
   |
   +--> validate allowed origin
   +--> enforce POST + application/json
   +--> parse and validate request payload
   +--> apply per-IP rate limiting
   +--> keep GitHub Models token on the server boundary
   |
   v
9. GitHub Models
   |
   +--> embeddings endpoint for semantic retrieval
   +--> chat completions endpoint for grounded answer generation
   |
   v
10. Client validates model output
    |
    +--> parse JSON
    +--> enforce answer shape
    +--> discard invalid citation IDs
    +--> downgrade unsupported answers to "missing"
    |
    v
11. UI renders final answer + snippet citations

Why the Resume API Matters

The assistant is powered by a generated resume.json representation of the site content. That payload aggregates the information that actually matters to a recruiter, hiring manager, or engineering leader:

summary and about information
experience history
education
links
projects
articles and case studies
recommendations and proof points

That means the assistant is reasoning over normalized, intentional content rather than a noisy DOM.

From an architecture perspective, this matters for three reasons:

consistency: the UI and the assistant share the same source of truth
reliability: the payload shape is stable and easy to evolve
observability: retrieval logic can work on well-defined content chunks instead of ambiguous page fragments

Embeddings and Retrieval Strategy

This is where the assistant becomes meaningfully useful.

I break the resume payload into small, semantically focused snippets. Each snippet represents a coherent unit such as:

a role in my experience timeline
an education entry
a project or article
a recommendation
a skills or summary block

Those snippets are then embedded through GitHub Models and ranked against the user’s question using vector similarity.

Retrieval flow

the client loads the resume data
it builds a snippet set and computes a hash of the content
snippet embeddings are cached locally by model and content hash
when a user asks a question, the question is embedded
the system selects the highest-signal snippets and sends only that context to the model

This does two things well:

it improves answer quality because the model is given the most relevant evidence instead of the whole resume
it keeps the design efficient by avoiding unnecessary repeated embedding work for unchanged content

The hash-based cache boundary matters more than it may seem. It means embeddings are tied to the exact published content state. If the resume changes, the cache key changes too, so the assistant does not accidentally reuse vectors generated for older content.

I also added a keyword-based fallback path. If embeddings are unavailable for any reason, the assistant still works by ranking snippets through lexical overlap. That is an important operational detail: degraded mode is still useful mode.

A Safer Model Boundary with Cloudflare Workers

I did not want model credentials exposed in the browser, and I did not want the frontend talking directly to a third-party inference endpoint.

So I added a Cloudflare Worker route that acts as a narrow proxy for the assistant.

The worker handles:

origin validation so requests only come from approved sites
schema validation for both chat and embeddings payloads
rate limiting to prevent abuse, with a bounded per-IP request window
secret isolation so the GitHub Models token never leaves the server boundary
uniform response handling across embeddings and chat completions

In practical terms, the Worker is not trying to become an AI orchestration layer. It is intentionally narrow. The frontend decides what context to send, and the Worker decides whether the request is valid and safe to proxy.

This is a simple pattern, but it is the right one. It turns a frontend AI feature into a controlled service interface rather than an open client-side experiment.

Why GitHub Models

GitHub Models was a strong fit for this implementation because it provided a clean way to call both embeddings and chat inference through a familiar developer workflow.

That let me keep the integration focused on product and architecture concerns:

retrieving the right context
structuring the prompt correctly
validating the model output
controlling cost and abuse boundaries

In other words, the interesting engineering problem here was never “how do I call an LLM API.” It was “how do I make the model answer the right question from the right evidence in a portfolio setting.”

Guardrails and Deterministic Shortcuts

One of the mistakes people make with assistants is sending every query directly to the model.

I took a more disciplined approach.

Before the remote model is involved, the assistant first checks for:

off-topic or adversarial prompts
prompt-injection style instructions
questions that can be answered deterministically from known patterns

That means common questions like current role, core skills, education, or portfolio links can often be handled immediately without a full model round trip.

This improves latency, reduces token usage, and tightens correctness.

It also gives the assistant a more senior-engineered feel: use the model where it adds value, not where a clear rule-based answer already exists.

There is another important constraint in the implementation: even when the model is used, an answer is only accepted as a real answer if it comes back with valid citations that map to the provided snippet IDs. If the model returns an unsupported answer shape or cites content outside the retrieved context, the client treats that as a missing-information case instead of trusting it.

Engineering Tradeoffs

There are a few deliberate tradeoffs in this design.

1. Static data over live repository introspection

The assistant answers from the published site content, not from repo activity, commit history, or unstated knowledge. That makes the experience more trustworthy even if it is narrower.

2. Retrieval-first over open-ended conversation

I optimized for factual grounding rather than personality. The assistant is there to help a visitor understand my work quickly and accurately.

3. Progressive enhancement over hard dependency

If embeddings fail, the assistant does not disappear. It falls back gracefully and still answers from keyword-ranked evidence.

4. Structured output over free-form generation

The response is expected in a constrained JSON shape with answer status and citations. That gives the UI something deterministic to render and makes failure modes easier to manage.

5. Thin worker over backend sprawl

I deliberately kept the backend boundary small. The Worker does not own retrieval, snippet construction, or portfolio-specific business logic. That keeps the moving parts understandable and makes the production surface easier to secure.

What I Like About This Build

What I like most is that the system stays honest.

It does not pretend to know more than the portfolio actually says. It does not blur the line between retrieval and invention. And it does not rely on backend sprawl to deliver a relatively small but high-leverage user experience.

This is the kind of AI integration I find compelling in real systems: bounded, observable, useful, and aligned with the actual product surface.

Closing Thoughts

There is a broader lesson here.

Good AI product engineering is rarely about adding a model to the stack and calling it innovation. The real work is in designing the data boundary, controlling failure modes, choosing where determinism should win, and making sure the system remains legible when something goes wrong.

That is what this assistant represents for me.

A portfolio is supposed to communicate capability. Building a resume-native assistant on top of structured content is a strong way to demonstrate not just that I can use AI tools, but that I can integrate them with sound engineering judgment.