Project

Building a Resume-Native AI Assistant for My Portfolio

Building a Resume-Native AI Assistant for My Portfolio

Designing an assistant that answers from structured portfolio data, merges keyword and Cloudflare semantic retrieval, and routes grounded responses through a unified Cloudflare Worker.

aiworkers-aivectorizecloudflare-workersretrievalnextjsportfolio

Context

I wanted the portfolio to do more than display information. I wanted it to answer for itself.

Not with a generic chatbot and not with a prompt that tries to improvise from thin air. The goal was narrower and more useful: let a visitor ask questions about my experience, projects, technical strengths, and background, and get answers grounded in the actual content already published on the site.

That design constraint shaped the entire implementation.

The assistant is not a toy layer on top of a marketing site. It is a retrieval system built around structured resume data, deterministic guardrails, a Cloudflare-backed semantic retrieval layer, and a routed model boundary that can fail over between providers without breaking the user experience.

What I Built

The result is an AI assistant embedded directly into the portfolio experience that:

  • loads a generated static resume dataset from the site itself
  • converts that content into searchable snippets
  • answers common questions locally when a deterministic path is safer and faster
  • merges client-side keyword retrieval with Cloudflare semantic retrieval
  • folds recent conversation turns into retrieval so follow-up questions still make sense
  • sends only the strongest supporting snippets to the routed assistant endpoint
  • routes generation through a Cloudflare Worker across Groq, Groq Backup, Hugging Face, Cloudflare AI, and a final portfolio-RAG fallback
  • returns concise answers with citations tied to the underlying portfolio content
  • links cited projects, articles, and case studies back to their actual pages when possible
  • exposes both a site debug panel and a worker homepage console for inspecting providers, retrieval, and grounded RAG behavior

In practical terms, visitors can ask questions like:

  • What kind of backend systems has he built?
  • Has he worked on payment infrastructure?
  • What technologies does he use most often?
  • What is his current role and background?

That sounds straightforward. The real work is in making the answers accurate, constrained, fast, and maintainable.

The Core Design Decision

The most important decision was to treat the portfolio as a data product, not just a set of pages.

Because this site is statically exported, the assistant cannot rely on traditional server-side Next.js API routes. Instead, the content pipeline generates a static resume payload at build time and publishes it as a site asset.

That gave me a clean contract:

  1. author content once in the portfolio
  2. normalize it into a structured resume payload
  3. let the assistant reason only over that approved dataset

This is a much better engineering tradeoff than scraping rendered HTML or letting a model guess from partial page context.

Architecture Diagram

Visitor Question
      |
      v
Portfolio Assistant UI
      |
      +--> Load /api/resume.json
      |
      +--> Build normalized snippets
      |
      +--> Local guardrails + intent normalization
      |
      +--> Keyword retrieval on the client
      |
      +--> POST /assistant-retrieve
              |
              +--> Workers AI embeddings
              +--> Vectorize semantic search
              +--> KV chunk lookup
      |
      +--> Merge keyword + semantic matches
      |
      v
Cloudflare Worker /assistant-routed
      |
      +--> origin validation
      +--> payload validation
      +--> rate limiting
      +--> provider routing
              |
              +--> Groq
              +--> Groq Backup
              +--> Hugging Face
              +--> Cloudflare AI
              +--> Portfolio RAG fallback
      |
      v
Structured JSON response + citations
      |
      v
Grounded answer rendered in the portfolio

Request Lifecycle Diagram

The full request path is a little more interesting than the high-level diagram suggests, because there are several control points before the model is ever asked to answer.

1. Visitor opens the footer assistant drawer
   |
   v
2. Client loads /api/resume.json
   |
   +--> generated at build time from content/about, content/home,
   |    content/settings, content/recommendations, and content/posts
   |
   v
3. Client normalizes content into snippets
   |
   +--> summary, about, skills, links, contact
   +--> experience and education entries
   +--> projects, articles, case studies, recommendations
   |
   v
4. Visitor submits a question
   |
   +--> local guardrails check scope and reject off-topic or adversarial prompts
   |
   +--> local deterministic answer path tries known patterns first
   |    - current role
   |    - education
   |    - common links
   |    - top technologies
   |    - payment-related experience
   |
   +--> if local answer exists, return immediately with citations
   |
   v
5. Retrieval phase builds the search intent
   |
   +--> combine current question with a small window of recent conversation
   +--> preserve who said what so follow-up turns keep their meaning
   |
   v
6. Retrieval phase selects supporting evidence
   |
   +--> client keyword retrieval
   |    - broad lexical and synonym-aware matching
   |    - good for links, abbreviations, and direct listing prompts
   |
   +--> worker semantic retrieval
   |    - POST /assistant-retrieve
   |    - embed the retrieval query with Workers AI
   |    - score stored portfolio chunks in Vectorize
   |    - hydrate chunk text from KV
   |
   v
7. Top-ranked snippets only are sent to the model
   |
    +--> recent chat context is included
    +--> response is constrained to a JSON schema
    |
    v
8. Cloudflare Worker /assistant-routed
   |
   +--> validate allowed origin
   +--> enforce POST + application/json
   +--> parse and validate request payload
   +--> apply per-IP rate limiting
   +--> route across configured providers until one returns a grounded answer
   |
   v
9. Routed model providers
   |
   +--> Groq primary
   +--> Groq backup model
   +--> Hugging Face router
   +--> Cloudflare Workers AI
   +--> portfolio-RAG last fallback
   |
   v
10. Client validates model output
    |
    +--> parse JSON
    +--> enforce answer shape
    +--> discard invalid citation IDs
    +--> downgrade unsupported answers to "missing"
    +--> if the model path fails but retrieval is strong, show a closest-match fallback
    |
    v
11. UI renders final answer + snippet citations

Why the Resume API Matters

The assistant is powered by a generated resume.json representation of the site content. That payload aggregates the information that actually matters to a recruiter, hiring manager, or engineering leader:

  • summary and about information
  • experience history
  • education
  • links
  • projects
  • articles and case studies
  • recommendations and proof points

That means the assistant is reasoning over normalized, intentional content rather than a noisy DOM.

From an architecture perspective, this matters for three reasons:

  • consistency: the UI and the assistant share the same source of truth
  • reliability: the payload shape is stable and easy to evolve
  • observability: retrieval logic can work on well-defined content chunks instead of ambiguous page fragments

Retrieval Strategy

This is where the assistant becomes meaningfully useful.

I break the resume payload into small, semantically focused snippets. Each snippet represents a coherent unit such as:

  • a role in my experience timeline
  • an education entry
  • a project or article
  • a recommendation
  • a skills or summary block

Those snippets are then matched in two ways:

  • keyword and intent ranking on the client
  • semantic ranking through Cloudflare Workers AI + Vectorize on the worker

Retrieval flow

  • the client loads the resume data
  • it builds a snippet set from the generated portfolio payload
  • when a user asks a question, the client first computes a keyword-oriented retrieval pass
  • the worker then performs semantic retrieval over the build-time Vectorize index
  • the client merges those result sets and sends only the highest-signal snippets to the routed assistant

This does two things well:

  1. it improves answer quality because the model is given the most relevant evidence instead of the whole resume
  2. it keeps the design operationally simpler by reusing build-time Cloudflare vectors instead of recomputing a full snippet embedding corpus in the browser

The important operational property is that degraded mode is still useful mode. If the semantic retrieval path is unavailable or weak, the assistant still has keyword retrieval, deterministic answers for some well-known prompts, and a routed provider chain that can continue on missing or malformed responses.

Multi-Turn Retrieval, Not Just Multi-Turn UI

One subtle weakness in many portfolio assistants is that they only use chat history for display, not for retrieval.

I wanted follow-up questions to work more naturally.

So the assistant now uses a small window of recent conversation turns when building the retrieval query. That means questions like:

  • What was his first project?
  • What about his recent work?
  • Did any of that involve payments?

can be interpreted in relation to the previous turns rather than as isolated fragments.

This does not mean the model gets free rein to improvise from conversation memory. The important detail is that retrieval itself becomes context-aware. The system still goes back to the published resume dataset, but it does so with a better understanding of what the visitor is referring to.

The conversation is also persisted locally in the browser, with message role information preserved, so both the UI and retrieval layer know which turns came from the visitor and which came from the assistant.

Citations That Resolve Back to Real Content

Early citations were useful as proof, but they were still mostly internal identifiers.

I tightened that up so references to projects, posts, and case studies can carry real URLs. When the assistant cites one of those snippets, the UI can render a direct link back to the corresponding page on the portfolio.

That improves two things:

  • the answer feels more inspectable because a visitor can immediately open the source material
  • the assistant becomes a navigation surface, not just a response surface

That is an underrated product detail. In a portfolio, a good answer should often lead the visitor deeper into the work.

Better Failure Handling When Retrieval Is Strong but Generation Fails

Another practical improvement was separating retrieval failure from generation failure.

Those are not the same problem.

If the model request fails because of a rate limit or temporary upstream issue, but retrieval already found a strong matching snippet, the assistant now has a more graceful degraded path. Instead of immediately saying the information is unavailable, it can return the closest relevant reference and make that uncertainty explicit.

The wording is intentionally careful. It does not pretend to be a full grounded answer if the full answer path did not complete. It says, in effect: here is the closest relevant source I could find.

That is a much more honest fallback than collapsing every upstream failure into "I don't know."

A Safer Model Boundary with Cloudflare Workers

I did not want model credentials exposed in the browser, and I did not want the frontend talking directly to a third-party inference endpoint.

So I added a Cloudflare Worker route that acts as a narrow proxy for the assistant.

The worker handles:

  • origin validation so requests only come from approved sites
  • schema validation for both chat and embeddings payloads
  • rate limiting to prevent abuse, with a bounded per-IP request window
  • secret isolation so the GitHub Models token never leaves the server boundary
  • uniform response handling across embeddings and chat completions

In practical terms, the Worker is not trying to become an AI orchestration layer. It is intentionally narrow. The frontend decides what context to send, and the Worker decides whether the request is valid and safe to proxy.

This is a simple pattern, but it is the right one. It turns a frontend AI feature into a controlled service interface rather than an open client-side experiment.

Why GitHub Models

GitHub Models was a strong fit for this implementation because it provided a clean way to call both embeddings and chat inference through a familiar developer workflow.

That let me keep the integration focused on product and architecture concerns:

  • retrieving the right context
  • structuring the prompt correctly
  • validating the model output
  • controlling cost and abuse boundaries

In other words, the interesting engineering problem here was never “how do I call an LLM API.” It was “how do I make the model answer the right question from the right evidence in a portfolio setting.”

Guardrails and Deterministic Shortcuts

One of the mistakes people make with assistants is sending every query directly to the model.

I took a more disciplined approach.

Before the remote model is involved, the assistant first checks for:

  • off-topic or adversarial prompts
  • prompt-injection style instructions
  • questions that can be answered deterministically from known patterns

That means common questions like current role, core skills, education, or portfolio links can often be handled immediately without a full model round trip.

This improves latency, reduces token usage, and tightens correctness.

It also gives the assistant a more senior-engineered feel: use the model where it adds value, not where a clear rule-based answer already exists.

There is another important constraint in the implementation: even when the model is used, an answer is only accepted as a real answer if it comes back with valid citations that map to the provided snippet IDs. If the model returns an unsupported answer shape or cites content outside the retrieved context, the client treats that as a missing-information case instead of trusting it.

For development, I also added a local debug view that exposes the retrieval mode, the built retrieval query, the top-ranked snippets, and whether the closest-match fallback was used. That makes it much easier to inspect why a question failed, whether the wrong snippets were selected, or whether the model path failed after retrieval had already succeeded.

Engineering Tradeoffs

There are a few deliberate tradeoffs in this design.

1. Static data over live repository introspection

The assistant answers from the published site content, not from repo activity, commit history, or unstated knowledge. That makes the experience more trustworthy even if it is narrower.

2. Retrieval-first over open-ended conversation

I optimized for factual grounding rather than personality. The assistant is there to help a visitor understand my work quickly and accurately.

3. Progressive enhancement over hard dependency

If embeddings fail, the assistant does not disappear. It falls back gracefully and still answers from keyword-ranked evidence. If the generation path fails after retrieval succeeds, it can still surface the closest grounded reference instead of throwing everything away.

4. Structured output over free-form generation

The response is expected in a constrained JSON shape with answer status and citations. That gives the UI something deterministic to render and makes failure modes easier to manage.

5. Thin worker over backend sprawl

I deliberately kept the backend boundary small. The Worker does not own retrieval, snippet construction, or portfolio-specific business logic. That keeps the moving parts understandable and makes the production surface easier to secure.

6. Freshness controls over forever-cached vectors

Client-side caching is great for latency, but AI features age in more ways than plain assets do. The underlying content can change, the embedding model can change, or the ranking logic can improve. That is why the assistant uses a content hash, a cache version, and a TTL together instead of assuming embeddings in local storage should live forever.

What I Like About This Build

What I like most is that the system stays honest.

It does not pretend to know more than the portfolio actually says. It does not blur the line between retrieval and invention. And it does not rely on backend sprawl to deliver a relatively small but high-leverage user experience.

This is the kind of AI integration I find compelling in real systems: bounded, observable, useful, and aligned with the actual product surface.

Closing Thoughts

There is a broader lesson here.

Good AI product engineering is rarely about adding a model to the stack and calling it innovation. The real work is in designing the data boundary, controlling failure modes, choosing where determinism should win, and making sure the system remains legible when something goes wrong.

That is what this assistant represents for me.

A portfolio is supposed to communicate capability. Building a resume-native assistant on top of structured content is a strong way to demonstrate not just that I can use AI tools, but that I can integrate them with sound engineering judgment.