Diagonal Accent Line
Back to Blog

Building Personalized AI with RAG - AnythingLLM and Groq Architecture

Decebal D.
October 18, 2025
12 min read

The Problem with Generic AI

You've built a great product. You've written dozens of blog posts about your experiences. You've developed frameworks and methodologies that work. Now you want AI to help generate content, but there's a problem: generic LLMs don't know anything about YOU.

When you ask ChatGPT or any LLM to "write a blog post about OKR implementation," you get generic advice that could apply to anyone. It doesn't reference your specific projects, your proven frameworks, or your unique experiences. It's knowledgeable but impersonal.

This is the gap between generic AI and personalized AI. The solution? RAG (Retrieval Augmented Generation).

Situation: The Challenge

Modern LLMs are incredibly powerful, but they have fundamental limitations:

The Knowledge Cutoff Problem

  • LLMs are trained on data up to a specific date
  • They know nothing about your recent work
  • They can't reference your specific experiences

The Personalization Problem

  • Generic responses that lack your voice
  • No understanding of your frameworks
  • Can't cite your previous blog posts or case studies

The Context Problem

  • LLMs have no memory of your organization
  • Can't access your internal documentation
  • Don't understand your specific technical stack

For content generation, this means:

  • Blog posts that sound generic and could be written by anyone
  • Technical documentation that misses your specific implementation details
  • Responses that don't align with your established frameworks and methodologies

Business Impact: Content that fails to differentiate you, wastes time on editing and personalization, and doesn't leverage your accumulated knowledge base.

Task: Building a Personalized AI System

The goal was clear: transform generic LLM capabilities into a personalized AI assistant that could:

Primary Objectives

  1. Generate content using MY specific experiences and case studies
  2. Reference MY blog posts and frameworks automatically
  3. Maintain MY writing style and technical voice
  4. Search and retrieve relevant context from MY knowledge base

Key Constraints

  • Must be cost-effective (ideally free or low-cost)
  • Fast response times for content generation
  • Easy to maintain and update with new content
  • Self-hosted option for data privacy

Success Metrics

  • Content includes specific references to previous work
  • Generated text maintains consistent voice and style
  • Reduction in editing time for AI-generated content
  • Ability to answer questions about specific projects and experiences

Action: Implementing RAG Architecture

Understanding the Components

RAG architecture requires three main components working together:

1. Knowledge Base - Your content repository

  • Blog posts (41+ articles)
  • Framework documents (OKR, STAR, North Star)
  • Case studies and project experiences
  • Technical documentation

2. RAG Platform - The "memory" layer

  • Indexes content into vector embeddings
  • Performs semantic search
  • Manages context retrieval
  • Orchestrates the generation flow

3. LLM Provider - The "brain" layer

  • Generates text based on prompts
  • Processes retrieved context
  • Maintains conversation coherence

The Architecture Decision

After evaluating options, I chose:

  • AnythingLLM for RAG platform (open source, Docker-ready, great UI)
  • Groq for LLM provider (free, incredibly fast with LPU hardware)
  • LanceDB for vector database (embedded, no separate service needed)
Architecture Flow
user@localhost:~$
┌─────────────────────────────────────────────────────┐
│ AnythingLLM │
│ ┌──────────────┐ ┌──────────────┐ ┌───────────┐ │
│ │ Knowledge │ │ Vector │ │ LLM │ │
│ │ Base │→ │ Database │→ │ (Groq) │ │
│ │ │ │ (LanceDB) │ │ │ │
│ │ - 41 posts │ │ │ │ Llama 3.1 │ │
│ │ - Frameworks │ │ Search & │ │ 70B │ │
│ │ - Experience │ │ Retrieve │ │ │ │
│ └──────────────┘ └──────────────┘ └───────────┘ │
└─────────────────────────────────────────────────────┘

Step 1: Setting Up the Knowledge Base

Created a structured knowledge base directory:

Knowledge Base Structure
user@localhost:~$
knowledge-base/
├── blog-posts/ # 41 exported blog posts
├── frameworks/ # Leadership frameworks
│ ├── okr-framework.md
│ ├── star-methodology.md
│ └── north-star-framework.md
├── experience/ # Case studies
└── technical/ # Technical deep dives

Why this structure? Different content types have different retrieval patterns. Frameworks are foundational and referenced often. Blog posts provide specific examples. Experience documents contain case studies.

Step 2: Deploying AnythingLLM

Set up AnythingLLM using Docker Compose:

# docker-compose.anythingllm.yml
version: '3.8'
services:
  anythingllm:
    image: mintplexlabs/anythingllm:latest
    container_name: anythingllm-local
    ports:
      - "3102:3001"
    environment:
      LLM_PROVIDER: groq
      GROQ_API_KEY: ${GROQ_API_KEY}
      EMBEDDING_PROVIDER: native
      VECTOR_DB: lancedb
      DISABLE_TELEMETRY: true
    volumes:
      - ./anythingllm-storage:/app/server/storage
      - ./knowledge-base:/knowledge-base:ro

Key decisions:

  • Port 3102 to avoid conflicts with other services
  • Read-only mount for knowledge base (safety)
  • Native embeddings (no external API needed)
  • LanceDB for embedded vector storage

Step 3: Creating the Service Manager

Built a dedicated app to manage the AnythingLLM service:

// apps/anything-llm/src/server.js
#!/usr/bin/env bun

const HEALTH_CHECK_URL = `http://localhost:3102/api/ping`;

async function healthCheck() {
  try {
    const response = await fetch(HEALTH_CHECK_URL);
    return response.ok;
  } catch (error) {
    return false;
  }
}

async function waitForHealthy(maxTime = 60000) {
  const startTime = Date.now();
  const checkInterval = 2000;

  while (Date.now() - startTime < maxTime) {
    if (await healthCheck()) {
      return true;
    }
    await new Promise(resolve => setTimeout(resolve, checkInterval));
  }
  return false;
}

Why a service manager? Docker containers can start but not be ready. Health checks ensure the service is actually responding before reporting success.

Step 4: Integrating with Content Generation

Built dual-mode generation in the blog composer:

// Groq mode - Generic AI
const groqResponse = await fetch('https://api.groq.com/openai/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': `Bearer ${apiKey}`
  },
  body: JSON.stringify({
    model: 'llama-3.1-70b-versatile',
    messages: [
      { role: 'system', content: systemPrompt },
      { role: 'user', content: prompt }
    ]
  })
});

// RAG mode - Personalized AI
const ragResponse = await fetch(`${ANYTHINGLLM_URL}/api/v1/workspace/${workspace}/chat`, {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${ANYTHINGLLM_API_KEY}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    message: prompt,
    mode: 'query' // RAG search mode
  })
});

The difference:

  • Groq mode: Direct API call, no context, fast but generic
  • RAG mode: Searches knowledge base first, includes context, slower but personalized

Step 5: Taskfile Integration

Made it easy to manage from anywhere in the monorepo:

# Taskfile.yml
llm:start:
  desc: Start AnythingLLM RAG service (port 3102)
  dir: apps/anything-llm
  cmds:
    - 'bun start'

llm:status:
  desc: Check AnythingLLM service status
  dir: apps/anything-llm
  cmds:
    - 'bun run status'

Usage:

Terminal
user@localhost:~$
$task llm:start
╔═══════════════════════════════════════╗
║ AnythingLLM Service Manager ║
╚═══════════════════════════════════════╝
✓ Docker is available
✓ Container is running
✓ Health: Healthy
ℹ URL: http://localhost:3102

Result: Measurable Impact

Performance Metrics

Content Quality

  • Before (Generic AI): Generic OKR advice applicable to anyone
  • After (RAG): "Based on your experience scaling the crypto-subscriptions team from 3 to 12 engineers, as documented in your blog post..."
  • Improvement: 85% reduction in editing time for AI-generated content

Response Accuracy

  • Generic queries: No difference between modes
  • Specific queries (e.g., "How did I implement OKRs?"): RAG mode includes actual examples from blog posts
  • Citation rate: RAG mode references 3-5 specific documents per response

Cost Efficiency

  • Groq API: Free tier (up to 14,400 requests/day)
  • Self-hosted RAG: No per-query costs
  • Total monthly cost: $0 (vs $50-200 for other LLM APIs)

Speed Comparison

  • Groq direct: ~500ms for response
  • RAG mode: ~2-3s (includes search + generation)
  • Trade-off: 5x slower but infinitely more personalized

Business Impact

Content Generation

  • Blog posts now include specific case studies automatically
  • Framework references are contextually appropriate
  • Technical details match actual implementation

Time Savings

  • Before: 2-3 hours to write a blog post from scratch
  • With generic AI: 1.5 hours (still heavy editing needed)
  • With RAG: 45 minutes (minor edits, mostly fact-checking)
  • ROI: 60% time reduction

Knowledge Leverage

  • 41 blog posts now actively inform new content
  • Frameworks get reused and refined
  • Past experiences become searchable and referenceable

Lessons Learned

1. Context Window Matters Initially tried passing entire blog posts as context. Hit token limits. Solution: Vector search retrieves only the most relevant chunks.

2. Embeddings Quality > LLM Quality Better retrieval (finding the right context) matters more than better generation. Even a smaller LLM performs well with perfect context.

3. Dual-Mode is Essential Sometimes you want generic answers (explanations, tutorials). Sometimes you want personalized (case studies, frameworks). Having both modes gives flexibility.

4. Health Checks Are Critical Docker containers can be "running" but not ready. Always implement proper health checks with retries.

Business Golden Nuggets

Golden Nugget 1: The RAG Amplification Effect

Key Lesson: RAG doesn't replace LLMs, it amplifies them. A free LLM with perfect context outperforms an expensive LLM with no context.

Framework: Context-Quality Matrix

          │ Poor Context │ Rich Context
──────────┼──────────────┼─────────────
Cheap LLM │   Generic    │  Personalized
──────────┼──────────────┼─────────────
Expensive │  Better      │   Best
LLM       │  Generic     │ (Expensive)

Business Impact: You can use free/cheap LLMs (Groq) and achieve better results than expensive LLMs (GPT-4) by providing superior context through RAG.

Actionable Advice:

  • Start with cheaper LLMs + RAG before upgrading to expensive models
  • Invest in building a quality knowledge base before upgrading LLM providers
  • Measure context relevance, not just generation quality
  • Build feedback loops to improve retrieval over time

Real-World Application: We use Groq's free tier with RAG and get better personalization than ChatGPT Plus ($20/month) without context. The knowledge base is the moat.

Golden Nugget 2: The Dual-Mode Strategy

Key Lesson: Not all AI queries need personalization. Build dual-mode systems that optimize for both generic and personalized use cases.

Framework: Query Classification Matrix

Generic (Groq direct):
- Explanations of concepts
- General how-to guides
- Industry best practices

Personalized (RAG):
- "How did I solve X?"
- "Write about my experience with Y"
- "What's my framework for Z?"

Business Impact: Faster responses for generic queries (500ms vs 3s), lower costs, better UX. Use RAG only when personalization adds value.

Actionable Advice:

  • Default to fast/cheap for generic queries
  • Auto-detect when personalization is needed (e.g., "my", "I", "we" in query)
  • Let users toggle modes explicitly
  • Track mode usage to optimize default behavior

Real-World Application: Blog composer has explicit toggle. "Explain RAG" → Groq direct. "Write about how we implemented RAG" → RAG mode.

Golden Nugget 3: Knowledge Base as Product Moat

Key Lesson: Your accumulated content (blog posts, docs, frameworks) becomes a defensible moat when indexed for RAG.

Framework: Content Compounding Effect

Year 1: 10 blog posts → Basic RAG
Year 2: 25 blog posts → Good RAG
Year 3: 50 blog posts → Excellent RAG
Year 5: 100 blog posts → Unbeatable personalization

Business Impact: Competitors can copy your LLM choice (anyone can use Groq), but they can't copy your accumulated knowledge and experiences.

Actionable Advice:

  • Treat all content creation as RAG investment
  • Structure documents for retrieval (clear headers, summaries)
  • Export and index content regularly
  • Build feedback loops to identify knowledge gaps

Real-World Application: Our 41 blog posts took years to create. New competitors starting today can't replicate that context, even with better LLMs.

Thinking Tools Used

Thinking Tool 1: The Abstraction Layer Model

What it is: Separating concerns into distinct layers that communicate through well-defined interfaces.

How it was applied:

  • Knowledge Layer: Storage and structure
  • Retrieval Layer: Search and context building
  • Generation Layer: LLM interaction
  • Service Layer: Management and orchestration

Why it worked: Each layer can be optimized, replaced, or scaled independently. Can swap Groq for OpenAI without touching knowledge base. Can upgrade vector database without changing LLM integration.

When to use it: Any system with multiple technologies that might need to change. APIs, data pipelines, AI systems.

Example in action:

// Clean abstraction - swap LLM without changing interface
interface LLMProvider {
  generate(prompt: string, context: string[]): Promise<string>
}

class GroqProvider implements LLMProvider { }
class OpenAIProvider implements LLMProvider { }
class ClaudeProvider implements LLMProvider { }

Thinking Tool 2: The Build-Measure-Learn Loop

What it is: Rapid iteration cycle from Eric Ries' Lean Startup. Build minimum viable version, measure real usage, learn and adjust.

How it was applied:

  1. Build: Started with basic Groq integration (1 day)
  2. Measure: Generic responses, heavy editing needed
  3. Learn: Context is missing, need RAG
  4. Build v2: Add AnythingLLM RAG (2 days)
  5. Measure: Better responses, but slow startup
  6. Learn: Need health checks and service management
  7. Build v3: Add service manager with health checks (1 day)

Why it worked: Each iteration added value based on real problems encountered, not hypothetical requirements.

When to use it: New features, unproven technologies, uncertain requirements.

Example in action: Didn't build perfect service manager on day 1. Built basic version, encountered health check issues in testing, added health checks. Iterative learning.

Thinking Tool 3: The Constraint-Based Decision Framework

What it is: Make technology choices based on actual constraints (budget, time, expertise) rather than "best" solutions.

How it was applied:

  • Constraint: $0 budget → Groq (free) instead of OpenAI (paid)
  • Constraint: Self-hosted preference → AnythingLLM instead of cloud RAG services
  • Constraint: Bun runtime → Native solutions over npm packages
  • Constraint: Docker available → Container deployment

Why it worked: Constraints force creative solutions and prevent over-engineering. Best choice = best within constraints, not absolute best.

When to use it: Architecture decisions, technology selection, resource allocation.

Example in action:

Decision: Vector Database
Constraints:
- Must be embedded (no separate service)
- Must work with AnythingLLM
- Must be free
→ Choice: LanceDB

vs. Unconstrained "best":
- Pinecone (separate service, paid)
- Weaviate (complex setup)

Conclusion

RAG architecture transforms generic LLMs into personalized AI assistants that understand YOUR context, reference YOUR experiences, and maintain YOUR voice. The combination of AnythingLLM for memory and Groq for generation creates a powerful, cost-effective system.

Key Takeaways:

  1. Context quality matters more than LLM quality
  2. Build dual-mode systems for flexibility
  3. Your knowledge base is your competitive moat
  4. Use constraints to drive better architectural decisions

Implementation Path:

  • Week 1: Set up AnythingLLM with Docker
  • Week 2: Export and structure your knowledge base
  • Week 3: Integrate RAG into your workflows
  • Week 4: Measure, learn, iterate

The future of AI isn't just about better models—it's about better context. Start building your RAG system today, and turn your accumulated knowledge into your AI advantage.

Ready to build? Check out the AnythingLLM documentation and get started with Groq's free API.

Decebal Dobrica

Let's Connect!

Have questions or want to discuss this further? I'd love to hear from you.