AI Digital Twin Avatar

Creating an AI version of myself (Fall 2024 - Ongoing)

1/31/20254 min read

The Challenge

Problem Discovery: Explore efficient personality replication via AI inspired by research form Stanford/DeepMind.

  1. Primary Hurdle: The prohibitive cost (>500k tokens / ~$1500+) and complexity of using 6 years of personal journal data.

  2. Alternative Approach: Test the feasibility of using a concise, structured 2-hour interview (~21k tokens) as the primary data source.

  3. Core Question: Can advanced prompt engineering with limited interview data yield a functionally accurate personality replica, effectively navigating LLM limitations and cost constraints?

Feature Prioritization: I focused on validating the interview-to-textual-replica pipeline to build an MVP:

  1. Methodology: Build a Minimum Viable Product (MVP).

  2. Technology Stack: Utilize GPT-4o (chosen for optimal output) and Whisper API for transcription.

  3. Training Data: Limited exclusively to the 2-hour interview transcript.

Results: Successfully developed a functioning MVP chatbot capable of recalling interview details & reflecting core personality traits (e.g., Myers-Briggs INTJ). This validated the interview method's potential while understanding the limitations of non-RAG approaches and highlighting key areas for V2 improvement.

Project Summary

Core Tools Used:

AI/ML: ChatGPT-4o, OpenAI Whisper API, Claude 3.5 Sonnet, Gemini 1.5 Pro
Languages: Python, JSON,
\Key Concepts/Methods: AI Digital Twin, Prompt Engineering (Large Corpus Structuring), LLM Cost Analysis, LLM Evaluation (Subjective Fidelity), Natural Language Processing (NLP), Conversational AI, API Integration.

Core Skills Learned:

Advanced Prompt Engineering for Large Contexts, LLM API Integration & Cost Management, Qualitative AI Evaluation, Python Scripting for AI Workflows.

Timeline:

V1 (MVP): Fall 2024
V2: Winter 2024 - Ongoing

Architecture - V1 (MVP):

2 hr Interview → Whisper API (Transcription) → Python Script (Loads Interview JSON & Master Prompt) → Manual Interaction via ChatGPT Interface (using GPT-4o with Custom Instructions mimicking Master Prompt)

6 years of journal entries: 375K words = 500,000 tokens → too expensive
Technical Architecture Decisions

Approach

Key Product Decisions

Development Process

V1: Interview-Based MVP (Focus: Prompt Engineering & Feasibility)
  1. Data Prep: Conducting interview, transcription via Whisper, structuring into categorized JSON.

  2. Prompt Engineering: Designing and iteratively refining the "Comprehensive Document Overview" based on the JSON data and interaction principles. Focused on clarity, structure, and persona consistency.

  3. Testing & Iteration: Manually testing prompt versions within the ChatGPT interface (Custom Instructions) against target questions. Identifying and correcting issues like hallucination or inaccurate recall.

  4. Demo: Configuring the final Custom GPT as the MVP for demo.

Key Technical Challenges & Solutions

Challenge 1: LLM Cost & Context Limits:

GPT-4o, while capable, still presented cost barriers for API use and potential context limits for the additional information beyond interview.

Solution: V1 bypassed API costs via ChatGPT interface; meticulous prompt structuring maximized information density within the available context.

Challenge 2: Model Limitations (V1 - MVP):

GPT-4o showed still had accuracy challenges, though less so than other models; subjective evaluation is imprecise.

Solution: Focused subjective tests on core personality/recall and the need for RAG for V2.

Ensuring responses were grounded in the interview data, not generic LLM knowledge or hallucination.

Solution: Explicit instructions in the prompt to only use provided document content; subjective testing focused on identifying and refining prompts to reduce hallucination.

Challenge 3: Accuracy & Grounding:

Product Insights

  1. Accessibility Potential: The simplicity of the interview method (vs. complex data pipelines) can enable less technical individuals to create personal replicas or capture loved ones' stories.

  2. Proactive Privacy is Paramount: Building AI from sensitive personal data (like interviews, journals, etc) requires strict privacy boundaries and guardrails in the design and prompts from the outset, not as an add-on.

  3. Hallucination Requires Expectation Management: LLMs hallucinate beyond their training data; clearly defining the twin's knowledge limits (e.g., "based only on the interview") is crucial for maintaining user trust.

Technical Learnings

  1. LLM Trade-offs (Cost/Context/Accuracy): GPT-4o offered usable context/accuracy for V1 demo. Cost remains a primary constraint for API-driven approaches without RAG.

  2. Structured Prompting Effectiveness: Structured prompts using categorization and clear instructions significantly improve large-context performance and reduce persona drift compared to unstructured data dumps.

  3. Necessity of RAG for Scalability: Attempting to fit large, static datasets (like journals) entirely into prompt context is inefficient and costly; V1 validated that RAG is technically necessary for scaling beyond limited interview data.

Picture Source: Ammon Haggerty

My personal master prompt isn't shown due to privacy

Next Steps (V2 - Ongoing)

  1. Implement RAG: Transition from full-context prompting to RAG using journal entries to overcome V1's data limitations and cost issues.

  2. Explore Advanced Models: Test latest models (newer GPT versions, Gemini) with RAG implementation.

  3. Develop Custom Interface: Create a dedicated interface using APIs for better control and interaction design (potentially incorporating multi-modal elements). Leverage NVIDIA's AI Blueprint: Digital Humans.