AI Digital Twin Avatar
Creating an AI version of myself (Fall 2024 - Ongoing)
1/31/20254 min read
The Challenge
Problem Discovery: Explore efficient personality replication via AI inspired by research form Stanford/DeepMind.
Primary Hurdle: The prohibitive cost (>500k tokens / ~$1500+) and complexity of using 6 years of personal journal data.
Alternative Approach: Test the feasibility of using a concise, structured 2-hour interview (~21k tokens) as the primary data source.
Core Question: Can advanced prompt engineering with limited interview data yield a functionally accurate personality replica, effectively navigating LLM limitations and cost constraints?
Feature Prioritization: I focused on validating the interview-to-textual-replica pipeline to build an MVP:
Methodology: Build a Minimum Viable Product (MVP).
Technology Stack: Utilize GPT-4o (chosen for optimal output) and Whisper API for transcription.
Training Data: Limited exclusively to the 2-hour interview transcript.
Results: Successfully developed a functioning MVP chatbot capable of recalling interview details & reflecting core personality traits (e.g., Myers-Briggs INTJ). This validated the interview method's potential while understanding the limitations of non-RAG approaches and highlighting key areas for V2 improvement.


Project Summary


Core Tools Used:
AI/ML: ChatGPT-4o, OpenAI Whisper API, Claude 3.5 Sonnet, Gemini 1.5 Pro
Languages: Python, JSON,
\Key Concepts/Methods: AI Digital Twin, Prompt Engineering (Large Corpus Structuring), LLM Cost Analysis, LLM Evaluation (Subjective Fidelity), Natural Language Processing (NLP), Conversational AI, API Integration.
Core Skills Learned:
Advanced Prompt Engineering for Large Contexts, LLM API Integration & Cost Management, Qualitative AI Evaluation, Python Scripting for AI Workflows.
Timeline:
V1 (MVP): Fall 2024
V2: Winter 2024 - Ongoing
Architecture - V1 (MVP):
2 hr Interview → Whisper API (Transcription) → Python Script (Loads Interview JSON & Master Prompt) → Manual Interaction via ChatGPT Interface (using GPT-4o with Custom Instructions mimicking Master Prompt)
6 years of journal entries: 375K words = 500,000 tokens → too expensive
Technical Architecture Decisions
Approach
Key Product Decisions




Development Process


V1: Interview-Based MVP (Focus: Prompt Engineering & Feasibility)
Data Prep: Conducting interview, transcription via Whisper, structuring into categorized JSON.
Prompt Engineering: Designing and iteratively refining the "Comprehensive Document Overview" based on the JSON data and interaction principles. Focused on clarity, structure, and persona consistency.
Testing & Iteration: Manually testing prompt versions within the ChatGPT interface (Custom Instructions) against target questions. Identifying and correcting issues like hallucination or inaccurate recall.
Demo: Configuring the final Custom GPT as the MVP for demo.
Key Technical Challenges & Solutions
Challenge 1: LLM Cost & Context Limits:
GPT-4o, while capable, still presented cost barriers for API use and potential context limits for the additional information beyond interview.
Solution: V1 bypassed API costs via ChatGPT interface; meticulous prompt structuring maximized information density within the available context.
Challenge 2: Model Limitations (V1 - MVP):
GPT-4o showed still had accuracy challenges, though less so than other models; subjective evaluation is imprecise.
Solution: Focused subjective tests on core personality/recall and the need for RAG for V2.
Ensuring responses were grounded in the interview data, not generic LLM knowledge or hallucination.
Solution: Explicit instructions in the prompt to only use provided document content; subjective testing focused on identifying and refining prompts to reduce hallucination.
Challenge 3: Accuracy & Grounding:
Product Insights
Accessibility Potential: The simplicity of the interview method (vs. complex data pipelines) can enable less technical individuals to create personal replicas or capture loved ones' stories.
Proactive Privacy is Paramount: Building AI from sensitive personal data (like interviews, journals, etc) requires strict privacy boundaries and guardrails in the design and prompts from the outset, not as an add-on.
Hallucination Requires Expectation Management: LLMs hallucinate beyond their training data; clearly defining the twin's knowledge limits (e.g., "based only on the interview") is crucial for maintaining user trust.
Technical Learnings
LLM Trade-offs (Cost/Context/Accuracy): GPT-4o offered usable context/accuracy for V1 demo. Cost remains a primary constraint for API-driven approaches without RAG.
Structured Prompting Effectiveness: Structured prompts using categorization and clear instructions significantly improve large-context performance and reduce persona drift compared to unstructured data dumps.
Necessity of RAG for Scalability: Attempting to fit large, static datasets (like journals) entirely into prompt context is inefficient and costly; V1 validated that RAG is technically necessary for scaling beyond limited interview data.


Picture Source: Ammon Haggerty
My personal master prompt isn't shown due to privacy
Next Steps (V2 - Ongoing)
Implement RAG: Transition from full-context prompting to RAG using journal entries to overcome V1's data limitations and cost issues.
Explore Advanced Models: Test latest models (newer GPT versions, Gemini) with RAG implementation.
Develop Custom Interface: Create a dedicated interface using APIs for better control and interaction design (potentially incorporating multi-modal elements). Leverage NVIDIA's AI Blueprint: Digital Humans.

