Skip to content

Today I Learned

A collection of quick learnings, insights, and “aha” moments. Think of this as my digital garden - a space for growing ideas that might not be fully-formed blog posts yet, but are worth sharing.

ai

Claude 3.5 Sonnet Outperforms GPT-4 on Code

In my testing, Claude 3.5 Sonnet consistently produces better code than GPT-4, especially for complex refactoring tasks. The key is its stronger instruction following and lower hallucination rate on technical details. Context window of 200k tokens makes it ideal for large codebases.

#llm#claude#coding
ai

RAG Chunking: Semantic > Fixed-Size

Switched from fixed 500-token chunks to semantic chunking using sentence-transformers for topic boundaries. Retrieval precision jumped from 73% to 89%. The key insight: preserve logical units of meaning, not arbitrary character counts.

#rag#embeddings#nlp
ai

Fine-tuning vs RAG: When to Choose Each

Fine-tuning is for style/behavior changes. RAG is for knowledge injection. Tried fine-tuning GPT-3.5 on domain docs - model hallucinated more. Same docs in RAG pipeline: 94% accuracy. Fine-tune for HOW the model responds, use RAG for WHAT it knows.

#fine-tuning#rag#llm
ai

Vector DB Showdown: Pinecone vs Chroma vs FAISS

For production RAG: Pinecone wins on managed infra + hybrid search. For local dev: Chroma is perfect (embedded, Python-native). For scale on a budget: FAISS with IVF index handles 10M+ vectors on a single GPU. Match your tool to your scale.

#vector-db#rag#infrastructure
ai

Prompt Engineering: Chain-of-Thought Still Wins

Tested various prompting strategies on reasoning tasks. Simple CoT ("Let's think step by step") still beats zero-shot by 15-20% on complex tasks. But for simple extraction/classification, zero-shot is faster and cheaper. Know when to use each.

#prompting#llm#optimization
ai

LangChain vs LlamaIndex for RAG

LangChain: better for complex agent workflows and tool use. LlamaIndex: superior for pure RAG with better out-of-box retrieval. For my blog's AI chat, LlamaIndex with sentence-transformers + Supabase pgvector was the sweet spot.

#langchain#llamaindex#rag
ai

Embedding Model: text-embedding-3-small is Enough

OpenAI's text-embedding-3-small (1536d) performs nearly as well as ada-002 for semantic search at 5x lower cost. For most RAG use cases, the smaller model is the right choice. Only use large (3072d) for cross-lingual or highly nuanced domains.

#embeddings#openai#cost-optimization
ai

Anthropic's Constitutional AI Approach

Reading Anthropic's research on Constitutional AI. The key insight: instead of RLHF with human feedback, use AI feedback guided by a "constitution" of principles. Results in more consistent, explainable alignment. Claude's helpfulness comes from this approach.

#alignment#anthropic#research
ai

Local LLMs: Ollama + Mistral 7B is Surprisingly Good

Running Mistral 7B locally via Ollama for private/offline tasks. Quantized to 4-bit, it runs at 30 tokens/sec on M2 Mac. Quality is ~80% of GPT-3.5 for most tasks. Perfect for development, testing prompts, and privacy-sensitive applications.

#local-llm#ollama#mistral
ai

Multimodal AI: Vision Models for Document Parsing

GPT-4V and Claude Vision are game-changers for document understanding. Parsing complex PDFs with tables/charts that failed traditional OCR? Just screenshot and ask the model. Accuracy jumped from 60% (PyPDF) to 95% (vision model). Cost is higher but worth it for complex docs.

#multimodal#vision#document-ai

Let's Create Something Amazing Together

Whether you have a project in mind or just want to chat, I'm always open to discussing new opportunities and ideas.