Today I Learned

A collection of quick learnings, insights, and “aha” moments. Think of this as my digital garden - a space for growing ideas that might not be fully-formed blog posts yet, but are worth sharing.

Jan 18, 2025

Claude 3.5 Sonnet Outperforms GPT-4 on Code

In my testing, Claude 3.5 Sonnet consistently produces better code than GPT-4, especially for complex refactoring tasks. The key is its stronger instruction following and lower hallucination rate on technical details. Context window of 200k tokens makes it ideal for large codebases.

#llm#claude#coding

Jan 17, 2025

RAG Chunking: Semantic > Fixed-Size

Switched from fixed 500-token chunks to semantic chunking using sentence-transformers for topic boundaries. Retrieval precision jumped from 73% to 89%. The key insight: preserve logical units of meaning, not arbitrary character counts.

#rag#embeddings#nlp

Jan 16, 2025

Fine-tuning vs RAG: When to Choose Each

Fine-tuning is for style/behavior changes. RAG is for knowledge injection. Tried fine-tuning GPT-3.5 on domain docs - model hallucinated more. Same docs in RAG pipeline: 94% accuracy. Fine-tune for HOW the model responds, use RAG for WHAT it knows.

#fine-tuning#rag#llm

Jan 15, 2025

Vector DB Showdown: Pinecone vs Chroma vs FAISS

For production RAG: Pinecone wins on managed infra + hybrid search. For local dev: Chroma is perfect (embedded, Python-native). For scale on a budget: FAISS with IVF index handles 10M+ vectors on a single GPU. Match your tool to your scale.

#vector-db#rag#infrastructure

Jan 14, 2025

Prompt Engineering: Chain-of-Thought Still Wins

Tested various prompting strategies on reasoning tasks. Simple CoT ("Let's think step by step") still beats zero-shot by 15-20% on complex tasks. But for simple extraction/classification, zero-shot is faster and cheaper. Know when to use each.

#prompting#llm#optimization

Jan 13, 2025

LangChain vs LlamaIndex for RAG

LangChain: better for complex agent workflows and tool use. LlamaIndex: superior for pure RAG with better out-of-box retrieval. For my blog's AI chat, LlamaIndex with sentence-transformers + Supabase pgvector was the sweet spot.

#langchain#llamaindex#rag

Jan 12, 2025

Embedding Model: text-embedding-3-small is Enough

OpenAI's text-embedding-3-small (1536d) performs nearly as well as ada-002 for semantic search at 5x lower cost. For most RAG use cases, the smaller model is the right choice. Only use large (3072d) for cross-lingual or highly nuanced domains.

#embeddings#openai#cost-optimization

Jan 11, 2025

Anthropic's Constitutional AI Approach

Reading Anthropic's research on Constitutional AI. The key insight: instead of RLHF with human feedback, use AI feedback guided by a "constitution" of principles. Results in more consistent, explainable alignment. Claude's helpfulness comes from this approach.

#alignment#anthropic#research

Jan 10, 2025

Local LLMs: Ollama + Mistral 7B is Surprisingly Good

Running Mistral 7B locally via Ollama for private/offline tasks. Quantized to 4-bit, it runs at 30 tokens/sec on M2 Mac. Quality is ~80% of GPT-3.5 for most tasks. Perfect for development, testing prompts, and privacy-sensitive applications.

#local-llm#ollama#mistral

Jan 9, 2025

Multimodal AI: Vision Models for Document Parsing

GPT-4V and Claude Vision are game-changers for document understanding. Parsing complex PDFs with tables/charts that failed traditional OCR? Just screenshot and ask the model. Accuracy jumped from 60% (PyPDF) to 95% (vision model). Cost is higher but worth it for complex docs.

#multimodal#vision#document-ai