A collection of quick learnings, insights, and “aha” moments. Think of this as my digital garden - a space for growing ideas that might not be fully-formed blog posts yet, but are worth sharing.
ai
Claude 3.5 Sonnet Outperforms GPT-4 on Code
In my testing, Claude 3.5 Sonnet consistently produces better code than GPT-4, especially for complex refactoring tasks. The key is its stronger instruction following and lower hallucination rate on technical details. Context window of 200k tokens makes it ideal for large codebases.
#llm#claude#coding
ai
RAG Chunking: Semantic > Fixed-Size
Switched from fixed 500-token chunks to semantic chunking using sentence-transformers for topic boundaries. Retrieval precision jumped from 73% to 89%. The key insight: preserve logical units of meaning, not arbitrary character counts.
#rag#embeddings#nlp
ai
Fine-tuning vs RAG: When to Choose Each
Fine-tuning is for style/behavior changes. RAG is for knowledge injection. Tried fine-tuning GPT-3.5 on domain docs - model hallucinated more. Same docs in RAG pipeline: 94% accuracy. Fine-tune for HOW the model responds, use RAG for WHAT it knows.
#fine-tuning#rag#llm
ai
Vector DB Showdown: Pinecone vs Chroma vs FAISS
For production RAG: Pinecone wins on managed infra + hybrid search. For local dev: Chroma is perfect (embedded, Python-native). For scale on a budget: FAISS with IVF index handles 10M+ vectors on a single GPU. Match your tool to your scale.
#vector-db#rag#infrastructure
ai
Prompt Engineering: Chain-of-Thought Still Wins
Tested various prompting strategies on reasoning tasks. Simple CoT ("Let's think step by step") still beats zero-shot by 15-20% on complex tasks. But for simple extraction/classification, zero-shot is faster and cheaper. Know when to use each.
#prompting#llm#optimization
ai
LangChain vs LlamaIndex for RAG
LangChain: better for complex agent workflows and tool use. LlamaIndex: superior for pure RAG with better out-of-box retrieval. For my blog's AI chat, LlamaIndex with sentence-transformers + Supabase pgvector was the sweet spot.
#langchain#llamaindex#rag
ai
Embedding Model: text-embedding-3-small is Enough
OpenAI's text-embedding-3-small (1536d) performs nearly as well as ada-002 for semantic search at 5x lower cost. For most RAG use cases, the smaller model is the right choice. Only use large (3072d) for cross-lingual or highly nuanced domains.
#embeddings#openai#cost-optimization
ai
Anthropic's Constitutional AI Approach
Reading Anthropic's research on Constitutional AI. The key insight: instead of RLHF with human feedback, use AI feedback guided by a "constitution" of principles. Results in more consistent, explainable alignment. Claude's helpfulness comes from this approach.
#alignment#anthropic#research
ai
Local LLMs: Ollama + Mistral 7B is Surprisingly Good
Running Mistral 7B locally via Ollama for private/offline tasks. Quantized to 4-bit, it runs at 30 tokens/sec on M2 Mac. Quality is ~80% of GPT-3.5 for most tasks. Perfect for development, testing prompts, and privacy-sensitive applications.
#local-llm#ollama#mistral
ai
Multimodal AI: Vision Models for Document Parsing
GPT-4V and Claude Vision are game-changers for document understanding. Parsing complex PDFs with tables/charts that failed traditional OCR? Just screenshot and ask the model. Accuracy jumped from 60% (PyPDF) to 95% (vision model). Cost is higher but worth it for complex docs.
#multimodal#vision#document-ai
Let's Create Something Amazing Together
Whether you have a project in mind or just want to chat, I'm always open to discussing new opportunities and ideas.