The AI tooling space evolves monthly. Here's my assessment of what's working, what's not, and what's emerging in 2026:
Rising
1. ONNX Runtime
Framework-agnostic inference
Local + cloud, CPU + GPU + NPU
What I use for local inference2. Vercel AI SDK
Simple streaming + tools
Works across providers
The "default" for web AI apps3. Ollama
Pull and run models locally
CLI-first, minimal fuss
Best for local experimentation4. Open Weights Models
Qwen, Phi, Llama variants
No API dependency
Good enough for most tasksDeclining
1. Custom Model Training
Except for specific domains, fine-tuning isn't worth it. The gap between base models and fine-tuned is shrinking.
2. Vector Databases for Small Use
Postgres has vector support now
For most apps, PG or SQLite is enough
Dedicated vector DBs are overkill unless at scale3. Framework-Locked Solutions
If it only works with OpenAI, it's a risk
Multi-provider support is now baselineWhat's Emerging
Agentic Frameworks
AutoGen, LangGraph, CrewAI—the agent orchestration space is consolidating.
Edge Deployment
WASM-based inference
Browser-run models (WebGPU)
The "local" extends to client-sideSound and Video
Real-time voice integration
Video generation (Sora-class)
Not just text anymoreMy Stack
ONNX Runtime (inference)
Qwen 0.5B (local model)
React + Modern.js (frontend)
Python backend (flexibility)Simple, replaceable, local-first. That's the philosophy.
Article 7 of 10 - AI Industry Series