Why on-device matters
Cloud AI assistants send your prompts — and often your API context — to remote servers. For internal APIs, auth tokens, and production endpoints, that's a non-starter. Son of Anton runs inference through a statically linked llama.cpp with GPU acceleration when available and a clean CPU fallback.
Postman's Postbot and similar cloud assistants bill per credit. Ripple's AI costs nothing in API fees because the model runs on your hardware.
Hardware & performance
Apple Silicon Macs offer the smoothest experience. On Windows, Linux, and Intel Mac, AI features are fully supported — but speed depends on your GPU and CPU, and may feel slow on weaker hardware.
Load Test Lab AI planning and Composer autocomplete use the same local stack — expect similar hardware sensitivity outside Apple Silicon.
What Son of Anton can do
Ask in plain English. The model calls tools to interact with Ripple:
- list_collections, list_requests, inspect_request, get_request
- run_request — send saved requests and read responses
- get_variables, set_variable — read and write collection/environment state
- get_history, get_last_response — triage recent traffic
- Extend via MCP servers (stdio and remote transports, OAuth, keychain secrets)
Once a tool call starts, a GBNF grammar guarantees well-formed JSON. Chat history persists locally. Sub-tabs cover Chat, Action Log, History, and Models.
Model catalog
Download curated Q4_K_M GGUF models with progress, pause, resume, and cancel:
- Hermes 3 Llama 3.1 8B
- Qwen 2.5 7B
- Qwen 2.5 Coder 7B
- Phi 3.5 Mini
The LLM stack ships in the default build but can be excluded with --no-default-features for a slimmer distributable.
AI beyond chat
- Load Test Lab — describe a scenario in English, get a load test plan
- Composer autocomplete ranked by past phrases, request names, and collection names
- MCP tool surface extends what the co-pilot can reach without cloud plugins