Skip to content
View Siddhesh2377's full-sized avatar
🪨
Eating Stones
🪨
Eating Stones

Block or report Siddhesh2377

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Siddhesh2377/README.md

Siddhesh Sonar

On-Device AI / Edge Inference Engineer

I make large AI models run fast on mobile hardware. I work across the full stack :: from tensor-level optimizations in GGML/C++, through JNI bindings, to production Android apps. Currently building on-device inference infrastructure at RunAnywhere (YC W26).


What I've Built

ToolNeuron :: Production offline AI ecosystem for Android. 500+ commits. Native C++ inference via llama.cpp with custom JNI bindings. Plugin sandboxing with hardware-backed encryption (Android KeyStore). GGUF model management, runtime model switching, offline TTS (Sherpa-ONNX), and OTA updates. 2K+ Play Store installs.

Ai-Systems-New :: The native C/C++ inference engine powering ToolNeuron. Direct GGML integration, custom tensor operations optimized for mobile SOCs.

ForgeAI :: Toolkit for SafeTensors and GGUF model operations :: inspection, conversion, and manipulation.

N1 :: Experimental self-rewriting neural architecture using local error signals. No backpropagation. Runtime weight mutation based on surprise signals.


What I Know (Deeply, Not Surface Level)

Inference on constrained hardware :: GGML internals, compute graph construction for new model architectures, ML op scheduling across CPU/GPU/NPU, quantization scheme behavior on real devices (Q4_K_M, Q5_K_S, Q8_0).

Mobile SOC architectures :: Qualcomm Hexagon DSP (HVX vector extensions), Adreno GPU compute pipelines (Vulkan, timeline semaphores), QNN SDK for NPU graph compilation, ARM CPU architecture differences across Android devices.

Production Android :: NDK/JNI, Jetpack Compose, plugin SDK design, secure IPC, encrypted inference pipelines, AOSP-level optimizations.


Currently

  • Building cross-platform on-device inference infrastructure at RunAnywhere (YC W26)
  • Deepening formal math foundations (linear algebra, quantization theory)
  • Maintaining and shipping ToolNeuron updates

Contact

siddheshsonar2377@gmail.com · LinkedIn

Open to full-time roles in edge AI, on-device inference, and mobile AI infrastructure.

Pinned Loading

  1. ToolNeuron ToolNeuron Public

    On-device AI for Android — LLM chat (GGUF/llama.cpp), vision models (VLM), image generation (Stable Diffusion), tool calling, AI personas, RAG knowledge packs, TTS/STT. Fully offline, zero subscrip…

    Kotlin 301 24

  2. Ai-Systems-New Ai-Systems-New Public

    On-device AI SDK powering ToolNeuron — LLM chat & tool calling (llama.cpp), Stable Diffusion image generation (QNN/MNN), image processing (upscale, segment, inpaint, depth, style), and TTS. Native …

    C++ 9 2

  3. ForgeAi ForgeAi Public

    ForgeAI : Your local model workshop, Load. Inspect. Merge. Ship.

    Rust 4

  4. llama.cpp-android llama.cpp-android Public

    Custom llama.cpp fork with character intelligence engine: control vectors, attention bias, head rescaling, attention temperature, fast weight memory

    C++ 4 2