Show HN: Llama Running on a Microcontroller

maxbbraun · on Nov 15, 2023

I was wondering if it's possible to fit a non-trivial language model on a microcontroller. Turns out the answer is some version of yes!

This project is using the Coral Dev Board Micro with its FreeRTOS toolchain. The board has a number of neat hardware features not currently being used here (notably a TPU, sensors, and a second CPU core). It does, however, also have 64MB of RAM. That's tiny for LLMs, which are typically measured in the GBs, but comparatively huge for a microcontroller.

The LLM implementation itself is an adaptation of llama2.c and the tinyllamas checkpoints trained on the TinyStories dataset. The quality of the smaller model versions isn't ideal, but good enough to generate somewhat coherent (and occasionally weird) stories.

RecycledEle · on Nov 15, 2023

The "microcontroller" is a Coral AI accelerator.

maxbbraun · on Nov 15, 2023

Just to clarify: Inference is happening on the Arm Cortex-M7. The Coral TPU chip is off in this implementation.

RecycledEle · on Nov 16, 2023

Thank You.

AMICABoard · on Nov 15, 2023

epic!