On a ten-hour flight from London to Las Vegas, a tech enthusiast decided to push the limits of local AI by running large language models (LLMs) offline on a high-end MacBook Pro. With no in-flight Wi-Fi, the experiment highlighted both the potential and the limitations of relying on local computation for serious engineering tasks.
Running local LLMs offline isn’t just a gimmick. It’s a glimpse into a future where AI can be harnessed without constant cloud dependency. Using a MacBook Pro M5 Max with 128GB of memory, the traveler ran Gemma 4 31B and Qwen 4.6 36B models via LM Studio. The goal? To build a billing analytics tool covering two years of cloud spending data, revealing insights that standard dashboards often miss.
This setup isn’t just about showcasing hardware prowess. It underscores a growing trend where engineers and developers seek more autonomy from cloud services. While cloud models offer vast capabilities, they come with costs and dependencies that local models can mitigate. However, this approach isn’t without challenges. Power consumption was a major hurdle, with the MacBook draining 1% of battery per minute under heavy load, even when plugged into a 60W power source. Heat was another issue, making prolonged use uncomfortable.
The experiment also exposed the limits of context management. Beyond 100k tokens, performance degraded, and the models occasionally fell into infinite loops, requiring manual intervention. Despite these setbacks, the endeavor highlighted the practicality of local inference for specific tasks like refactoring and CLI scaffolding, where the performance was on par with cloud-based models.
In the competitive landscape, this experiment raises questions about the balance between local and cloud AI. While NVIDIA dominates the AI hardware space, Apple Silicon’s performance-per-watt efficiency presents a compelling case for battery-constrained environments. This could influence how future devices are designed and utilized, especially in mobile or remote settings.
The implications for engineers and founders are significant. Local inference offers a chance to reduce cloud costs and enhance privacy by keeping data processing offline. However, it demands a disciplined approach to resource management, which could lead to more efficient cloud usage when necessary. This experiment also suggests that understanding the physical costs of AI—like power and heat—can lead to more informed decisions about when and how to use cloud resources.
As the traveler prepares for the return flight with the correct cable, the focus will shift to exploring Neural Engine-powered small LLM models. These models promise efficiency in speed and power consumption, potentially offering a middle ground between the limitations of local hardware and the expansive capabilities of the cloud.
For engineers and tech founders, this experiment serves as a reminder to critically evaluate where AI workloads are best executed. The choice between local and cloud isn’t just about capability but also about cost, autonomy, and efficiency. As AI continues to evolve, staying informed about these dynamics will be crucial for making strategic decisions that balance innovation with practical constraints.




















