
No cloud, no subscriptions, just pure freedom. Some tech pros are ditching proprietary models for local AI, running increasingly capable models on everything from old Nvidia GTX 1070 gaming GPUs and mobile phones to very powerful machines. What do they manage to achieve?
It’s evident that the adaptation of the open-weight AI models like Llama, Mistral, DeepSeek, and Qwen is flourishing. Hardware choices are even more diverse, ranging from old GPUs and mobile devices to data centers.
Cybernews talked to tech experts who’ve adapted small AI models to perform many tasks, such as agentic software testing and design feedback, healthcare data structuring, vulnerability orchestration, and even academic research.
Motivations range from ensuring data privacy and compliance to enabling controlled experimentation, rapid prototyping, low latency, and cost-effectiveness compared to cloud solutions.
Here’s what professionals shared about their setups, use cases, and experiences.
1. MacBook plus Ollama minus guardrails
Proprietary models sometimes are overly restrictive in what responses they provide.
Josh Jacobson, director of professional services at HackerOne, a cybersecurity company, uses Ollama to efficiently run various models locally.
“I am currently running an abliterated model by BlackHillsInfosec on a MacBook Pro. For my use case, I do not need too much advanced hardware,” Jacobson said.
Abliterated large language models (LLMs) are “uncensored” – modified to remove their built-in refusal mechanisms, making the model more obedient.
“I use it to connect with a vulnerability orchestrator that I wrote, as well as to help me create enhanced and/or custom scanner templates based on bugs I have found or what code I can send to the model from the asset I am researching,” Jacobson explained.
“I also use it to ask security questions that many models these days will not answer or answer effectively.”
Running models locally offers greater variety and flexibility, freeing from reliance on any specific platform.
“I am not beholden to any given ecosystem. I also like to run it locally as I can control and secure my hardware, as well as nothing trains on my prompts or approaches unless I want to do so myself,” the expert said.
Jacobson likened the AI tool to a “good partner” that helps in large-scale testing and identifies key areas that might otherwise be missed.
2. For researchers, local AI models are a must
John Licato, PhD, is an associate professor at the Bellini College of Artificial Intelligence, Cybersecurity, and Computing at the University of South Florida (USF). He also leads the Advancing Machine and Human Reasoning (AMHR) Lab, and owns a startup, Actualization.AI.
“In my research lab, we regularly run Llama, Mistral, and DeepSeek models on our college's GAIVI compute cluster, mostly NVIDIA A100s,” Licato said.
“We are researching how agents can optimally interact with each other.”
This field raises many interesting questions. Should multiple AI agents working together to solve a problem have the same “personality?” Or should an AI moderator step in and prevent one AI agent from dominating the conversation?
Running AI locally is essential to answer these questions. It provides a level of transparency that researchers just don't get with closed models like those from OpenAI or Claude.
“By itself, it doesn't fully meet our needs, as there is a noticeable performance trade-off, but understanding and improving the capabilities of smaller LMs is an important goal of our research,” Licato said.
3. If it works locally, the cloud-based model will do even better
Another reason why experts choose local AI is to validate their ideas fast without extra spending, according to Anton Cheplyukov, AI Practice Lead at Coherent Solutions, a full-stack software development company in Minneapolis, Minnesota.
“Our team uses Ollama (Qwen3 and Gemma3) for early prototyping. The main reasons are cost and privacy,” Cheplyukov said.
“I like using local AI for proof of concept and treat it as a quick test bed: if I'm able to build a decent approach using a local (quantized) model, then a cloud-based model (a heavier one) will almost certainly do better.”
However, the developers don’t use local models for client-related work because they’re “much lower from an accuracy perspective."
4. In healthcare, not a single bit to the cloud
Venkat Ramamurthy, founder and CEO of TrackHealthAI LLC, a health tech startup, emphasizes local AI’s crucial role in fostering customer trust and ensuring HIPAA compliance. Local AI is at the core of the startup’s strategy.
“We are innovating with lightweight, on-device AI models optimized for Apple’s CoreML framework. These models are designed to run locally on smartphones and tablets. Today, these devices already have powerful neural engines capable of handling billions of parameters efficiently,” Ramamurthy said.
The startup is building a patient-first platform and uses local AI to manage healthcare episodes, structure intake forms, from preventive care to dental, cardiology, and specialty visits, create “episode cards,” and send follow-up reminders.
“On-device AI approach ensures that sensitive data never leaves the device and allows for ‘peace of mind' data storage,” the expert added.
The founder mentioned other benefits of running AI locally. For example, it improves latency and delivers insights without connectivity, which is important when delivering efficient and responsive patient-centered care.
Similarly, Matt Hasan, CEO at aiRESULTS, Inc., a company providing AI solutions for marketing optimization, customer loyalty, and healthcare, believes that local AI is reshaping all three fields, and is “where privacy meets performance.”
“We run LLaMA 2 (7B and 13B), Mistral, and GGUF variants locally on a dual-RTX 4090 workstation, with lighter models on Jetson Nano and Raspberry Pi clusters,” Hasan said.
“Running AI locally gives us privacy, speed, and control – things that matter most in high-trust environments.”
The company has built a local AI engine that “reads behavioral signals across the customer journey to predict churn and optimize customer acquisition, retention, and expansion.”
“In healthcare, we use local NLP (Natural Language Processing) models to flag medication conflicts and simulate care plans without sending a single record to the cloud,” Hasan explains.
“Cloud models are still very valuable, but local AI is fast becoming the backbone of responsible, high-impact AI, where trust, data integrity, and human judgment all come together.”
5. AI brings new life to an old GTX 1070
Hyrum Hurst, an automation expert building a startup called QuarterSmart, says Nvidia GTX 1070 is all it takes for him to run a couple of DeepSeek Distill models from LM Studio, a platform for running AI models locally.
“It's light enough to be run on most computers. My personal setup using the GTX 1070 runs so well,” Hurst said.
The GTX 1070 is a graphics card from NVIDIA that was launched nearly a decade ago. It features 8GB of fast GDDR5 memory and is available on the second-hand market, usually for less than $100.
Hurst uses AI to prototype workflows in Make, n8n, and Google’s AI Studio for automation. Local AI helps build and debug automation logic offline, before scaling it to production environments.
The main advantage is faster iteration loops and complete data privacy, which is critical when testing automations involving user-sensitive workflows.
“It's like having a mini sandbox where I can experiment without relying on external APIs.”
Hurst sees local AI as the bridge between theory and deployment. Once the model works well on a local machine, it can be integrated into cloud workflows for client-facing systems.
“In conjunction with AI, these automation tools will enable me to construct systems saving hundreds of manual hours a month for small businesses,” the founder believes.
6. What does the client want? Ask local AI
Salome Mikadze, Co-Founder at Movadex, a software development firm, has been running Mistral 7B and Llama 3 8B locally on a Mac Studio M2 Ultra for fast experimentation. It helps sketch initial project concepts without sending client data to external AI services.
“Our use case sits somewhere between user journey mapping and design feedback. We use local inference to test how AI can interpret these messy founder briefs,” Mikadze said.
Instant responses are also useful when updating prototypes, and local AI has become an internal R&D “companion.”
“It’s like a controlled sandbox that mirrors the intelligence of cloud models while respecting privacy and agility,” Mikadze explained.
7. A team of AI engineers on one GPU
Twinkle Joshi, a Senior Quality Assurance Test Engineer at IQGeo Solutions Canada, uses local AI to explore agentic AI flows for software testing.
The expert employs multiple intelligent agents to work in various roles, such as test case generation, validation, prioritization, and self-optimization.
Local AI allows for testing configurations, observing agent coordination, and validating results without restrictions and in a secure, isolated environment.
“I've been running Llama 3 locally to explore agentic flow orchestration for AI-assisted testing workflows. I have a Windows workstation with an Nvidia RTX 4090 GPU (24GB VRAM), 64GB RAM, and a 13th Gen Intel Core i9 processor, which is enough performance to run small to mid-sized model inference locally without the use of external APIs,” Joshi said.
The expert uses Ollama as a platform to run AI models.
While cloud-based LLMs such as GPT-4 or Claude offer better scale and intelligence, Llama 3 is already a stable, transparent, and cost-effective model for day-to-day experimentation.
“Having Llama 3 running locally gives me full control of how the agents learn, reason, and cooperate in testing environments,” Joshi said.
Some don’t believe local AI is fit for real-world applications
For Faizel Khan, a Lead AI Engineer at Landing Point, a NYC-based executive search and recruiting firm, the capabilities proprietary models offer are key.
“We don’t run AI models locally, and that’s a deliberate choice,” Khan said.
“Our goal is to build automation that can be trusted, not just demo-grade intelligence. Local models, by nature, trade capability for convenience: small ones are cheap but shallow; large ones are smart but costly and brittle to maintain.”
The expert explains that these trade-offs break automation in production.
“You can’t build dependable workflows on models that miss nuance or hallucinate under resource pressure. That’s why we rely on remote, high-fidelity models, which deliver the reasoning depth and consistency automation demands,” Khan said.
However, the engineer believes local AI still has a place for privacy-sensitive and “low-IQ tasks,” such as sorting and tagging.
“It’s not where real intelligence lives. As of today, local AI is a sandbox; cloud AI is the factory,” Khan concludes.
Unlock more exclusive Cybernews content on YouTube.
Your email address will not be published. Required fields are markedmarked