As access to AI models has become more widely available and more and more people are using these tools on a regular basis, many organizations are interested in providing AI tools to their employees and customers. A key problem with the publicly available APIs is that they have varying levels of security and privacy that don’t always meet the needs of these organizations’ specific use cases. Because of these concerns, standing up a private AI instance can be an attractive alternative.

Our goal was to evaluate two different options, DeepSeek (on EC2) and OpenAI (on Azure), and investigate the setup process, costs, and how realistic it would be for an organization to get one of these running as a private AI instance. In this blog, we will discuss our findings and key takeaways.

Why Companies Consider Private AI Instances

At Keyhole Software, we collaborate with companies across industries, giving us unique insight into why organizations seek private AI instances. The two biggest reasons that we’ve seen cause an organization to be interested in a private AI instance are data security & compliance, and performance & customization.

Running a private AI instance ensures regulatory compliance (e.g., HIPAA and GDPR) while protecting proprietary data, business logic, and trade secrets from external exposure. And, unlike public AI models, private instances allow organizations to tailor AI to their specific needs. Common customizations that are attractive to our clients include:

Fine-Tuning Models – Training AI on proprietary datasets to improve domain-specific performance.
RAG Architecture Implementation – Enhancing knowledge retrieval by integrating AI with internal data sources. Learn more about RAG.
Tool & API Integration – Seamless connection with internal systems, databases, and workflows.
Access Control & Multi-Tenant Support – Managing user permissions and securing sensitive data across enterprise teams.
Performance Optimization – Adjusting model size and infrastructure to balance speed, and accuracy while reducing unnecessary compute expenses

By leveraging private AI, enterprises can enhance security, control, and efficiency while customizing AI capabilities to fit their exact business use cases.

DeepSeek Evaluation

We first evaluated DeepSeek. DeepSeek is a Chinese company that was founded in 2023 and has made its models publicly available.

Since DeepSeek’s models are readily available, we wanted to investigate what it would take to run one of these models privately—and specifically, if it was feasible to run one of these models on an AWS EC2 instance. We chose to explore running DeepSeek on an AWS EC2 instance since many organizations, as well as many of our clients, are already using AWS and are already familiar with it. We wanted to see if running DeepSeek in this way could provide a relatively seamless way to integrate a private AI instance into an organization’s AWS infrastructure.

Deepseek Models Evaluated

AWS offers several different preconfigured instance options with varying combinations of memory and GPUs. We picked three different instance types at varying price and performance levels to get an idea of what was possible at different price points.

Deepseek Model – Option 1

We ultimately wanted to see if we could run deepseek-v3, but for simplicity, we first ran one of the distilled models.

Model distillation is the process of creating a new, smaller model based on the larger original model. The smaller model is a condensed version of the larger model that requires fewer resources while still producing output similar to the original model.

We first chose one of the more affordable EC2 instance types. We wanted something with at least one GPU and enough memory to run the 4.7GB deepseek-r1:7b model. We decided to start with a g4dn.xlarge instance. It has 1 GPU and 16GB of GPU memory.

We launched the instance and got the server configured. We installed Ollama as the inference engine and used nginx to provide a web-accessible OpenAI style API.

With this configuration, we routinely saw response times of 30-40 seconds for any requests we made. These requests could include asking for code examples, proofreading a document, or even asking for a restaurant recommendation. These models are able to handle all of the same types of requests that a publicly available model would. The estimated monthly cost for this server would be around $375.

Deepseek Model – Option 2

Next, we wanted to run one of the larger distilled models. We chose a similar instance type, but one with more memory and GPUs. We spun up a g4dn.12xlarge instance that had 4 GPUs and 64GB of GPU memory. We went through the same setup process with Ollama and nginx and then ran a larger 40GB distilled model: deepseek-r1:70b.

We were able to get better response times with this configuration. Some responses returned in as little as 10 seconds while others took longer and eventually timed out. We tried a variety of requests including many of the same requests that we used with Option 1. One of the requests that sometimes completed and other times timed out was asking the model to explain why the sky is blue.

The estimated monthly costs for this configuration would be around $2,800 per month.

Deepseek Model – Option 3

Finally, we evaluated what it would take to run the full 405GB deepseek-v3 model.

For this model, we would need a much more powerful EC2 instance. One option would be the p5.48xlarge instance type. It has 8 GPUs and 640GB of GPU memory and is one of the few EC2 instance types currently capable of running this model.

This final option would have an estimated monthly cost of around $70,000. This is obviously a substantial increase over the previous scenarios and is a result of the much higher specification hardware used in this instance. This instance is one of the latest generation instances available in AWS and provides the highest performance in EC2 for the GPU-intensive computing that running AI models requires.

Pricing in AWS increases fairly rapidly with increased performance, as demonstrated in these examples, going from a relatively small basic server to a large top-of-the-line server.

Takeaways From Testing Deepseek As a Private AI Instance

Based on our testing, running a private instance of deepseek-v3 doesn’t seem realistic at this point. It would be cost-prohibitive for most organizations to run an EC2 instance powerful enough to provide an acceptable level of performance for most use cases.

We found some discussion that indicated that it might be possible to run DeepSeek for around $50,000–60,000 per month, but the general consensus confirmed our findings that it would be closer to $70,000 per month (source 1 + source 2). This approach also requires the initial setup of the server and would require ongoing server administration work to keep the instance updated and secure. Given these findings, this approach is not going to be a viable option for most organizations.

DeepSeek Models & EC2 Testing

Model	Instance Type	GPUs	GPU Memory	Performance	Estimated Cost
DeepSeek-R1:7B (4.7GB)	g4dn.xlarge	1	16GB	Slow – 30-40s response time	~$375/month
DeepSeek-R1:70B (40GB)	g4dn.12xlarge	4	64GB	Mixed – Some responses in 10s, some timeouts	~$2,800/month
DeepSeek-V3 (405GB)	p5.48xlarge	8	640GB	Not feasible – Requires high-end EC2 setup	~$70,000/month

Key Takeaways from DeepSeek Deployment

High Costs – Running full-scale DeepSeek models privately can be prohibitively expensive (~$70K/month).
Performance Challenges – Lower-tier EC2 instances yield slow response times.
Requires Complex Setup – Deployment involves configuring Ollama for inference and NGINX for an OpenAI-compatible API.
Not Practical for Most Orgs – Due to cost and complexity, DeepSeek isn’t a realistic private AI option for most enterprises.

Evaluating OpenAI on Azure

Next, we looked at running OpenAI on Azure. OpenAI doesn’t provide access to their models directly but instead makes them available as an Azure resource. Azure provides access to a number of OpenAi models including GPT-3.5, GPT-4, and o-series models. Specific model availability varies by region and some models are available only by request. One of the advantages of going this route is that there is no direct server administration required. All setup, configuration, and any ongoing maintenance are handled in the Azure portal.

Keyhole Evaluation of OpenAI On Azure

To get started with this approach, we created an Azure OpenAI resource in the Azure portal with the required details and network security appropriate for the use case. Once the resource was created, we deployed the model of our choice in the Azure AI Foundry portal. Once the model was deployed, we were able to begin interacting with the model.

Note: This is just a high-level overview of the process. Microsoft has extensive documentation available on setting up and using OpenAI on Azure, but this gives an idea of how relatively easy it is to get started with this approach.

Pricing for OpenAI on Azure is either billed on demand or as PTUs (provisioned throughput units).

On-demand is currently billed at $0.0050 per 1000 input tokens and $0.0150 per 1000 output tokens, and
Provisioned Throughput Units, PTU, are billed at $260 per 1 PTU with a one-month reservation.
Here’s a handy calculator.

Client Highlight Using OpenAI on Azure

Keyhole has an enterprise healthcare client that has gone this route. As a large healthcare organization, they need the privacy and regulatory compliance that a private AI instance affords. This approach also allows them to easily scale their Azure resources to appropriately meet their needs.

As a general example, this healthcare client pays around $4,000 a month and has around 500 concurrent users and 10,000 total users.

Final Thoughts on Private AI Instances

DeepSeek is a relative newcomer in the world of AI and the release of its models provided competition to some of the more established organizations. OpenAI has often been a clear winner for organizations looking to run a private AI instance based on its cost, availability, and performance. With the release of DeepSeek, we wanted to see if it would be directly competitive with OpenAI as a viable alternative for running as a private AI instance.

From what we’ve found in our investigation, it doesn’t seem realistic for most organizations to run DeepSeek as a private instance. Because it is so new, DeepSeek doesn’t yet have the same level of infrastructure and support in place that OpenAI does.

AWS Bedrock is the Amazon offering similar to Azure AI services, but at the time of writing, AWS does not offer DeepSeek-V3 as a model option in Bedrock. It does look like that may be coming in the future, but it is not available currently. This is why we chose to test with an EC2 instance that would end up being much more expensive. If DeepSeek-V3 does become available in AWS Bedrock, this comparison would be worth revisiting. That would likely provide a much more comparable experience for DeepSeek as is currently available for OpenAI in Azure.

At this point, OpenAI on Azure is probably the best approach for most organizations looking to run a private AI instance. Pricing is reasonable, initial setup is relatively straightforward, and ongoing maintenance should be minimal. This makes it a scalable and secure option for enterprises.

As with much current AI technology, this is likely to continue to change and evolve. Organizations should stay informed on private AI hosting options for scalability, security, and cost efficiency. We hope this article has provided some insight into running a private AI instance for an organization as of the current state of things.

For organizations considering private AI deployment, the right choice depends on budget, scalability needs, and regulatory requirements. If you’re exploring private AI hosting, our team at Keyhole Software can help you navigate the best approach for your use case. Contact us to discuss your AI strategy today.

All Industries

Articles

Artificial Intelligence

AWS

Azure

Cloud