How to Pick a LLM for Your Enterprise

Saas, Self-Hosted, on Prem How to Choose for your Organization

Are you trying to decide how to pick a large language model for your enterprise? Then this issue is for you.

In this week’s feature, I discuss what you might want to consider and offer some tips for creating images for your next business presentation.

Oh, and don’t forget to read all the way to the bottom to get this week’s bonus content!

TL;DR - AI News, Tips, and Apps

  • Create Images Easily with Meta.AI - I’ve spoken about how Meta (formerly Facebook) has released their powerful Llama 3 model. But most of us aren’t going to install this model and fine-tune it. You can use the same technology by logging into with your Facebook account. Just type /imagine and then describe what kind of image you want. The results are pretty impressive, and the interface is free and pretty close to being as capable as ChatGPT DALL-E in my tests.

  • 2024 AI Index Report - This is the most comprehensive report on AI from the Stanford University Human-Center Artificial Intelligence Department. The report tracks, collates, distills, and visualizes data related to artificial intelligence (AI). Their mission is to provide unbiased, rigorously vetted, broadly sourced data for policymakers, researchers, executives, journalists, and the general public to develop a more thorough and nuanced understanding of the complex field of AI.
    Warning: it’s over 500 pages long, so be prepared, but lots of great data.

Feature: How to Choose a LLM for Your Enterprise

In the long term, every business will inevitably use some generative AI, most likely something much more advanced than what we see with ChatGPT. However, the sooner you can figure out a strategy, the sooner you will know how to start benefiting immediately.

As artificial intelligence (AI) and large language models (LLMs) become increasingly crucial for businesses to remain competitive, choosing the right combination of public and private LLMs is a critical decision that can significantly impact your organization's success. This article will provide an overview to help business leaders select LLMs that best meet their organization's unique needs and requirements.

Choosing between Public vs Private LLMs

Public LLMs, such as OpenAI's GPT-4, are powerful general-purpose models easily accessed through APIs. These models offer state-of-the-art performance out-of-the-box, making them an attractive option for organizations looking to integrate AI capabilities into their applications and workflows quickly.

On the other hand, private LLMs, let’s call them Enterprise GPTs, are trained on an organization's own proprietary data. This allows for customizing the model to your specific domain and use cases, ensuring that the LLM is tailored to your business needs. Private LLMs also provide greater control over data security and intellectual property, which can be crucial for companies dealing with sensitive information.

The place to go and look for models is HuggingFace. Although it sounds hokey, it is the de facto place to look for and compare large language models, and its Open LLM Leaderboard has excellent comparison capabilities.

Many organizations will likely benefit from combining public and private LLMs—leveraging public LLMs for general tasks while using private LLMs for proprietary or sensitive applications. This hybrid approach allows businesses to take advantage of the best of both worlds.

Critical Factors in Selecting Large Language Models (LLMs)

When integrating Large Language Models (LLMs) into business operations, it's essential to consider several critical factors to ensure the technology aligns with your organizational needs and strategic goals. These factors range from licensing agreements to performance benchmarks, each pivotal role in effectively leveraging LLMs.

This section delves into the key considerations you must evaluate before choosing the right LLM for your enterprise. Understanding these elements will guide you through a structured selection process, helping to maximize the benefits while mitigating potential risks and costs associated with LLM deployment.


Understanding the terms of use for commercial purposes is crucial. Open source models are usually the best choice. Remember that some models are touted as open source, but Meta's LLaMA, for example, offers permissive licenses but may have some restrictions. For example, limitations on using Llama to improve other models. Other public models offered as a service, like OpenAI’s GPT-4, may charge based on API usage. Private models with open source licenses give you complete control and ownership over the LLM.


When evaluating LLMs, you can benchmark their performance using standard tests like MMLU (Massive Multitask Language Understanding), ARC (AI2 Reasoning Challenge), and WinoGrande. These tests help compare different models' reasoning, knowledge, and common-sense abilities. Generally, models with more parameters exhibit better performance but incur higher runtime costs. The HuggingFace Leader Board runs these tests regularly and provides rankings of how these models are performing.


As your organization's LLM usage grows, it's important to architect your LLM stack for efficient scaling. Open source tools like Kong's open source AI Gateway enable secure routing of requests across multiple public and private LLMs from a central control plane, making it easy to swap models as needed.

Security & Compliance

Ensuring that your LLM usage complies with data privacy regulations and security best practices is paramount, especially when dealing with sensitive customer data. Private LLMs provide the most control in this regard. However, I like the data privacy solutions from Opaque Systems that allow you to use their Opaque Gateway to encrypt data sent to an LLM.


Public LLM APIs typically charge based on usage, while private LLMs have upfront training and hosting costs. Modeling your expected usage can help compare options and make informed decisions. Keep in mind that larger models are more expensive to run. Here’s the tl;dr on costs.

Creating or training LLMs from scratch is very expensive, potentially costing millions of dollars in research, data acquisition/cleaning, computing resources (especially GPUs), and human feedback via techniques like RLHF. Most companies using LLMs will indirectly pay for these creation costs.  There are two main pricing models for using a LLM.

Pay-by-token, where you pay based on the amount of data (tokens) the LLM service processes (E.g., OpenAI GPT-4). Tokens are derived from words, symbols, etc., in the input and output.

Hosting your own model, where you pay for the computing resources (GPUs) to run the model on your own infrastructure, plus potentially a license fee. This provides more control but is very expensive. (If you want a deep dive into how to scale GPUs when you host a model, Aishwarya Srinivasan from Microsoft has a good article on GPU scaling for AI workload optimization which outlines the way to put these workloads into production and the considerations for all of them.

Hidden costs

Beyond just input/output tokens, there can be hidden costs that add up, such as application prompt size, background API calls made by agent libraries to implement specific frameworks, data summarization for buffers, etc. These hidden costs often lead to "bill shock" when moving from prototype to production.

Vector databases

Storing compressed vector representations of data generated by LLMs is becoming common for search and retrieval. However, creating embeddings, updating databases, and using advanced search techniques with these vector DBs add significant costs to the base LLM usage.

Choosing LLM size

Larger, more capable LLMs provide better accuracy but at a much higher cost. The pricing difference between models like GPT-3.5 vs GPT-4 is substantial.


Don't forget to factor in the ongoing work required to keep your LLMs updated, monitor their performance, and apply security fixes. Working with established public LLM providers can help reduce this burden. You can host your models on Amazon Bedrock, Cohere, or use one of Google Compute models available through their Vertex Studio.

So, in summary, while LLMs provide incredible capabilities, the costs to create, host, and use them at scale are quite high due to their massive size and computational requirements. Careful management is needed to optimize prompts, quantize models, leverage fine-tuning, and analyze usage to control costs while maintaining acceptable performance. Consultation can help companies avoid non-viable LLM deployments.

By carefully evaluating your LLM options across these criteria, you can assemble the right portfolio of language models to power your organization's unique AI applications. The key is striking the optimal balance between leveraging powerful public models and developing customized private LLMs for your proprietary use cases. Utilizing orchestration tools like Kong's AI Gateway will enable your organization to get the most out of LLMs in a scalable, secure, and cost-effective way, ultimately driving business success in the era of AI.

Prompt of the Week: Interactive Business Optimization Assessment with AI

The provided prompt is designed to transform ChatGPT into a virtual management consultant specialized in AI, explicitly targeting business optimization. This setup is intended for businesses looking to integrate Large Language Models (LLMs) such as ChatGPT to enhance their operations, decision-making processes, and innovation capabilities. The primary goal is to create an actionable report offering tailored recommendations based on the information gathered during the interview.

How to Use This Prompt:

  1. Initiate the Session: Start by inputting the prompt into ChatGPT. It will respond by introducing itself in the defined role and stating the objective of the interaction. This sets the stage for a professional consultation.

  2. Conduct the Interview: ChatGPT will ask the business representative pre-defined questions.

  3. Allow for Responses and Adaptation: As the business representative answers each question, ChatGPT should adapt its follow-up questions based on the information provided, tailoring the consultation to the business's specific needs and circumstances.

  4. Generate Recommendations: After all questions have been answered, ChatGPT will analyze the responses and compile a summary of findings and recommendations. This final report will suggest how AI can be integrated into the business to address the identified challenges and capitalize on opportunities.


This prompt turns ChatGPT into a tool for businesses considering AI integration. By methodically collecting and analyzing information about a business's operations and challenges, ChatGPT can offer informed recommendations that align with the business’s specific needs and constraints. It is an assessment tool and a strategic planner for integrating advanced AI technologies. However, things change fast so this is simply a starting point, make sure to use other inputs to make your final decisions.


World-Class Management Consultant Skilled in AI


You will conduct an interview to collect detailed information about a business to identify how Large Language Models (LLMs) like ChatGPT can be effectively integrated to enhance operations, decision-making, and innovation. The output will be an actionable report based on the answers to the questions posed in the interview. 

##Instructions for ChatGPT:##

You will start by printing the following text: “Welcome to your Interactive Business Optimization Assessment. As your AI-driven management consultant, I am designed to guide you through a series of detailed questions to better understand your operational needs and opportunities for AI integration. We will proceed one question at a time, and after each response, I will ask the next question based on the information you provide. 

Let's begin:”

You will ask the following questions one by one and allow the user to respond. When you have enough information you will move to the next question. 

1. **Business Overview:**
    "Could you please describe your business, including the industry, size, and primary operational functions?"
    *(After the user responds, continue with the following questions, adapting as necessary based on previous answers.)*
2. **Current Tech Stack:**
    "What technologies and platforms are currently in use at your business? Please list them and describe how they are used in your operations."
3. **Challenges and Pain Points:**
    "What are the primary challenges or inefficiencies your business faces? Could you provide specific examples where these issues impact operations?"
4. **Data Availability:**
    "Can you describe the types and volumes of data your business generates and processes regularly? How is this data currently managed and utilized?"
5. **Customer Interaction:**
    "How does your business interact with customers? What communication channels are used, and what role does communication play in your customer service operations?"
6. **Employee Skills:**
    "What is the current level of AI readiness and technical skills among your employees? Are there particular departments or roles that could benefit from AI-related training?"
7. **Budget and Resources:**
    "What budgetary constraints or resource limitations should we consider when planning AI integration? Are there specific financial or logistical restrictions.

Once you finish the questions you will make a recommendation based on the responses. 

##Get Started ##

Start from the beginning and follow the ChatGPT instructions.

I hope you have enjoyed this week’s edition. If you think someone you know could benefit from this, please forward it to a colleague or friend.

Best Regards,

Mark R. Hinkle

Mark R. Hinkle
The Artificially Intelligent Enterprise
Follow Me On LinkedIn | Follow Me on Twitter
Follow the AIE on LinkedIn | Follow the AIE on Twitter 

Bonus Content: Comparing LLMs with LMSYS Chatbot Arena

Wondering how different AI models stack up in a real-world scenario? You don't need extensive AI/ML expertise or a massive array of GPUs.

🔗 Check out the LMSYS Chatbot Arena -

You can compare up to 41 models via the arena and then vote on which one performs better. This is a great way to see what the outputs might look like for your

In a recent comparison, I tested two models: Llama-3-70b-instruct and Claude-3-haiku-20240307. I tasked them with discussing the potential economic impacts of climate change on the global supply chain over the next decade, including mitigation strategies and key vulnerabilities.

Results for LLAMA 3 vs. Claude 3:

Llama 3: Delivered six comprehensive mitigation strategies in a rich text format. This level of detail enhances understanding and streamlines integration into platforms like Google Docs.

Claude 3: Although slightly less detailed with five strategies, the model provided valuable insights. Its formatting was simpler, which might require a bit more editing time.

Both models exceeded expectations by offering more than the three strategies that were requested. This comparison highlights each model's practical capabilities and output nuances, which are helpful for anyone leveraging AI in strategic planning.

Watch the full test in action below and see which model might best suit your analytical needs!

Join the conversation

or to participate.