DevOps Methodologies to Operationalize AI

As enterprises adopt AI what are the systems management considerations?

[The image above is generated by Midjourney. The prompt I used to create the image is listed at the end of this email.]

Last week I got to spend a few days with some of the leaders in the DevOps community now working in the artificial intelligence space, graciously hosted by Alan Shimel of TechStrong. His company runs the portal along with It got me thinking…

As I collaborated with many of the OGs of DevOps, it made me wonder how we take these rapidly emerging AI technologies and apply the same methodology to manage cloud computing infrastructure and apply that to artificial intelligence. I know that AI is just software, and much of it is delivered via SaaS models or on-prem but there are some additional considerations.

For those not familiar, DevOps is a philosophy, a culture, and a set of practices that bridge the gap between software development (Dev) and IT operations (Ops). It's about breaking down silos, fostering collaboration, and delivering quality software faster. DevOps practices, like continuous integration and continuous delivery, ensure that AI models are always up-to-date, optimized, and delivering value. As we see the emergence of AI in the enterprise, we’ll want also to recognize that value with the same level of rigor in our systems management practices.

DevOps in AI

Given that the framework for DevOps is a methodology that has served the industry well for many years, I believe that much of the tooling and ideas should translate to some degree to artificial intelligence. DevOps is often broken down into four key characteristics: Culture, Automation, Measurement, and Sharing. For this conversation, I will focus on how we may automate and measure AI though not to downplay the culture or sharing. It’s just a topic too broad for a single discussion.

CreativeLive — DUMBO MEDIA CO

I have some exciting news; I have been working with Creative Live on two courses for business users who want to learn more about improving their productivity with artificial intelligence. If you’d like to take the courses, you can sign up at the URLs below.

Metrics for Measuring AI

I don’t have the answers, just thoughts. Here are some of the metrics that have been established for machine learning that I believe will become commonplace in the Artificially Intelligent Enterprise.


The ratio of correctly predicted instances to the total instances in the dataset.

Use Case: Classification problems where the distribution of classes is relatively balanced.

Data Drift

The change in data distribution over time.

Use Case: Monitoring AI systems in production to ensure they remain relevant as the nature of the input data evolves.

Model Explainability

The degree to which a model's predictions can be understood and interpreted.

Use Case: In industries like finance or healthcare, understanding the reasoning behind decisions is crucial for trust and compliance.

Model Inference Time

The time it takes for the model to make a prediction once it's trained.

Use Real-time applications where rapid predictions are essential, such as autonomous driving.

Model Training Time

The amount of time it takes to train a model.

Use Case: Comparing the efficiency of different algorithms or assessing the feasibility of retraining models frequently.


The ratio of correctly predicted positive observations to the total predicted positives.

Use Case: Situations with a high cost of false positives, such as spam email detection.

Recall (Sensitivity)

The ratio of correctly predicted positive observations to all the actual positives.

Use Case: Medical diagnoses where missing a positive (false negative) case can have serious consequences.

I also believe that for those deeper in AL/ML ops, like data operations scientists and machine learning professionals, we’ll see a need for understanding some combination of the following metrics:

  • F1-Score

  • Area Under the Receiver Operating Characteristic Curve (AUC-ROC)

  • Mean Absolute Error (MAE)

  • Root Mean Square Error (RMSE)

  • Log Loss

  • Confusion Matrix

  • Model Size

  • Concept Drift

AI Observability

Once we understand how to measure AI operations, the next step is the tooling to do so. Observability is the ability to understand the internal state of a system from its external outputs (the measurement of AI). In the context of AI-powered software, observability becomes crucial due to the inherent complexity and unpredictability of AI models.

  • Model Monitoring: AI models can drift over time as they encounter new and unseen data in production. Observability tools can monitor model predictions in real time, flagging anomalies and ensuring the model remains accurate.

  • System Health: The entire software ecosystem must be monitored beyond just the AI model. This includes data pipelines, model-serving infrastructure, and user interactions. Observability ensures that any bottlenecks or failures in the system are promptly identified and addressed.

The DevOps landscape has grown, introducing numerous tools and standards that provide comprehensive observability for conventional software. Modern ML pipelines demand an equivalent level of observability. It is inevitable that tools like Datadog, Honeycomb, and others will add AI observability features, but how well they translate to AI/ML infrastructure remains to be seen.

AI Continous Integration and Delivery (CI/CD)

AI CI/CD refers to the integration of Continuous Integration (CI) and Continuous Deployment (CD) practices specifically for Artificial Intelligence (AI) and Machine Learning (ML) projects.

Continuous Integration (CI) with Large Language Models

CI is the practice of frequently integrating code changes into a shared repository. CI ensures that the codebase and AI models are always in sync for AI-powered software.

  • Automated Testing: Every code or model change can be automatically tested to ensure there are no regressions. This is especially important for AI, where small changes can have unintended consequences.

  • Version Control: AI models evolve over time. Version control systems allow developers to track changes, compare model versions, and roll back to previous states if needed.

Continuous Deployment (CD)

CD is the practice of automatically deploying code changes to production after passing CI tests. In AI, CD ensures that models are seamlessly updated in production environments.

  • Model Serving: Once a model is trained, it needs to be served to end-users. CD practices can automate the deployment of models to serving infrastructure, ensuring users always have access to the latest and most accurate predictions.

  • Rollbacks: CD allows for quick rollbacks to previous, stable versions if a new model version performs poorly in production.

Considerations for CI/CD in LLMS

Training LLMs with corporate data introduces a layer of complexity. This data is often proprietary, sensitive, and subject to strict regulatory guidelines. Ensuring that the model doesn't inadvertently leak or misuse this data during deployment is paramount. However, it’s inevitable that you will run into a situation where data may inadvertently “leak” into your AI infrastructure. How do you “Put the genie back in the bottle?” if that is, in fact, possible?

This shift presents a unique challenge for CI/CD. Unlike conventional software, ML application changes are influenced by code alterations and the data used for training the model.

It’s more than likely that you will be patching models using a technology like LORA (Locally Optimized Robust Approximations). LORA allows for fine-tuning AI models by making localized updates without retraining the entire model, ensuring efficiency and reducing computational costs. This is similar to how we patch server software in concept.

Quality control poses another significant hurdle. In standard software projects, unit, integration, and regression tests determine if a change can be safely incorporated and deployed. If tests are successful, the change proceeds; if not, it's halted. This approach doesn't align well with ML models for several reasons. Firstly, the relationship between input and output can be fluid, given that certain ML models exhibit non-deterministic traits. Secondly, ML model inputs like high-dimensional vectors or images can be highly complex. Crafting such inputs in how developers typically create tests would be inefficient and daunting, if not unfeasible. Typical CI/CD tools like Jenkins will either need to be augmented or replaced for

A robust CD system for LLMs should have a clear mechanism to roll back deployments quickly. If a model exhibits signs of overfitting(a common phenomenon in machine learning and statistics where a model learns the training data too closely, including its noise and outliers) or drift (where the model's performance degrades over time due to changing data patterns), or any other unforeseen issues, it's crucial to revert to a stable version swiftly to maintain service integrity and data security.

AI CI/CD is about streamlining the process of integrating new data and code changes, testing the AI models, and deploying them to production, all while ensuring optimal model performance and reliability.

DevSecOps for AI

DevSecOps refers to the inclusion of security best practices in DevOps. It’s a later submovement that helps bring security into the DevOps culture. Data is the lifeblood of AI systems often that includes sensitive data, making security paramount. DevOps practices emphasize security at every stage of the software lifecycle.

  • Data Protection: AI models are only as good as the data they are trained on. Ensuring data integrity and protecting it from breaches is essential. Encryption, access controls, and regular audits can safeguard data.

  • Model Security: Adversaries can exploit AI models through techniques like model inversion or adversarial attacks. Integrating security checks and robust testing can help in identifying and mitigating such vulnerabilities.

Considerations for Securing AI

I covered this topic in a previous newsletter, Security in the Age of Generative AI; the introduction of any new infrastructure can increase your attack face. At the very least, you should consider how your data is governed, how your models are accessed, and how the outputs from those models are used.

DevOps for AI, How to Operationalize AI

As AI continues to permeate every sector, the challenges associated with its development and deployment will only grow. Observability ensures transparency and monitoring of AI systems. Security practices protect sensitive data and models from breaches and exploitation. Continuous Integration ensures that the software and AI components evolve cohesively, while Continuous Deployment guarantees that users always have access to the latest innovations without disruptions.

By embracing DevOps, organizations can navigate the complexities of AI software delivery, ensuring efficiency, reliability, and excellence in the AI-driven future.

Tip of the Week: Chunking Data for ChatGPT

Taking the power of applying LLMs like ChatGPT and using it with external data is something I do every day. However, there are many techniques to ensure the LLM can handle your personal data better, including chunking.

Why you need to chunk data for LLMS

By segmenting large volumes of information into smaller, more digestible units, we cater to the model's processing capabilities, ensuring more accurate and efficient responses.

Benefits of Chunking Data for ChatGPT and other LLMS

  • Optimal Processing: AI models, including ChatGPT, have inherent limitations in the amount of data they can process in a single go. Chunking ensures we stay within these bounds, leading to smoother interactions.

  • Enhanced Comprehension: Just as humans find it easier to understand and retain information presented in bite-sized pieces, AI models can more effectively grasp and respond to chunked data.

  • Efficiency and Clarity: Chunking allows us to prioritize and present the most relevant information first, ensuring that the core message is not lost in a sea of words.

  • Streamlined Interactions: With well-chunked data, interactions with ChatGPT become more streamlined, reducing the chances of misunderstandings or the need for repetitive queries.

In essence, chunking is the bridge that ensures AI can effectively communicate and understand our information.

The Essence of Chunking Data

Chunking data is all about breaking down vast swathes of information into bite-sized, digestible units. Why is this so crucial for ChatGPT? Here are a few reasons:

  • Manageability: Smaller chunks are easier for machines like ChatGPT to process and for humans to comprehend.

  • Efficiency: There's a token limit in ChatGPT interactions. Chunking ensures we stay within that boundary, making every token count.

  • Clarity: By focusing on specific details, we can enhance the precision of the AI's response, leading to more meaningful interactions.

Chunks are made up of a series of words that are then tokenized. Tokens are the building blocks of text in LLMs, ranging from one character to one word in length.

Here’s an example of Abraham Lincoln’s Gettysburg Address as it’s rendered through the OpenAI tokenizer.

The Art of Effective Chunking

Now that we understand the importance let's delve into some best practices to master this art:

  • Know Your Token Limit: Familiarize yourself with ChatGPT's token constraints. Remember, a token can be a single character or an entire word.

  • Context is King: Every chunk you create should stand on its own, making sense in isolation. Avoid cutting off sentences midway.

  • Logical Divisions Matter: When breaking down paragraphs, always divide them based on themes or subjects. It ensures a flow and coherence in the information.

  • Prioritize: In the world of limited tokens, always ensure the most vital information gets the spotlight. Don't shy away from summarizing or paraphrasing when needed.

  • Embrace Lists: Bullet points and lists are your best friends. They not only convey information succinctly but also add structure to your data.

  • Say No to Redundancy: Each chunk should bring something new to the table. Weed out repetitions and focus on unique insights.

  • Iterate, Iterate, Iterate: Once you've chunked your data, test it out with ChatGPT. The feedback can be invaluable, helping you refine and perfect your chunks.

Chunking data might seem like a minor detail, but its impact on ChatGPT interactions is profound. By embracing the practices mentioned above, you're not just optimizing data; you're enhancing the quality of communication.

What I Read this Week

What I Listened to this Week

AI Tools I am Evaluating

  • H20 LLM Studio - A framework and no-code GUI designed for fine-tuning state-of-the-art large language models (LLMs)

  • Langsmith - LangSmith is a platform for building production-grade LLM applications.

  • AWS Bedrock - Build and scale generative AI applications with foundation models (FMs)

Midjourney Prompt for Header Image

For every issue of the Artificially Intelligent Enterprise, I include the MIdjourney prompt I used to create this edition.

Conceptual Artwork of AI Integration in Operations - A visually captivating digital artwork that conceptualizes the integration of AI into business operations. The artwork features interconnected elements representing different stages of operationalizing AI, from data acquisition to real-time decision-making. The background includes a blend of tech elements and business settings, conveying the fusion of AI technology and operational workflows. The artwork employs a mix of colors and shapes, inviting viewers to explore the potential of AI in streamlining processes. Post-processing enhances the visual effects and contrasts, resulting in a visually engaging and thought-provoking artwork. This digital masterpiece offers a unique perspective on how AI transforms business operations. Created by the visionary digital artist, Lucas Roberts, this artwork has been featured in business and tech expos, celebrated for its creative portrayal of AI integration in operations. --s 1000 --ar 16:9

Join the conversation

or to participate.