The New CAP Theorem for B2B GenAI Apps

Our learnings for enterprise IT, product and engineering teams looking to develop B2B GenAI applications for their business.

Updated:

May 15, 2024

Authored by:

Vijay Rayapati

CEO @ Atomicwork

Over the last two years of Atomicwork’s journey of building a GenAI platform to deliver a Modern ITSM solution and Conversational ESM product, we’ve had to make painful choices between Cost, Accuracy and Performance, often trying to optimize for all of them.

Traditionally a popular theorem in the context of distributed systems, CAP (Consistency, Availability and Partition Tolerance) takes on a whole new meaning in the world of GenAI apps. As it stands today, building GenAI apps requires thinking along the dimensions of Cost, Accuracy and Performance—the new CAP.

Through this blogpost, I want to share our learnings for Enterprise IT, product and engineering teams working to adopt, develop or deploy GenAI applications for their business—especially with the context of Cost, Accuracy and Performance.

Cost

Cost in GenAI applications refers to the operating cost of AI models in production and not the resources required to develop, deploy, and maintain AI systems. Cost is a primary dimension as most businesses will be consuming AI Models via APIs and LLM calls that are expensive.

While evaluating AI Models, we have learned that a single model approach will not work for B2B use cases. It is important to categorize the existing landscape of LLM models into 3 segments:

Performance models: Fast token prediction and throughput. These are mostly smaller models like Mixtral8*7B or Llama3-7B.
High accuracy models: Much better reasoning capabilities. Mostly SOTA models like GPT-4 or Claude 3 Opus.
Cost efficient models: Decent performance with good reasoning capabilities. This includes models like Cohere Command-R or GPT 3.5 Turbo

For Enterprise IT, product and engineering teams, understanding the cost implications of different approaches is crucial. Balancing cost with the desired level of accuracy and performance is a delicate dance that requires careful planning based on your Gen AI use cases.

Accuracy of AI models

Accuracy is the measure of how well an AI system performs its intended task or use case.

All AI models are not the same in accuracy even if they are trained with similar size data/tokens. For B2B GenAI applications, accuracy is paramount, especially in critical domains like HR, legal, healthcare, finance, and business systems.

Achieving high accuracy requires not just using state-of-the-art models, but also building robust data pipelines, quality training data, and in some cases even sophisticated fine-tuning.

Modern IT, product and engineering teams must constantly strive to improve the accuracy of their AI models while being mindful of the trade-offs between cost and performance. Improving accuracy as a primary dimension will increase the latency and cost significantly.

Performance of GenAI apps

Performance in GenAI applications encompasses speed, scalability, and efficiency.

A high-scalable AI system can easily process large volumes of data quickly and adapt to changing business data through RAG architecture. However, delivering results in real-time or low latency for end users will become a challenge.

Most product and engineering teams face the challenge of optimizing model performance without compromising accuracy or incurring excessive costs. This often involves fine-tuning for domain specificity, leveraging parallel processing across models, and utilizing specialized hardware beyond GPUs like Groq to meet low-latency and high-performance requirements.

At the end of the day, compromising on one parameter over the other two comes down to deeply understanding the domain, the user persona, the use case, and the cost of making an error.

Let’s look at a few examples from support:

In financial services, such as banking or investment advisory, the accuracy of the information provided and the performance of the AI in handling complex queries in real time are critical. These queries might involve transactions, financial advice, or regulatory compliance information. The potential cost of inaccuracies (such as wrong advice or transaction errors) can be very high, leading to financial losses or legal consequences. Therefore, companies in this domain might prefer to invest more in advanced AI technologies to ensure high accuracy and quick, effective performance, even if the initial and ongoing costs are significant.
For e-commerce platforms dealing with high volumes of basic customer inquiries such as order status, shipping updates, or product availability, it might be acceptable to compromise slightly on accuracy for the sake of lower operational costs and fast response times. The majority of queries do not require perfect accuracy as the consequences of minor mistakes are typically not severe (e.g., slightly incorrect product recommendations or minor errors in stock levels). Here, the focus is on handling large volumes efficiently, maintaining low costs, and optimizing the end-user experience via low wait times.
In healthcare, particularly in non-emergency contexts like patient data management or scheduling appointments, the accuracy of information is crucial to ensure proper patient care and compliance with health regulations. However, the speed of interactions might be less critical. Businesses can prioritize cost-effective AI solutions that maintain high accuracy while perhaps working slightly slower, tolerating longer response times to ensure data is handled correctly and cost-effectively.

3 key implications for IT and engineering Teams

For Enterprise IT, product and engineering teams working on GenAI applications, understanding the interplay between cost, accuracy, and performance is essential to know that improving one of these dimensions could adversely impact the other.

Here are 3 key implications to consider based on our production experience over the last 2 years:

Pick your 2 dimensions of CAP: You cannot optimize for all three dimensions of CAP as of today. Higher accuracy will impact your performance because you must use larger/SOTA models, which would increase overall costs, while lowering costs through Groq specific model could lower performance. IT and Tech Teams need to develop a clear strategy that balances cost constraints with the need for high accuracy and performance. This involves setting realistic goals with business based on use cases along with prioritizing tasks that focus on your most important dimension and making informed decisions.
Continuous optimization: Improving the cost, accuracy, and performance of AI systems is an ongoing process because all AI models are not the same and you need to develop a good evaluation system based on your use cases. Teams should regularly evaluate their models, data pipelines, and infrastructure to identify areas for optimization and enhancement. Smaller models when hosted on Groq can deliver a great performance with low latency but might constrain the performance as you apply enterprise AI guardrails for prompt security, data validation and AI safety checks.
Collaboration: Collaboration between Enterprise IT, product and engineering teams is crucial for addressing the challenges posed by the CAP theorem in GenAI applications. By working together, teams can leverage their respective expertise to design and implement AI solutions that meet the desired objectives. Don’t get excited by a new model's benchmark data, always verify with your evaluation system for use case performance.

By understanding the implications of this new CAP theorem for AI and adopting a strategic and collaborative approach, enterprise IT teams can navigate the complexities of developing advanced AI systems that deliver value while managing TCO effectively.

Co-authored by Aparna Chugh, Head of Product at Atomicwork.

Originally published on Medium.

Get a demo