How to take advantage of great language models without breaking the bank

Visit our on-demand library to see sessions for VB Transform 2023. Sign up here

Generative AI continues to make headlines. At its beginnings, we were all taken by the novelty. But now we’re way beyond fun and games – we’re seeing its real impact on business. And everyone dives head first.

MSFT, AWS, and Google have waged an “AI arms race” in pursuit of dominance. Companies are rushing to pivot for fear of being left behind or missing out on a huge opportunity. New LLM-powered companies are emerging by the minute, fueled by VCs chasing their next bet.

But with every new technology comes challenges. The truth and bias of the model and the cost of training are among the topics of the day. Identity and security, though related to the misuse of patterns rather than inherent technology issues, are also starting to make headlines.

The cost of operating models, a major threat to innovation

Generative AI is also bringing back the good old debate between open source and closed source. While both have their place in the enterprise, open source offers lower deployment and release costs. They also offer great accessibility and a large selection. However, we now see an abundance of open source models but not enough technological advancements to deploy them in a viable way.


VB Transform 2023 on demand

Did you miss a session of VB Transform 2023? Sign up to access the on-demand library for all of our featured sessions.

Register now

All of this aside, there is one issue that still needs much more attention: the cost of running these large models in production (inference costs) poses a major threat to innovation. Generative models are exceptionally large, complex, and computationally intensive, making them much more expensive to run than other types of machine learning models.

Imagine you’re creating a home decor app that helps customers imagine their room in different design styles. With a few tweaks, the Stable Diffusion model can do this relatively easily. You opt for a service that charges $1.50 for 1,000 images, which might not seem like much, but what if the app goes viral? Let’s say you get 1 million daily active users who create ten images each. Your inference costs are now $5.4 million per year.

LLM Cost: Inference is Eternal

Now, if you’re a business deploying a generative model or an LLM as the backbone of your application, your entire pricing structure, growth plan, and business model must take these costs into account. By the time your AI application is launched, training is more or less a sunk cost, but inference is forever.

There are many examples of companies using these models, and it will become increasingly difficult for them to sustain these costs over the long term.

But while proprietary models have made great strides in a short time, they’re not the only option. Open source models also show great promise in terms of flexibility, performance and cost savings – and could be a viable option for many emerging companies in the future.

Hybrid World: Open Source and Proprietary Models Matter

There is no doubt that we have gone from zero to 60 in a short time with proprietary models. Over the past few months, we’ve seen OpenAI and Microsoft release GPT-4, Bing Chat, and endless plugins. Google also stepped in with the introduction of Bard. Progress in space has been nothing short of impressive.

However, contrary to popular belief, I don’t believe AI generation is a winner-takes-all game. In fact, these models, while innovative, only scratch the surface of what is possible. And the most interesting innovation is coming and will be open-source. Just as we’ve seen in the software world, we’ve reached a point where companies are taking a hybrid approach, using proprietary and open source models where it makes sense.

It is already proven that open source will play a major role in the proliferation of generation AI. There’s Meta’s new LLaMA 2, the latest and greatest. Then there’s LLaMA, a powerful but small model that can be recycled for a modest amount (around $80,000) and set instructions for around $600. You can run this model anywhere, even on a Macbook Pro, smartphone, or Raspberry Pi.

Meanwhile, Cerebras has introduced a family of models, and Databricks has deployed Dolly, an open-source ChatGPT-style model that’s also flexible and inexpensive to train.

Models, cost and power of open source

The reason we’re starting to see open source models take off is their flexibility; you can basically run them on any hardware with the right tooling. You don’t get this level of flexibility and control with closed proprietary models.

And all this happened in a very short time, and this is just the beginning.

We learned great lessons from the open source software community. If we make AI models freely available, we can better promote innovation. We can foster a global community of developers, researchers, and innovators to contribute, improve, and customize models for the greater good.

If we can achieve this, developers will have the choice to run the model that meets their specific needs – whether open source, out-of-the-box, or custom. In this world, the possibilities are truly endless.

Luis Ceze is CEO of OctoML.


Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including data technicians, can share data insights and innovations.

If you want to learn more about cutting-edge insights and up-to-date information, best practices, and the future of data and data technology, join us at DataDecisionMakers.

You might even consider writing your own article!

Learn more about DataDecisionMakers

Leave a Comment