Join senior executives in San Francisco on July 11-12 to learn how leaders are integrating and optimizing AI investments for success. Learn more
Refuel AI, a company using large language models (LLMs) to generate high-quality training data for AI models, sneaked out today with $5.2 million in seed funding. The company said it will use this cycle to grow its team and expand its platform’s capabilities, preparing it for its commercial launch in July.
Founded by Stanford graduates Nihit Desai and Rishabh Bhargava, Refuel has also opened up access to AutoLabel, an open source library that allows any AI team to easily label their data in their own environment and with any LLM of his choice.
>>Don’t miss our special issue: Building the Foundations of Customer Data Quality.<
The offerings address data challenges that are slowing the development of AI, preventing companies from integrating next-generation technology into their products and business functions.
Join us in San Francisco on July 11-12, where senior executives will share how they integrated and optimized AI investments for success and avoided common pitfalls.
Every AI business needs AI-ready data
Today, every business is striving to become an AI business, working with in-house experts and third-party vendors to develop models that can target different business-specific use cases. The task can be very difficult, but every AI project has the same starting point: clean and labeled data. If done well, the project can easily come to life.
Today, even though companies have a lot of data, not all of them are ready for training by default. Information must be cleaned and annotated for model training – a task typically handled by human teams and which takes weeks or even months. It just doesn’t match the demands of AI today.
“A lot of teams [we spoke to] had all these amazing ideas for the models they wanted to train and the products they wanted to build – if only they had the data ready for training. That’s when we knew that delivering clean, labeled data at the speed of thought was what we wanted to focus on,” Bhargava told VentureBeat.
So, in 2021, the duo launched Refuel and went on to build a dedicated platform that uses specialized LLMs to automate the creation and labeling of datasets (with equal or better quality than humans) for every business and every use case.
According to the company, enterprise users will be able to use the platform by simply uploading their datasets and asking LLMs to label the data. They could also give guidelines and some examples to ensure that only high-quality, training-ready data comes out.
“Within an hour, they (users) will have enough data to start training their AI models, which they can then seamlessly connect to their model training infrastructure. As these teams collect more data (especially from production), they can redirect it to Refuel for labeling, performance measurement, and improving their datasets for model retraining” , added the CEO.
In private beta testing by select companies, the offering was found to speed up the data creation and labeling process by up to 100%. Bhargava did not share the names of these companies, but noted that Refuel AI is attracting interest from multiple verticals, from social media and fintech to healthcare, HR and e-commerce.
The road ahead
With this round, co-led by General Catalyst and XYZ Ventures, Refuel plans to grow its engineering team from six to 12 members and further invest in the platform and its LLM infrastructure to prepare for a commercial launch of here at the end of July. The company will also invest the capital in its open source library and community.
“As a concrete example, we are running a competition to push the boundaries of LLM-based data labeling, with prizes up to $10,000,” Bhargava noted.
Currently, in the field of data labeling, the company competes with players such as Tasq AI, Snorkel AI and SuperAnnotate.
VentureBeat’s mission is to be a digital public square for technical decision makers to learn about transformative enterprise technology and conduct transactions. Discover our Briefings.