While not a particularly novel field, natural language processing (NLP) has gained significant attention recently, in large part due to the ChatGPT generative AI hype train.
There’s a tangible sense that AI’s entry into the mainstream is just around the corner thanks to other NLP models like Hugging Face’s Transformers and Google’s LaMDA, which is slated to power its ChatGPT-rival Bard.
But it’s simple to overlook all the effort that goes into establishing the underlying AI models and getting them ready for mass consumption for those who type a few keywords into ChatGPT to make it generate lyrics in the vein of Nick Cave.
Developers require a ton of high-quality training data that is precisely “labeled,” a method of categorizing raw data to help machines to understand and learn from it, in addition to algorithms, to generate NLP models.
A platform has been developed by the German startup Kern AI for NLP developers and data scientists to not only control the labeling process but also automate and orchestrate tangential tasks and enable them to address low-quality data that comes their way. Several companies exist substantively to power this labeling process.
NLP is currently one of the hottest trends in AI, so Kern AI today announced that it has raised €2.7 million in seed funding to accelerate its recent growth. Commercial clients using NLP include the insurance companies Barmenia and VHV Versicherungen, logistics companies like Metro Supply Chain Group subsidiary Evolution Time Critical, and venture-backed startups like Crowd.dev. The business also claims that data scientists from organizations like Samsung and DocuSign have used its basic open-source incarnation.
Seedcamp and Faber jointly led Kern AI’s seed round, which also included Xdeck, Another.vc, and a few angel investors.
Established in Bonn in 2020, co-founder and CEO Johannes Hötter acknowledged that developers require more control and flexibility over the NLP development process and stated that he started the company “with the expectation that NLP will evolve into a key digitizing technology.”
The company’s flagship tool, Refinery, is open source and gives developers the ability to take a data-centric approach to develop NLP models by semi-automating their labeling, identifying low-quality datasets in their training data, and monitoring all of their data in a single interface.
Moreover, Bricks, which is also open source, is a set of modular, standardized “code snippets” that developers can incorporate into Refinery; according to the firm, this is the “application logic driving your NLP automation.”
Hötter stated that internal tooling for businesses is a common real-world use case for the Kern AI platform. For instance, a logistics provider would have to respond to a client’s request to “please ship 20 pallets to our facility in Gothenburg by tomorrow 4 pm”; such urgent demands must be resolved right away.
The logistics organization might use Kern AI to automatically identify the request’s needs and intent to coordinate incoming requests with their transport management system (TMS).
According to Hötter, who spoke with TechCrunch, “this is accomplished by syncing the service inbox with our commercial product process, which then feeds the data to Refinery.” Here, developers can examine the request using NLP techniques before pushing the structured extracted data right to their TMS.
Hence, in some aspects, it functions similarly to Zapier, but it is designed for more sophisticated natural language processing rather than using a rules-based methodology.
In reality, there are already a ton of platforms like this, covering both the proprietary and open-source worlds. They include Labelstudio’s Labelex and Argilla, which both recently closed seed rounds of funding totaling $1.6 million and $25 million, respectively. Then there is Snorkel AI, a private product that has received about $135 million in funding throughout its existence.
What precisely is Kern AI doing differently, then? According to Hötter, it is now the only “open-core and modular full stack” available. By so, he means that its platform can be used to either build whole data-centric NLP apps in their entirety, or it can be utilized as a developer-focused add-on plugged into other labeling platforms like Labelstudio.
This means that if you’re a startup looking to develop an advanced NLP product and require a great solution to produce the data, for instance, you can utilize Refinery as the application to easily manage and build your training data.
As an alternative, Hötter continued, “You may also use the algorithms of Refinery to install real-time API and to orchestrate complete operations, covering the entire value chain. Our platform is modular because we want to give modern NLP breakthroughs to data teams independent of their present tech stack.
With a new $2.9M in the bank, Hötter said Kern AI plans to expand the platform’s feature set to cover more workflows, such as those involving audio and document-based data, and build products for a much wider range of industry use cases. Previously, Kern AI had only raised a small €550K pre-seed round of funding. Hötter added that they will speed up efforts to make the free, personal tier generally accessible since it is presently only accessible via invitation.
Image Credit: Kern AI
News Source: TechCrunch