BeatpulseLabs, a London-based AI data infrastructure startup, has raised £1.3 million in a pre-seed round led by Araya Ventures and Lighthouse Ventures, with participation from Alumni Ventures and Avalancha Ventures. Founded by Jason Rieff and Nikolay Vitanov, the company turns expert human judgment into high-fidelity training datasets for multimodal AI models.
Most multimodal AI models are trained on poor training data, limiting their ability to perform reliably in real-world environments. BeatpulseLabs addresses this through two integrated core offerings: dataset preparation and dataset provision. The dataset preparation service transforms existing multimedia content libraries into enterprise-grade training datasets by cleaning, structuring, labelling, validating, enriching, and formatting raw speech, music, and video assets. The dataset provision service supplies ready-made and custom, rights-cleared datasets for companies that need high-quality training data without starting from their own archive. The platform combines exclusive, licensed datasets with human-in-the-loop annotation and deep metadata enrichment. The result is enterprise-grade, context-rich data built for model training, fine-tuning, and reinforcement learning, which shortens training time, helps improve model accuracy, and reduces hallucinations.
The announcement comes as BeatpulseLabs has recorded 10x revenue growth over the first half of 2026, reflecting strong enterprise demand for high-fidelity, custom AI training datasets. The round is described by the company as a strategic step rather than a capital necessity, with the funding providing additional capacity to expand into new domains.
The rise of multimodal AI systems in enterprise settings has created growing demand for training data that reflects the complexity of the real world. As companies build increasingly sophisticated models, the limiting factor is no longer access to raw training data but the ability to encode human judgment in the context of specific use cases. BeatpulseLabs is positioned to address this gap as a foundation data infrastructure layer.
Enterprise AI doesn't fail in testing. It fails when it meets the real world. BeatpulseLabs closes that gap by building training data around how each business actually operates. We proved this approach in some of the most demanding multimodal domains such as music, video and speech. The same logic applies anywhere the margin for error is low, from robotics to knowledge work. Using generic training data is like letting a confident stranger make decisions for your business. We do not recommend it.
AI models are only as capable as the data they are trained on. Today, too much training data is generic, messy, and shallowly labelled, chosen because it's easy to access rather than being fit for purpose. We're building the missing data layer: transforming raw multimedia content into structured, annotated, model-ready datasets that help AI systems understand context, not just patterns. The old approach of throwing broad labels onto available content is no longer enough for the next generation of AI.
BeatpulseLabs is tackling one of the most fundamental bottlenecks in Enterprise AI today: creating datasets beyond scale and general-purpose labelling, by embedding Subject Matter Expertise product-specific workflows, and high-fidelity human judgement directly into the data that powers Enterprise AI models.








