DMR News

Advancing Digital Conversations

Open Source LLMs Set to Strengthen Europe’s Digital Sovereignty

ByHilary Ong

Feb 20, 2025

Open Source LLMs Set to Strengthen Europe’s Digital Sovereignty

The OpenEuroLLM project is set to transform the landscape of artificial intelligence in Europe by developing a series of foundation models aimed at achieving transparent AI. Jan Hajič, a computational linguist from Charles University in Prague, and Peter Sarlin, CEO and co-founder of Silo AI, lead this ambitious venture. With a substantial budget of approximately €7 billion, the project unites 20 organizations, including academic institutions and corporations. Collaborating with EuroHPC supercomputer centers across Spain, Italy, Finland, and the Netherlands, OpenEuroLLM aspires to create open-source models for companies to utilize in building AI applications throughout Europe.

The project’s goal is to foster digital sovereignty within Europe by ensuring that mission-critical infrastructure and tools remain close to home. OpenEuroLLM’s scope encompasses all European Union languages, including the 24 official EU languages and those of countries negotiating entry to the EU market. Data from Common Crawl, an open repository of web-crawled data, alongside additional datasets, will support this initiative. These models will be trained on 4.5 petabytes of web crawls and over 20 billion documents, ensuring they are transparent and explainable through high-performance computing.

Collaboration for Innovation and Linguistic Diversity

OpenEuroLLM builds upon the foundation laid by the HPLT project, which has already developed datasets and models for European languages. The project’s budget primarily covers personnel costs, with compute expenses expected to be managed through partnerships with EuroHPC centers. Unlike aiming to outpace Big Tech or billion-dollar AI startups, OpenEuroLLM seeks to offer a genuinely open-source alternative for Europe. The open-source nature extends to the data used for training these models, especially that sourced from Common Crawl.

“We hope that most of the data [will be open], especially the data coming from the Common Crawl,” – Jan Hajič

Jan Hajič emphasizes the importance of transparency and compliance with AI regulations in Europe. While the project’s ultimate goal is to keep everything open, certain constraints may necessitate adjustments.

“We would like to have it all completely open, but we will see. In any case, we will have to comply with AI regulations.” – Jan Hajič

The focus remains on quality over quantity, ensuring that any models released are high-quality and well-developed. As substantial public funds from the European Commission are involved, delivering robust outcomes is of high stakes.

“We want to have it as small but as high-quality as possible. We don’t want to release something which is half-baked, because from the European point-of-view this is high-stakes, with lots of money coming from the European Commission — public money.” – Jan Hajič

OpenEuroLLM also addresses linguistic diversity challenges within Europe. The project aims to establish true benchmarks for languages with scarce digital resources, ensuring cultural representation within AI models.

“That is the goal, but how successful we can be with languages with scarce digital resources is the question,” – Jan Hajič

“But that’s also why we want to have true benchmarks for these languages, and not to be swayed toward benchmarks which are perhaps not representative of the languages and the culture behind them.” – Jan Hajič

Collaborative Approach and Long-Term Goals

The OpenEuroLLM project draws inspiration from recent successes in AI within Europe, notably by small focused teams like Mistral AI and LightOn.

“Europe’s recent successes in AI shine through small focused teams like Mistral AI and LightOn — companies that truly own what they’re building,” – Stasenko

Stasenko highlights how these companies maintain accountability in financial decisions, market positioning, and reputation.

“They carry immediate responsibility for their choices, whether in finances, market positioning, or reputation.” – Stasenko

Collaboration remains a cornerstone of the OpenEuroLLM approach. Andre Martins advocates for open collaboration among diverse communities to prevent redundancy and maximize expertise sharing.

“I hope the different communities collaborate openly, share their expertise, and don’t decide to reinvent the wheel every time a new project gets funded,” – Andre Martins

Jan Hajič echoes this sentiment, emphasizing the advantages of collaborative efforts over those driven by single entities.

“I’ve been involved in many collaborative projects, and I believe it has its advantages versus a single company,” – Jan Hajič

Combining academic expertise with corporate focus could result in innovative outcomes that rival achievements by major players such as OpenAI and Mistral.

“Of course they’ve done great things at the likes of OpenAI to Mistral, but I hope that the combination of academic expertise and the companies’ focus could bring something new.” – Jan Hajič

Ultimately, success for OpenEuroLLM does not hinge on becoming the leading model; rather, it lies in delivering a robust solution built entirely within Europe.

“I hope this won’t be the case, but if, in the end, we are not the number one model, and we have a ‘good’ model, then we will still have a model with all the components based in Europe,” – Jan Hajič

What The Author Thinks

The OpenEuroLLM initiative represents an essential step in ensuring Europe’s autonomy in the AI space. While the challenges, particularly in linguistic diversity and ensuring high-quality models, are considerable, the project’s emphasis on collaboration, transparency, and regulatory compliance shows a forward-thinking approach. Success will be measured not by surpassing tech giants but by fostering sustainable, open-source AI that serves Europe’s unique needs.


Featured image credit: Pickpic

Follow us for more breaking news on DMR

Hilary Ong

Hello, from one tech geek to another. Not your beloved TechCrunch writer, but a writer with an avid interest in the fast-paced tech scenes and all the latest tech mojo. I bring with me a unique take towards tech with a honed applied psychology perspective to make tech news digestible. In other words, I deliver tech news that is easy to read.

Leave a Reply

Your email address will not be published. Required fields are marked *