TildeOpen LLM: A New Era of European Language AI with Over 30 Billion Parameters

In the rapidly evolving landscape of artificial intelligence, a new contender has emerged, poised to redefine the capabilities and accessibility of Large Language Models (LLMs) for a significant portion of the global population. Tilde AI, a leading Latvian language technology firm, has officially launched TildeOpen LLM, an open-source foundational large language model with an impressive 30 billion parameters. This release is not merely an incremental update; it represents a strategic leap forward, particularly for European languages, aiming to bridge the linguistic divide often present in current AI technologies and bolster digital sovereignty within the European Union.

Addressing the European Language Gap

The current AI ecosystem, dominated by models trained predominantly on English, often struggles to deliver comparable performance for the diverse array of European languages. This linguistic bias leads to noticeable deficiencies, including awkward sentence structures, grammatical inaccuracies, and the misapplication of terms, especially when dealing with more complex or specialized tasks. TildeOpen LLM was developed with the explicit goal of overcoming these limitations. By focusing on languages frequently underrepresented in mainstream LLMs—such as those of the Baltic countries, Ukrainian, and Turkish—TildeOpen LLM promises a more accurate, nuanced, and culturally aware AI experience for speakers of these languages. Artūrs Vasiļevskis, CEO of Tilde, highlighted this crucial distinction, explaining that while popular commercial models may excel in English, TildeOpen was meticulously tailored to ensure robust performance across a wider spectrum of European tongues.

Security, Sovereignty, and EU Compliance

Beyond linguistic accuracy, TildeOpen LLM places a strong emphasis on data security and privacy, aligning with the European Union's stringent regulatory framework. A key feature is the model's capability to be hosted on a local server or within secure cloud storage. This self-hosting option provides organizations with direct control over their data, ensuring that sensitive information remains within their premises or a trusted, EU-compliant environment. This is a critical differentiator from many global commercial models that are typically hosted in data centers outside the EU, potentially posing challenges for compliance with regulations like the GDPR and the upcoming AI Act. Tilde's commitment to European data protection standards is a cornerstone of TildeOpen LLM's design, offering a trustworthy AI solution for businesses and public institutions operating within the EU.

A Foundation Built on European Supercomputing Power

The development of TildeOpen LLM was significantly enabled by access to cutting-edge European supercomputing resources. As a recipient of the European Commission's "Large AI Grand Challenge," Tilde was awarded substantial computational power, including approximately 2 million GPU hours on the LUMI supercomputer in Finland and the JUPITER supercomputer. This access to high-performance computing was instrumental in training a model of TildeOpen LLM's scale and complexity. The training process itself involved sophisticated techniques, utilizing EleutherAI-inspired GPT-NeoX scripts over roughly 450,000 update steps and consuming approximately 2 trillion training tokens. The training regimen incorporated a three-stage sampling strategy designed to ensure equitable representation across languages, balancing uniform distribution with natural language distribution to boost performance for high-data-volume languages while also rebalancing rarer language examples.

Open Source: Fostering Collaboration and Innovation

A defining characteristic of TildeOpen LLM is its open-source nature. Released under a permissive CC-BY-4.0 license, the model is freely accessible to a wide range of users, including national authorities, companies, scientists, students, and various industry sectors such as medical, financial, and insurance. This open approach is intended to democratize access to advanced AI technology, enabling developers to fine-tune the base model for specific assignments. For instance, organizations can develop custom AI assistants proficient in European languages or build specialized translation models. The open-source availability is expected to foster a vibrant ecosystem of innovation, encouraging community-driven development and the creation of tailored AI solutions across Europe.

Technical Architecture and Performance

TildeOpen LLM is built as a 30-billion-parameter dense decoder-only transformer. Its architecture features 60 transformer layers, an embedding size of 6,144, and 48 attention heads, with a context window capable of handling 8,192 tokens. The model employs SwiGLU activation functions, RoPE positional encoding, and RMSNorm layer normalization, design choices that favor efficient handling of long contexts and robust multilingual inference. The performance of TildeOpen LLM has been rigorously evaluated on various benchmarks. Notably, it sets a new state-of-the-art on the Belebele reading comprehension benchmark, achieving an average accuracy of 84.7%, outperforming leading models like Gemma-27B, ALIA-40B, and EuroLLM-22B. Its superiority is particularly evident in languages often underserved by global models, such as Icelandic and Finnish, where it demonstrates significantly higher accuracy. Furthermore, TildeOpen LLM exhibits remarkable efficiency gains in morphologically rich languages like Latvian and Lithuanian compared to models such as LLaMA-3, GPT-4o, and Mistral, making it a faster, more accurate, and sustainable alternative for European languages.

A Strategic Vision for European AI

The release of TildeOpen LLM is more than just a technological achievement; it represents a strategic vision for Europe's role in the global AI landscape. By developing its own world-class foundational models, Europe aims to reduce its dependence on AI solutions developed elsewhere, thereby enhancing its technological independence and fostering its own AI infrastructure. Tilde's CEO, Artūrs Vasiļevskis, emphasized this point, stating that "For Europe to be truly sovereign in AI, we must move beyond dependence on English-centric models built elsewhere." Tilde is already working on extending TildeOpen's context length and developing instruction-tuned versions for specialized European tasks, such as legal translation and e-government services, further solidifying its commitment to building a robust and inclusive AI ecosystem for Europe.

The Path Forward

TildeOpen LLM is positioned as a foundational model, serving as a base for future specialized AI solutions. Its open-source nature, coupled with its strong performance and focus on European languages, makes it a compelling option for researchers, developers, and organizations looking to leverage the power of AI while respecting linguistic diversity and data privacy. The model is available for download and use, inviting the global community to explore its capabilities and contribute to its ongoing development.