Enterprise AI: Beyond the Model

The Illusion of Fine-Tuning

While vast resources are dedicated to training Large Language Models (LLMs) annually, a significant obstacle persists: the effective integration of these models into practical and valuable applications. Fine-tuning and Retrieval Augmented Generation (RAG) are commonly regarded as well-established methods for enhancing the knowledge and capabilities of pre-trained AI models. However, Aleph Alpha CEO Jonas Andrulis emphasizes that the reality is more intricate.

“A year ago, there was a widespread belief that fine-tuning was a magic solution. If an AI system didn’t perform as desired, the answer was simply fine-tuning. It’s not that simple,” he explained.

While fine-tuning can modify a model’s style or behavior, it’s not the most effective approach for imparting new information. The expectation that fine-tuning alone can resolve all AI application issues is a misconception. Fine-tuning primarily adjusts the model’s parameters based on a specific dataset, influencing its output style and bias. However, it struggles to introduce entirely new facts or concepts. The model learns to generate text that aligns with the fine-tuning data, but it doesn’t necessarily acquire a deep understanding of the underlying information. This limitation becomes apparent when the application requires the model to access and utilize knowledge beyond its pre-trained or fine-tuned data.

The challenge lies in the fact that LLMs are essentially pattern recognition machines. They excel at identifying and replicating patterns in the data they are trained on. Fine-tuning reinforces these patterns, but it doesn’t fundamentally alter the model’s architecture or its ability to reason about new information. Therefore, relying solely on fine-tuning for knowledge enhancement is often insufficient, particularly in enterprise settings where applications require access to a vast and constantly evolving knowledge base.

RAG: An Alternative Approach

RAG offers an alternative by functioning like a librarian that retrieves information from an external archive. This approach allows for updates and changes to the information within the database without retraining or fine-tuning the model. Additionally, the generated results can be cited and audited for accuracy.

“Specific knowledge should always be documented and not stored within the LLM’s parameters,” Andrulis emphasized.

RAG works by first retrieving relevant documents or passages from a knowledge base based on the user’s query. These retrieved documents are then combined with the query and fed into the LLM, which generates a response based on both the query and the retrieved information. This approach allows the model to access and utilize information that it was not explicitly trained on, effectively expanding its knowledge base.

The key advantage of RAG is its ability to keep the model’s knowledge up-to-date without requiring constant retraining. When new information becomes available, it can simply be added to the knowledge base, and the model will be able to access it through the RAG process. This is particularly important in enterprise settings where information is constantly changing and evolving.

However, RAG is not without its challenges. The success of RAG hinges on the quality and organization of the knowledge base. If the knowledge base is poorly documented or contains inaccurate information, the model will likely generate inaccurate or unhelpful responses. Furthermore, the retrieval process itself can be complex, requiring sophisticated techniques to identify the most relevant documents for a given query.

While RAG provides numerous benefits, its success hinges on the proper documentation of key processes, procedures, and institutional knowledge in a format that the model can understand. Unfortunately, this is often not the case. Many organizations struggle with maintaining comprehensive and up-to-date documentation, making it difficult to implement RAG effectively.

Even when documentation exists, enterprises may encounter issues if the documents or processes rely on out-of-distribution data – data that differs significantly from the data used to train the base model. For instance, a model trained solely on English datasets will struggle with German documentation, especially if it contains scientific formulas. In many cases, the model may be unable to interpret the data at all. The model’s ability to understand and process information is heavily influenced by the data it was trained on. When it encounters data that deviates significantly from its training data, it may struggle to extract meaningful information or generate accurate responses.

Therefore, Andrulis suggests that a combination of fine-tuning and RAG is typically necessary to achieve meaningful results. This hybrid approach leverages the strengths of both methods to overcome their individual limitations. Fine-tuning can be used to adapt the model to a specific domain or task, while RAG can be used to provide access to a broader knowledge base. By combining these two approaches, organizations can create AI applications that are both knowledgeable and adaptable.

Bridging the Divide

Aleph Alpha aims to distinguish itself as a European DeepMind by tackling the challenges that prevent enterprises and nations from developing their own sovereign AIs.

Sovereign AI refers to models trained or fine-tuned using a nation’s internal datasets on hardware built or deployed within its borders. This approach ensures data privacy, security, and control, which are crucial for many organizations and governments. The concept of sovereign AI is gaining traction as organizations and governments become increasingly concerned about the security and privacy of their data. By training and deploying AI models within their own infrastructure, they can maintain greater control over their data and ensure that it is not subject to foreign laws or regulations.

“We strive to be the operating system, the foundation for enterprises and governments to build their own sovereign AI strategy,” Andrulis stated. “We aim to innovate where necessary, while also leveraging open source and state-of-the-art technologies where possible.” Aleph Alpha’s approach is to provide the underlying infrastructure and tools that organizations need to build and deploy their own AI models, rather than offering a managed service or cloud-based solution. This allows organizations to maintain complete control over their data and AI infrastructure.

While this occasionally involves training models, such as Aleph’s Pharia-1-LLM, Andrulis stresses that they are not trying to replicate existing models like Llama or DeepSeek. Their focus is on creating unique solutions that address specific challenges. Aleph Alpha’s strategy is to focus on niche areas where they can provide unique value, rather than trying to compete directly with the large AI labs that are focused on training general-purpose models.

“I always direct our research to focus on meaningfully different things, not just copying what everyone else is doing, because that already exists,” Andrulis said. “We don’t need to build another Llama or DeepSeek because they already exist.”

Instead, Aleph Alpha concentrates on building frameworks that simplify and streamline the adoption of these technologies. A recent example is their new tokenizer-free, or “T-Free,” training architecture, which aims to fine-tune models that can understand out-of-distribution data more efficiently.

Traditional tokenizer-based approaches often require large quantities of out-of-distribution data to effectively fine-tune a model. This is computationally expensive and assumes that sufficient data is available. Tokenizers are used to break down text into smaller units, such as words or subwords, which are then used as input to the model. However, when the model encounters data that is significantly different from its training data, the tokenizer may struggle to break down the text into meaningful units, leading to poor performance.

Aleph Alpha’s T-Free architecture bypasses this issue by eliminating the tokenizer. Early testing on their Pharia LLM in the Finnish language showed a 70 percent reduction in training cost and carbon footprint compared to tokenizer-based approaches. This innovative approach makes fine-tuning more accessible and sustainable. By eliminating the tokenizer, the T-Free architecture allows the model to process raw text directly, making it more robust to out-of-distribution data.

Aleph Alpha has also developed tools to address gaps in documented knowledge that can lead to inaccurate or unhelpful conclusions.

For instance, if two contracts relevant to a compliance question contradict each other, “the system can approach the human and say, ‘I found a discrepancy… can you please provide feedback on whether this is an actual conflict?’” Andrulis explained. This type of interactive feedback loop allows the system to learn from human expertise and improve its accuracy over time.

The information gathered through this framework, called Pharia Catch, can be fed back into the application’s knowledge base or used to fine-tune more effective models. This feedback loop improves the accuracy and reliability of the AI system over time. By incorporating human feedback, the system can learn to identify and resolve ambiguities in the data, leading to more accurate and reliable results.

According to Andrulis, these tools have attracted partners like PwC, Deloitte, Capgemini, and Supra, who work with end customers to implement Aleph Alpha’s technology. These partnerships demonstrate the value and practicality of Aleph Alpha’s solutions in real-world applications. The fact that these large consulting firms are partnering with Aleph Alpha is a testament to the company’s innovative technology and its potential to transform enterprise AI.

The Hardware Factor

Software and data are not the only challenges facing Sovereign AI adopters. Hardware is another critical consideration.

Different enterprises and nations may have specific requirements to run on domestically developed hardware or may simply dictate where workloads can run. These constraints can significantly impact the choice of hardware and infrastructure. The choice of hardware can have a significant impact on the performance, cost, and security of AI applications. Organizations must carefully consider their hardware requirements when building and deploying AI models.

This means that Andrulis and his team must support a wide range of hardware options. Aleph Alpha has attracted an eclectic group of hardware partners, including AMD, Graphcore, and Cerebras. By supporting a wide range of hardware options, Aleph Alpha is able to meet the diverse needs of its customers.

Last month, Aleph Alpha announced a partnership with AMD to use its MI300-series accelerators. This collaboration will leverage AMD’s advanced hardware to accelerate AI training and inference. The partnership with AMD will allow Aleph Alpha to take advantage of AMD’s latest hardware innovations and further improve the performance of its AI models.

Andrulis also highlighted collaborations with Graphcore, acquired by Softbank, and Cerebras, whose CS-3 wafer-scale accelerators are used to train AI models for the German armed forces. These partnerships demonstrate Aleph Alpha’s commitment to working with diverse hardware providers to meet the specific needs of its customers. The collaboration with Cerebras is particularly noteworthy, as it demonstrates Aleph Alpha’s ability to work with cutting-edge hardware technologies.

Despite these collaborations, Andrulis insists that Aleph Alpha’s goal is not to become a managed service or cloud provider. “We will never become a cloud provider,” he stated. “I want my customers to be free and without being locked in.” This commitment to customer freedom and flexibility sets Aleph Alpha apart from many other AI companies. Aleph Alpha’s focus is on providing the tools and infrastructure that organizations need to build and deploy their own AI models, rather than offering a managed service that locks them into a specific platform.

The Road Ahead: Increasing Complexity

Looking ahead, Andrulis anticipates that building AI applications will become more complex as the industry shifts from chatbots to agentic AI systems capable of more sophisticated problem-solving.

Agentic AI has gained significant attention over the past year, with model builders, software developers, and hardware vendors promising systems that can complete multi-step processes asynchronously. Early examples include OpenAI’s Operator and Anthropic’s computer use API. These agentic AI systems represent a significant advancement in AI capabilities. Agentic AI systems are designed to be more autonomous and proactive than traditional AI systems. They can perform complex tasks without requiring constant human intervention.

“Last year, we primarily focused on straightforward tasks like document summarization or writing assistance,” he said. “Now, it’s becoming more exciting with things that, at first glance, don’t even appear to be genAI problems, where the user experience is not a chatbot.” This shift towards more complex and integrated AI applications presents new challenges and opportunities for the industry. As AI applications become more complex, they will require more sophisticated tools and techniques to build and deploy.

Key Challenges in Building Enterprise AI Applications:

  • Bridging the gap between model training and application integration: Effectively translating the capabilities of LLMs into practical applications remains a significant hurdle.
  • Overcoming the limitations of fine-tuning: Fine-tuning alone is often insufficient for teaching AI models new information or adapting them to specific tasks.
  • Ensuring the quality and accessibility of data: RAG relies on well-documented and readily accessible data, which is often lacking in many organizations.
  • Handling out-of-distribution data: AI models must be able to handle data that differs from the data they were trained on, which requires specialized techniques.
  • Addressing hardware constraints: Different enterprises and nations have varying hardware requirements that must be taken into consideration.
  • Maintaining data privacy and security: Sovereign AI requires ensuring that data is processed and stored securely within a nation’s borders.
  • Developing agentic AI systems: Building AI applications that can perform complex multi-step processes asynchronously is a challenging but promising area of research.

Key Opportunities in Building Enterprise AI Applications:

  • Developing innovative AI solutions: The challenges in building enterprise AI applications create opportunities for developing innovative solutions that address specific needs.
  • Leveraging open source technologies: Open source technologies can help reduce costs and accelerate the development of AI applications.
  • Collaborating with hardware partners: Collaborating with hardware partners can help ensure that AI applications are optimized for specific hardware platforms.
  • Building sovereign AI capabilities: Sovereign AI can provide nations and organizations with greater control over their data and AI infrastructure.
  • Transforming industries with AI: AI has the potential to transform industries by automating tasks, improving decision-making, and creating new products and services.

The Future of Enterprise AI Applications:

The future of enterprise AI applications is likely to be characterized by:

  • Increased complexity: AI applications will become more complex and integrated, requiring specialized expertise and tools.
  • Greater focus on data quality: Data quality will become increasingly important as AI applications rely on accurate and reliable data.
  • More emphasis on security and privacy: Security and privacy will be paramount as AI applications handle sensitive data.
  • Wider adoption of agentic AI: Agentic AI systems will become more prevalent as organizations seek to automate complex tasks.
  • Continued innovation: The field of AI will continue to evolve rapidly, leading to new breakthroughs and opportunities.

By addressing the challenges and embracing the opportunities, organizations can harness the power of AI to transform their businesses and create a better future.