NeuReality Reshapes AI Economics | en

Enterprises and service providers are deploying AI applications and agents at a record pace and focusing on delivering…

NeuReality, a pioneer in reimagining AI inference architecture to meet the demands of today’s AI models and workloads, announced that its NR1 inference appliance now comes pre-loaded with popular enterprise AI models, including Llama, Mistral, Qwen, Granite 1, and support for private Generative AI clouds and on-prem clusters. This generative and agentic AI-ready appliance is up and running in less than 30 minutes, providing a 3X speed to value, allowing customers to innovate faster. Current PoCs (Proof of Concepts) demonstrate up to a 6.5X increase in token throughput at the same cost and power envelope compared to x86 CPU-based inference servers, making AI accessible and usable in a more affordable way for businesses and governments of all sizes.

Inside the appliance, the NR1® chip is the first true AI-CPU, built for inference orchestration – the management of data, tasks and integrations – with built-in software, services and APIs. It not only consolidates traditional CPU and NIC architectures into one but also encapsulates 6X the processing power onto the chip to keep pace with rapidly evolving GPUs, while eliminating traditional CPU bottlenecks.

Paired with any GPU or AI accelerator inside the appliance, the NR1 chip delivers breakthrough cost, energy and real estate efficiencies, critical for widespread enterprise AI adoption. For example, comparing the same Llama 3.3-70B model and the same GPU or AI accelerator setup, NeuReality’s AI-CPU driven appliance achieves a lower total cost per million AI tokens than x86 CPU-based servers.

“No one questions the immense potential of AI. The challenge is making AI inference deployments economically viable,” said Moshe Tanach, CEO and co-founder of NeuReality. “NeuReality’s disruptive AI-CPU technology removes the bottlenecks, enabling us to deliver the additional performance needed to unleash the full power of GPUs, while orchestrating AI queries and tokens, thus maximizing the performance and ROI of these expensive AI systems.”

Tanach continued, “Now, we are taking ease of use to the next level with an integrated silicon-to-software AI Inference appliance. It comes pre-loaded with AI models and all the tools to help AI software developers deploy AI faster, easier, and cheaper than ever before, enabling them to shift resources to applying AI in their business rather than laboring over infrastructure integration and optimization.”

A recent study found that approximately 70% of enterprises report using Generative AI in at least one business function, signaling increased demand. However, only 25% have fully AI-enabled processes, achieving broad adoption, and only a third have begun implementing limited AI use cases.

Today, CPU performance bottlenecks on servers managing multimodal and large language model workloads are a major factor contributing to low GPU utilization rates, as low as 30-40% on average. This results in a waste of expensive silicon in AI deployments, as well as an underserved market that still faces complexity and cost barriers.

“Enterprises and service providers are deploying AI applications and agents at a record pace and are focused on delivering performance in an economical fashion,” said Rashid Attar, senior vice president of Engineering, Qualcomm Technologies, Inc. “By integrating the Qualcomm Cloud AI 100 Ultra accelerator with NeuReality’s AI-CPU architecture, users can achieve new levels of cost efficiency and AI performance without compromising ease of deployment and scale.”

NeuReality’s NR1 appliance is already deployed with cloud and financial services customers and is purpose-built to accelerate AI adoption through its economics, accessibility and space efficiency for on-prem and cloud Inference-as-a-Service options. In addition to new pre-loaded Generative and Agentic AI models and new releases every quarter, it is fully-optimized with pre-configured software development kits and APIs for computer vision, conversational AI, or custom requests supporting a wide array of business use cases and markets (e.g. financial services, life science, government, cloud service providers).

The first NR1 appliance unifies an NR1® module (PCIe card) with a Qualcomm® Cloud AI 100 Ultra accelerator.

NeuReality will be exhibiting at InnoVEX (co-located with Computex) in Taipei, Taiwan from May 20-23, 2025, at the Israeli Pavilion, Hall 2 booth S0912 (near the central stage). The Company will be hosting live demonstrations of the NR1 Inference appliance including migrating a chat application in minutes as well as performance demonstrations of the NR1 chip running Smooth Factory Models and DeepSeek-R1-Distill-Llama-8B.

Founded in 2019, NeuReality is a pioneer of dedicated AI inference architecture powered by the NR1® chip – the first AI-CPU built for inference orchestration. Based on an open, standards-based architecture, the NR1 is fully-compatible with any AI accelerator. NeuReality’s mission is to make AI accessible and universally available by lowering the barriers associated with exorbitant costs, power consumption, and complexity while leveraging its disruptive technology to broaden AI inference adoption. The company has 80 employees in its facilities in Israel, Poland and the U.S.

1 AI models pre-loaded and optimized for enterprise customers include: Llama 3.3 70B, Llama 3.1 8B (Llama 4 series coming soon); Mistral 7B, Mistral 8x7B and Mistral Small; Qwen 2.5, including Coder (Qwen 3 coming soon); DeepSeek R1**-**Distill-Llama 8B, R1 Distill-Llama 70b; and Granite 3, 3.1 8B (Granite 3.3 coming soon).

NeuReality’s AI Revolution: A Convergence of Performance, Cost-Effectiveness, and Ease of Use

As Artificial Intelligence (AI) continues to permeate various industries, enterprises are grappling with the challenge of deploying AI inference solutions in a manner that is both economically viable and efficient. NeuReality is disrupting the AI economics through its innovative approach focusing on providing out-of-the-box, instant LLM (Large Language Model) access, while significantly reducing the total cost of AI inference. NeuReality’s flagship product, the NR1 inference appliance, offers businesses unprecedented levels of performance, cost-effectiveness, and ease of use by optimizing the AI inference architecture coupled with pre-loaded popular enterprise AI models.

The NR1 Inference Appliance: A Game-Changer

At the heart of the NR1 inference appliance lies NeuReality’s purpose-built AI-CPU, which acts as a centralized control hub for data, tasks, and integrations. Unlike traditional CPU and NIC architectures, the NR1 chip consolidates these components into a single unit, reducing bottlenecks and maximizing processing power. This integrated approach enables the chip to keep pace with rapidly evolving GPUs while optimizing AI queries and tokens for enhanced performance and return on investment.

AI Out-of-the-Box: Streamlining Deployment

To further enhance ease of use, the NR1 inference appliance comes pre-loaded with popular enterprise AI models, including Llama, Mistral, Qwen, and Granite. This feature eliminates the complexities of configuration and optimization, allowing AI software developers to focus on applying AI within their businesses rather than spending time on infrastructure integration. The appliance can be up and running in less than 30 minutes, providing customers with rapid time-to-value.

Affordable AI: Accelerating Adoption

NeuReality’s technology enables businesses to access and utilize AI in a more affordable manner by delivering a lower total cost per million AI tokens than x86 CPU-based servers. This cost-effectiveness is crucial for businesses and governments of all sizes, as it lowers the barriers to AI deployment and enables broader adoption.

Qualcomm Technologies Collaboration: Unlocking New Levels of Performance

The strategic partnership between NeuReality and Qualcomm Technologies further enhances the capabilities of the NR1 inference appliance. By integrating the Qualcomm Cloud AI 100 Ultra accelerator with NeuReality’s AI-CPU architecture, users can achieve new levels of cost efficiency and AI performance without compromising ease of deployment and scale. This collaborative approach showcases NeuReality’s commitment to leveraging cutting-edge technologies to optimize AI inference solutions.

Addressing Enterprise AI Challenges: Improving GPU Utilization

NeuReality is addressing a significant challenge faced by enterprises: CPU performance bottlenecks on servers that are decreasing GPU utilization. Traditionally, servers managing multimodal and large language model workloads experience low GPU utilization rates, averaging around 30-40%. This low utilization results in a waste of expensive silicon in AI deployments and limits AI adoption in underserved markets. NeuReality’s AI-CPU technology addresses this issue by eliminating performance bottlenecks, enabling businesses to fully leverage the capabilities of their GPUs in AI applications.

Meeting the Demand for Generative AI: Increased Utilization

NeuReality’s solutions are well-positioned to capitalize on the rapidly growing generative AI market. Recent studies indicate that approximately 70% of enterprises report using generative AI in at least one business function. However, only 25% have fully AI-enabled processes, achieving broad adoption. NeuReality’s NR1 inference appliance empowers enterprises to accelerate their generative AI initiatives by removing the barriers to adoption through improved ease of use, cost-effectiveness, and performance.

Ease of Use: Lowering Deployment Barriers

In addition to performance and cost-effectiveness, ease of use is a key driver of NeuReality’s AI solutions. The NR1 inference appliance simplifies the deployment process and reduces the need for infrastructure integration by coming pre-loaded with both AI models and software development kits. This ease of use empowers AI software developers to focus on building and deploying innovative AI applications rather than spending time managing complex infrastructure.

Broad Applications: Multiple Verticals

NeuReality’s NR1 inference appliance is designed to support a wide array of business use cases and markets. The appliance is fully-optimized with pre-configured software development kits and APIs for computer vision, conversational AI, and custom requests. This versatility makes the NR1 inference appliance suitable for various industries, including financial services, life science, government, and cloud service providers.

Accelerating AI Adoption: Affordability, Accessibility, and Space Efficiency

NeuReality’s NR1 appliance facilitates AI adoption by simultaneously offering affordability and accessibility, making it appropriate for on-prem and cloud infrastructures. Many organizations have struggled to scale their AI initiatives because of the high costs and complexity of AI, but NeuReality’s solutions address these barriers by providing a cost-effective, open-standard platform for simplifying the development and deployment of AI.

Demonstration Highlights

NeuReality will be demonstrating its NR1 inference engine at InnoVEX, a tech show co-located with Computex in Taipei, Taiwan from May 20-23, 2025 to showcase its capabilities. At the event, the Company will demonstrate the ease of migrating a chat application in minutes and demonstrate the performance of the NR1 chip running Smooth Factory Models and DeepSeek-R1-Distill-Llama-8B.

Continuous Innovation: Future-Proofing

NeuReality is committed to continuously enhancing the capabilities of its NR1 inference appliance through regular releases of new Generative and Agentic AI models and optimized software development kits. This continuous innovation enables businesses to stay on the leading edge of AI technology and ensures that their AI infrastructure is optimized for future workloads.

NeuReality: Empowering Enterprises to Harness the Potential of AI

NeuReality’s disruptive AI-CPU technology offers a cost-effective way to deploy inferencing AI deployments that maximizes the performance of GPUs, while orchestrating AI queries and tokens, for peak performance and ROI. As NeuReality continues to innovate and expand the capabilities of its NR1 inference appliance, it will serve as a key ally for enterprises seeking to thrive in the burgeoning world of AI.

By combining a focus on performance, cost-effectiveness, and ease of use with a commitment to continuous innovation, NeuReality is positioned to reshape the AI economics and empower businesses of all sizes to harness the potential of AI.

updated at 2025-05-16

# AI # LLM # Agent