
SoftBank has unveiled a new AI-for-RAN architecture that applies Transformer-based models to live 5G radio processing, reporting significant throughput and latency gains, and anchoring its broader AI-RAN product strategy (AITRAS) that converges AI and RAN on shared, GPU-accelerated infrastructure. The work extends earlier white-paper and field-trial efforts and formalizes a deployment blueprint that pairs AI signal-processing with an orchestrated, virtualized AI-and-RAN platform spanning central and edge domains.
What SoftBank announced
SoftBank developed a high-performance Transformer AI model for RAN processing and demonstrated up to 30% uplink throughput improvement in over-the-air tests, with latency reductions versus prior CNN-based approaches and further simulated gains for downlink in mobility. The company positions this as an AI-for-RAN breakthrough within an end-to-end architecture that includes AI-enhanced channel interpolation, MU-MIMO user pairing, and SRS prediction, and that is deployable on its AITRAS platform using NVIDIA Grace and Grace Hopper systems.
- Live/OTA result: ~30% uplink throughput increase in 5G with Transformer-based processing.
- Latency: ~26% lower average processing latency (~338 µs reported) vs. a CNN baseline in live tests.
- Simulated mobility: up to ~31% downlink throughput improvement for moving terminals.
- Additional AI-for-RAN functions: AI-driven uplink channel interpolation, MU-MIMO user pairing optimization, and SRS prediction, with lab/system-level improvements in the 9–20% range per function.
- Platform: AITRAS integrates AI and RAN workloads under a common virtualized GPU-centric architecture (gRAN), orchestrated centrally and distributable to edge for local AI demand.
Why Transformers for RAN
Transformers provide long-range dependency modeling and scalable parallelism that can outperform CNNs in sequence modeling tasks, and SoftBank shows that, with careful optimization, Transformer inference can meet stringent RAN real-time constraints while improving throughput and reliability in noisy, fast-changing radio conditions. This addresses a long-standing trade-off between model accuracy and latency in practical 5G deployments.
Architecture overview: AI-for-RAN on AITRAS
SoftBank’s AI-RAN concept converges RAN and AI workloads on a shared, GPU-accelerated, software-defined platform (gRAN), enabling dynamic allocation of compute between baseband (L1–L3) and AI inference tasks, managed by an “AITRAS Orchestrator” for resource optimization across central and distributed sites.
- Virtualized layout: Management clusters in core; workload clusters distributed toward edge for real-time execution; GitOps and centralized registries for lifecycle management.
- Hardware stack: NVIDIA GH200 Grace Hopper Superchips for combined CPU/GPU acceleration; NVIDIA Grace CPU Superchip for high-capacity CU deployments; support for MIG/MPS to partition GPU for multi-tenant AI/RAN concurrency.
- Coexistence: AI and vRAN coexist on the same servers, enabling shared capacity and flexible placement of DU/CU functions alongside AI apps (e.g., LLM, RAG, robotics) via serverless APIs.
- Orchestration: AITRAS Orchestrator dynamically allocates resources between AI and RAN based on workload demands, improving flexibility and utilization across the fleet.
- Standards/ecosystem: SoftBank drives the AI-RAN Alliance and aligns work with 3GPP and O-RAN Alliance directions to accelerate adoption and interoperability.
AI-for-RAN function details
SoftBank details three representative AI signal-processing functions that combine to deliver end-to-end cell and user throughput gains, validated in lab/system-level tests and field trials.
- Uplink channel interpolation (Transformer/MLP variants): Interpolates channel under low-SNR/noisy conditions to recover performance; lab tests show ~20% UL throughput improvement vs. without AI, and OTA evaluations report ~8% vs. a conventional CNN baseline.
- MU-MIMO user pairing optimization: Uses AI to optimize pairing/scheduling in multi-user MIMO; cell throughput improved by ~9% in system-level simulations using MLP scheduling.
- SRS prediction for beamforming: Predicts and interpolates SRS to maintain beamforming performance at wider SRS intervals; user throughput improved by ~13% in system-level simulations.
These modular gains contribute to the headline uplink improvement in live trials when integrated within a Transformer-driven pipeline, with additional downlink benefits in mobility scenarios shown via simulation.
Performance results at a glance
- Live OTA (5G): +30% uplink throughput versus prior methods using a Transformer-based AI model; achieved while meeting RAN latency targets.
- Latency: ~338 µs average processing latency, ~26% faster than CNN baseline in the same test conditions.
- Mobility simulation: up to +31% downlink throughput for moving terminals under the new architecture.
- Lab/system simulations for sub-functions: ~20% UL gain (channel interpolation, lab), ~9% cell throughput gain (MU-MIMO pairing), ~13% user throughput gain (SRS prediction).
Together, these results argue for immediate operational value from AI-enhanced baseband processing, particularly in dense, noisy, and high-mobility scenarios where traditional algorithms face reliability limits.
Deployment models: centralized and distributed AI-RAN
SoftBank describes two complementary deployment patterns under AITRAS to match traffic and AI demand profiles:
- Centralized evolution: CU capacity roughly doubles when implemented on NVIDIA Grace CPU Superchip servers, supporting higher centralized processing loads and pooling efficiencies.
- Distributed evolution (D-RAN configured AITRAS): Co-locates DU and CU on a single GH200 server in areas with increasing local AI demand (e.g., enterprise campuses), enabling low-latency AI services alongside RAN.
This flexibility allows operators to place AI models where they deliver maximum value, while keeping within power and latency constraints.
Operations and resource management
AITRAS applies cloud-native patterns to telecom operations, enabling multi-tenant AI-and-RAN coexistence with carrier-grade controls.
- Resource isolation: GPU partitioning (MIG/MPS) to guarantee slices for L1/L2/L3 and AI inference; policy-driven scheduling to protect RAN SLOs first.
- Observability: Telemetry across token-level/graph metrics for AI and per-subframe latency/throughput for RAN, feeding the orchestrator for dynamic rebalancing.
- Lifecycle: GitOps for versioning, reproducible rollouts, and rapid model iterations; alignment with O-RAN/3GPP for interface and functional compliance.
Roadmap and industry context
SoftBank’s December 2024 AI-RAN white paper laid the foundation for gRAN and AITRAS and previewed field pilots and commercialization timelines; 2025 updates show on-air cells, ecosystem partnerships (e.g., Nokia and Red Hat), and concrete orchestration components moving toward production readiness. The operator also co-founded the AI-RAN Alliance to coordinate R&D and adoption across vendors and operators, reflecting an industry pivot to AI-native RAN for 5G-Advanced and 6G paths.
- 2024–2025 milestones: AI-RAN white paper; on-air trials; press and briefings with new technology development; platform evolution on Grace/Grace Hopper; orchestrator maturation.
- Ecosystem: Collaborations with Nokia and Red Hat to validate coexistence, virtualization stacks, and real-world operational models.
- Commercialization aim: Progressing from pilots toward broader rollout by mid-decade, with performance and energy-efficiency targets comparable to current RAN while adding AI workloads.
What this means for operators
- Immediate value levers: Deploy AI-for-RAN functions (channel interpolation, MU-MIMO pairing, SRS prediction) to capture throughput and reliability gains, especially in congested or noisy cells.
- Platform shift: Move toward shared AI-and-RAN GPU servers at edge and central sites, using orchestrators to dynamically reallocate resources without compromising RAN SLOs.
- Risk and readiness: Validate Transformer inference latency end-to-end on target hardware; ensure GPU partitioning and preemption policies keep L1/L2 deterministic; integrate observability that spans AI and RAN performance.
- Standards alignment: Track AI-RAN Alliance activities and O-RAN/3GPP directions to ensure interoperability and a supply chain for AI-native RAN components.
Key materials
- News and industry reports: SoftBank’s new AI architecture for RAN delivers ~30% uplink throughput in live tests and reduced latency using Transformer models, with simulated downlink gains for mobility scenarios.
- Technical briefings: SoftBank’s AI-RAN white paper and February 2025 briefing detail AITRAS system elements, GPU platform choices, centralized/distributed deployment patterns, and measured gains for AI signal-processing functions.
- Platform overview: AITRAS Orchestrator optimizes compute allocation in real time between AI and RAN workloads, with ecosystem demos from Nokia and Red Hat showing coexistence and carrier-grade feasibility
SoftBank’s new AI architecture for Radio Access Networks: inside the design, results, and roadmap was originally published in Data Science in Your Pocket on Medium, where people are continuing the conversation by highlighting and responding to this story.