Innovation and Technology

Accelerating AI with Co-Processors

Published

12 months ago

May 9, 2025

Most chips today are built from a combination of customized logic blocks that deliver some special sauce, and off-the-shelf blocks for commonplace technologies such as I/O, memory controllers, etc. But there is one needed function that has been missing; an AI co-processor.

In AI, the special sauce has been the circuits that do the heavy-lifting of parallel matrix operations. However, other types of operations used in AI do not lend themselves well to such matrix and tensor operators and silicon. These scalar and vector operators for computing activations and averages are typically calculated on a CPU or a digital signal processor (DSP) to speed vector operations.

Designers of custom AI chips often use a network processor unit coupled with a DSP block from companies like Cadence or Synopsys to accelerate scalar and vector calculations. However, these DSPs also include many features that are irrelevant to AI. Consequently, designers are spending money and power on unneeded features.

Enter AI Co-Processors

Large companies that design custom chips address this by building in their own AI Co-Processor. Nvidia Orin Jetson uses a vector engine called PVA, Intel Gaudi uses its own vector processor within its TPCs, Qualcomm Snapdragon has its vector engine within the Hexagon accelerator, as does the Google TPU.

AI co-processors work alongside AI matrix engines in many accelerators today. But what if you are an automotive, TV, or edge infrastructure company designing your own AI ASIC for a specific application? Until now, you had to either design your own co-processor, or license a DSP block and only use part of it for your AI needs.

The New AI Co-Processor Building Block

Cadence Design has now introduced an AI co-processor, called the Tensilica NeuroEdge, which can deliver roughly the same performance of a DSP but consumes 30% less die area (cost) on an SoC. Since NeuroEdge was derived from the Cadence Vision DSP platform, it is fully supported by an existing robust software stack and development environment.

An AI SoC can have CPUs, AI block like GPUs, Vision processors, NPUs, and now AI co-processors to accelerate the entire AI workload. The new co-processor can be used with any NPU, is scalable, and helps circuit design teams get to market faster with a fully tested and configurable block. Designers will combine CPUs from Arm or RISC-V, NPUs from EDA firms like Synopsys and Cadence, and now the “AICP” from Cadence, all off-the-shelf designs and chiplets.

The AICP was born from the Vision DSP, and is configurable to meet a wide-range of compute needs. The NeuroEdge supports up to 512 8×8 MACs with FP16, 32, and BD16 support. It connects with the rest of the SoC using AXI, or using Cadence’s HBDO (High-Bandwidth Interface). Cadence has high hopes for NeuroEdge in the Automotive market, and is ready for ISO 26262 Fusa certification.

NeuroEdge fully supports the NeuroWeave AI compiler toolchain for fast development with a TVM-based front-end.

My Takeaway

With the rapid proliferation of AI processing in physical AI applications such as autonomous vehicles, robotics, drones, industrial automation and healthcare, NPUs are assuming a more critical role. Today, NPUs handle the bulk of the computationally intensive AI/ML workloads, but a large number of non-MAC layers include pre- and post-processing tasks that are better offloaded. Current CPU, GPU and DSP solutions required tradeoffs, and the industry needs a low-power, high-performance solution that is optimized for co-processing and allows future proofing for rapidly evolving AI processing needs. Cadence is the first to take that step.

Conclusion

In conclusion, the introduction of the Tensilica NeuroEdge AI co-processor by Cadence Design is a significant development in the field of AI processing. It addresses the need for a low-power, high-performance solution that is optimized for co-processing and allows future proofing for rapidly evolving AI processing needs. With its configurable design and support for a wide range of compute needs, NeuroEdge is poised to play a critical role in the development of AI applications in various industries.

FAQs

Q: What is an AI co-processor?
A: An AI co-processor is a specialized processor designed to work alongside AI matrix engines to accelerate scalar and vector calculations in AI applications.
Q: What is the Tensilica NeuroEdge AI co-processor?
A: The Tensilica NeuroEdge AI co-processor is a new AI co-processor introduced by Cadence Design, which delivers roughly the same performance as a DSP but consumes 30% less die area (cost) on an SoC.
Q: What are the benefits of using the NeuroEdge AI co-processor?
A: The benefits of using the NeuroEdge AI co-processor include low power consumption, high performance, and configurability to meet a wide range of compute needs.
Q: What industries can benefit from the NeuroEdge AI co-processor?
A: Various industries such as automotive, TV, edge infrastructure, autonomous vehicles, robotics, drones, industrial automation, and healthcare can benefit from the NeuroEdge AI co-processor.
Q: Is the NeuroEdge AI co-processor supported by a software stack and development environment?
A: Yes, the NeuroEdge AI co-processor is fully supported by an existing robust software stack and development environment, including the NeuroWeave AI compiler toolchain.