24th February 2015
The second generation of Xilinx’s Zynq family was finally announced yesterday (23th February 2015) with the tagline “Industry’s First All Programmable MPSoC”. Rather than upgrading to a Zynq-8000 range, the new generation is based on 16 nm UltraScale+ FPGA fabric, skipping the 20nm Ultrascale fabric. I have been using the Zynq to develop for the Nanostreams project, and a move to UltraScale could bring us even better performance per Watt.
Figure 1: UltraScale+ Zynq
After reading the press release I had one burning question “What is an MPSoC anyway?”. It turns out that MPSoC stands for Multiprocessor System on Chip. Xilinx have attended and presented the previous Zynq generation at the International Forum on MPSoC, the 15th of which is scheduled for July, so I’m not sure how they justify this being an industry first. I guess that Xilinx are emphasising the difference between their Multi-core Zynq-7000 series and the new devices which include a secondary Dual Core Real Time processor and a GPU in addition to the 64 bit Quad Core, all provided by ARM.
The UltraScale+ range deserves a blog post in itself, so I’ll just give an overview of the highlights for the new Zynq devices. All of the new devices use TSMC’s 16FinFET+ technology giving between 2-5X performance per Watt. The Zynq UltraScale+ Product Table contains a side by side comparison of the features in the range.
Firstly, bigger and better UltraRAM units with a capacity of 288 kb each in addition to the standard 18 kb BRAM. As someone who is used to moving data to off-chip memory and taking a huge performance hit, I am looking forward to having all that memory available with single cycle access. The largest Virtex device will have a total FPGA based BRAM/UltraRAM capacity of over 512 Mb, which is huge compared to the 132.9 Mb of the standard UltraScale devices. For the Zynq, the number is slightly less impressive at just 70 Mb total, but this is still much larger than the previous generation’s 3 Mb.
Dedicated PCIE Gen 3×16/Gen 4×8 and 100G Ethernet MACs on the chip will mean getting data to and from the chip should take no time at all. DSP Slices form the basis of the ALU in the Nanostreams processor, with a 32 bit processor using four DSPs. The most DSPs available in an UltraScale+ Zynq is 3528, which could support over 800 cores depending on routing.
One thing which is noticeably absent from the new UltraScale+ range is hard Floating Point units such as those that will be available in the new Altera 10 series. In fact, today (24th February) Altera started shipping early access samples of their 20 nm Arria 10 SoCs featuring these units. Hard IEEE 754 floating point units would be especially useful for Nanostreams to produce small and fast floating point capable processors. Altera are also planning on producing a 14 nm Stratix 10 SoC with a 64 bit Quad Core ARM Cortex-A53 processor which are expected late this year.
Figure 2: Processor Features
At a glance, the main features of the new Zynq architecture are:
The main processor of the Zynq has been upgraded from a 666/800 MHz Dual Core Cortex-A9 to a 1.3 GHz Quad Core Cortex-A53. This still puts it behind the power of some modern mobile phones which are using over 2 GHz Quad Core Snapdragons, but is a great improvement. Hardware virtualisation is also supported to give Terabyte memory access, but this is not the only processor on the new Zynq…
A 600 MHz Dual Core Cortex-R5 processor is also included on the device for real-time processor offload and energy efficient operation. If that isn’t enough for you, there is also an ARM GPU included which is OpenGL ES 2.0 compliant. Unfortunately it looks like the Mali-400 does not support OpenCL, so GPGPU is probably not going to take off on the Zynq just yet.
Efficient communication between the Processing System and the Programmable Logic has been difficult in the Zynq-7000, and I usually resort to flushing the cache and using the DDR as an intermediate step in the transfer. In the new Zynq there are 11 32/64/128b AXI ports and 1 32/64b port, and I’m looking forward to finding out how that compares to the previous system.
One of the main areas which Xilinx have been focusing on is the ease of programming their devices. According to their website they will be continuing to use the Vivado Design Suite with SDK, and the SDx family of environments can be used to develop in OpenCL or C/C++. This should at least be familiar to designers of the 7000 series, but it will be interesting to see how well the GPU and Real-Time processor integrates with the SDK development environment.
In summary there is much more to play with in the new UltraScale+ Zynq devices, and it is becoming much more like an entire heterogeneous compute node on a single chip. The Processing System has received a major upgrade, not only increasing the power of the existing application processor but adding a Real-Time processor and a GPU. With the new UltraRAM blocks and 16 nm technology in the Programmable Logic, it should be possible to get bigger FPGA designs running at higher clock frequencies to make the most of the PCIE and Ethernet interfaces. This announcement leaves me waiting eagerly for the release in the fourth quarter of this year.