12th December 2014

Recently Ben wrote a blog post on floating point arithmetic and how it can result in problems with accuracy when accumulating these. As he says, this is because you lose precision when storing floating point values as 32 bit numbers, but how can you avoid this problem? Well, one easy way is to increase to double precision floating point. The reason why single precision floating point numbers have issues is because they represent the mantissa or significand in 24 bits including one “hidden” bit. This only gives you about 7 decimal places to represent your number, so if you add 100,000,000 to 1 you get 100,000,000 again.

Double precision floating point numbers use 53 bits for the mantissa and therefore have almost 30 bits more to represent the value, which equates to 9 decimal places more than single precision. For almost all applications double precision floating point will be sufficient because you rarely require more than 16 decimal places, and anything less than this can be ignored as insignificant. So problem solved, use double precision floats! Well, not quite, when we migrate this to an FPGA there are three main issues with using double precision values:

- Data Transfer
- Latency
- Resource Utilisation

This is fairly straightforward, when using single precision floats we needed to transfer 32 bits of data for each value. Now we need to transfer 64 bits for each value, which will take twice the length of time. If the design is fully pipelined this additional transfer time can be hidden by processing the first batch of data while we transfer the second batch.

First we should distinguish between throughput and latency. The throughput is the rate at which we can process data, the average number of samples which we can get a result for per second. Latency is how long it takes between giving the input samples and getting the output samples, the length of time it takes to process one sample. For single precision floating point this is significantly less than for double precision, which may not be a problem unless you need an answer really fast like in High Frequency Trading.

Resources are important in FPGA designs, lower utilisation means that it is easier to get a higher clock frequency, which increases throughput and reduces latency. It also has the benefit that any spare resources can be used to implement a second processing pipeline for better parallelism. Double precision floats will require more resources than single precision floats, meaning less parallelism and potentially a lower frequency.

So it seems like a compromise, you can use single precision floats for less accuracy but better performance, or double precision for more accurate but slow designs. However, there is a third option which may be capable of giving the same accuracy as double precision at even higher performance than single precision! Fixed point arithmetic is essentially the same as integer arithmetic so it is very resource friendly, and will result in low latency designs.

The one downside is that it does not have the same dynamic range that floating point values do as illustrated in the diagram below. Floating point will give you a constant number of significant figures in your value, and it adjusts as the value it represents gets larger. Fixed point values have a constant number of decimal places and don’t adjust, in this example I have chosen to use 40 fractional bits (about 12 decimal places).

As can be seen from this graph, fixed point values of 0.00001 matches single precision floating point, and double precision is matched at 10,000. If we use a total of 64 bits, we can represent values up to about 16 million (represented by the vertical line) with a precision of , while double precision manages about at this value.

The range is not a problem for many applications, because large ranges are not often required, and instead we require a minimum standard of precision (exactly what fixed point provides). In the event that your values do have a large range it is often the case that intermediate values can be stored and/or calculated using fixed point to improve performance. So for FPGA designs you should start by looking for places where you can use fixed point to save resources and reduce latency.