FPGA Mining Overview

The word FPGA stands for Field Programmable Gate Array. The word distinguishes FPGA’s from ASIC’s. An ASIC is a custom chip that cannot be reconfigured after it is manufactured. Once it is shipped and ‘in the field’, an ASIC’s function is fixed. An FPGA gets its name because it can be ‘reprogrammed in the field’, after it has been shipped to a custom or seen use in the real world. The word ‘gate array’ comes from the fact that within the FPGA are millions of logic ‘gates’. In most CPU’s and ASIC’s, these gates are hard-wired to perform specific logic functions on binary (1,0) values, such as AND, OR, NOT, XOR, and so on, producing a 1-bit output from two 1-bit inputs. Operating such as addition and multiplication are created by combining hundreds or thousands of basic logic gates together.

An FPGA uses something called LUT’s (look-up tables) to simulate logic gates. Each lookup table can be reprogrammed to perform an arbitrary ‘boolean function’. A boolean function is a function that takes several binary inputs and produces a single binary output (1 or 0). Most modern FPGA’s use 6-input lookup tables (LUT6), which are lookup tables that have 6-inputs. Older FPGA’s use 4-input lookup tables.

Each FPGA also contains millions of registers, which are 1-bit memory elements each storing the value 0 or 1. Unlike ordinary computer memory, FPGA registers are spread out within the FPGA fabric, and mixed in with the lookup tables, the clock signals, and other ‘routing’ resources that allow arbitrary ‘wires’ to connect all these things together in whatever way the programmer wants. Once properly configured, every processing element in the FPGA can work simultaneously and in parallel. This is in contrast to CPU’s which perform sequential operations. Graphics cards (GPU) achieve higher parallel operation than CPU’s, but still not as high a degree of parallelism compared to FPGA’s.

Due to the vast difference between an FPGA and a classical CPU or GPU, FPGA’s must be programmed in a special type of programming language. The two languages that are used are Verilog and VHDL. These are called ‘Hardware Description Languages’ (HDL). Programming an FPGA in an HDL language is typically referred to as ‘RTL programming’, which stands for ‘register-transfer level’. What this means is that a programmer who programs the FPGA at the ‘RTL’ level is able to fully control every individual item inside the FPGA for maximum performance and customization. General purpose processors (CPU’s, GPU’s) are programmed in higher level languages, like C, C++, Java, and Python. Since ‘high level’ languages are far easier to learn, and to use, many attempts have been made to create a system that allows programming of FPGA’s using ‘high level’ languages. This started with a university project called Handel-C in the late 1990’s. Today it has evolved into several software packages such as Vivado HLS (high-level synthesis), and a language called OpenCL. While these ‘high level’ FPGA languages do work for some applications like artificial intelligence, they do not work at all for cryptocurrency mining. To mine cryptocurrency competitively, FPGA’s must be programmed on the lowest possible level which is the RTL (register transfer level) using hardware description languages such as Verilog and VHDL.

While Verilog and VHDL are structurally very similar (essentially identical), their syntax varies dramatically. VHDL is typically used in universities, while ‘real world’ companies and programmers typically use Verilog. Verilog has the same syntax as the C programming language, and a Verilog program takes up less than half the text space that a similar program in VHDL would take.  All of Zetheron’s mining software is programmed in Verilog.

Manufacturers OF FPGA’s

The FPGA was invented by a company called Xilinx, in 1985. Soon after, a competitor company called Altera appeared with similar products. Through the 1990’s and 2000’s, Xilinx and Altera were head-to-head in the FPGA market. There were a few other companies such as Actel that offered low-end FPGA’s, but Xilinx and Altera were on the cutting edge of the high end products. Then, in December 2015, Intel bought Altera, and the combined company is typically called ‘Intel-FPGA.’

While Xilinx and Intel-FPGA both produce high end products, the inner architecture of their FPGA’s is very different. When programming on the lowest (RTL) level, if a designer wanted to ‘port’ his software from a Xilinx FPGA to an Intel FPGA, he faces a difficult and time consuming task. The same is even true within the product offerings of each company. Even porting code from one Xilinx FPGA to another can be a huge task, because the code ‘expects’ that the chip has a certain exact number of registers and lookup tables, and a different FPGA (even in the same family) will have a different number of registers and lookup tables, meaning the original software will not work on a different FPGA without major modifications.  With this in mind, never expect to take any type of FPGA software and ‘run it’ on different or ‘random’ FPGA hardware that you might find available at a bargain price. You must purchase the exact hardware that the software was designed for.

FPGA Models

The highest end FPGA’s from Intel are currently the Stratix-10 family. In the Xilinx family, you have various FPGA families that are broken down by their silicon architecture:

Xilinx Ultrascale+ family                (Virtex Ultrascale+, Kintex Ultrascale+, Zynq Ultrascale+: 16nm technology)

Xilinx Ultrascale family                  (Virtex Ultrascale, Kintex Ultrascale: 20nm technology)

Xilinx 7 series                    (Virtex-7, Kintex-7, Artix-7: 28nm technology)

Performance Metrics

The silicon technology (16nm, 20nm, 28nm) affects the maximum clock speed of the FPGA and its power efficiency. Since FPGA’s were originally designed for use in space, satellites, and communication, those applications require a high level of reliability. FPGA’s are designed for high internal safety margins. On the other hand, when mining cryptocurrency, it actually doesn’t matter if the hardware produces the occasional error. A single or occasional ‘bad hash’ or ‘bad share’ is not a problem. Because of this, it is possible to ‘overclock’ the FPGA way beyond what the manufacturer expects. Similarly, the FPGA design software will ‘report’ a predicted value for the maximum clock speed that the software can operate at; but that prediction is based on high-reliability type applications. Because cryptocurrency mining can accept occasional errors, crypto software can push FPGA’s far beyond their design specifications in terms of clock frequency, power and heat. These ‘problems’ must be dealt with accordingly, and you can read more about that in the Performance and Hardware Modifications sections.

Each FPGA has a certain amount of internal RAM memory available, and if a program needs more memory than that, it must access slower, external memory. External memory is usually DDR3/DDR4 SDRAM, QDR SRAM, or RLDRAM. FPGA’s perform best when their software can perform ALL required operations using only the internal memory. In that sense, the amount of internal memory inside the FPGA is a very important metric. The Xilinx Virtex Ultrascale+ VU9P FPGA that Zetheron supports has 360Mb (360 Mega-BITS of internal memory), which is equal to 360/8 = 45MB (Mega-BYTES) of internal memory. Be careful when reading FPGA datasheets, as they will almost always express memory in Mb (Megabits) rather than MB (Megabytes), and there is a factor of 8 difference between the two units.

One of the cards Zetheron supports is the VCU1525 from Xilinx. With a single VU9P FPGA on the card, the FPGA has 45MB of internal memory; in addition, it also has 64GB (4 x 16GB) of external DDR4-2400 ECC memory. Some algorithms such as Ethash require large (1GB+) amounts of memory. While a card like the VCU1525 can theoretically mine any algorithm, you will find that mining algorithms that require huge amounts of external memory will not bring in any better profits than mining the same coins with graphics cards. In some cases, due to the high cost of the FPGA cards, the return-on-investment (ROI) might be worse than with a graphics cards on certain algorithms. Even if the FPGA is faster than the graphics card, the extra speed may not make up for the extra cost. Because of this, it is important to select coins to mine where the FPGA’s have the best ROI vs. other hardware on the market. A new type of FPGA that supports HBM (high bandwith memory) is coming to market soon. These HBM-enabled FPGA’s might be able to ‘beat’ graphics cards on memory intensive algorithms. However, the HBM FPGA’s typically have less internal memory and logic and therefore might lose out on other algorithms.

Another critical metric of performance is the number of LUT’s (lookup tables) contained in the FPGA. For reference, the VU9P FPGA has 1,182,240 LUT’s. The bigger VU13P FPGA has about 1,700,000 LUT’s. While bigger might seem better, there are many reasons why this is not the case. High end FPGA’s generate a huge amount of heat (200-400W for a VU9P), and the biggest (VU13P) FPGA is essentially ‘power limited’, meaning it may not even be possible to use all the resources, as the power consumption of the single silicon die could reach 700W and it may become impossible to cool the chip even with liquid immersion cooling. Price is also another factor. When mining cryptocurrency, it is the price-to-performance (ROI) that matters more than absolute speed. The hardware devices that Zetheron supports have been carefully selected as having among the best price-to-performance ratio, for maximum profit and fastest return on investment.

Bitstreams

Once a program has been written for an FPGA, the FPGA must be ‘loaded’ with that program. The program is actually just a ‘configuration’ of the various logic elements inside the FPGA. This configuration ‘file’ is called a bitstream. So, to load the program into the FPGA, you must have the correct bitstream, and use a special program on a PC which loads the bitstream into the FPGA. It is important to understand that an FPGA’s configuration is volatile. You can think of it like RAM; when you turn off the power, a computer loses the contents of its RAM, but the contents of the hard drive or flash disks still remain. Because an FPGA is configured with memory similar to RAM, the FPGA will lose its configuration if you cut its power supply. To get around this, most FPGA cards have a flash memory that sits right next to the FPGA. This flash memory can hold the configuration bitstream file, and the card can be configured such that upon power-up, it automatically configures itself by loading the bitstream file from the neighboring flash memory immediately. If you start working with real FPGA cards, you will then understand the difference between a standard ‘bitstream’ (which is a volatile/RAM configuration), and a different kind of file which is called a ‘memory configuration file’. The memory configuration file is designed to be loaded, from the PC, through the FPGA, into the neighboring flash memory so the FPGA can auto-configure itself upon power-up. For mining, the main benefit of using memory configuration files (and loading them into the FPGA’s flash configuration memory) is that if you operate your mining farm remotely, and there is a power outage, the FPGA will come back online immediately. On the other hand, if you are ‘only’ using a volatile configuration, and there is a power outage, you might have to use a remote terminal program (like TeamViewer) and manually reprogram the FPGA. So the non-volatile configuration is more robust against power outages.

Programming Time & Reconfiguration

As an example, the VU9P FPGA with 1.1 million lookup tables takes about 40 seconds to program with a volatile bitstream file. There are ways to perform ‘rapid reconfiguration’ which allow the FPGA to completely reconfigure itself in a fraction of a second. This type of rapid reconfiguration requires loading the bitstream file into the FPGA’s external volatile RAM memory (typically DDR4 memory). With many gigabytes of external memory, hundreds of different bitstream configuration files can be stored. In this fashion, during real world cryptocurrency mining, an FPGA has the power to reconfigure itself in a fraction of a second based on circumstances that occur during the mining operations. Some algorithms like Timetravel10, X11Evo, X16R and X16S have hash function sequences that change every few minutes during mining. An FPGA can effectively mine those algorithms by using the rapid reconfiguration function. Since this type of rapid reconfiguration is not available to ASIC chips, it is quite possible that high end FPGA’s could never be defeated by ASIC’s on this class of algorithms, although time will tell if this is true.

Mining Software Setup

There are many different cryptocurrencies, and they are mined using different algorithms. An FPGA can be programmed to mine one algorithm effectively; that is to say, it can only mine one algorithm at a time (there are some exceptions where dual-mining is possible), but generally speaking, you load a configuration file into the FPGA that is specific to ONE algorithm that you want to mine. For Zetheron’s software, you then run a PC application called FXMiner, which connects to the FPGA over a USB cable. The FXMiner PC software connects to the mining pool of your choice, and receives work jobs and new blocks over your PC’s internet connection. The PC then sends the work jobs to the FPGA to process, and the FPGA sends back its solutions to the PC. The PC then submits the ‘good shares’ back to the mining pool. If you want to switch algorithms, you need to use a stand-alone utility to program a different bitstream into the FPGA. If you are using Xilinx FPGA cards, you would use a program called Vivado Lab Edition (which is free), to load the new bitstream into the FPGA. If you are using a Bittware FPGA card, Bittware provides its own custom utility to load a new bitstream, or you can still use the Vivado Lab Edition program if you want.