How It Works?
- Login to siwave.io
The compiler is hosted on tier1 cloud service providers network and uses secure HTTPS protocol as well as strong authorization and authentication protocols for maximum security.
Signup to get a no-obligation 30 days free trial account - Load Keras model
All builtin float and quantized TFLite operations are supported. Get an instant feedback on the validity of the Keras model immediately when it's loaded.
- Set system parameters and select target platform
A super intuitive user interface with the minimal number of options to get going. No need to have any FPGA/ASIC experience or read a user manual to use the compiler, just load the Keras model, set the system parameters (Frequency and Latency) and select target platform to generate an RTL inference in just a couple of minutes. The inference can also be optimized for either frequency or area.
The compiler generates agnostic RTL that can target any FPGA vendor/device or ASIC implementationSystem Parameters - Download IP and integrate into top level system
Deliverables include the encrypted SystemVerilog inference model, a SystemVerilog testbench and a datasheet. A Quartus Platform Designer ready IP component and a Vivado IP core are also provided to facilitate the integration of the inference in a top level system. We also offer optional system design and integration service for generating a complete FPGA or ASIC system.
The bus interface uses a simple ready/valid handshake protocol that can handle burst sizes up to 4G bytesQuartus Platorm Designer IP Vivado IP Core
What sets us apart from the competition
3X Smaller Inferences
The entire inference is implemented in platform agnostic RTL which means there is no runtime environment and no microcode to execute on a microprocessor which translates to a considerable reduction in inference size, in addition, to a much faster deployment and less maintenance effort.
2X Higher Performance
The entire inference is implemented in RTL and, therefore, there is no time consuming context switching between ML hardware and host processor to execute unsupported operations. Siwave's compiler optimizes each operation based on the system frequency and latency instead of slicing tensors to fit operations into a fixed matrix which translates to a considerable speedup in inference execution time compared to our competitors when using same frequency or, alternatively, the potential to run the inference at higher frequency which will, consequently, lead to a lower latency as demonstrated in the table below.
LUT | LUTRAM | FF | BRAM | DSP | Max CLK | Max Ops/cycle | GOPS | |
---|---|---|---|---|---|---|---|---|
AMD DPUCZDX8G (B512) | 8% | 2% | 6% | 8% | 4% | 325 MHz | 512 | 166.4 |
SIWAVE CONVOLUTION RTL MODULE | 3% | 0% | 1% | 8% | 1% | 500 MHz | 512 | 256 |