Homework 3
due Wednesday, March 4, 4:00 PM
due Friday, March 6, 4:00 PM
(NO LATE HOMEWORK ACCEPTED)
submission using Blackboard
Problem
FAQ
Some common questions are answered here.
Part 1: Coding
In this assignment you will code a systolic FIR filter using Xilinx DSP48 blocks. The filter you will design is a systolic FIR filter using DSP48s with separate rounding function as shown in Figure 4-10 of the "Xtreme DSP for Virtex 4 FPGAs User Guide" here. For your convenience, the figure is shown here.
The Matlab code you will use to verify your design is the same as for the previous homework. This code can be used because a systolic FIR filter is basically a pipelined version of a direct form FIR filter:
fir_filter.m
print_vector.m
In this homework you will slightly modify fir_filter.m: the only change you will make is to replace floor(x + 0.5) with round(x) so that you can perform Xilinx symmetric rounding.
You will develop a single VHDL model which can implement the following two variants of systolic FIR filters:
homework3_short: M = 12 coefficients, N1=N2=N3=12 totals bits, L1=L2=L3=11 fractional bits, quant_type = 1 (symmetric rounding), many_quant = 0 (i.e. one quantizer after final adder). You will need to change floor(x + 0.5) to round(x) such that the quantizer does symmetric rounding not biased round-to-nearest.
homework3_long: M = 24 coefficients, N1=N2=N3=16 totals bits, L1=L2=L3=15 fractional bits, quant_type = 1 (symmetric rounding), many_quant = 0 (i.e. one quantizer after final adder). You will need to change floor(x + 0.5) to round(x) such that the quantizer does symmetric rounding not biased round-to-nearest.
Develop the complete VHDL code including library declaration, entity declaration, and architecture for a systolic FIR filter using DSP48s. The only difference between the short and long filters is the generic values that you select.
Use the following entity declaration. Assume N, L, and M are generic. Your code should work for ANY generic integer values of N and L (assuming N>L, L>1). Since M will determine which coefficients to use, your circuit should be able to work for M=12 and M=24. Be sure to use std_logic or std_logic_vector for all entity input and output ports, not integer, signed, or unsigned. Use "in" or "out" for port types; do not use "buffer". It is recommended you use the std_logic_signed package. Entity declaration:
entity homework3 is
generic (
N: integer:=12; -- N = total number of bits for N1, N2, N3 (assuming N1=N2=N3)
L: integer:=11; -- L = number of fractional for L1, L2, L3 (assuming L1=L2=L3)
M: integer:=12); -- M = number of filter coefficients
port (
clk: in std_logic;
rst: in std_logic; -- active high reset to comply with DSP48 reset
x: in std_logic_vector(N-1 downto 0); -- SN.L number
y: out std_logic_vector(N-1 downto 0); -- SN.L number
end homework3;
Coefficients
The coefficients can be stored internally as constants, similar to how Xilinx does this in DSP48 systolic filter example below. In other words, you will run your Matlab code to obtain the coefficients, then put these coefficients as an array of std_logic_vector (do not use integer as the Xilinx example uses) in your code. You can put these in a package, put them as a constant array at the top-level, etc. You can then use the generic value of M to select between which array of coefficients to use (i.e. when M=12 choose one set of coefficients and when M=24 choose the other set). You can be flexibile with how to store the coefficients inside your design.
Xilinx DSP48 Systolic Filter Example
Read pages 82-86 of the "Xtreme DSP for Virtex 4 FPGAs User Guide" here. Download the example code from the link shown on page 86, which is also here. You will first need to create an account at Xilinx to access this example.
Look at the example of the systolic filter given in these files. You can model your systolic filter after the Xilinx example; note that this is just a general model and is different from our homework, as we want to do different rounding mechanisms, wordlengths, etc. VERY IMPORTANT: For the systolic filter, you should be reading the files from ug073_c04\vhdl\Transpose not from Systolic. Xilinx mixed up the two VHDL directories--the "systolic" directory holds the transpose filter files, and the "transpose" directory holds the systolic filter files!
Part 2: Testbench
Create test vectors in Matlab by changing the Matlab script to the appropriate parameters and producing vectors d_in.dat, d_out.dat, and h.dat (you will use h.dat to create internal coefficient arrays as discussed previously). Based on the script, d_in.dat and d_out.dat should both be 100 samples each.
Your testbench should reset the circuit and then run through all inputs in the d_in.dat file. The inputs should change at the negative edge of the clock.
As you are running the code, wait until the appropriate time (recall: systolic FIR filter increases the latency of the system), check the first VHDL output with the Matlab output from d_out.dat; if this first output is exactly the same, print "match", the expected value, and the computed value (and a carriage return) in a file called verify.dat; if the output does not exactly match, print "error", the expected value, and the computed value (and a carriage return) in the verify.dat file. Do this for all 100 output samples. If your VHDL and Matlab match, then the verify.dat file should be a 100-line file with each line having the word "match" in it. Help with VHDL file I/O can be found here or slides 67-81 of this file.
Save the waveforms from simulation (i.e. in .awf format for Active HDL or .wlf format for ModelSim) and submit all of them. You should perform functional simulation and post-synthesis (or post-translate) simulation. There is no need to do post-place-and-route simulation.
Part 3: FPGA Synthesis
For each of the two designs, synthesize the design on the smallest Virtex4 device you can fit it in . You should not do implementation.
- Perform synthesis for Virtex 4.
- Perform post-synthesis simulation if using Aldec (post-translate simulation if using Xilinx/Modelsim) at some reasonable clock frequency (i.e. 10 MHz).
What to Turn In
Submit the following files using Blackboard. DO NOT submit your entire Aldec or Xilinx project with all its files, directories, subdirectories, temp files, etc.; only submit what is requested below. Submit 3 files:
Submit a zip file called homework3_short.zip. This zip file should contain:
- All VHDL source codes, including VHDL testbench, and all Matlab test files and verify.dat
- Synthesis report
- Waveforms from simulation before synthesis (in .awf format for Active HDL or .wlf format for ModelSim)
- Waveforms from simulation post-synthesis (in .awf format for Active HDL) or post-translate (in .wlf format for ModelSim)
Submit a zip file called homework3_long.zip. This zip file should contain:
- All VHDL source codes, including VHDL testbench, and all Matlab test files and verify.dat. Even if your source code and testbench is exactly the same as before, include it in this zip file again.
- Synthesis report
- Waveforms from simulation before synthesis (in .awf format for Active HDL or .wlf format for ModelSim)
- Waveforms from simulation post-synthesis (in .awf format for Active HDL) or post-translate (in .wlf format for ModelSim)
Submit a short report called yourname_homework3_report.txt (or .doc or .pdf). The report should have the exact numbering below:
- 1) Report whether your post-synthesis/translate results matched Matlab results for homework3_short and homework3_long
- 2) What is the measured SQNR (the Matlab variable "sqnr_direct") of homework3_short and homework3_long, based on Matlab? Does this meet your expectations (i.e. should onequant or manyquant be expected to have higher SQNR).
- 3) Write results after FPGA synthesis:
- Results after synthesis for homework3_short: #slices, #flip-flops, #LUTs, #block RAMs, #DSP48s, maximum clock frequency, minimum period. If your synthesis tool only gives slices instead of LUTs (or vice-versa) after synthesis, just note this.
- Results after synthesis for homework3_long: #slices, #flip-flops, #LUTs, #block RAMs, #DSP48s, maximum clock frequency, minimum period. If your synthesis tool only gives slices instead of LUTs (or vice-versa) after synthesis, just note this.
- 4) Which of the two filters is faster? Does this meet your expectations and why?
- 5) Which of the two filters is larger? Does this meet your expectations and why?
- 6) Compare your results of homework3_short and homework2_onequant. Which of the two filters is faster? Does this meet your expectations and why?