Digital filters are always an interesting topic, and they are especially attractive with FPGAs. [Pabolo] has been working with them in a series of blog posts. The latest covers an 8th order FIR filter in Verilog. He covers some math, which you can find in many places, but he also shows how an implementation maps to DSP slices in a device. Then to reduce the number of slices, he illustrates folding which trades delay time for slice usage.
Folding takes a multi-stage parallel multiplication and breaks it into fewer multiplications done over a longer period of time. This reuses slices to reduce the number required for high-order filters.