Digital filters are always an interesting topic, and they are especially attractive with FPGAs. [Pabolo] has been working with them in a series of blog posts. The latest covers an 8th order FIR filter in Verilog. He covers some math, which you can find in many places, but he also shows how an implementation maps to DSP slices in a device. Then to reduce the number of slices, he illustrates folding which trades delay time for slice usage.
Folding takes a multi-stage parallel multiplication and breaks it into fewer multiplications done over a longer period of time. This reuses slices to reduce the number required for high-order filters.
By the end, you can see three different implementations of the same filter and it is illustrative how each one uses resources, power, and time. The code is all available on GitHub. The posts focus mostly on Xilinx, although there is also discussion of other DSP block styles.
Mathematically, an FIR filter has no poles which means it is always stable. However, compared to IIR filters which use state information, they require higher filter orders to get similar performance.