Implementing A Kalman Filter In PostgreSQL To Smooth GPS Data

Usually databases are treated primarily as fairly dumb data storage systems, but they can be capable of much more. Case in point the PostgreSQL database and its – Ada-based – PL/pgSQL programming language, which allows you to perform significantly more complex operations than would be realistically possible with raw SQL. Case in point the implementation of a Kalman Filter by the folk over at Traconiq, which thus removes the necessity for an external filtering pipeline.

Using a Kalman Filter is highly desirable when you’re doing something like vehicle tracking using both dead-reckoning and GPS coordinates, as it filters out noise that can be the result of e.g. GPS reception issues. As noted in the article, transferring state from one row to the next requires a bit of lateral thinking, but is doable with some creative SQL usage. As PL/pgSQL is very similar to Oracle’s PL/SQL, this same code should work there too without too much porting required.

The code for the different implementations and associated benchmarks can be found on GitHub, though the benchmark results make it abundantly clear that the most efficient approach is to run an offline aggregate processing routine. This coincides with the other batch processing tasks that are typically performed by a database server to e.g. optimize storage, so this isn’t entirely unsurprising.

7 thoughts on “Implementing A Kalman Filter In PostgreSQL To Smooth GPS Data

  1. It would seem to me that one wants to offload SQL servers as much as possible. Not only for performance, but because it’s usually the most expensive compute-time short of GPU farms. Hence one’s wallet would suggest to do this kind of data massaging somewhere else.

    Anyone have thoughts on this?

    1. Depends

      first – “it’s usually the most expensive compute-time short of GPU farms” is not universal truth – depends on your system architecture
      second – again depending on multiple factors, it may be better to not pull the whole dataset from the server over network first, than process and than update back, but instead do it in the database directly

      for large datasets, updates using simple “update” command row by row (when the values are calculated externally) is no go (too slow) so anyway you need to do something like COPY FROM STDIN into temp table and than updating from that – may as well be easier to do the update in place with custom function in the database itself

    2. Since the filter depends not only on the input, but also related (previous) values, there might be something to say for implementing it on the server. It avoids having to fetch the previous data to a compute node, calculating the filter and writing the new data to the server.
      But still I agree. Unless there is a Lot of data and latency is an issue, I don’t see the added value.

      Still: It’s a pretty funny hack. :) Kudos for at least that.

    3. So preoccupied with whether they could that they never stopped to ask whether they should :)

      i take this sort of thing as a kind of demo of the capabilities of postgresql. Honestly, i don’t like it, for the reasons you stated and also because i view it as a kind of intrinsically harmful “embrace and extend.” i want an SQL database to be an SQL database, not to have some novel language tacked onto it which does noting but promote vendor lock-in and (usually) bad architectural choices. But a lot of projects are neat just for showing what’s possible, in case it might trigger a thought in the reader about where this capacity might be genuinely useful.

    4. If this was easy, liberal arts majors would make the big bucks right of out school.

      Beware simple answers to complicated questions.

      There are reasons to do the work on the SQL server, mostly about not moving massive data back and forth.
      There are reasons not to, everything hits SQL, you don’t want to dim those lights.

      Even situations where you customize network topography for big jobs.
      Dedicated network links between SQL, OLAP and web servers.

      Every real SQL server has it’s own server side language.
      They’re as much fun to debug as JS in a browser.

      If the option is available, best to do the work in another language, only invoke the library method in TSQL/PLSQL/pgSQL.

      Not only because *SQL sucks as a programming language but because DBAs are GD control freaks.

      I’ve seen one try to tell me that I couldn’t modify a database structure on a local dev database without her ‘blessing’ first.
      She was of the ‘backup monkey’ variety of Oracle DBAs (read overpaid and dumb), couldn’t code a stored procedure to save her life.
      But she had excellent self esteem.

Leave a Reply to mathmanCancel reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.