Wednesday, 6 April 2011

Google Summer of Code 2011

As a reminder, Google's Summer of Code 2011 program has started and PostgreSQL is participating!  We are looking for students who are interested in hacking on pgAdmin or PostgreSQL all summer.  You only have until April 8, 2011 at midnight Pacific time to submit a proposal.

If you are a student, or know a student, or are a professor, please urge your students to participate in Summer Of Code.  Not only is it a fantastic opportunity to learn how open source code is really made, and get paid to do it, but it can also be career-building.

Read up on the PostgreSQL GSOC wiki page.

Friday, 1 April 2011

Tinkering with SQL/MED

One of the cool new features in the upcoming PostgreSQL 9.1 release is support for SQL/MED, or Management of External Data. Essentially what this allows us to do is connect external data sources to PostgreSQL and query them as if they were tables. The initial support in 9.1 will offer read-only support for foreign tables, with the ability to include qualifier pushdown - i.e, the ability to push the applicable WHERE part of the SQL query to the remote server, to minimise data transfer. This optimisation is dependent on each individual Foreign Data Wrapper (FDW) - the driver used to connect to the remote data source - which may or may not implement it.  

PostgreSQL 9.1 will ship with an FDW that allows access to files in formats that can be read by the existing COPY command, and Andrew Dunstan has been working on an FDW that will allow access to files with fixed-width data as well as one for accessing ragged CSV files.

I've been a big proponent of SQL/MED support, particularly at work, so was pleased to finally find some time to look at the work that's been done for 9.1 earlier in the week. I don't have much interest in reading external files (though clearly that is an important feature for some users) - I'm far more keen to see PostgreSQL have the ability to access the data in all those other databases you may not be able to migrate away from, or are better suited to particular workloads. So, with that in mind, I picked a NoSQL database more or less at random (though intentionally one that's a simple key/value store) and started hacking on an FDW for Redis.

The code is still pre-alpha quality, and there are a number of TODOs, including:
  • Add qual pushdown
  • Handle different Redis datatypes correctly
  • Figure out how to make Redis scans atomic (or at the least properly handle tuples that have been removed since the initial scan)
  • Handle Redis authentication
However it seems to work well (bar the known issues of course), and is looking like it will be simple to use as the basis for more complex FDWs which may well keep me amused for a while longer.

As always with Open Source, feel free to play with this if you think you might find it useful, but don't blame me if it breaks (which it likely will)!