Dave's Postgres Blog: 2011

Wednesday 21 December 2011

Updated PostgreSQL Download Infrastructure

Back at the tail end of November, the PostgreSQL sysadmin/web team successfully migrated the PostgreSQL website to a new platform, based on PostgreSQL, Django, Lighttpd, and Varnish, to replace the old somewhat complex and messy PHP platform used in the past. Functionally and visually the website is almost identical to what it was, though the behind-the-scenes management interface is now vastly improved, as is the "Your Account" section which now offers users much more control over their submissions like news and events etc.

One other change that went largely unnoticed however was in the downloads section of the website, specifically the FTP area. This is a web interface over the content on the PostgreSQL FTP site, ftp.postgresql.org, that gives users a nice way to browse the files and directories on the site. On the old website, when the user clicked to download a file they would then be taken to a page of flags where they could select a mirror site to download the file from. The mirrors were all third party servers over which we had no control, aside from an automated system to ensure we only listed those which had content which was no more than 48 hours out of date. This arrangement made sense years ago when bandwidth was more precious, however with the bandwidth available to us these days it's really just clunky and inconvenient for users to have to choose one of 75 flags to reach a server that may not be entirely up to date.

To resolve this, as the new website infrastructure went live we also pushed a new download infrastructure into production. What you'll see now is direct links to files on download servers we run ourselves, from the website. This gives us a number of advantages:

The user has a much slicker experience when downloading, both in terms of the workflow, and often the speed of downloads (because some of the old mirrors were much faster than others).
We can push out files to the download servers in minutes, rather than days.
We can collect meaningful statistics to help us understand what users are downloading.
We can automatically (and invisibly) disable download servers in the event of problems, within minutes.

So what does this mean to the end user? Well, for a user coming from the website downloads take fewer clicks and avoid the "flags" page which could be daunting and potentially confusing. For those using the FTP site directly, there are a number of changes:

The site can be accessed at ftp://ftp.postgresql.org/, rather than using one of the individual mirror hostnames we used in the past.
Because there are no third party mirrors, there are no inconsistent paths to the content (one mirror in the past may have had content under /pub/mirrors/postgresql/ whilst another may have used /u/postgres/).
We can now offer downloads over HTTP using the same paths as FTP - you can use http://ftp.postgresql.org/pub/... or ftp://ftp.postgresql.org/pub/...
We've also opened up RSYNC access to all users, where previously only registered mirror sites could rsync the downloads from us: rsync://ftp.postgresql.org::pgsql-ftp.

Because there are lots of links to files on the old mirror network on the web, we've left the old mirror hostnames in place for the time being (though they are no longer being monitored), however they will be phased out over time.

Happy downloading!

Tuesday 11 October 2011

To upgrade or not to upgrade? That is the question.

On a fairly regular basis the Postgres community hear from users who are complaining of bugs in old versions of Postgres - they'll post a bug report or a request for help on one of the mailing lists saying something along the lines of having run into an issue with PostgreSQL 8.4.2 for example, and immediately be met with suggestions to upgrade to the latest version because there have been 37 bug fixes and 5 security issues resolved since 8.4.2 was released.

Generally this happens with one of two types of users. There are those that just don't bother to upgrade (who we can do little about), and those that don't upgrade because they are concerned that changes to PostgreSQL will break their application. This latter class of user is sometimes also restricted by what they can do by corporate policies in their workplace.

The Postgres developers are mindful of this issue and have practices in place to allow users to safely upgrade without significant risk of behavioural changes breaking their applications. The practice is really quite simple; minor upgrades of PostgreSQL never include new features.

What does this mean in practice? Well, PostgreSQL version numbers are in 3 parts, X.Y.Z (some packages such as the EnterpriseDB installers also add a build suffix):

X.Y: This is the major version number of the server, for example, 8.4, 9.0 or 9.1. New major versions may include new functionality, require upgrades to the database files on disk and generally require thorough testing of applications prior to deployment.

Z: This is the minor version number. This number is increased for bug fixes releases, also known as "point releases". These releases never include new features; only carefully applied bug fixes. Point releases are fully compatible with previous point releases of the same major version and should require minimal testing prior to deployment on existing installations.

Returning to our opening example, the user in this case is not being told to upgrade to 9.1.1 - the latest and greatest release at the time of writing, complete with a myriad of new features and changes from 8.4.2 - but to 8.4.9, the latest point release in the 8.4 series, which is functionally identical to 8.4.2. This numbering scheme and the processes behind it are specifically designed to allow users to safely and easily upgrade their database servers to minimise the number of known bugs in the software; in fact the PostgreSQL developers consider not upgrading to the latest point release to be riskier than upgrading.

So next time you're "certifying" your application with PostgreSQL, aim to certify it with a specific major version of the server, and avoid getting into a situation where you prevent yourself from updating to the latest point release as doing so can cause more problems than it can solve, and if you're a sysadmin or DBA rest assured that point releases won't introduce functionality changes and should be welcomed and installed as soon as possible!

Wednesday 5 October 2011

StackBuilder Package Updates

If you've ever used one of the PostgreSQL installers for v8.2 or above, either the old Windows MSI installer or the newer "one click" installers that also support Linux and Mac, you'll probably have come across StackBuilder. For those that haven't or those that never found the time, StackBuilder was introduced with the PostgreSQL 8.2 installer to allow us to distribute different components of PostgreSQL independently of the server itself. Originally in 8.0/8.1, the installer included lots of additional products, such as the ODBC, JDBC, OLEDB and .NET drivers, Slony and PostGIS. As you can imagine, this proved near impossible to maintain as we needed to try to coordinate the release of products from multiple independently run projects.

StackBuilder was the solution to this. The installers were cut down to essentially include just the PostgreSQL server, pgAdmin and StackBuilder, which allowed us to provide all the other components on independent release schedules. It also gave us a vehicle to encourage adoption of PostgreSQL by other Open Source projects, by including packages for them as well, so with a few mouse clicks a user could be up and running with a "stack" like Drupal, Apache, PHP and PostgreSQL on their OS of choice. We made a conscious decision to include software other than well known Open Source products in the StackBuilder catalog as well, with the aim of giving the user as much choice of product to use with PostgreSQL as possible - as a result we have products in the catalog from multiple vendors and projects, as well as the "PostgreSQL family" projects. If you're interested in having a PostgreSQL related product included in the catalog, please contact me to discuss how we can make that happen.

So, with the refresher course out of the way, todays blog post is prompted by the latest round of updates that I've been adding to the catalog - we've got 42 updates to the Open Source packages, as well as 18 new packages, including for the first time, pgBouncer, pgMemcache and Drupal 7.

Updates:

Apache/PHP 2.2.20-5.3.8: Linux32, Linux64, Mac, Win32
mediaWiki 1.17.0: Linux32, Linux64, Mac, Win32
pgJDBC 9.0-801: Linux32, Linux64, Mac, Win32
phpBB 9.0-801: Linux32, Linux64, Mac, Win32
PostGIS 1.3.6 for PG 8.3: Linux32, Linux64, Mac
PostGIS 1.4.2 for PG 8.4: Linux32, Linux64, Mac
PostGIS 1.5.3 for PG 9.0: Linux32, Linux64, Mac
psqlODBC 09.00.0310: Linux32, Linux64, Mac, Win32
Slony 1.2.22 for PG 8.3: Linux32, Linux64, Mac
Slony 2.0.7 for PG 8.4: Linux32, Linux64, Mac
Slony 2.0.7 for PG 9.0: Linux32, Linux64, Mac
Npgsql 2.0.11: Linux32, Linux64, Mac, Win32

New releases:

Drupal 7: Linux32, Linux64, Mac, Win32
pgBouncer: Linux32, Linux64
pgMemcache 2.0.1 for PG 9.0: Linux32, Linux64, Mac
pgMemcache 2.0.1 for PG 9.1: Linux32, Linux64, Mac
PostGIS 1.5.3 for PG 9.1: Linux32, Linux64, Mac
Slony 2.0.7 for PG 9.1: Linux32, Linux64, Mac

Note that the PostGIS community maintain the PostGIS installers for Windows which will be released as soon as they're available. Windows updates for Slony are still in development due to an issue found in QA. Mac and Windows builds of pgBouncer are on their way.

To download and install any of these packages, just run StackBuilder - if you don't have it already you can get it with the PostgreSQL Installers.

Enjoy :-)

Tuesday 4 October 2011

PostgreSQL Conference Europe: Are you ready?

PostgreSQL Conference Europe 2011 starts 2 weeks from today in the beautiful city of Amsterdam in the Netherlands. This is the fourth annual conference hosted by PostgreSQL Europe, following on from extremely successful events in Prato (Italy), Paris and Stuttgart, and is aimed at developers, DBAs, technologists and decision makers either using, or considering using the world's most advanced Open Source database.

This year we have four days on the schedule, with a kick-off day of training sessions hosted by respected PostgreSQL developers such as Greg Smith, Bruce Momjian, Magnus Hagander, Guillaume Lelarge and more. Topics will cover performance tuning, application development, database administration, replication & high availability and geospatial. The training sessions are available on their own, or as part of a regular conference attendance at additional - but very reasonable - cost.

We had a record number of talk proposals submitted this year but we've resisted the urge to host even more sessions in parallel - in fact we've reduced the number of parallel sessions to three as we all know how frustrating it can be when more than one that you want to see are at the same time. Instead we've extended the conference by a day to accomodate over 40 different sessions, which has the added bonus of allowing an additional night of social activities - always a great way to discuss the latest technologies, trends and ideas with other Postgres users over a beer or two.

We've got a great range of topics for this year, covering new features in PostgreSQL 9.1 and beyond, developing applications, running Postgres in the cloud, hacking PostgreSQL internals, tools and add-on products and managing large databases, presented by a wide cross-section of users and developers, including a number of this year's Google Summer of Code students who will talk about their work. You can view the complete schedule on the conference website.

Our opening keynote this year will be presented by Ram Mohan, EVP and CTO of Afilias who manage the .info, .org and .mobi top level domains using Postgres. Ram will be discussing the business decisions and strategy around their use of PostgreSQL. Our closing keynote will be presented by Ed Boyajian, President and CEO of EnterpriseDB who will discuss PostgreSQL's role in the post-Oracle era.

So, if you haven't done so already, head on over to the website to register as an attendee to avoid missing out on what promises to be an outstanding conference in an fantastic location. See you in Amsterdam!

Monday 1 August 2011

PostgreSQL 9.1, meet MySQL

So having spent the last few months with my head buried deep in a project at work, I finally managed to get back to my previous hacking on SQL/MED at the weekend after the kids went away for a week (good $DEITY it's quiet here)! Within a couple of hours, I had my half-baked Foreign Data Wrapper for MySQL up and running, and am now able to create foreign table objects in PostgreSQL 9.1 that map either directly to tables in a MySQL database, or to queries run on the MySQL server.

Here's an example:

-- Create the required functions for the FDW.
CREATE FUNCTION mysql_fdw_handler()
RETURNS fdw_handler
AS '$libdir/mysql_fdw'
LANGUAGE C STRICT;

CREATE FUNCTION mysql_fdw_validator(text[], oid)
RETURNS void
AS '$libdir/mysql_fdw'
LANGUAGE C STRICT;

-- Create the data wrapper or "transport".
CREATE FOREIGN DATA WRAPPER mysql_fdw
HANDLER mysql_fdw_handler
VALIDATOR mysql_fdw_validator;

-- Create the foreign server, a pointer to the MySQL server.
CREATE SERVER mysql_svr
FOREIGN DATA WRAPPER mysql_fdw
OPTIONS (address '127.0.0.1', port '3306');

-- Create one or more foreign tables on the MySQL server. The first of
-- these maps to a remote table, whilst the second uses an SQL query.
CREATE FOREIGN TABLE employees (
id integer,
name text,
address text)
SERVER mysql_svr
OPTIONS (table 'hr.employees');

CREATE FOREIGN TABLE ex_staff (
id integer,
name text,
address text)
SERVER mysql_svr
OPTIONS (query 'SELECT * FROM hr.employees WHERE date_left IS NOT NULL');

-- Create a user mapping to tell the FDW the username/password to

-- use to connect to MySQL, for PUBLIC. This could be done on a per-

-- role basis.

CREATE USER MAPPING FOR PUBLIC

SERVER mysql

OPTIONS (username 'dpage', password '');

So let's try it out. Here's the test table in MySQL:

mysql> SELECT * FROM employees;

+----+-------------+----------------------------+------------+

| id | name        | address                    | date_left  |

+----+-------------+----------------------------+------------+

|  1 | Dave Page   | 27 High Street, Georgetown | NULL       |

|  2 | Fred Bloggs | 46 Mill Road, Klasborough  | NULL       |

|  3 | Fred Bloggs | 24 The Wharf, Westhampton  | 2010-05-23 |

+----+-------------+----------------------------+------------+

3 rows in set (0.00 sec)

And here we are in PostgreSQL 9.1 beta 3:

postgres=# SELECT * FROM employees;

 id |    name     |          address           

----+-------------+----------------------------

  1 | Dave Page   | 27 High Street, Georgetown

  2 | Fred Bloggs | 46 Mill Road, Klasborough

  3 | Fred Bloggs | 24 The Wharf, Westhampton

(3 rows)

postgres=# SELECT * FROM ex_staff;

 id |    name     |          address          

----+-------------+---------------------------

  3 | Fred Bloggs | 24 The Wharf, Westhampton

(1 row)

For the curious, here's what the EXPLAIN output looks like:

postgres=# EXPLAIN SELECT * FROM employees;

                           QUERY PLAN                           

----------------------------------------------------------------

 Foreign Scan on employees  (cost=10.00..13.00 rows=3 width=68)

   Local server startup cost: 10

   MySQL query: SELECT * FROM hr.employees

(3 rows)

Pretty neat huh? There are a couple of limitations in the current implementation:

No attempt is currently made to push down quals (WHERE clauses) to the MySQL server, so every row MySQL finds is returned to PostgreSQL and filtered there. There's no defined API for this in PostgreSQL yet, and it's not immediately clear how to build something more complex than the simple example I used in my Redis FDW that would be required for a remote relational database. That said, you can build WHERE clauses into the foreign table definition of course.
The MySQL C API doesn't seem to offer a simple way to either randomly access a result set, or at least reset the cursor to the first row, unless you copy the entire resultset to the client (PostgreSQL in this case). Because we need to be able to return to the first row if PostgreSQL calls the Rescan function, we therefore currently copy the entire resultset, rather than reading it from the server, on demand.

Aside from minor tweaks, this is probably about as far as I'll take this little project for now. I'll be talking about it at both Postgres Open 2011 in Chicago in September, and PGBR 2011 in São Paulo in November - hopefully I'll see you there.

The MySQL FDW source code is available on Github, and uses the PostgreSQL licence.

Wednesday 6 April 2011

Google Summer of Code 2011

As a reminder, Google's Summer of Code 2011 program has started and PostgreSQL is participating! We are looking for students who are interested in hacking on pgAdmin or PostgreSQL all summer. You only have until April 8, 2011 at midnight Pacific time to submit a proposal.

If you are a student, or know a student, or are a professor, please urge your students to participate in Summer Of Code. Not only is it a fantastic opportunity to learn how open source code is really made, and get paid to do it, but it can also be career-building.

Read up on the PostgreSQL GSOC wiki page.

Friday 1 April 2011

Tinkering with SQL/MED

One of the cool new features in the upcoming PostgreSQL 9.1 release is support for SQL/MED, or Management of External Data. Essentially what this allows us to do is connect external data sources to PostgreSQL and query them as if they were tables. The initial support in 9.1 will offer read-only support for foreign tables, with the ability to include qualifier pushdown - i.e, the ability to push the applicable WHERE part of the SQL query to the remote server, to minimise data transfer. This optimisation is dependent on each individual Foreign Data Wrapper (FDW) - the driver used to connect to the remote data source - which may or may not implement it.

PostgreSQL 9.1 will ship with an FDW that allows access to files in formats that can be read by the existing COPY command, and Andrew Dunstan has been working on an FDW that will allow access to files with fixed-width data as well as one for accessing ragged CSV files.

I've been a big proponent of SQL/MED support, particularly at work, so was pleased to finally find some time to look at the work that's been done for 9.1 earlier in the week. I don't have much interest in reading external files (though clearly that is an important feature for some users) - I'm far more keen to see PostgreSQL have the ability to access the data in all those other databases you may not be able to migrate away from, or are better suited to particular workloads. So, with that in mind, I picked a NoSQL database more or less at random (though intentionally one that's a simple key/value store) and started hacking on an FDW for Redis.

The code is still pre-alpha quality, and there are a number of TODOs, including:

Add qual pushdown
Handle different Redis datatypes correctly
Figure out how to make Redis scans atomic (or at the least properly handle tuples that have been removed since the initial scan)
Handle Redis authentication

However it seems to work well (bar the known issues of course), and is looking like it will be simple to use as the basis for more complex FDWs which may well keep me amused for a while longer.

As always with Open Source, feel free to play with this if you think you might find it useful, but don't blame me if it breaks (which it likely will)!

Thursday 24 March 2011

Debunking the FUD: PostgreSQL for Microsoft Windows Payload Execution

If you follow Twitter for keywords like "postgres" or "postgresql", you may well have seen a number of tweets over the last day or so regarding a so-called "Payload Execution" exploit in PostgreSQL. This supposed attack was apparently first described in this paper, and has hit Twitter after code to demonstrate the issue was added to Metasploit.

The "attack" works like this:

A user uploads a payload as a large object to the database. This can be done using the client side lo_create() and lo_import() functions.
The user then uses the server-side lo_export() function to export the payload to the server's filesystem.
The user then executes the payload, by creating an SQL level function to wrap the C function in the payload, eg:
CREATE FUNCTION do_bad_stuff() RETURNS int4 AS '$libdir/bad_stuff.dll' LANGUAGE C;
The user then executes the payload:
SELECT do_bad_stuff();

Here's the problem with the "exploit":

You need to be a superuser for it to work.

That's right - steps 2 and 3 above both require that you are a superuser - in other words, you already have complete, unfettered access to the database server.

But wait - that doesn't get you very far into the OS either! One oft-criticized feature of PostgreSQL (for users that unwisely like to do all their work as root) is that it refuses to run under an account with superuser privileges - in fact on Windows, we irrevocably remove unwanted privileges from our security token at startup. The reason for this, is to ensure that once our L337 H4X0r has uploaded his payload (using his existing superuser privileges), he cannot escalate those privileges to the operating system's superuser accounts and compromise other parts of the system.

As a side note - as superuser in most installations there are far easier ways of achieving the results above that avoid the need to know the architecture of the system or to write code in C - for example, you can simply create a function using one of the untrusted procedural languages, like pl/perlu.

So the bottom line is, this is not an exploit - it requires that you are already a database superuser with all the power that entails, and doesn't gain you anything you didn't already have the ability to do, or give you a way to gain OS level privileges beyond those already held by the low-privilege user account the server runs under.