The other day I found myself designing a database schema for a collection of musicians each of which plays one or more instruments. The result looked something like this.
CREATE TABLE musicians
(
id bigserial NOT NULL,
name text NOT NULL,
PRIMARY KEY (id)
);
CREATE TYPE musical_instrument AS ENUM
('guitar', 'piano', 'bass', 'trumpet', 'drums');
CREATE TABLE musician_instruments
(
musician_id bigint NOT NULL,
instrument musical_instrument NOT NULL,
"position" integer NOT NULL,
PRIMARY KEY (musician_id, instrument)
);
As you can see the table musician_instruments
is used to associate musical instruments to a musician. It has an additional column called position
because the order of instruments is relevant for the application.
When I squinted real hard I saw that I had in front of me a simple array disguised table. So I thought, why not use the array type that Postgres provides and see where it leads me. A research project if you might.
ALTER TABLE public.musicians
ADD COLUMN instruments musical_instrument[] NOT NULL;
Using Postgres arrays with PHP
The application that accesses the database is written in PHP using the Laravel framework which uses PDO as the mechanism to talk to the database. Let’s query the table and see what happens.
This is the musicians
table:
id | name | instruments |
---|---|---|
1 | Peter | {guitar,piano} |
This is the query:
$musicians = DB::select('SELECT * FROM musicians');
var_dump($musicians);
And this is what we got:
array(1) { [0] => class stdClass#728 (3) { public $id => int(1) public $name => string(5) "Peter" public $instruments => string(14) "{guitar,piano}" } }
Meh. For a second there I had my hopes up that PDO would return the instruments as a PHP array all ready to go. Lucky for us, the string looks easy enough to parse.
function maybeParseArrayOutput(string $output) {
if ($output === '{}') {
return [];
}
return mb_split(',', mb_substr($output, 1, -1));
}
$instruments = maybeParseArrayOutput('{guitar,piano}');
// ['guitar', 'piano']
This works fine for a subset of arrays and if you know that your arrays fall into that subset then you’re good to go. However, leaving aside the issue of nested arrays and arrays of complex types, there are common cases where this simple solution breaks down.
Postgres Array Output Syntax
Let’s take a look at how Postgres formats the array output:
The array output routine will put double quotes around element values if they are empty strings, contain curly braces, delimiter characters, double quotes, backslashes, or white space, or match the word
https://www.postgresql.org/docs/current/arrays.html#ARRAYS-IONULL
. Double quotes and backslashes embedded in element values will be backslash-escaped.
The rules seem pretty straight forward but do we really want to write a parser? It is very easy to overlook some of the edge cases and have a potentially exploitable bug in our application. I leave it as an challenge for the adventurous reader to write a parser that can handle just all possible text[]
returned by Postgres.
Fortunately, there is a more pragmatic solution that makes use of the function array_to_json
to convert the Postgres array to JSON which is something that PHP can natively parse:
$musicians = DB::select('SELECT id, name, array_to_json(instruments) as instruments FROM musicians');
foreach ($musicians as $musician) {
$musician->instruments = json_decode($musician->instruments);
}
var_dump($musicians);
array(1) { [0] => class stdClass#2032 (3) { public $id => int(1) public $name => string(5) "Peter" public $instruments => array(2) { [0] => string(6) "guitar" [1] => string(5) "piano" } } }
Conversely we can convert a JSON array to text[]
when inserting new records.
DB::insert(<<<SQL
INSERT INTO musicians (name, instruments) VALUES
('Mike', ARRAY(SELECT * FROM json_array_elements_text(?))::musical_instrument[]);
SQL
, [
json_encode(['drums', 'bass']),
]);
So what’s the point?
We are already converting our data to and from JSON so why not just store it in a JSON column? Actually, if you want a practical solution that integrates nicely with Laravel then you are better off storing your data as JSON directly. Of course as a trade-off you lose the strong type checking and self-documenting nature of the native Postgres array.
But let’s not give up on the idea just yet. Let’s first find out how we can use Postgres arrays together with the Laravel query builder.
Postgres Arrays with Laravel Query Builder
As you might already expect, selecting with the query builder is no problem.
DB::table('musicians')->select([
'id',
'name',
DB::raw('array_to_json(instruments) as instruments'),
]);
Inserting new rows, on the other hand, is the first major roadblock.
DB::table('musicians')->insert([
'name' => 'Paula',
'instruments' => DB::raw('ARRAY(SELECT * FROM json_array_elements_text(?))::musical_instrument[]')
]);
Laravel has no built-in mechanism to fill that ?
placeholder. First instinct is to look at the source code of insert
and check if where we can somehow extend it. Scrolling to the end of the function we see that it is ultimately the call to cleanBindings
which maps our values to bindings.
// From Illuminate\Database\Query\Builder
public function insert(array $values)
{
// [...]
return $this->connection->insert(
$this->grammar->compileInsert($this, $values),
$this->cleanBindings(Arr::flatten($values, 1))
);
}
public function cleanBindings(array $bindings)
{
return array_values(array_filter($bindings, function ($binding) {
return ! $binding instanceof Expression;
}));
}
The values from our input array become bindings with expressions filtered out for obvious reason. For our problem we need to do the opposite. We need the expression to provide additional bindings. To do that, we have to first extend the Expression
class.
use Illuminate\Database\Query\Expression;
class ParameterizedExpression extends Expression
{
protected $bindings;
public function __construct($value, array $bindings)
{
parent::__construct($value);
$this->bindings = $bindings;
}
public function getBindings()
{
return $this->bindings;
}
}
Now we can modify cleanBindings
to extract the bindings from the parameterized expression.
public function cleanBindings(array $bindings)
{
$items = [];
foreach ($bindings as $binding) {
if ($binding instanceof ParameterizedExpression) {
foreach ($binding->getBindings() as $b) {
$items[] = $b;
}
} else if (!($binding instanceof Expression)) {
$items[] = $binding;
}
}
return $items;
}
With this we can use the insert
function with arrays like this:
DB::table('musicians')->insert([
'name' => 'Paula',
'instruments' => new ParameterizedExpression('ARRAY(SELECT * FROM json_array_elements_text(?))::musical_instrument[]', [json_encode(['trumpet'])]),
]);
It works but we can make it much cleaner if we refactor out all of the boilerplate into a helper function.
class Expr {
public static function array(array $items, ?string $elementType = null): ParameterizedExpression
{
$sql = 'ARRAY(SELECT * FROM json_array_elements_text(?))';
if ($elementType !== null) {
$sql .= "::{$elementType}[]";
}
return new ParameterizedExpression($sql, [json_encode($items)]);
}
}
DB::table('musicians')->insert([
'name' => 'Paula',
'instruments' => Expr::array(['trumpet'], 'musical_instrument'),
]);
It is still necessary to specify the array type but I can live with that, plus when working with text
arrays it can be omitted entirely. One major advantage of this solution is that we can use parameterized expressions not only for insert
but also for update
and all of the different variants. The big downside is that overriding cleanBindings
involves a lot of work since the dependency on the Builder
class is hard coded pretty deep down the chain. It’s almost bad enough to warrant some composer shenanigans.
Wrapping up
By now if you are not already convinced that using Postgres arrays with the current state of PHP libraries is a bad idea then you can look forward to Part 2 in which I am going to have a go at using Postgres arrays together with Eloquent Models.
You can find a repository with the code samples from this blog at github.
2 thoughts on “Postgres Arrays and PHP -Part 1”