или внеочередные заметки

Previous Entry Share Next Entry
Yet another performance improvement for hstore @> hstore !
trekking, Himalaya
Our GIN improvement (see Full-text search in PostgreSQL in milliseconds(PDF)) can seriously improve the performance of @> operator. Please, take a look on slide #47 from our presentation of nested hstore with array support, where we summarize performance of @> operator:

Hstore - seqscan, GiST, GIN
                              100s 400s - create index
                             64MB 815MB
                   0.98s 0.3s 0.1s
                                3x 10x

We see, that GIN index is 10x times faster than seqscan. But we can further improve GIN performance if we apply special optimization for (freq & rare) GIN index scan, which currently takes ~time(freq), but with our GIN improvement it takes ~time(rare). This improvement can greatly affect @> operator, since we index keys and values separately and @> operator can be viewed as (hstore @> key) & (hstore @> value). Usually, key is frequent and for rare value we can get serious improvement. Let's check this. I use the same dataset and the same query we used in our presentation at PGCon-2013.

=# select count(*) from hs where h::hstore @> 'tags=>{{term=>NYC}}'::hstore;
(1 row)

Time: 17.372 ms

That's what we expected !

Key 'tags' is very frequent
=# select count(*) from hs where h::hstore ? 'tags';
(1 row)

=# select count(*) from hs;
(1 row)

Finally, the slide #47 should looks like:

Hstore - seqscan, GiST,  GIN  GIN+
                              100s 400s              - create index
                             64MB 815MB
                   0.98s 0.3s 0.1s    0.017s
                                3x 10x     57x

I think this is very good argument for considering our patch for GIN improvement, which is submitted to Commitfest (https://commitfest.postgresql.org/action/patch_view?id=1137)
Tags: , ,


Log in