Oleg (obartunov) wrote,

Hstore development for 9.4 release: part 3

I returned back home from PGCon.EU-2013 conference, Dublin, Ireland, where we gave two talks - Next Generation of GIN and Binary storage for nested data structures and application to hstore data type. These projects as well as visiting the conference (I, Teodor Sigaev and Alexander Korotkov) were sponsored by EngineYard company and we hope it will supports us in other projects. I already wrote about hstore development, see Part 1,
GIN fast-scan and speedup of @> operator,
Part 2, so today I'll just highlight only new features of hstore we added since PGCon-2013, Ottawa, Canada.

  • We reworked the binary storage (slides 17-21) for nested hstore to add scalars and types support.

  • Scalars now supported in hstore (slide 10)
    postgres=# select 'a'::hstore, 't'::hstore;
     hstore | hstore
     "a"    | t

  • Finally, types are supported in hstore (slides 11-13). They are numeric, boolean, strings and NULL. We added a bunch of new operators and functions to work with these types (slides 42-46). Types are huge improvement, since now hstore and json could be converted to each other without any problem. Slide 40:
    =# select '{"a":3.14}'::json::hstore::json;
    {"a": 3.14}
    =# select '3.14'::json::hstore::json;

  • GUC variable hstore.root_array_decorated is now deprecated and arrays always have to be embedded in curly braces.

  • We added performance comparison with MongoDB. MongoDB is very slow on loading data (slide 59) - 8 minutes vs 76s,
    seqscan speed is the same - about 1s, index scan is very fast - 1ms vs 17 ms with GIN fast-scan patch. But we managed to create new opclass (slides 61-62) for hstore using hashing of full-paths concatenated with values and got 0.6ms, which is faster than mongodb ! Here GIN++ is the GIN with fast-scan patch.
    Hstore - seqscan, GiST,  GIN   GIN++  GINhash  MongoDB
                      64MB   815MB        349MB    100MB
             0.98s    0.3s   0.1s  0.017s  0.0007s 0.001s
                      3x     10x    60x    1400x   1000x

    It's worth noticing, that MongoDB index is very "narrow" index, while hstore's indexes could speedup more queries.

  • New hstore is now documented, thanks David Wheeler, who also is reviewing our patch.

  • The last patch to 9.4dev is available http://www.sigaev.ru/misc/nested_hstore-0.36.patch.gz, but expect the new one with gin_hstore_hash_ops opclass included.

Now, there is no obstacle to prevent json development for 9.4 to use our binary storage. Andrew Dunstan has started thinking about this.

We have many discussions after our talk (which has a great success) about future development and improvements and we really have several interesting ideas about extending GIN interface and bitmap filtering. I hope that we find support of our work (several companies already shown some interest).

Again, thanks EngineYard for support !

Tags: hstore, pg, pgen

  • Post a new comment


    Anonymous comments are disabled in this journal

    default userpic

    Your IP address will be recorded