?

Log in

No account? Create an account

...

или внеочередные заметки


Previous Entry Share Next Entry
Free json datasets
trekking, Himalaya
obartunov
This is a technical post about json[b] datasets I used in presentations (http://www.sai.msu.su/~megera/postgres/talks/). We will make them available in http://sqlfiddle.postgrespro.ru/#!21/,
which one can use to play with our SQL/JSON (SQL-2016 standard) implementation in PostgreSQL.


1. Delicious bookmarks - http://www.sai.msu.su/~megera/postgres/files/jb.dump.gz, imported from http://randomwalker.info/data/delicious/

There are 1252973 bookmarks, which look like:
 {                                                                                             +
     "id": "http://delicious.com/url/b5b3cbf9a9176fe43c27d7b4af94a422#mcasas1",                +
     "link": "http://www.theatermania.com/broadway/",                                          +
     "tags": [                                                                                 +
         {                                                                                     +
             "term": "NYC",                                                                    +
             "label": null,                                                                    +
             "scheme": "http://delicious.com/mcasas1/"                                         +
         }                                                                                     +
     ],                                                                                        +
     "links": [                                                                                +
         {                                                                                     +
             "rel": "alternate",                                                               +
             "href": "http://www.theatermania.com/broadway/",                                  +
             "type": "text/html"                                                               +
         }                                                                                     +
     ],                                                                                        +
     "title": "TheaterMania",                                                                  +
     "author": "mcasas1",                                                                      +
     "source": {                                                                               +
     },                                                                                        +
     "updated": "Tue, 08 Sep 2009 23:28:55 +0000",                                             +
     "comments": "http://delicious.com/url/b5b3cbf9a9176fe43c27d7b4af94a422",                  +
     "guidislink": false,                                                                      +
     "title_detail": {                                                                         +
         "base": "http://feeds.delicious.com/v2/rss/recent?min=1&count=100",                   +
         "type": "text/plain",                                                                 +
         "value": "TheaterMania",                                                              +
         "language": null                                                                      +
     },                                                                                        +
     "wfw_commentrss": "http://feeds.delicious.com/v2/rss/url/b5b3cbf9a9176fe43c27d7b4af94a422"+
 }


40 sec to load, table size is 1.4 Gb.

2. Customer reviews data from Amazon 1998-2004 - http://www.sai.msu.su/~megera/postgres/files/review.json.dump.gz, taken from https://www.citusdata.com/blog/2013/05/30/run-sql-on-json-files-without-any-data-loads/
There are 7593244 reviews, which look like:
 {                                                   +
     "product_id": "0380979012",                     +
     "customer_id": "A3RK9GXAUB1HD7",                +
     "review_date": {                                +
         "$date": 996364800000                       +
     },                                              +
     "review_votes": 37,                             +
     "product_group": "Book",                        +
     "product_title": "The Curse of Chalion",        +
     "review_rating": 5,                             +
     "product_category": "Science Fiction & Fantasy",+
     "product_sales_rank": 115525,                   +
     "product_subcategory": "Fantasy",               +
     "similar_product_ids": [                        +
         "0380979020",                               +
         "0060574623",                               +
         "0743436121",                               +
         "0671578855",                               +
         "0671578286"                                +
     ],                                              +
     "review_helpful_votes": 36                      +
 }


About ~2 minutes to load (zcat review.json.dump.gz | psql), table size is almost 4 Gb

3. house dataset, which I used in http://www.sai.msu.su/~megera/postgres/talks/sqljson-pgconf.eu-2017.pdf

create table house as
select jsonb '{
  "address": {
    "city": "Moscow",
    "street": "Ulyanova, 7A"
  },
  "lift": false,
  "floor": [
    {
      "level": 1,
      "apt": [
        {"no": 1, "area": 40, "rooms": 1},
        {"no": 2, "area": 80, "rooms": 3},
        {"no": 3, "area": 50, "rooms": 2}
      ]
    },
    {
      "level": 2,
      "apt": [
        {"no": 4, "area": 100, "rooms": 3},
        {"no": 5, "area": 60, "rooms": 2}
      ]
    }
  ]
}' js;