Pretty much all imports done at git hash c1d0fea

    time xzcat /srv/fatcat/datasets/crossref-works.2018-09-05.json.xz | time parallel -j20 --round-robin --pipe ./fatcat_import.py import-crossref - /srv/fatcat/datasets/20180216.ISSN-to-ISSN-L.txt /srv/fatcat/datasets/release_ids.ia_munge_20180908.sqlite3

    Processed 4990450 lines, inserted 4005034, updated 0. (etc)

    133387.36user 5255.64system 24:19:01elapsed 158%CPU (0avgtext+0avgdata 448196maxresident)k
    177480808inputs+432403768outputs (204major+48533880minor)pagefaults 0swaps

    real    1459m1.518s
    user    2308m24.300s
    sys     93m17.132s

    Longer, bigger, etc than previously!

    Size:  377.49G

    select count(id) from release_ident; => 79,880,900


    zcat /srv/fatcat/datasets/ia_papers_manifest_2018-01-25.matched.json.gz | pv -l | time parallel -j12 --round-robin --pipe ./fatcat_import.py import-matched --no-file-update -
    Processed 531700 lines, inserted 511751, updated 0.
    Command exited with non-zero status 1
    17087.60user 717.77system 3:07:11elapsed 158%CPU (0avgtext+0avgdata 67420maxresident)k
    60128inputs+3401960outputs (141major+403282minor)pagefaults 0swaps


Sample of "not found" DOIs:

    DOI not found: 10.1109/mic.2005.100
    DOI not found: 10.3386/w9732
    DOI not found: 10.1090/s0002-9939-97-04114-2
    DOI not found: 10.1186/1475-2867-5-29
    DOI not found: 10.2172/143964
    DOI not found: 10.2172/10170724
    DOI not found: 10.2172/383051
    DOI not found: 10.1017/s0033291700051370
    DOI not found: 10.12980/jclm.3.2015j5-154
    DOI not found: 10.2172/801341
    DOI not found: 10.2172/899508

    DOI not found: 10.1136/bmj.2.4570.302
    DOI not found: 10.1136/bmj.2.4687.1049
    DOI not found: 10.1163/221125903x00429
    DOI not found: 10.1177/004947557800800102
    DOI not found: 10.1177/107755874800500313
    DOI not found: 10.1177/107755874800500415
    DOI not found: 10.1177/107755874800500713

    DOI not found: 10.5990/jwpa.29.72
    DOI not found: 10.2307/1107183
    DOI not found: 10.1101/147165
    DOI not found: 10.17848/wp04-108
    DOI not found: 10.2172/542039
    DOI not found: 10.2172/542040
    DOI not found: 10.1002/9781444308747.ch6


    zcat /srv/fatcat/datasets/2018-08-27-2352.17-matchcrossref.insertable.json.gz | pv -l | time parallel -j12 --round-robin --pipe ./fatcat_import.py import-matched -
    Processed 485400 lines, inserted 283498, updated 197825.
    25649.33user 1152.84system 4:42:24elapsed 158%CPU (0avgtext+0avgdata 38984maxresident)k
    38584inputs+2371576outputs (136major+357478minor)pagefaults 0swaps

    Size:  395.13G

                          table_name                          | table_size | indexes_size | total_size 
--------------------------------------------------------------+------------+--------------+------------
 "public"."release_ref"                                       | 154 GB     | 54 GB        | 208 GB
 "public"."release_rev"                                       | 39 GB      | 22 GB        | 61 GB
 "public"."release_contrib"                                   | 25 GB      | 22 GB        | 47 GB
 "public"."release_edit"                                      | 7095 MB    | 6956 MB      | 14 GB
 "public"."work_edit"                                         | 7095 MB    | 6956 MB      | 14 GB
 "public"."release_ident"                                     | 5203 MB    | 6254 MB      | 11 GB
 "public"."work_ident"                                        | 5203 MB    | 6254 MB      | 11 GB
 "public"."file_rev_url"                                      | 6535 MB    | 2478 MB      | 9013 MB
 "public"."work_rev"                                          | 3376 MB    | 3127 MB      | 6503 MB
 "public"."file_rev"                                          | 1404 MB    | 2115 MB      | 3519 MB
 "public"."abstracts"                                         | 2611 MB    | 208 MB       | 2820 MB
 "public"."file_edit"                                         | 1089 MB    | 1066 MB      | 2155 MB
 "public"."file_release"                                      | 713 MB     | 1250 MB      | 1962 MB
 "public"."file_ident"                                        | 618 MB     | 740 MB       | 1358 MB
 "public"."creator_rev"                                       | 371 MB     | 457 MB       | 828 MB
 "public"."creator_edit"                                      | 347 MB     | 352 MB       | 699 MB
 "public"."release_rev_abstract"                              | 284 MB     | 369 MB       | 653 MB
 "public"."creator_ident"                                     | 255 MB     | 305 MB       | 560 MB
 "public"."changelog"                                         | 138 MB     | 142 MB       | 279 MB
 "public"."editgroup"                                         | 155 MB     | 92 MB        | 247 MB
 "public"."container_rev"                                     | 20 MB      | 9272 kB      | 29 MB
 "public"."container_edit"                                    | 8312 kB    | 7360 kB      | 15 MB
 "public"."container_ident"                                   | 7272 kB    | 6832 kB      | 14 MB

Exports!

    time cat /tmp/fatcat_ident_releases.tsv | ./target/release/fatcat-export release --expand files,container -j8 | gzip > release_export_expanded.json.gz
     INFO 2018-09-27T22:54:30Z: fatcat_export: Done reading (79880900 lines), waiting for workers to exit...

     real    384m29.435s
     user    740m1.060s
     sys     229m11.632s


    time zcat /srv/fatcat/snapshots/2018-09-24/release_export_expanded.json.gz | ./transform_release.py | esbulk -verbose -size 20000 -id ident -w 8 -index fatcat -type release
     2018/09/28 02:56:36 79880900 docs in 2h56m48.425914042s at 7529.948 docs/s with 8 workers
     2018/09/28 02:56:36 applied setting: {"index": {"refresh_interval": "1s"}} with status 200 OK
     2018/09/28 02:56:36 applied setting: {"index": {"number_of_replicas": "1"}} with status 200 OK
     2018/09/28 02:56:40 index flushed: 200 OK

     real    176m53.138s
     user    318m17.004s
     sys     29m48.944s

    webcrawl@wbgrp-svc503:/srv/fatcat/src/extra/elasticsearch$ du -sh /srv/elasticsearch/data/      
    52G     /srv/elasticsearch/data/

TODO:
x abstracts
x file_hashes
x ext idents
x upload to an item
x download and re-build elastic
- insert new mellon matches


    time zcat /srv/fatcat/datasets/2018-09-23-0405.30-dumpgrobidmetainsertable.longtail_join.filtered.tsv.gz | pv -l | time parallel -j12 --round-robin --pipe ./fatcat_import.py import-grobid-metadata -
    [...]
    Processed 132994 lines, inserted 123052, updated 0.
    Processed 132984 lines, inserted 122979, updated 0.
    10930.34user 475.87system 2:40:03elapsed 118%CPU (0avgtext+0avgdata 68180maxresident)k
    8912inputs+20157832outputs (59major+1104467minor)pagefaults 0swaps

    real    160m3.573s
    user    184m54.176s
    sys     8m23.388s