summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@robocracy.org>2019-01-18 17:28:59 -0800
committerBryan Newbold <bnewbold@robocracy.org>2019-01-18 17:28:59 -0800
commit4eab53da4b89d0ef4d90140f9429a3bdfcc7761e (patch)
treeac83c77e47d6a9f55dab2823563f4eeffa043979
parent2366e0fd8d3a69ec0b01557c3588d70c62967726 (diff)
downloadfatcat-4eab53da4b89d0ef4d90140f9429a3bdfcc7761e.tar.gz
fatcat-4eab53da4b89d0ef4d90140f9429a3bdfcc7761e.zip
update import README with times
-rw-r--r--python/README_import.md5
1 files changed, 3 insertions, 2 deletions
diff --git a/python/README_import.md b/python/README_import.md
index 9dda725d..2465940b 100644
--- a/python/README_import.md
+++ b/python/README_import.md
@@ -56,13 +56,14 @@ Usually 24 hours or so on fast production machine.
## Matched
-Unknown speed!
+These each take 2-4 hours:
# No file update for the first import...
- zcat /srv/fatcat/datasets/ia_papers_manifest_2018-01-25.matched.json.gz | pv -l | time parallel -j12 --round-robin --pipe ./fatcat_import.py matched --no-file-updates -
+ time zcat /srv/fatcat/datasets/ia_papers_manifest_2018-01-25.matched.json.gz | pv -l | time parallel -j12 --round-robin --pipe ./fatcat_import.py matched --no-file-updates -
# ... but do on the second
zcat /srv/fatcat/datasets/2018-08-27-2352.17-matchcrossref.insertable.json.gz | pv -l | time parallel -j12 --round-robin --pipe ./fatcat_import.py matched -
# GROBID extracted (release+file)
time zcat /srv/fatcat/datasets/2018-09-23-0405.30-dumpgrobidmetainsertable.longtail_join.filtered.tsv.gz | pv -l | time parallel -j12 --round-robin --pipe ./fatcat_import.py grobid-metadata -
+