diff options
| -rw-r--r-- | python/README.md | 2 | ||||
| -rw-r--r-- | python/README_import.md | 20 | ||||
| -rwxr-xr-x | python/fatcat_import.py (renamed from python/client.py) | 0 | ||||
| -rwxr-xr-x | python/fatcat_webface.py (renamed from python/run.py) | 0 | 
4 files changed, 11 insertions, 11 deletions
| diff --git a/python/README.md b/python/README.md index c7e33f0a..eebbbd9c 100644 --- a/python/README.md +++ b/python/README.md @@ -3,7 +3,7 @@  Use `pipenv` (which you can install with `pip`). -    pipenv run run.py +    pipenv run fatcat_webface.py  Run tests: diff --git a/python/README_import.md b/python/README_import.md index ae9764e6..38c8406f 100644 --- a/python/README_import.md +++ b/python/README_import.md @@ -24,7 +24,7 @@ the others:  From CSV file:      export LC_ALL=C.UTF-8 -    time ./client.py import-issn /srv/datasets/journal_extra_metadata.csv +    time ./fatcat_import.py import-issn /srv/datasets/journal_extra_metadata.csv      real    2m42.148s      user    0m11.148s @@ -36,38 +36,38 @@ Pretty quick, a few minutes.  Directly from compressed tarball; takes about 2 hours in production: -    tar xf /srv/datasets/public_profiles_API-2.0_2017_10_json.tar.gz -O | jq -c . | grep '"person":' | time parallel -j12 --pipe --round-robin ./client.py import-orcid - +    tar xf /srv/datasets/public_profiles_API-2.0_2017_10_json.tar.gz -O | jq -c . | grep '"person":' | time parallel -j12 --pipe --round-robin ./fatcat_import.py import-orcid -  After tuning database, `jq` CPU seems to be bottleneck, so, from pre-extracted  tarball:      tar xf /srv/datasets/public_profiles_API-2.0_2017_10_json.tar.gz -O | jq -c . | rg '"person":' > /srv/datasets/public_profiles_1_2_json.all.json -    time parallel --bar --pipepart -j8 -a /srv/datasets/public_profiles_1_2_json.all.json ./client.py import-orcid - +    time parallel --bar --pipepart -j8 -a /srv/datasets/public_profiles_1_2_json.all.json ./fatcat_import.py import-orcid -  Does not work: -    ./client.py import-orcid /data/orcid/partial/public_profiles_API-2.0_2017_10_json/3/0000-0001-5115-8623.json +    ./fatcat_import.py import-orcid /data/orcid/partial/public_profiles_API-2.0_2017_10_json/3/0000-0001-5115-8623.json  Instead: -    cat /data/orcid/partial/public_profiles_API-2.0_2017_10_json/3/0000-0001-5115-8623.json | jq -c . | ./client.py import-orcid - +    cat /data/orcid/partial/public_profiles_API-2.0_2017_10_json/3/0000-0001-5115-8623.json | jq -c . | ./fatcat_import.py import-orcid -  Or for many files: -    find /data/orcid/partial/public_profiles_API-2.0_2017_10_json/3 -iname '*.json' | parallel --bar jq -c . {} | rg '"person":' | ./client.py import-orcid - +    find /data/orcid/partial/public_profiles_API-2.0_2017_10_json/3 -iname '*.json' | parallel --bar jq -c . {} | rg '"person":' | ./fatcat_import.py import-orcid -  ### ORCID Performance  for ~9k files: -    (python-B2RYrks8) bnewbold@orithena$ time parallel --pipepart -j4 -a /data/orcid/partial/public_profiles_API-2.0_2017_10_json/all.json ./client.py import-orcid - +    (python-B2RYrks8) bnewbold@orithena$ time parallel --pipepart -j4 -a /data/orcid/partial/public_profiles_API-2.0_2017_10_json/all.json ./fatcat_import.py import-orcid -      real    0m15.294s      user    0m28.112s      sys     0m2.408s      => 636/second -    (python-B2RYrks8) bnewbold@orithena$ time ./client.py import-orcid /data/orcid/partial/public_profiles_API-2.0_2017_10_json/all.json +    (python-B2RYrks8) bnewbold@orithena$ time ./fatcat_import.py import-orcid /data/orcid/partial/public_profiles_API-2.0_2017_10_json/all.json      real    0m47.268s      user    0m2.616s      sys     0m0.104s @@ -94,11 +94,11 @@ After some simple database tuning:  From compressed: -    xzcat /srv/datasets/crossref-works.2018-01-21.json.xz | time parallel -j20 --round-robin --pipe ./client.py import-crossref - /srv/datasets/20180216.ISSN-to-ISSN-L.txt +    xzcat /srv/datasets/crossref-works.2018-01-21.json.xz | time parallel -j20 --round-robin --pipe ./fatcat_import.py import-crossref - /srv/datasets/20180216.ISSN-to-ISSN-L.txt  ## Manifest  -    time ./client.py import-manifest /srv/datasets/idents_files_urls.sqlite +    time ./fatcat_import.py import-manifest /srv/datasets/idents_files_urls.sqlite      [...]      Finished a batch; row 284518671 of 9669646 (2942.39%).  Total inserted: 6606900 diff --git a/python/client.py b/python/fatcat_import.py index 2804a210..2804a210 100755 --- a/python/client.py +++ b/python/fatcat_import.py diff --git a/python/run.py b/python/fatcat_webface.py index cfddad48..cfddad48 100755 --- a/python/run.py +++ b/python/fatcat_webface.py | 
