aboutsummaryrefslogtreecommitdiffstats
path: root/notes/ingest/2020-05_pubmed.md
blob: 36d00a107aab9e9c2f090d2384389d0a8f8c067f (plain)
1
2
3
4
5
6
7
8
9
10

From ARXIV-PUBMEDCENTRAL-CRAWL-2020-04, on fatcat-prod1.

Test small batch:

    zcat ingest_file_pmcid_20200424.json.gz | head -n200 | rg -v "\\\\" | jq . -c | kafkacat -P -b wbgrp-svc263.us.archive.org -t sandcrawler-prod.ingest-file-requests-bulk -p -1

Run the whole batch:

    zcat ingest_file_pmcid_20200424.json.gz | rg -v "\\\\" | jq . -c | kafkacat -P -b wbgrp-svc263.us.archive.org -t sandcrawler-prod.ingest-file-requests-bulk -p -1