blob: 36d00a107aab9e9c2f090d2384389d0a8f8c067f (
plain)
1
2
3
4
5
6
7
8
9
10
|
From ARXIV-PUBMEDCENTRAL-CRAWL-2020-04, on fatcat-prod1.
Test small batch:
zcat ingest_file_pmcid_20200424.json.gz | head -n200 | rg -v "\\\\" | jq . -c | kafkacat -P -b wbgrp-svc263.us.archive.org -t sandcrawler-prod.ingest-file-requests-bulk -p -1
Run the whole batch:
zcat ingest_file_pmcid_20200424.json.gz | rg -v "\\\\" | jq . -c | kafkacat -P -b wbgrp-svc263.us.archive.org -t sandcrawler-prod.ingest-file-requests-bulk -p -1
|