diff options
author | bnewbold <bnewbold@archive.org> | 2020-03-20 05:00:44 +0000 |
---|---|---|
committer | bnewbold <bnewbold@archive.org> | 2020-03-20 05:00:44 +0000 |
commit | e5ad7bddbcb55471b96ce30397ed85fe98e3b098 (patch) | |
tree | 4dac48368f0e19d1b9f2a51a74e0f5fce9d86925 /kafka/topics.md | |
parent | 2ede359095660b8b0906cd26fe8eca2a6f429010 (diff) | |
parent | 750b5d4c53d1075ddd31c3357dd5f690eb5951e0 (diff) | |
download | sandcrawler-e5ad7bddbcb55471b96ce30397ed85fe98e3b098.tar.gz sandcrawler-e5ad7bddbcb55471b96ce30397ed85fe98e3b098.zip |
Merge branch 'martin-pubmed-ftp-topic-docs' into 'master'
topics: add pubmed ftp topic
See merge request webgroup/sandcrawler!26
Diffstat (limited to 'kafka/topics.md')
-rw-r--r-- | kafka/topics.md | 10 |
1 files changed, 9 insertions, 1 deletions
diff --git a/kafka/topics.md b/kafka/topics.md index 0ce8610..9cd43bd 100644 --- a/kafka/topics.md +++ b/kafka/topics.md @@ -55,8 +55,15 @@ retention (on both a size and time basis). => ~1TB capacity; 8x crossref partitions, 4x datacite => key compaction possible + fatcat-ENV.ftp-pubmed + => new citations from FTP server, from: ftp://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/ + => raw XML, one record per message (PubmedArticle, up to 25k records/day and 650MB/day) + => key: PMID + => key compaction possible + fatcat-ENV.api-crossref-state fatcat-ENV.api-datacite-state + fatcat-ENV.ftp-pubmed-state fatcat-ENV.oaipmh-pubmed-state fatcat-ENV.oaipmh-arxiv-state fatcat-ENV.oaipmh-doaj-journals-state (DISABLED) @@ -135,11 +142,12 @@ exists`; this seems safe, and the settings won't be over-ridden. ./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 2 --partitions 8 --topic fatcat-qa.api-crossref ./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 2 --partitions 8 --topic fatcat-qa.api-datacite --config cleanup.policy=compact + ./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 2 --partitions 8 --topic fatcat-qa.ftp-pubmed --config cleanup.policy=compact ./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 2 --partitions 1 --topic fatcat-qa.api-crossref-state ./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 2 --partitions 1 --topic fatcat-qa.api-datacite-state + ./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 2 --partitions 1 --topic fatcat-qa.ftp-pubmed-state ./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 2 --partitions 4 --topic fatcat-qa.oaipmh-pubmed ./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 2 --partitions 4 --topic fatcat-qa.oaipmh-arxiv ./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 2 --partitions 1 --topic fatcat-qa.oaipmh-pubmed-state ./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 2 --partitions 1 --topic fatcat-qa.oaipmh-arxiv-state - |