aboutsummaryrefslogtreecommitdiffstats
path: root/kafka
diff options
context:
space:
mode:
authorbnewbold <bnewbold@archive.org>2020-03-20 05:00:44 +0000
committerbnewbold <bnewbold@archive.org>2020-03-20 05:00:44 +0000
commite5ad7bddbcb55471b96ce30397ed85fe98e3b098 (patch)
tree4dac48368f0e19d1b9f2a51a74e0f5fce9d86925 /kafka
parent2ede359095660b8b0906cd26fe8eca2a6f429010 (diff)
parent750b5d4c53d1075ddd31c3357dd5f690eb5951e0 (diff)
downloadsandcrawler-e5ad7bddbcb55471b96ce30397ed85fe98e3b098.tar.gz
sandcrawler-e5ad7bddbcb55471b96ce30397ed85fe98e3b098.zip
Merge branch 'martin-pubmed-ftp-topic-docs' into 'master'
topics: add pubmed ftp topic See merge request webgroup/sandcrawler!26
Diffstat (limited to 'kafka')
-rw-r--r--kafka/topics.md10
1 files changed, 9 insertions, 1 deletions
diff --git a/kafka/topics.md b/kafka/topics.md
index 0ce8610..9cd43bd 100644
--- a/kafka/topics.md
+++ b/kafka/topics.md
@@ -55,8 +55,15 @@ retention (on both a size and time basis).
=> ~1TB capacity; 8x crossref partitions, 4x datacite
=> key compaction possible
+ fatcat-ENV.ftp-pubmed
+ => new citations from FTP server, from: ftp://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/
+ => raw XML, one record per message (PubmedArticle, up to 25k records/day and 650MB/day)
+ => key: PMID
+ => key compaction possible
+
fatcat-ENV.api-crossref-state
fatcat-ENV.api-datacite-state
+ fatcat-ENV.ftp-pubmed-state
fatcat-ENV.oaipmh-pubmed-state
fatcat-ENV.oaipmh-arxiv-state
fatcat-ENV.oaipmh-doaj-journals-state (DISABLED)
@@ -135,11 +142,12 @@ exists`; this seems safe, and the settings won't be over-ridden.
./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 2 --partitions 8 --topic fatcat-qa.api-crossref
./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 2 --partitions 8 --topic fatcat-qa.api-datacite --config cleanup.policy=compact
+ ./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 2 --partitions 8 --topic fatcat-qa.ftp-pubmed --config cleanup.policy=compact
./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 2 --partitions 1 --topic fatcat-qa.api-crossref-state
./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 2 --partitions 1 --topic fatcat-qa.api-datacite-state
+ ./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 2 --partitions 1 --topic fatcat-qa.ftp-pubmed-state
./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 2 --partitions 4 --topic fatcat-qa.oaipmh-pubmed
./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 2 --partitions 4 --topic fatcat-qa.oaipmh-arxiv
./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 2 --partitions 1 --topic fatcat-qa.oaipmh-pubmed-state
./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 2 --partitions 1 --topic fatcat-qa.oaipmh-arxiv-state
-