aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2019-08-08 16:43:20 -0700
committerBryan Newbold <bnewbold@archive.org>2019-08-08 16:43:20 -0700
commit51e73fa019577bb3b5443274767252c748d5773a (patch)
tree847431e66ff7250bbaca02d862823dd520c46b3b
parent48a802d42cff309543466a9f23245aa93c6d84ea (diff)
downloadsandcrawler-51e73fa019577bb3b5443274767252c748d5773a.tar.gz
sandcrawler-51e73fa019577bb3b5443274767252c748d5773a.zip
minio README
-rw-r--r--minio/README.md24
1 files changed, 24 insertions, 0 deletions
diff --git a/minio/README.md b/minio/README.md
new file mode 100644
index 0000000..8e8e29f
--- /dev/null
+++ b/minio/README.md
@@ -0,0 +1,24 @@
+
+minio is used as an S3-compatible blob store. Initial use case is GROBID XML
+documents, addressed by the sha1 of the PDF file the XML was extracted from.
+
+Note that on the backend minio is just storing objects as files on disk.
+
+## Buckets
+
+Notable buckets, and structure/naming convention:
+
+ grobid/
+ 2c/0d/2c0daa9307887a27054d4d1f137514b0fa6c6b2d.tei.xml
+ SHA1 (lower-case hex) of PDF that XML was extracted from
+
+Create new buckets like:
+
+ mc mb sandcrawler/grobid
+
+## Users
+
+Create a new readonly user like:
+
+ mc admin user add sandcrawler unpaywall $RANDOM_SECRET_KEY readonly
+