aboutsummaryrefslogtreecommitdiffstats
path: root/blobs/minio
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2022-12-23 15:52:02 -0800
committerBryan Newbold <bnewbold@archive.org>2022-12-23 15:52:02 -0800
commitf3a721a9dce8801b78f7bc31e88dc912b0ec1dba (patch)
treefdae9373e78671d0031f83045e6c76de9ad616cf /blobs/minio
parent8c2c354a74064f2d66644af8f4e44d74bf322e1f (diff)
downloadsandcrawler-f3a721a9dce8801b78f7bc31e88dc912b0ec1dba.tar.gz
sandcrawler-f3a721a9dce8801b78f7bc31e88dc912b0ec1dba.zip
move a bunch of top-level files/directories to ./extra/
Diffstat (limited to 'blobs/minio')
-rw-r--r--blobs/minio/README.md74
-rw-r--r--blobs/minio/minio.conf14
2 files changed, 0 insertions, 88 deletions
diff --git a/blobs/minio/README.md b/blobs/minio/README.md
deleted file mode 100644
index d8f1c69..0000000
--- a/blobs/minio/README.md
+++ /dev/null
@@ -1,74 +0,0 @@
-
-minio is used as an S3-compatible blob store. Initial use case is GROBID XML
-documents, addressed by the sha1 of the PDF file the XML was extracted from.
-
-Note that on the backend minio is just storing objects as files on disk.
-
-## Deploying minio Server
-
-It seems to be important to use a version of minio from at least December 2019
-era for on-disk compression to actually work.
-
-Currently install minio (and mc, the minio client) in prod by simply
-downloading the binaries and calling from systemd.
-
-## Buckets and Directories
-
-Hosts and buckets:
-
- localhost:sandcrawler-dev
- create locally for development (see below)
-
- cluster:sandcrawler
- main sandcrawler storage bucket, for GROBID output and other derivatives.
- Note it isn't "sandcrawler-prod", for backwards compatibility reasons.
-
- cluster:sandcrawler-qa
- for, eg, testing on cluster servers
-
- cluster:unpaywall
- subset of sandcrawler content crawled due to unpaywall URLs;
- potentially made publicly accessible
-
-Directory structure within sandcrawler buckets:
-
- grobid/2c/0d/2c0daa9307887a27054d4d1f137514b0fa6c6b2d.tei.xml
- SHA1 (lower-case hex) of PDF that XML was extracted from
-
-Create new buckets like:
-
- mc mb cluster/sandcrawler-qa
-
-## Development
-
-Run minio server locally, with non-persisted data:
-
- docker run -p 9000:9000 minio/minio server /data
-
-Credentials are `minioadmin:minioadmin`. Install `mc` client utility, and
-configure:
-
- mc config host add localhost http://localhost:9000 minioadmin minioadmin
-
-Then create dev bucket:
-
- mc mb --ignore-existing localhost/sandcrawler-dev
-
-A common "gotcha" with `mc` command is that it will first look for a local
-folder/directory with same name as the configured remote host, so make sure
-there isn't a `./localhost` folder.
-
-
-## Users
-
-Create a new readonly user like:
-
- mc admin user add sandcrawler unpaywall $RANDOM_SECRET_KEY readonly
-
-Make a prefix within a bucket world-readable like:
-
- mc policy set download cluster/unpaywall/grobid
-
-## Config
-
- mc admin config set aitio compression extensions=.txt,.log,.csv,.json,.tsv,.pdf,.xml mime_types=text/csv,text/plain,application/json,application/xml,application/octet-stream,application/tei+xml
diff --git a/blobs/minio/minio.conf b/blobs/minio/minio.conf
deleted file mode 100644
index 2e93f9a..0000000
--- a/blobs/minio/minio.conf
+++ /dev/null
@@ -1,14 +0,0 @@
-
-# Volume to be used for MinIO server.
-MINIO_VOLUMES="/sandcrawler-minio/data"
-# Use if you want to run MinIO on a custom port.
-MINIO_OPTS="--address :9000"
-# Access Key of the server.
-MINIO_ACCESS_KEY=REDACTED
-# Secret key of the server.
-MINIO_SECRET_KEY=REDACTED
-
-# may need to set these manually using `mc admin config get`, edit the JSON, then `set`
-MINIO_COMPRESS="on"
-MINIO_COMPRESS_EXTENSIONS=".txt,.log,.csv,.json,.tar,.xml,.bin,.pdf,.tsv"
-MINIO_COMPRESS_MIME_TYPES="text/*,application/json,application/xml,application/pdf,application/octet-stream"