blob: 3ce0f954a1b5a4a9c687e70735db6600ca65bcae (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
|
minio is used as an S3-compatible blob store. Initial use case is GROBID XML
documents, addressed by the sha1 of the PDF file the XML was extracted from.
Note that on the backend minio is just storing objects as files on disk.
## Buckets
Notable buckets, and structure/naming convention:
grobid/
2c/0d/2c0daa9307887a27054d4d1f137514b0fa6c6b2d.tei.xml
SHA1 (lower-case hex) of PDF that XML was extracted from
unpaywall/grobid/
2c/0d/2c0daa9307887a27054d4d1f137514b0fa6c6b2d.tei.xml
SHA1 (lower-case hex) of PDF that XML was extracted from
(mirror of /grobid/ for which we crawled for unpaywall and made publicly accessible)
Create new buckets like:
mc mb sandcrawler/grobid
## Users
Create a new readonly user like:
mc admin user add sandcrawler unpaywall $RANDOM_SECRET_KEY readonly
Make a prefix within a bucket world-readable like:
mc policy set download sandcrawler/unpaywall/grobid
|