diff options
-rw-r--r-- | minio/README.md | 7 | ||||
-rw-r--r-- | nginx/README.md | 4 | ||||
-rw-r--r-- | postgrest/README.md | 6 |
3 files changed, 16 insertions, 1 deletions
diff --git a/minio/README.md b/minio/README.md index 8e8e29f..3ce0f95 100644 --- a/minio/README.md +++ b/minio/README.md @@ -11,6 +11,10 @@ Notable buckets, and structure/naming convention: grobid/ 2c/0d/2c0daa9307887a27054d4d1f137514b0fa6c6b2d.tei.xml SHA1 (lower-case hex) of PDF that XML was extracted from + unpaywall/grobid/ + 2c/0d/2c0daa9307887a27054d4d1f137514b0fa6c6b2d.tei.xml + SHA1 (lower-case hex) of PDF that XML was extracted from + (mirror of /grobid/ for which we crawled for unpaywall and made publicly accessible) Create new buckets like: @@ -22,3 +26,6 @@ Create a new readonly user like: mc admin user add sandcrawler unpaywall $RANDOM_SECRET_KEY readonly +Make a prefix within a bucket world-readable like: + + mc policy set download sandcrawler/unpaywall/grobid diff --git a/nginx/README.md b/nginx/README.md index 8a3ee8e..0369f9b 100644 --- a/nginx/README.md +++ b/nginx/README.md @@ -2,6 +2,9 @@ This folder contains nginx configs for partner access to sandcrawler DB (postgrest) and GROBID XML blobs (minio). +`fatcat-blobs` is part of the fatcat.wiki ansible config, but included here to +show how it works. + ## Let's Encrypt As... bnewbold? @@ -13,4 +16,3 @@ As... bnewbold? --webroot -w /var/www/letsencrypt \ -d sandcrawler-minio.fatcat.wiki \ -d sandcrawler-db.fatcat.wiki - diff --git a/postgrest/README.md b/postgrest/README.md index 4774adb..b171614 100644 --- a/postgrest/README.md +++ b/postgrest/README.md @@ -118,3 +118,9 @@ Questions we might want to answer - load full fatcat file dump (TSV transform) - load dumpfilemeta +## Example Useful Lookups + + + http get :3030/cdx?url=eq.https://coleccionables.mercadolibre.com.ar/arduino-pdf_Installments_NoInterest_BestSellers_YES + http get :3030/file_meta?sha1hex=eq.120582c855a7cc3c70a8527c560d7f27e6027278 + |