aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--minio/README.md7
-rw-r--r--nginx/README.md4
-rw-r--r--postgrest/README.md6
3 files changed, 16 insertions, 1 deletions
diff --git a/minio/README.md b/minio/README.md
index 8e8e29f..3ce0f95 100644
--- a/minio/README.md
+++ b/minio/README.md
@@ -11,6 +11,10 @@ Notable buckets, and structure/naming convention:
grobid/
2c/0d/2c0daa9307887a27054d4d1f137514b0fa6c6b2d.tei.xml
SHA1 (lower-case hex) of PDF that XML was extracted from
+ unpaywall/grobid/
+ 2c/0d/2c0daa9307887a27054d4d1f137514b0fa6c6b2d.tei.xml
+ SHA1 (lower-case hex) of PDF that XML was extracted from
+ (mirror of /grobid/ for which we crawled for unpaywall and made publicly accessible)
Create new buckets like:
@@ -22,3 +26,6 @@ Create a new readonly user like:
mc admin user add sandcrawler unpaywall $RANDOM_SECRET_KEY readonly
+Make a prefix within a bucket world-readable like:
+
+ mc policy set download sandcrawler/unpaywall/grobid
diff --git a/nginx/README.md b/nginx/README.md
index 8a3ee8e..0369f9b 100644
--- a/nginx/README.md
+++ b/nginx/README.md
@@ -2,6 +2,9 @@
This folder contains nginx configs for partner access to sandcrawler DB
(postgrest) and GROBID XML blobs (minio).
+`fatcat-blobs` is part of the fatcat.wiki ansible config, but included here to
+show how it works.
+
## Let's Encrypt
As... bnewbold?
@@ -13,4 +16,3 @@ As... bnewbold?
--webroot -w /var/www/letsencrypt \
-d sandcrawler-minio.fatcat.wiki \
-d sandcrawler-db.fatcat.wiki
-
diff --git a/postgrest/README.md b/postgrest/README.md
index 4774adb..b171614 100644
--- a/postgrest/README.md
+++ b/postgrest/README.md
@@ -118,3 +118,9 @@ Questions we might want to answer
- load full fatcat file dump (TSV transform)
- load dumpfilemeta
+## Example Useful Lookups
+
+
+ http get :3030/cdx?url=eq.https://coleccionables.mercadolibre.com.ar/arduino-pdf_Installments_NoInterest_BestSellers_YES
+ http get :3030/file_meta?sha1hex=eq.120582c855a7cc3c70a8527c560d7f27e6027278
+