diff options
author | Bryan Newbold <bnewbold@archive.org> | 2019-09-20 20:04:53 -0700 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2019-09-20 20:04:53 -0700 |
commit | a139bcc56911e83ecba55ab6474d6aa867d9d02f (patch) | |
tree | 80ba87d11d7bc42e35c8fac8b5ce997d7b211acc | |
parent | f89906a442977a48f99bbb8b52ba7c60ec366c89 (diff) | |
download | sandcrawler-a139bcc56911e83ecba55ab6474d6aa867d9d02f.tar.gz sandcrawler-a139bcc56911e83ecba55ab6474d6aa867d9d02f.zip |
update service docs
-rw-r--r-- | minio/README.md | 7 | ||||
-rw-r--r-- | nginx/README.md | 4 | ||||
-rw-r--r-- | postgrest/README.md | 6 |
3 files changed, 16 insertions, 1 deletions
diff --git a/minio/README.md b/minio/README.md index 8e8e29f..3ce0f95 100644 --- a/minio/README.md +++ b/minio/README.md @@ -11,6 +11,10 @@ Notable buckets, and structure/naming convention: grobid/ 2c/0d/2c0daa9307887a27054d4d1f137514b0fa6c6b2d.tei.xml SHA1 (lower-case hex) of PDF that XML was extracted from + unpaywall/grobid/ + 2c/0d/2c0daa9307887a27054d4d1f137514b0fa6c6b2d.tei.xml + SHA1 (lower-case hex) of PDF that XML was extracted from + (mirror of /grobid/ for which we crawled for unpaywall and made publicly accessible) Create new buckets like: @@ -22,3 +26,6 @@ Create a new readonly user like: mc admin user add sandcrawler unpaywall $RANDOM_SECRET_KEY readonly +Make a prefix within a bucket world-readable like: + + mc policy set download sandcrawler/unpaywall/grobid diff --git a/nginx/README.md b/nginx/README.md index 8a3ee8e..0369f9b 100644 --- a/nginx/README.md +++ b/nginx/README.md @@ -2,6 +2,9 @@ This folder contains nginx configs for partner access to sandcrawler DB (postgrest) and GROBID XML blobs (minio). +`fatcat-blobs` is part of the fatcat.wiki ansible config, but included here to +show how it works. + ## Let's Encrypt As... bnewbold? @@ -13,4 +16,3 @@ As... bnewbold? --webroot -w /var/www/letsencrypt \ -d sandcrawler-minio.fatcat.wiki \ -d sandcrawler-db.fatcat.wiki - diff --git a/postgrest/README.md b/postgrest/README.md index 4774adb..b171614 100644 --- a/postgrest/README.md +++ b/postgrest/README.md @@ -118,3 +118,9 @@ Questions we might want to answer - load full fatcat file dump (TSV transform) - load dumpfilemeta +## Example Useful Lookups + + + http get :3030/cdx?url=eq.https://coleccionables.mercadolibre.com.ar/arduino-pdf_Installments_NoInterest_BestSellers_YES + http get :3030/file_meta?sha1hex=eq.120582c855a7cc3c70a8527c560d7f27e6027278 + |