From ab48873debef25fefac9a22b7a80d2aa742d0d96 Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Thu, 2 May 2019 14:32:48 -0700 Subject: init with notes --- .gitignore | 21 +++++++++++++++ notes.md | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++ prototyping.md | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 181 insertions(+) create mode 100644 .gitignore create mode 100644 notes.md create mode 100644 prototyping.md diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..81a4762 --- /dev/null +++ b/.gitignore @@ -0,0 +1,21 @@ +*.o +*.a +*.pyc +#*# +*~ +*.swp +.* +*.tmp +*.old +*.profile +*.bkp +*.bak +[Tt]humbs.db +*.DS_Store +build/ +_build/ +src/build/ +*.log + +# Don't ignore this file itself +!.gitignore diff --git a/notes.md b/notes.md new file mode 100644 index 0000000..0b94fbc --- /dev/null +++ b/notes.md @@ -0,0 +1,76 @@ + +Bryan's notes on setting up an "official" Tor onion service (formerly known as +"hidden service") gateway for the Internet Archive. + +Goal is to have one (or more) onion service gateways running on IA hardware +within the IA network to provide access to archive.org content and the wayback +machine. Rough order of feature importance: + +- browse and download files from archive.org +- browse wayback machine +- read the blog +- request wayback captures via SPN +- create account and login to archive.org; upload files +- access the S3 upload endpoint, eg via `ia` tool +- browse/borrow/use openlibrary.org +- fatcat.wik (bryan's paper project; can use as a demo/example) + +## Current Situation (May 2019) + +An external un-official volunteer runs an onion service: + +- +- +- +- +- + +This is done via a custom PHP script: + +- + +archive.is runs an onion service. + +## Onion Service Resources / Docs + +Riseup guide: + +EOTK ("Enterprise Onion ToolKit"): + +Tor Project onion service overview: + +Tor Project onion service v3 announce (November 2017): + +## IA-specific notes + +Running from the "office" network instead of "cluster" network might be best: +local routing, but doesn't become a way to bypass our IP-range cluster +firewalls. + +Tasks: + +- get a proper SSL EV certificate... wildcard? for the onion address +- monitoring/alerting + +EOTK notes: + +- upgrading to the new onion services ("v3", "Prop 224", longer keys, better + crypto, etc) seems non-trivial: EOTK depends on onionbalance which depends on + stem. [EOTK issue](https://github.com/alecmuffett/eotk/issues/23), + [onionbalance issue](https://github.com/DonnchaC/onionbalance/issues/69), [onionbalance notes](https://onionbalance.readthedocs.io/en/latest/design.html#next-generation-onion-services-prop-224-compatibility) +- need to enumerate all "sub-domain stems" (like us.archive.org), but + apparently not every host name +- multi-machine configs are nice, though realistically in 2019 if any + datacenter is down then probably all of our services are + ("anti-high-availability"), so not much help if the onion service is up. + the "hardmap 2" setup, where the second device might not even be live/active, + but ready in case there are hardware issues with the first, might be easiest + +### archive.org + +Lots of host names! + +### web.archive.org + +Multiple layers of re-write! + diff --git a/prototyping.md b/prototyping.md new file mode 100644 index 0000000..e83bc57 --- /dev/null +++ b/prototyping.md @@ -0,0 +1,84 @@ + +## wbgrp-svc206.us.archive.org Log + +A cluster VM, running Ubuntu 16.04. + + sudo mkdir -p /srv/eotk + sudo chown bnewbold:bnewbold /srv/eotk + cd /srv/eotk + screen -S eotk + + git clone https://github.com/alecmuffett/eotk.git src + cd src + ./opt.d/install-everything-on-ubuntu-16.04.sh + +This initially failed on: + + gpgkeys: key A3C4F0F979CAA22CDBA8F512EE8CBC9E886DDD89 can't be retrieved + +"yav" Following random internet things and trying: + + # FAIL + gpg --keyserver pool.sks-keyservers.net --recv A3C4F0F979CAA22CDBA8F512EE8CBC9E886DDD89 + +I instead commented out the gpg key recv line from the script and did: + + curl https://deb.torproject.org/torproject.org/A3C4F0F979CAA22CDBA8F512EE8CBC9E886DDD89.asc | gpg --import + +Got a bunch of python `bdist_wheel` issues. + + sudo apt install build-essential python3-dev python-dev + +This probably shouldn't be done on cluster machines with NFS home by regular +(staff) user accounts; python/pip and system/user/local is a big mess. + +Moving on: + + ./eotk make-scripts + +Going to ignore those for now. + + cat archive_org.tconf + #set project archive_org + #hardmap %NEW_ONION% archive.org us + ./eotk config archive_org.tconf + + cat archive_org.conf + #set project archive_org + #hardmap c6srwspz6764tcyn archive.org us + + ./eotk start archive_org + +Browse to in tor browser, accept a bunch +of self-signed SSL errors and... it just works ?!?!? + +For wayback, , or an example reply page +. + +Blog: + +Quick checks of things that work (some need cert accept digging): + +- archive.org/download/.../... +- general search and catalog view +- audio playback +- tv news archive +- archive.org login (email/pass) +- video playback +- book reader +- wayback replay (at least the basics) + +## Bugs / Issues + +Rewriting of "archive.org" in plain text. Eg, blog post title: "Official EU +Agencies Falsely Report More Than 550 c6srwspz6764tcyn.onion URLs as Terrorist +Content" and "A blog from the team at c6srwspz6764tcyn.onion". + +At least audio /details/ pages ask for canvas access; probably ok? + +Audio playback initially: "This video file cannot be played.(Error Code: +224003)"; but this was just a cert exception thing. Worked after downloading an +MP3 (to make exception). + +Streaming large files (video) is slow to start, but no surprise there. + -- cgit v1.2.3