diff options
author | Bryan Newbold <bnewbold@archive.org> | 2019-05-02 14:32:48 -0700 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2019-05-02 14:32:48 -0700 |
commit | ab48873debef25fefac9a22b7a80d2aa742d0d96 (patch) | |
tree | a61deaa5662656288170883cc9908e09b5953e80 | |
download | ia-onion-service-ab48873debef25fefac9a22b7a80d2aa742d0d96.tar.gz ia-onion-service-ab48873debef25fefac9a22b7a80d2aa742d0d96.zip |
init with notes
-rw-r--r-- | .gitignore | 21 | ||||
-rw-r--r-- | notes.md | 76 | ||||
-rw-r--r-- | prototyping.md | 84 |
3 files changed, 181 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..81a4762 --- /dev/null +++ b/.gitignore @@ -0,0 +1,21 @@ +*.o +*.a +*.pyc +#*# +*~ +*.swp +.* +*.tmp +*.old +*.profile +*.bkp +*.bak +[Tt]humbs.db +*.DS_Store +build/ +_build/ +src/build/ +*.log + +# Don't ignore this file itself +!.gitignore diff --git a/notes.md b/notes.md new file mode 100644 index 0000000..0b94fbc --- /dev/null +++ b/notes.md @@ -0,0 +1,76 @@ + +Bryan's notes on setting up an "official" Tor onion service (formerly known as +"hidden service") gateway for the Internet Archive. + +Goal is to have one (or more) onion service gateways running on IA hardware +within the IA network to provide access to archive.org content and the wayback +machine. Rough order of feature importance: + +- browse and download files from archive.org +- browse wayback machine +- read the blog +- request wayback captures via SPN +- create account and login to archive.org; upload files +- access the S3 upload endpoint, eg via `ia` tool +- browse/borrow/use openlibrary.org +- fatcat.wik (bryan's paper project; can use as a demo/example) + +## Current Situation (May 2019) + +An external un-official volunteer runs an onion service: + +- <http://archivecrfip2lpi.onion/> +- <http://archivebyd3rzt3ehjpm4c3bjkyxv3hjleiytnvxcn7x32psn2kxcuid.onion> +- <https://www.hackerfactor.com/blog/index.php?/archives/750-Freedom-of-Information.html> +- <https://www.hackerfactor.com/blog/index.php?/archives/762-Attacked-Over-Tor.html> +- <https://www.hackerfactor.com/blog/index.php?/archives/763-The-Continuing-Tor-Attack.html> + +This is done via a custom PHP script: + +- <https://www.hackerfactor.com/src/iaproxy.php.txt> + +archive.is runs an onion service. + +## Onion Service Resources / Docs + +Riseup guide: <https://riseup.net/en/security/network-security/tor/onionservices-best-practices> + +EOTK ("Enterprise Onion ToolKit"): <https://github.com/alecmuffett/eotk> + +Tor Project onion service overview: <https://2019.www.torproject.org/docs/onion-services.html.en> + +Tor Project onion service v3 announce (November 2017): <https://blog.torproject.org/tors-fall-harvest-next-generation-onion-services> + +## IA-specific notes + +Running from the "office" network instead of "cluster" network might be best: +local routing, but doesn't become a way to bypass our IP-range cluster +firewalls. + +Tasks: + +- get a proper SSL EV certificate... wildcard? for the onion address +- monitoring/alerting + +EOTK notes: + +- upgrading to the new onion services ("v3", "Prop 224", longer keys, better + crypto, etc) seems non-trivial: EOTK depends on onionbalance which depends on + stem. [EOTK issue](https://github.com/alecmuffett/eotk/issues/23), + [onionbalance issue](https://github.com/DonnchaC/onionbalance/issues/69), [onionbalance notes](https://onionbalance.readthedocs.io/en/latest/design.html#next-generation-onion-services-prop-224-compatibility) +- need to enumerate all "sub-domain stems" (like us.archive.org), but + apparently not every host name +- multi-machine configs are nice, though realistically in 2019 if any + datacenter is down then probably all of our services are + ("anti-high-availability"), so not much help if the onion service is up. + the "hardmap 2" setup, where the second device might not even be live/active, + but ready in case there are hardware issues with the first, might be easiest + +### archive.org + +Lots of host names! + +### web.archive.org + +Multiple layers of re-write! + diff --git a/prototyping.md b/prototyping.md new file mode 100644 index 0000000..e83bc57 --- /dev/null +++ b/prototyping.md @@ -0,0 +1,84 @@ + +## wbgrp-svc206.us.archive.org Log + +A cluster VM, running Ubuntu 16.04. + + sudo mkdir -p /srv/eotk + sudo chown bnewbold:bnewbold /srv/eotk + cd /srv/eotk + screen -S eotk + + git clone https://github.com/alecmuffett/eotk.git src + cd src + ./opt.d/install-everything-on-ubuntu-16.04.sh + +This initially failed on: + + gpgkeys: key A3C4F0F979CAA22CDBA8F512EE8CBC9E886DDD89 can't be retrieved + +"yav" Following random internet things and trying: + + # FAIL + gpg --keyserver pool.sks-keyservers.net --recv A3C4F0F979CAA22CDBA8F512EE8CBC9E886DDD89 + +I instead commented out the gpg key recv line from the script and did: + + curl https://deb.torproject.org/torproject.org/A3C4F0F979CAA22CDBA8F512EE8CBC9E886DDD89.asc | gpg --import + +Got a bunch of python `bdist_wheel` issues. + + sudo apt install build-essential python3-dev python-dev + +This probably shouldn't be done on cluster machines with NFS home by regular +(staff) user accounts; python/pip and system/user/local is a big mess. + +Moving on: + + ./eotk make-scripts + +Going to ignore those for now. + + cat archive_org.tconf + #set project archive_org + #hardmap %NEW_ONION% archive.org us + ./eotk config archive_org.tconf + + cat archive_org.conf + #set project archive_org + #hardmap c6srwspz6764tcyn archive.org us + + ./eotk start archive_org + +Browse to <https://www.c6srwspz6764tcyn.onion> in tor browser, accept a bunch +of self-signed SSL errors and... it just works ?!?!? + +For wayback, <https://web.c6srwspz6764tcyn.onion>, or an example reply page +<https://web.c6srwspz6764tcyn.onion/web/20151231235712/http://web.mit.edu/>. + +Blog: <https://blog.c6srwspz6764tcyn.onion> + +Quick checks of things that work (some need cert accept digging): + +- archive.org/download/.../... +- general search and catalog view +- audio playback +- tv news archive +- archive.org login (email/pass) +- video playback +- book reader +- wayback replay (at least the basics) + +## Bugs / Issues + +Rewriting of "archive.org" in plain text. Eg, blog post title: "Official EU +Agencies Falsely Report More Than 550 c6srwspz6764tcyn.onion URLs as Terrorist +Content" and "A blog from the team at c6srwspz6764tcyn.onion". + +At least audio /details/ pages ask for canvas access; probably ok? + +Audio playback initially: "This video file cannot be played.(Error Code: +224003)"; but this was just a cert exception thing. Worked after downloading an +MP3 (to make exception). + +Streaming large files (video) is slow to start, but no surprise there. + |