diff options
authorBryan Newbold <bnewbold@archive.org>2019-05-02 14:32:48 -0700
committerBryan Newbold <bnewbold@archive.org>2019-05-02 14:32:48 -0700
commitab48873debef25fefac9a22b7a80d2aa742d0d96 (patch)
init with notes
3 files changed, 181 insertions, 0 deletions
diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..81a4762
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,21 @@
+# Don't ignore this file itself
diff --git a/notes.md b/notes.md
new file mode 100644
index 0000000..0b94fbc
--- /dev/null
+++ b/notes.md
@@ -0,0 +1,76 @@
+Bryan's notes on setting up an "official" Tor onion service (formerly known as
+"hidden service") gateway for the Internet Archive.
+Goal is to have one (or more) onion service gateways running on IA hardware
+within the IA network to provide access to archive.org content and the wayback
+machine. Rough order of feature importance:
+- browse and download files from archive.org
+- browse wayback machine
+- read the blog
+- request wayback captures via SPN
+- create account and login to archive.org; upload files
+- access the S3 upload endpoint, eg via `ia` tool
+- browse/borrow/use openlibrary.org
+- fatcat.wik (bryan's paper project; can use as a demo/example)
+## Current Situation (May 2019)
+An external un-official volunteer runs an onion service:
+- <http://archivecrfip2lpi.onion/>
+- <http://archivebyd3rzt3ehjpm4c3bjkyxv3hjleiytnvxcn7x32psn2kxcuid.onion>
+- <https://www.hackerfactor.com/blog/index.php?/archives/750-Freedom-of-Information.html>
+- <https://www.hackerfactor.com/blog/index.php?/archives/762-Attacked-Over-Tor.html>
+- <https://www.hackerfactor.com/blog/index.php?/archives/763-The-Continuing-Tor-Attack.html>
+This is done via a custom PHP script:
+- <https://www.hackerfactor.com/src/iaproxy.php.txt>
+archive.is runs an onion service.
+## Onion Service Resources / Docs
+Riseup guide: <https://riseup.net/en/security/network-security/tor/onionservices-best-practices>
+EOTK ("Enterprise Onion ToolKit"): <https://github.com/alecmuffett/eotk>
+Tor Project onion service overview: <https://2019.www.torproject.org/docs/onion-services.html.en>
+Tor Project onion service v3 announce (November 2017): <https://blog.torproject.org/tors-fall-harvest-next-generation-onion-services>
+## IA-specific notes
+Running from the "office" network instead of "cluster" network might be best:
+local routing, but doesn't become a way to bypass our IP-range cluster
+- get a proper SSL EV certificate... wildcard? for the onion address
+- monitoring/alerting
+EOTK notes:
+- upgrading to the new onion services ("v3", "Prop 224", longer keys, better
+ crypto, etc) seems non-trivial: EOTK depends on onionbalance which depends on
+ stem. [EOTK issue](https://github.com/alecmuffett/eotk/issues/23),
+ [onionbalance issue](https://github.com/DonnchaC/onionbalance/issues/69), [onionbalance notes](https://onionbalance.readthedocs.io/en/latest/design.html#next-generation-onion-services-prop-224-compatibility)
+- need to enumerate all "sub-domain stems" (like us.archive.org), but
+ apparently not every host name
+- multi-machine configs are nice, though realistically in 2019 if any
+ datacenter is down then probably all of our services are
+ ("anti-high-availability"), so not much help if the onion service is up.
+ the "hardmap 2" setup, where the second device might not even be live/active,
+ but ready in case there are hardware issues with the first, might be easiest
+### archive.org
+Lots of host names!
+### web.archive.org
+Multiple layers of re-write!
diff --git a/prototyping.md b/prototyping.md
new file mode 100644
index 0000000..e83bc57
--- /dev/null
+++ b/prototyping.md
@@ -0,0 +1,84 @@
+## wbgrp-svc206.us.archive.org Log
+A cluster VM, running Ubuntu 16.04.
+ sudo mkdir -p /srv/eotk
+ sudo chown bnewbold:bnewbold /srv/eotk
+ cd /srv/eotk
+ screen -S eotk
+ git clone https://github.com/alecmuffett/eotk.git src
+ cd src
+ ./opt.d/install-everything-on-ubuntu-16.04.sh
+This initially failed on:
+ gpgkeys: key A3C4F0F979CAA22CDBA8F512EE8CBC9E886DDD89 can't be retrieved
+"yav" Following random internet things and trying:
+ # FAIL
+ gpg --keyserver pool.sks-keyservers.net --recv A3C4F0F979CAA22CDBA8F512EE8CBC9E886DDD89
+I instead commented out the gpg key recv line from the script and did:
+ curl https://deb.torproject.org/torproject.org/A3C4F0F979CAA22CDBA8F512EE8CBC9E886DDD89.asc | gpg --import
+Got a bunch of python `bdist_wheel` issues.
+ sudo apt install build-essential python3-dev python-dev
+This probably shouldn't be done on cluster machines with NFS home by regular
+(staff) user accounts; python/pip and system/user/local is a big mess.
+Moving on:
+ ./eotk make-scripts
+Going to ignore those for now.
+ cat archive_org.tconf
+ #set project archive_org
+ #hardmap %NEW_ONION% archive.org us
+ ./eotk config archive_org.tconf
+ cat archive_org.conf
+ #set project archive_org
+ #hardmap c6srwspz6764tcyn archive.org us
+ ./eotk start archive_org
+Browse to <https://www.c6srwspz6764tcyn.onion> in tor browser, accept a bunch
+of self-signed SSL errors and... it just works ?!?!?
+For wayback, <https://web.c6srwspz6764tcyn.onion>, or an example reply page
+Blog: <https://blog.c6srwspz6764tcyn.onion>
+Quick checks of things that work (some need cert accept digging):
+- archive.org/download/.../...
+- general search and catalog view
+- audio playback
+- tv news archive
+- archive.org login (email/pass)
+- video playback
+- book reader
+- wayback replay (at least the basics)
+## Bugs / Issues
+Rewriting of "archive.org" in plain text. Eg, blog post title: "Official EU
+Agencies Falsely Report More Than 550 c6srwspz6764tcyn.onion URLs as Terrorist
+Content" and "A blog from the team at c6srwspz6764tcyn.onion".
+At least audio /details/ pages ask for canvas access; probably ok?
+Audio playback initially: "This video file cannot be played.(Error Code:
+224003)"; but this was just a cert exception thing. Worked after downloading an
+MP3 (to make exception).
+Streaming large files (video) is slow to start, but no surprise there.