Bryan's notes on setting up an "official" Tor onion service (formerly known as "hidden service") gateway for the Internet Archive. Goal is to have one (or more) onion service gateways running on IA hardware within the IA network to provide access to archive.org content and the wayback machine. Rough order of feature importance: - browse and download files from archive.org - browse wayback machine - read the blog - request wayback captures via SPN - create account and login to archive.org; upload files - access the S3 upload endpoint, eg via `ia` tool - browse/borrow/use openlibrary.org - fatcat.wik (bryan's paper project; can use as a demo/example) ## Current Situation (May 2019) An external un-official volunteer runs an onion service: - - - - - This is done via a custom PHP script: - archive.is runs an onion service. ## Onion Service Resources / Docs Riseup guide: EOTK ("Enterprise Onion ToolKit"): Tor Project onion service overview: Tor Project onion service v3 announce (November 2017): ## IA-specific notes Running from the "office" network instead of "cluster" network might be best: local routing, but doesn't become a way to bypass our IP-range cluster firewalls. Tasks: - get a proper SSL EV certificate... wildcard? for the onion address - monitoring/alerting EOTK notes: - upgrading to the new onion services ("v3", "Prop 224", longer keys, better crypto, etc) seems non-trivial: EOTK depends on onionbalance which depends on stem. [EOTK issue](https://github.com/alecmuffett/eotk/issues/23), [onionbalance issue](https://github.com/DonnchaC/onionbalance/issues/69), [onionbalance notes](https://onionbalance.readthedocs.io/en/latest/design.html#next-generation-onion-services-prop-224-compatibility) - need to enumerate all "sub-domain stems" (like us.archive.org), but apparently not every host name - multi-machine configs are nice, though realistically in 2019 if any datacenter is down then probably all of our services are ("anti-high-availability"), so not much help if the onion service is up. the "hardmap 2" setup, where the second device might not even be live/active, but ready in case there are hardware issues with the first, might be easiest ### archive.org Lots of host names! ### web.archive.org Multiple layers of re-write!