aboutsummaryrefslogtreecommitdiffstats
path: root/notes.md
blob: 0b94fbc0079822e083cef49c57d8fe348c319021 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76

Bryan's notes on setting up an "official" Tor onion service (formerly known as
"hidden service") gateway for the Internet Archive.

Goal is to have one (or more) onion service gateways running on IA hardware
within the IA network to provide access to archive.org content and the wayback
machine. Rough order of feature importance:

- browse and download files from archive.org
- browse wayback machine
- read the blog
- request wayback captures via SPN
- create account and login to archive.org; upload files
- access the S3 upload endpoint, eg via `ia` tool
- browse/borrow/use openlibrary.org
- fatcat.wik (bryan's paper project; can use as a demo/example)

## Current Situation (May 2019)

An external un-official volunteer runs an onion service:

- <http://archivecrfip2lpi.onion/>
- <http://archivebyd3rzt3ehjpm4c3bjkyxv3hjleiytnvxcn7x32psn2kxcuid.onion>
- <https://www.hackerfactor.com/blog/index.php?/archives/750-Freedom-of-Information.html>
- <https://www.hackerfactor.com/blog/index.php?/archives/762-Attacked-Over-Tor.html>
- <https://www.hackerfactor.com/blog/index.php?/archives/763-The-Continuing-Tor-Attack.html>

This is done via a custom PHP script:

- <https://www.hackerfactor.com/src/iaproxy.php.txt>

archive.is runs an onion service.

## Onion Service Resources / Docs

Riseup guide: <https://riseup.net/en/security/network-security/tor/onionservices-best-practices>

EOTK ("Enterprise Onion ToolKit"): <https://github.com/alecmuffett/eotk>

Tor Project onion service overview: <https://2019.www.torproject.org/docs/onion-services.html.en>

Tor Project onion service v3 announce (November 2017): <https://blog.torproject.org/tors-fall-harvest-next-generation-onion-services>

## IA-specific notes

Running from the "office" network instead of "cluster" network might be best:
local routing, but doesn't become a way to bypass our IP-range cluster
firewalls.

Tasks:

- get a proper SSL EV certificate... wildcard? for the onion address
- monitoring/alerting

EOTK notes:

- upgrading to the new onion services ("v3", "Prop 224", longer keys, better
  crypto, etc) seems non-trivial: EOTK depends on onionbalance which depends on
  stem. [EOTK issue](https://github.com/alecmuffett/eotk/issues/23),
  [onionbalance issue](https://github.com/DonnchaC/onionbalance/issues/69), [onionbalance notes](https://onionbalance.readthedocs.io/en/latest/design.html#next-generation-onion-services-prop-224-compatibility)
- need to enumerate all "sub-domain stems" (like us.archive.org), but
  apparently not every host name
- multi-machine configs are nice, though realistically in 2019 if any
  datacenter is down then probably all of our services are
  ("anti-high-availability"), so not much help if the onion service is up.
  the "hardmap 2" setup, where the second device might not even be live/active,
  but ready in case there are hardware issues with the first, might be easiest

### archive.org

Lots of host names!

### web.archive.org

Multiple layers of re-write!