notes/auth.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251


This file summarizes the current fatcat authentication schema, which is based
on 3rd party OAuth2/OIDC sign-in and macaroon tokens.

## Overview

The informal high-level requirements for the auth system were:

- public read-only (HTTP GET) API and website require no login or
  authentication
- all changes to the catalog happen through the API and are associated with an
  abstract editor (the entity behind an editor could be human, a bots, an
  organization, change over time, etc). basic editor metadata (eg, identifier)
  is public for all time.
- editors can signup (create account) and login using the web interface
- bots and scripts access the API directly; their actions are associated with
  an editor (which could be a bot account)
- authentication can be managed via the web interface (eg, creating any tokens
  or bot accounts)
- there is a mechanism to revoke API access and lock editor accounts (eg, to
  block spam); this mechanism doesn't need to be a web interface, but shouldn't
  be raw SQL commands
- store an absolute minimum of PII (personally identifiable intformation) that
  can't be "mixed in" with public database dumps, or would make the database a
  security target. eg, if possible don't store emails or passwords
- the web interface should, as much as possible, not be "special". Eg, should
  work through the API and not have secret keys, if possible
- be as simple an efficient as possible (eg, minimize per-request database
  hits)

The initial design that came out of these requirements is to use bearer tokens
(in the form of macaroons) for all API authentication needs, and to have editor
account creation and authentication offloaded to third parties via OAuth2
(specifically OpenID Connect (OIDC) when available). By storing only OIDC
identifiers in a single database table (linked but separate from the editor
table), PII collection is minimized, and no code needs to be written to handle
password recovery, email verification, etc. Tokens can be embedded in web
interface session cookies and "passed through" in API calls that require
authentication, so the web interface is effectively stateless (in that it does
not hold any session or user information internally).

Macaroons, like JSON Web Tokens (JWT) contain signed (verifiable) constraints,
called caveats. Unlike JWT, these caveats can easily be "further constrained"
by any party. There is additional support for signed third party caveats, but
we don't use that feature currently. Caveats can be used to set an expiry time
for each token, which is appropriate for cookies (requiring a fresh login). We
also use creation timestamps and per-editor "authentication epoches" (publicly
stored in the editor table, non-sensitive) to revoke API tokens per-editor (or
globally, if necessary). Basically, only macaroons that were "minted" after the
current `auth_epoch` for the editor are considered valid. If a token is lost,
the `auth_epoch` is reset to the current time (after the compromised token was
minted, or any subsequent tokens possibly created by an attacker), all existing
tokens are considered invalid, and the editor must log back in (and generate
new API tokens for any bots/scripts). In the event of a serious security
compromise (like the secret signing key being compromised, or a bug in macaroon
generation is found), all `auth_epoch` timestamps are updated at once (and a
new key is used).

The account login/signup flow for new editors is to visit the web interface and
select an OAuth provider (from a fixed list) where they have an account. After
they approve Fatcat as an application on the third party site, they bounce back
to the web interface. If they had signed up previously they are signed in,
otherwise a new editor account is automatically created. A username is
generated based on the OAuth remote account name, but the editor can change
this immediately. The web interface allows (or will, when implemented) creation
of bot accounts (linked to a "wrangler" editor account), generation of tokens,
etc.

In theory, the API tokens, as macaroons, can be "attenuated" by the user with
additional caveats before being used. Eg, the expiry could be throttled down to
a minute or two, or constrained to edits of a specific editgroup, or to a
specific API endpoint. A use-case for this would be pasting a token in a
single-page app or untrusted script with minimal delgated authority. Not all of
these caveat checks have been implemented in the server yet though.

As an "escape hatch", there is a rust command (`fatcat-auth`) for debugging,
creating new keys and tokens, revoking tokens (via `auth_epoch`), etc. There is
also a web interface mechanism to "login via existing token". These mechanisms
aren't intended for general use, but are helpful when developing (when login
via OAuth may not be configured or accessible) and for admins/operators.

## Current Limitations

No mechanism for linking (or unlinking) multiple remote OAuth accounts into a
single editor account. The database schema supports this, there just aren't API
endpoints or a web interface.

There is no obvious place to store persistent non-public user information:
things like preferences, or current editgroup being operated on via the web
interface. This info can go in session cookies, but is lost when user logs
out/in or uses another device.

## API Tokens (Macaroons)

Macaroons contain "caveats" which constrain their scope. In the context of
fatcat, macaroons should always be constrained to a single editor account (by
`editor_id`) and a valid creation timestamp; this enables revocation.

In general, want to keep caveats, identifier, and other macaroon contents as
short as possible, because they can bloat up the token size.

Use identifiers (unique names for looking up signing keys) that contain the
date and (short) domain, like `20190110-qa`.

Caveats:

- general model is that macaroon is omnipotent and passes all verification,
  unless caveats are added. eg, adding verification checks doesn't constrain
  auth, only the caveats constrain auth; verification check *allow* additional
  auth. each caveat only needs to be allowed by one verifiation.
- can (and should?) add as many caveat checkers/constrants in code as possible

## Web Signup/Login

OpenID Connect (OIDC) is basically a convention for servers and clients to use
OAuth2 for the specific purpose of just logging in or linking accounts, a la
"Sign In With ...". OAuth is often used to provider interoperability between
service (eg, a client app can take actions as the user, when granted
permissions, on the authenticating platform); OIDC doesn't grant any such
permissions, just refreshing logins at most.

The web interface (webface) does all the OAuth/OIDC trickery, and extracts a
simple platform identifier and user identifier if authentication was
successful. It sends this in a fatcat API request to the `/auth/oidc` endpoint,
using admin authentication (the web interface stores an internal token "for
itself" for this one purpose). The API will return both an Editor object and a
token for that editor in the response. If the user had signed in previously
using the same provider/service/user pair as before, the Editor object is the
user's login. If the pair is new, a new account is created automatically and
returned; the HTTP status code indicates which happened. The editor username is
automatically generated from the remote username and platform (user can change
it if they want).

The returned token and editor metadata are stored in session cookies. The flask
framework has a secure cookie implementation that prevents users from making up
cookies, but this isn't the real security mechanism; the real mechanism is that
they can't generate valid macaroons because they are signed. Cookie *theft* is
an issue, so aggressive cookie protections should be activated in the Flask
configuration.

The `auth_oidc` enforces uniqueness on accounts in a few ways:

- lowercase UNIQ constaint on usernames (can't register upper- and lower-case
  variants)
- UNIQ {`editor_id`, `platform`}: can't login using multiple remote accounts
  from the same platform
- UNIQ {`platform`, `remote_host`, `remote_id`}: can't login to multiple local
  accounts using the same remote account
- all fields are NOT NULL

### archive.org "XAuth" Login

The internet archive has it's own bespoke internal API for authentication
between services. Internal (non-public) documentation link:

    https://git.archive.org/ia/petabox/blob/master/www/sf/services/xauthn/README.md

Fatcat implements "passthrough" authentication to this endpoint by accepting
email/password (in plaintext! red lights and sirens!) and passes them through,
along with with special staff-level authentication keys, to authenticate and
fetch user info. Fatcat then pretends this was a regular OAuth/OIDC
interaction, substituting the archive.org user "itemname" as a persistent
identifier, and the XAuth endpoint as the service key.

## Role-Based Authentication (RBAC)

Current acknowledge roles:

- public (not authenticated)
- bot
- human
- editor (bot or human)
- admin
- superuser

Will probably rename these. Additionally, editor accounts have an `is_active`
flag (used to lock disabled/deleted/abusive/compromised accounts); no roles
beyond public are given for inactive accounts.

## Developer Affordances

A few core accounts are created automatically, with fixed `username`,
`auth_epoch` and `editor_id`, to make testing and administration easier across
database resets (aka, tokens keep working as long as the signing key stays the
same).

Tokens and other secrets can be store in environment variables, scripts, or
`.env` files.

## Future Work and Alternatives

Want to support more OAuth/OIDC endpoints:

- orcid.org: supports OIDC
- wikipedia/wikimedia: OAuth; https://github.com/valhallasw/flask-mwoauth

Additional macaroon caveats:

- `endpoint` (API method; caveat can include a list)
- `editgroup`
- (etc)

Looked at a few other options for managing use accounts:

- portier, the successor to persona, which basically uses email for magic-link
  login, unless the email provider supports OIDC or similar. There is a central
  hosted version to use for bootstrap. Appealing/minimal, but feels somewhat
  neglected.
- use something like 'dex' as a proxy to multiple OIDC (and other) providers
- deploy a huge all-in-one platform like keycloak for all auth anything ever.
  sort of wish Internet Archive, or somebody (Wikimedia?) ran one of these as
  public infrastructure.
- having webface generate macaroons itself

Will probably eventually need to support multiple logins per editor account.
Shouldn't be too hard, but will require additional API endpoints (POST with
`editor_id` included, DELETE to remove, etc).

On mobile folks might not be signed in to as many accounts, or it might be
annoying to enter long/secure passwords (eg, to login to github). Could get
around this with "login via token via QR code" with long/unlimited expiry.
Might make more sense to support google OIDC as my guess is that many (most?)
people have a google account logged in on their phone.

## Implementation Notes

To start, using the `loginpass` python library to handle logins, which is built
on `authlib`. May need to extend or just use `authlib` directly in the future.
Supports many large commercial providers, including gitlab.com, github.com, and
google.

There are many other flask/oauth/OIDC libraries out there, but this one worked
well with multiple popular providers, mostly by being flexible about actual
OIDC support. For example, Github doesn't support OIDC (only OAuth2), and
apparently Gitlab's is incomplete/broken.

### Background Reading

Other flask OIDC integrations:

- https://flask-oidc.readthedocs.io/en/latest/
- https://github.com/zamzterz/Flask-pyoidc

Background reading on macaroons:

- https://github.com/rescrv/libmacaroons
- http://evancordell.com/2015/09/27/macaroons-101-contextual-confinement.html
- https://blog.runscope.com/posts/understanding-oauth-2-and-openid-connect
- https://latacora.micro.blog/2018/06/12/a-childs-garden.html
- https://github.com/go-macaroon-bakery/macaroon-bakery (for the "bakery" API pattern)