diff options
author | Bryan Newbold <bnewbold@robocracy.org> | 2019-01-08 16:28:27 -0800 |
---|---|---|
committer | Bryan Newbold <bnewbold@robocracy.org> | 2019-01-08 16:28:27 -0800 |
commit | 16f2e78298dbd2231f5f337ea17c89a6a131a052 (patch) | |
tree | 6e72581e625e73c97cbab72d0f9c35665c99e5d7 /notes/auth.md | |
parent | eb40a5f274f3608db34309cfd16739a7642ef5e7 (diff) | |
parent | ffb721f90c5d97ee80885209bf45feb85ca9625c (diff) | |
download | fatcat-16f2e78298dbd2231f5f337ea17c89a6a131a052.tar.gz fatcat-16f2e78298dbd2231f5f337ea17c89a6a131a052.zip |
Merge branch 'bnewbold-crude-auth'
Fixed a conflict in:
python/fatcat_export.py
Diffstat (limited to 'notes/auth.md')
-rw-r--r-- | notes/auth.md | 251 |
1 files changed, 251 insertions, 0 deletions
diff --git a/notes/auth.md b/notes/auth.md new file mode 100644 index 00000000..ea249cf7 --- /dev/null +++ b/notes/auth.md @@ -0,0 +1,251 @@ + +This file summarizes the current fatcat authentication schema, which is based +on 3rd party OAuth2/OIDC sign-in and macaroon tokens. + +## Overview + +The informal high-level requirements for the auth system were: + +- public read-only (HTTP GET) API and website require no login or + authentication +- all changes to the catalog happen through the API and are associated with an + abstract editor (the entity behind an editor could be human, a bots, an + organization, change over time, etc). basic editor metadata (eg, identifier) + is public for all time. +- editors can signup (create account) and login using the web interface +- bots and scripts access the API directly; their actions are associated with + an editor (which could be a bot account) +- authentication can be managed via the web interface (eg, creating any tokens + or bot accounts) +- there is a mechanism to revoke API access and lock editor accounts (eg, to + block spam); this mechanism doesn't need to be a web interface, but shouldn't + be raw SQL commands +- store an absolute minimum of PII (personally identifiable intformation) that + can't be "mixed in" with public database dumps, or would make the database a + security target. eg, if possible don't store emails or passwords +- the web interface should, as much as possible, not be "special". Eg, should + work through the API and not have secret keys, if possible +- be as simple an efficient as possible (eg, minimize per-request database + hits) + +The initial design that came out of these requirements is to use bearer tokens +(in the form of macaroons) for all API authentication needs, and to have editor +account creation and authentication offloaded to third parties via OAuth2 +(specifically OpenID Connect (OIDC) when available). By storing only OIDC +identifiers in a single database table (linked but separate from the editor +table), PII collection is minimized, and no code needs to be written to handle +password recovery, email verification, etc. Tokens can be embedded in web +interface session cookies and "passed through" in API calls that require +authentication, so the web interface is effectively stateless (in that it does +not hold any session or user information internally). + +Macaroons, like JSON Web Tokens (JWT) contain signed (verifiable) constraints, +called caveats. Unlike JWT, these caveats can easily be "further constrained" +by any party. There is additional support for signed third party caveats, but +we don't use that feature currently. Caveats can be used to set an expiry time +for each token, which is appropriate for cookies (requiring a fresh login). We +also use creation timestamps and per-editor "authentication epoches" (publicly +stored in the editor table, non-sensitive) to revoke API tokens per-editor (or +globally, if necessary). Basically, only macaroons that were "minted" after the +current `auth_epoch` for the editor are considered valid. If a token is lost, +the `auth_epoch` is reset to the current time (after the compromised token was +minted, or any subsequent tokens possibly created by an attacker), all existing +tokens are considered invalid, and the editor must log back in (and generate +new API tokens for any bots/scripts). In the event of a serious security +compromise (like the secret signing key being compromised, or a bug in macaroon +generation is found), all `auth_epoch` timestamps are updated at once (and a +new key is used). + +The account login/signup flow for new editors is to visit the web interface and +select an OAuth provider (from a fixed list) where they have an account. After +they approve Fatcat as an application on the third party site, they bounce back +to the web interface. If they had signed up previously they are signed in, +otherwise a new editor account is automatically created. A username is +generated based on the OAuth remote account name, but the editor can change +this immediately. The web interface allows (or will, when implemented) creation +of bot accounts (linked to a "wrangler" editor account), generation of tokens, +etc. + +In theory, the API tokens, as macaroons, can be "attenuated" by the user with +additional caveats before being used. Eg, the expiry could be throttled down to +a minute or two, or constrained to edits of a specific editgroup, or to a +specific API endpoint. A use-case for this would be pasting a token in a +single-page app or untrusted script with minimal delgated authority. Not all of +these caveat checks have been implemented in the server yet though. + +As an "escape hatch", there is a rust command (`fatcat-auth`) for debugging, +creating new keys and tokens, revoking tokens (via `auth_epoch`), etc. There is +also a web interface mechanism to "login via existing token". These mechanisms +aren't intended for general use, but are helpful when developing (when login +via OAuth may not be configured or accessible) and for admins/operators. + +## Current Limitations + +No mechanism for linking (or unlinking) multiple remote OAuth accounts into a +single editor account. The database schema supports this, there just aren't API +endpoints or a web interface. + +There is no obvious place to store persistent non-public user information: +things like preferences, or current editgroup being operated on via the web +interface. This info can go in session cookies, but is lost when user logs +out/in or uses another device. + +## API Tokens (Macaroons) + +Macaroons contain "caveats" which constrain their scope. In the context of +fatcat, macaroons should always be constrained to a single editor account (by +`editor_id`) and a valid creation timestamp; this enables revocation. + +In general, want to keep caveats, identifier, and other macaroon contents as +short as possible, because they can bloat up the token size. + +Use identifiers (unique names for looking up signing keys) that contain the +date and (short) domain, like `20190110-qa`. + +Caveats: + +- general model is that macaroon is omnipotent and passes all verification, + unless caveats are added. eg, adding verification checks doesn't constrain + auth, only the caveats constrain auth; verification check *allow* additional + auth. each caveat only needs to be allowed by one verifiation. +- can (and should?) add as many caveat checkers/constrants in code as possible + +## Web Signup/Login + +OpenID Connect (OIDC) is basically a convention for servers and clients to use +OAuth2 for the specific purpose of just logging in or linking accounts, a la +"Sign In With ...". OAuth is often used to provider interoperability between +service (eg, a client app can take actions as the user, when granted +permissions, on the authenticating platform); OIDC doesn't grant any such +permissions, just refreshing logins at most. + +The web interface (webface) does all the OAuth/OIDC trickery, and extracts a +simple platform identifier and user identifier if authentication was +successful. It sends this in a fatcat API request to the `/auth/oidc` endpoint, +using admin authentication (the web interface stores an internal token "for +itself" for this one purpose). The API will return both an Editor object and a +token for that editor in the response. If the user had signed in previously +using the same provider/service/user pair as before, the Editor object is the +user's login. If the pair is new, a new account is created automatically and +returned; the HTTP status code indicates which happened. The editor username is +automatically generated from the remote username and platform (user can change +it if they want). + +The returned token and editor metadata are stored in session cookies. The flask +framework has a secure cookie implementation that prevents users from making up +cookies, but this isn't the real security mechanism; the real mechanism is that +they can't generate valid macaroons because they are signed. Cookie *theft* is +an issue, so aggressive cookie protections should be activated in the Flask +configuration. + +The `auth_oidc` enforces uniqueness on accounts in a few ways: + +- lowercase UNIQ constaint on usernames (can't register upper- and lower-case + variants) +- UNIQ {`editor_id`, `platform`}: can't login using multiple remote accounts + from the same platform +- UNIQ {`platform`, `remote_host`, `remote_id`}: can't login to multiple local + accounts using the same remote account +- all fields are NOT NULL + +### archive.org "XAuth" Login + +The internet archive has it's own bespoke internal API for authentication +between services. Internal (non-public) documentation link: + + https://git.archive.org/ia/petabox/blob/master/www/sf/services/xauthn/README.md + +Fatcat implements "passthrough" authentication to this endpoint by accepting +email/password (in plaintext! red lights and sirens!) and passes them through, +along with with special staff-level authentication keys, to authenticate and +fetch user info. Fatcat then pretends this was a regular OAuth/OIDC +interaction, substituting the archive.org user "itemname" as a persistent +identifier, and the XAuth endpoint as the service key. + +## Role-Based Authentication (RBAC) + +Current acknowledge roles: + +- public (not authenticated) +- bot +- human +- editor (bot or human) +- admin +- superuser + +Will probably rename these. Additionally, editor accounts have an `is_active` +flag (used to lock disabled/deleted/abusive/compromised accounts); no roles +beyond public are given for inactive accounts. + +## Developer Affordances + +A few core accounts are created automatically, with fixed `username`, +`auth_epoch` and `editor_id`, to make testing and administration easier across +database resets (aka, tokens keep working as long as the signing key stays the +same). + +Tokens and other secrets can be store in environment variables, scripts, or +`.env` files. + +## Future Work and Alternatives + +Want to support more OAuth/OIDC endpoints: + +- orcid.org: supports OIDC +- wikipedia/wikimedia: OAuth; https://github.com/valhallasw/flask-mwoauth + +Additional macaroon caveats: + +- `endpoint` (API method; caveat can include a list) +- `editgroup` +- (etc) + +Looked at a few other options for managing use accounts: + +- portier, the successor to persona, which basically uses email for magic-link + login, unless the email provider supports OIDC or similar. There is a central + hosted version to use for bootstrap. Appealing/minimal, but feels somewhat + neglected. +- use something like 'dex' as a proxy to multiple OIDC (and other) providers +- deploy a huge all-in-one platform like keycloak for all auth anything ever. + sort of wish Internet Archive, or somebody (Wikimedia?) ran one of these as + public infrastructure. +- having webface generate macaroons itself + +Will probably eventually need to support multiple logins per editor account. +Shouldn't be too hard, but will require additional API endpoints (POST with +`editor_id` included, DELETE to remove, etc). + +On mobile folks might not be signed in to as many accounts, or it might be +annoying to enter long/secure passwords (eg, to login to github). Could get +around this with "login via token via QR code" with long/unlimited expiry. +Might make more sense to support google OIDC as my guess is that many (most?) +people have a google account logged in on their phone. + +## Implementation Notes + +To start, using the `loginpass` python library to handle logins, which is built +on `authlib`. May need to extend or just use `authlib` directly in the future. +Supports many large commercial providers, including gitlab.com, github.com, and +google. + +There are many other flask/oauth/OIDC libraries out there, but this one worked +well with multiple popular providers, mostly by being flexible about actual +OIDC support. For example, Github doesn't support OIDC (only OAuth2), and +apparently Gitlab's is incomplete/broken. + +### Background Reading + +Other flask OIDC integrations: + +- https://flask-oidc.readthedocs.io/en/latest/ +- https://github.com/zamzterz/Flask-pyoidc + +Background reading on macaroons: + +- https://github.com/rescrv/libmacaroons +- http://evancordell.com/2015/09/27/macaroons-101-contextual-confinement.html +- https://blog.runscope.com/posts/understanding-oauth-2-and-openid-connect +- https://latacora.micro.blog/2018/06/12/a-childs-garden.html +- https://github.com/go-macaroon-bakery/macaroon-bakery (for the "bakery" API pattern) + |