summaryrefslogtreecommitdiffstats
path: root/notes/auth.md
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@robocracy.org>2019-01-08 16:28:27 -0800
committerBryan Newbold <bnewbold@robocracy.org>2019-01-08 16:28:27 -0800
commit16f2e78298dbd2231f5f337ea17c89a6a131a052 (patch)
tree6e72581e625e73c97cbab72d0f9c35665c99e5d7 /notes/auth.md
parenteb40a5f274f3608db34309cfd16739a7642ef5e7 (diff)
parentffb721f90c5d97ee80885209bf45feb85ca9625c (diff)
downloadfatcat-16f2e78298dbd2231f5f337ea17c89a6a131a052.tar.gz
fatcat-16f2e78298dbd2231f5f337ea17c89a6a131a052.zip
Merge branch 'bnewbold-crude-auth'
Fixed a conflict in: python/fatcat_export.py
Diffstat (limited to 'notes/auth.md')
-rw-r--r--notes/auth.md251
1 files changed, 251 insertions, 0 deletions
diff --git a/notes/auth.md b/notes/auth.md
new file mode 100644
index 00000000..ea249cf7
--- /dev/null
+++ b/notes/auth.md
@@ -0,0 +1,251 @@
+
+This file summarizes the current fatcat authentication schema, which is based
+on 3rd party OAuth2/OIDC sign-in and macaroon tokens.
+
+## Overview
+
+The informal high-level requirements for the auth system were:
+
+- public read-only (HTTP GET) API and website require no login or
+ authentication
+- all changes to the catalog happen through the API and are associated with an
+ abstract editor (the entity behind an editor could be human, a bots, an
+ organization, change over time, etc). basic editor metadata (eg, identifier)
+ is public for all time.
+- editors can signup (create account) and login using the web interface
+- bots and scripts access the API directly; their actions are associated with
+ an editor (which could be a bot account)
+- authentication can be managed via the web interface (eg, creating any tokens
+ or bot accounts)
+- there is a mechanism to revoke API access and lock editor accounts (eg, to
+ block spam); this mechanism doesn't need to be a web interface, but shouldn't
+ be raw SQL commands
+- store an absolute minimum of PII (personally identifiable intformation) that
+ can't be "mixed in" with public database dumps, or would make the database a
+ security target. eg, if possible don't store emails or passwords
+- the web interface should, as much as possible, not be "special". Eg, should
+ work through the API and not have secret keys, if possible
+- be as simple an efficient as possible (eg, minimize per-request database
+ hits)
+
+The initial design that came out of these requirements is to use bearer tokens
+(in the form of macaroons) for all API authentication needs, and to have editor
+account creation and authentication offloaded to third parties via OAuth2
+(specifically OpenID Connect (OIDC) when available). By storing only OIDC
+identifiers in a single database table (linked but separate from the editor
+table), PII collection is minimized, and no code needs to be written to handle
+password recovery, email verification, etc. Tokens can be embedded in web
+interface session cookies and "passed through" in API calls that require
+authentication, so the web interface is effectively stateless (in that it does
+not hold any session or user information internally).
+
+Macaroons, like JSON Web Tokens (JWT) contain signed (verifiable) constraints,
+called caveats. Unlike JWT, these caveats can easily be "further constrained"
+by any party. There is additional support for signed third party caveats, but
+we don't use that feature currently. Caveats can be used to set an expiry time
+for each token, which is appropriate for cookies (requiring a fresh login). We
+also use creation timestamps and per-editor "authentication epoches" (publicly
+stored in the editor table, non-sensitive) to revoke API tokens per-editor (or
+globally, if necessary). Basically, only macaroons that were "minted" after the
+current `auth_epoch` for the editor are considered valid. If a token is lost,
+the `auth_epoch` is reset to the current time (after the compromised token was
+minted, or any subsequent tokens possibly created by an attacker), all existing
+tokens are considered invalid, and the editor must log back in (and generate
+new API tokens for any bots/scripts). In the event of a serious security
+compromise (like the secret signing key being compromised, or a bug in macaroon
+generation is found), all `auth_epoch` timestamps are updated at once (and a
+new key is used).
+
+The account login/signup flow for new editors is to visit the web interface and
+select an OAuth provider (from a fixed list) where they have an account. After
+they approve Fatcat as an application on the third party site, they bounce back
+to the web interface. If they had signed up previously they are signed in,
+otherwise a new editor account is automatically created. A username is
+generated based on the OAuth remote account name, but the editor can change
+this immediately. The web interface allows (or will, when implemented) creation
+of bot accounts (linked to a "wrangler" editor account), generation of tokens,
+etc.
+
+In theory, the API tokens, as macaroons, can be "attenuated" by the user with
+additional caveats before being used. Eg, the expiry could be throttled down to
+a minute or two, or constrained to edits of a specific editgroup, or to a
+specific API endpoint. A use-case for this would be pasting a token in a
+single-page app or untrusted script with minimal delgated authority. Not all of
+these caveat checks have been implemented in the server yet though.
+
+As an "escape hatch", there is a rust command (`fatcat-auth`) for debugging,
+creating new keys and tokens, revoking tokens (via `auth_epoch`), etc. There is
+also a web interface mechanism to "login via existing token". These mechanisms
+aren't intended for general use, but are helpful when developing (when login
+via OAuth may not be configured or accessible) and for admins/operators.
+
+## Current Limitations
+
+No mechanism for linking (or unlinking) multiple remote OAuth accounts into a
+single editor account. The database schema supports this, there just aren't API
+endpoints or a web interface.
+
+There is no obvious place to store persistent non-public user information:
+things like preferences, or current editgroup being operated on via the web
+interface. This info can go in session cookies, but is lost when user logs
+out/in or uses another device.
+
+## API Tokens (Macaroons)
+
+Macaroons contain "caveats" which constrain their scope. In the context of
+fatcat, macaroons should always be constrained to a single editor account (by
+`editor_id`) and a valid creation timestamp; this enables revocation.
+
+In general, want to keep caveats, identifier, and other macaroon contents as
+short as possible, because they can bloat up the token size.
+
+Use identifiers (unique names for looking up signing keys) that contain the
+date and (short) domain, like `20190110-qa`.
+
+Caveats:
+
+- general model is that macaroon is omnipotent and passes all verification,
+ unless caveats are added. eg, adding verification checks doesn't constrain
+ auth, only the caveats constrain auth; verification check *allow* additional
+ auth. each caveat only needs to be allowed by one verifiation.
+- can (and should?) add as many caveat checkers/constrants in code as possible
+
+## Web Signup/Login
+
+OpenID Connect (OIDC) is basically a convention for servers and clients to use
+OAuth2 for the specific purpose of just logging in or linking accounts, a la
+"Sign In With ...". OAuth is often used to provider interoperability between
+service (eg, a client app can take actions as the user, when granted
+permissions, on the authenticating platform); OIDC doesn't grant any such
+permissions, just refreshing logins at most.
+
+The web interface (webface) does all the OAuth/OIDC trickery, and extracts a
+simple platform identifier and user identifier if authentication was
+successful. It sends this in a fatcat API request to the `/auth/oidc` endpoint,
+using admin authentication (the web interface stores an internal token "for
+itself" for this one purpose). The API will return both an Editor object and a
+token for that editor in the response. If the user had signed in previously
+using the same provider/service/user pair as before, the Editor object is the
+user's login. If the pair is new, a new account is created automatically and
+returned; the HTTP status code indicates which happened. The editor username is
+automatically generated from the remote username and platform (user can change
+it if they want).
+
+The returned token and editor metadata are stored in session cookies. The flask
+framework has a secure cookie implementation that prevents users from making up
+cookies, but this isn't the real security mechanism; the real mechanism is that
+they can't generate valid macaroons because they are signed. Cookie *theft* is
+an issue, so aggressive cookie protections should be activated in the Flask
+configuration.
+
+The `auth_oidc` enforces uniqueness on accounts in a few ways:
+
+- lowercase UNIQ constaint on usernames (can't register upper- and lower-case
+ variants)
+- UNIQ {`editor_id`, `platform`}: can't login using multiple remote accounts
+ from the same platform
+- UNIQ {`platform`, `remote_host`, `remote_id`}: can't login to multiple local
+ accounts using the same remote account
+- all fields are NOT NULL
+
+### archive.org "XAuth" Login
+
+The internet archive has it's own bespoke internal API for authentication
+between services. Internal (non-public) documentation link:
+
+ https://git.archive.org/ia/petabox/blob/master/www/sf/services/xauthn/README.md
+
+Fatcat implements "passthrough" authentication to this endpoint by accepting
+email/password (in plaintext! red lights and sirens!) and passes them through,
+along with with special staff-level authentication keys, to authenticate and
+fetch user info. Fatcat then pretends this was a regular OAuth/OIDC
+interaction, substituting the archive.org user "itemname" as a persistent
+identifier, and the XAuth endpoint as the service key.
+
+## Role-Based Authentication (RBAC)
+
+Current acknowledge roles:
+
+- public (not authenticated)
+- bot
+- human
+- editor (bot or human)
+- admin
+- superuser
+
+Will probably rename these. Additionally, editor accounts have an `is_active`
+flag (used to lock disabled/deleted/abusive/compromised accounts); no roles
+beyond public are given for inactive accounts.
+
+## Developer Affordances
+
+A few core accounts are created automatically, with fixed `username`,
+`auth_epoch` and `editor_id`, to make testing and administration easier across
+database resets (aka, tokens keep working as long as the signing key stays the
+same).
+
+Tokens and other secrets can be store in environment variables, scripts, or
+`.env` files.
+
+## Future Work and Alternatives
+
+Want to support more OAuth/OIDC endpoints:
+
+- orcid.org: supports OIDC
+- wikipedia/wikimedia: OAuth; https://github.com/valhallasw/flask-mwoauth
+
+Additional macaroon caveats:
+
+- `endpoint` (API method; caveat can include a list)
+- `editgroup`
+- (etc)
+
+Looked at a few other options for managing use accounts:
+
+- portier, the successor to persona, which basically uses email for magic-link
+ login, unless the email provider supports OIDC or similar. There is a central
+ hosted version to use for bootstrap. Appealing/minimal, but feels somewhat
+ neglected.
+- use something like 'dex' as a proxy to multiple OIDC (and other) providers
+- deploy a huge all-in-one platform like keycloak for all auth anything ever.
+ sort of wish Internet Archive, or somebody (Wikimedia?) ran one of these as
+ public infrastructure.
+- having webface generate macaroons itself
+
+Will probably eventually need to support multiple logins per editor account.
+Shouldn't be too hard, but will require additional API endpoints (POST with
+`editor_id` included, DELETE to remove, etc).
+
+On mobile folks might not be signed in to as many accounts, or it might be
+annoying to enter long/secure passwords (eg, to login to github). Could get
+around this with "login via token via QR code" with long/unlimited expiry.
+Might make more sense to support google OIDC as my guess is that many (most?)
+people have a google account logged in on their phone.
+
+## Implementation Notes
+
+To start, using the `loginpass` python library to handle logins, which is built
+on `authlib`. May need to extend or just use `authlib` directly in the future.
+Supports many large commercial providers, including gitlab.com, github.com, and
+google.
+
+There are many other flask/oauth/OIDC libraries out there, but this one worked
+well with multiple popular providers, mostly by being flexible about actual
+OIDC support. For example, Github doesn't support OIDC (only OAuth2), and
+apparently Gitlab's is incomplete/broken.
+
+### Background Reading
+
+Other flask OIDC integrations:
+
+- https://flask-oidc.readthedocs.io/en/latest/
+- https://github.com/zamzterz/Flask-pyoidc
+
+Background reading on macaroons:
+
+- https://github.com/rescrv/libmacaroons
+- http://evancordell.com/2015/09/27/macaroons-101-contextual-confinement.html
+- https://blog.runscope.com/posts/understanding-oauth-2-and-openid-connect
+- https://latacora.micro.blog/2018/06/12/a-childs-garden.html
+- https://github.com/go-macaroon-bakery/macaroon-bakery (for the "bakery" API pattern)
+