diff options
Diffstat (limited to 'notes/auth.md')
-rw-r--r-- | notes/auth.md | 355 |
1 files changed, 201 insertions, 154 deletions
diff --git a/notes/auth.md b/notes/auth.md index d5e4dbd4..1918dc82 100644 --- a/notes/auth.md +++ b/notes/auth.md @@ -1,183 +1,230 @@ -For users: use openid connect (oauth2) to sign up and login to web app. From -web app, can create (and disable?) API tokens - -For impl: fatcat-web has private key to create tokens. tokens used both in -cookies and as API keys. tokens are macaroons (?). fatcatd only verifies -tokens. optionally, some redis or other fast shared store to verify that tokens -haven't been revoked. - -Could use portier with openid connect as an email-based option. Otherwise, -orcid, github, google. - ---------- - -Use macaroons! - -editor/user table has a "auth_epoch" timestamp; only macaroons generated -after this timestamp are valid. revocation is done by incrementing this -timestamp ("touch"). - -Rust CLI tool for managing users: -- create editor - -Special users/editor that can create editor accounts via API; eg, one for -fatcat-web. - -Associate one oauth2 id per domain per editor/user. - -Users come to fatcat-web and do oauth2 to login or create an account. All -oauth2 internal to fatcat-web. If successful, fatcat-web does an -(authenticated) lookup to API for that identifier. If found, requests a -new macaroon to use as a cookie for auth. All future requests pass this -cookie through as bearer auth. fatcat-web remains stateless! macaroon -contains username (for display); no lookup-per page. Need to logout/login for -this to update? - -Later, can do a "add additional account" feature. - -Backend: -- oauth2 account table, foreign key to editor table - => this is the only private table -- auth_epoch timestamp column on editor table -- lock editor by setting auth_epoch to deep future - -Deploy process: -- auto-create root (admin), import-bootstrap (admin,bot), and demo-user - editors, with fixed editor_id and "early" auth_epoch, as part of SQL. save - tokens in env files, on laptop and QA instance. -- on live QA instance, revoke all keys when live (?) - -TODO: privacy policy - -fatcat API doesn't *require* auth, but if auth is provided, it will check -macaroon, and validate against editor table's timestamp. - -support oauth2 against: -- orcid -- git.archive.org -- github -? google - -Macaroon details: -- worth looking at "bakery" projects (python and golang) for example of how to - actually implement macaroon authentication/authorization -- location is fatcat.wiki (and/or qa.fatcat.wiki, or test or localhost or test.fatcat.wiki?) -- identifier is a UUID in upper-case string format -- will need some on-disk key storage thing? - => how to generate new keys? which one should be used, most recent? - conception of revoking keys? simple JSON/TOML, or LMDB? -- call them "authentication tokens"? -- params/constraints - - editor_id: always, fcid format - - created: always, some date format (seconds/iso) - - expires: optional, same date format - -It's a huge simplification to have webface generate macaroons as well, using a -root key. webface doesn't need multiple keys because it only creates, doesn't -verify. - -Code structure: -- auth service/struct is generated at startup; reads environment and on-disk keys -- verify helper does the thing -- some sort of auth/edit context - -Roles? -- public: unauthenticated -- editor: any authenticated, active account -- bot -- admin +This file summarizes the current fatcat authentication schema, which is based +on 3rd party OAuth2/OIDC sign-in and macaroon tokens. + +## Overview + +The informal high-level requirements for the auth system were: + +- public read-only (HTTP GET) API and website require no login or + authentication +- all changes to the catalog happen through the API and are associated with an + abstract editor (the entity behind an editor could be human, a bots, an + organization, change over time, etc). basic editor metadata (eg, identifier) + is public for all time. +- editors can signup (create account) and login using the web interface +- bots and scripts access the API directly; their actions are associated with + an editor (which could be a bot account) +- authentication can be managed via the web interface (eg, creating any tokens + or bot accounts) +- there is a mechanism to revoke API access and lock editor accounts (eg, to + block spam); this mechanism doesn't need to be a web interface, but shouldn't + be raw SQL commands +- store an absolute minimum of PII (personally identifiable intformation) that + can't be "mixed in" with public database dumps, or would make the database a + security target. eg, if possible don't store emails or passwords +- the web interface should, as much as possible, not be "special". Eg, should + work through the API and not have secret keys, if possible +- be as simple an efficient as possible (eg, minimize per-request database + hits) + +The initial design that came out of these requirements is to use bearer tokens +(in the form of macaroons) for all API authentication needs, and to have editor +account creation and authentication offloaded to third parties via OAuth2 +(specifically OpenID Connect (OIDC) when available). By storing only OIDC +identifiers in a single database table (linked but separate from the editor +table), PII collection is minimized, and no code needs to be written to handle +password recovery, email verification, etc. Tokens can be embedded in web +interface session cookies and "passed through" in API calls that require +authentication, so the web interface is effectively stateless (in that it does +not hold any session or user information internally). + +Macaroons, like JSON Web Tokens (JWT) contain signed (verifiable) constraints, +called caveats. Unlike JWT, these caveats can easily be "further constrained" +by any party. There is additional support for signed third party caveats, but +we don't use that feature currently. Caveats can be used to set an expiry time +for each token, which is appropriate for cookies (requiring a fresh login). We +also use creation timestamps and per-editor "authentication epoches" (publicly +stored in the editor table, non-sensitive) to revoke API tokens per-editor (or +globally, if necessary). Basically, only macaroons that were "minted" after the +current `auth_epoch` for the editor are considered valid. If a token is lost, +the `auth_epoch` is reset to the current time (after the compromised token was +minted, or any subsequent tokens possibly created by an attacker), all existing +tokens are considered invalid, and the editor must log back in (and generate +new API tokens for any bots/scripts). In the event of a serious security +compromise (like the secret signing key being compromised, or a bug in macaroon +generation is found), all `auth_epoch` timestamps are updated at once (and a +new key is used). + +The account login/signup flow for new editors is to visit the web interface and +select an OAuth provider (from a fixed list) where they have an account. After +they approve Fatcat as an application on the third party site, they bounce back +to the web interface. If they had signed up previously they are signed in, +otherwise a new editor account is automatically created. A username is +generated based on the OAuth remote account name, but the editor can change +this immediately. The web interface allows (or will, when implemented) creation +of bot accounts (linked to a "wrangler" editor account), generation of tokens, +etc. + +In theory, the API tokens, as macaroons, can be "attenuated" by the user with +additional caveats before being used. Eg, the expiry could be throttled down to +a minute or two, or constrained to edits of a specific editgroup, or to a +specific API endpoint. A use-case for this would be pasting a token in a +single-page app or untrusted script with minimal delgated authority. Not all of +these caveat checks have been implemented in the server yet though. + +As an "escape hatch", there is a rust command (`fatcat-auth`) for debugging, +creating new keys and tokens, revoking tokens (via `auth_epoch`), etc. There is +also a web interface mechanism to "login via existing token". These mechanisms +aren't intended for general use, but are helpful when developing (when login +via OAuth may not be configured or accessible) and for admins/operators. + +## Current Limitations + +No mechanism for linking (or unlinking) multiple remote OAuth accounts into a +single editor account. The database schema supports this, there just aren't API +endpoints or a web interface. + +There is no obvious place to store persistent non-public user information: +things like preferences, or current editgroup being operated on via the web +interface. This info can go in session cookies, but is lost when user logs +out/in or uses another device. + +## API Tokens (Macaroons) + +Macaroons contain "caveats" which constrain their scope. In the context of +fatcat, macaroons should always be constrained to a single editor account (by +`editor_id`) and a valid creation timestamp; this enables revocation. + +In general, want to keep caveats, identifier, and other macaroon contents as +short as possible, because they can bloat up the token size. + +Use identifiers (unique names for looking up signing keys) that contain the +date and (short) domain, like `20190110-qa`. Caveats: + - general model is that macaroon is omnipotent and passes all verification, unless caveats are added. eg, adding verification checks doesn't constrain auth, only the caveats constrain auth; verification check *allow* additional auth. each caveat only needs to be allowed by one verifiation. - can (and should?) add as many caveat checkers/constrants in code as possible -http://evancordell.com/2015/09/27/macaroons-101-contextual-confinement.html - -------- - -## Schema/API Notes +## Web Signup/Login + +OpenID Connect (OIDC) is basically a convention for servers and clients to use +OAuth2 for the specific purpose of just logging in or linking accounts, a la +"Sign In With ...". OAuth is often used to provider interoperability between +service (eg, a client app can take actions as the user, when granted +permissions, on the authenticating platform); OIDC doesn't grant any such +permissions, just refreshing logins at most. + +The web interface (webface) does all the OAuth/OIDC trickery, and extracts a +simple platform identifier and user identifier if authentication was +successful. It sends this in a fatcat API request to the `/auth/oidc` endpoint, +using admin authentication (the web interface stores an internal token "for +itself" for this one purpose). The API will return both an Editor object and a +token for that editor in the response. If the user had signed in previously +using the same provider/service/user pair as before, the Editor object is the +user's login. If the pair is new, a new account is created automatically and +returned; the HTTP status code indicates which happened. The editor username is +automatically generated from the remote username and platform (user can change +it if they want). + +The returned token and editor metadata are stored in session cookies. The flask +framework has a secure cookie implementation that prevents users from making up +cookies, but this isn't the real security mechanism; the real mechanism is that +they can't generate valid macaroons because they are signed. Cookie *theft* is +an issue, so aggressive cookie protections should be activated in the Flask +configuration. + +The `auth_oidc` enforces uniqueness on accounts in a few ways: + +- lowercase UNIQ constaint on usernames (can't register upper- and lower-case + variants) +- UNIQ {`editor_id`, `platform`}: can't login using multiple remote accounts + from the same platform +- UNIQ {`platform`, `remote_host`, `remote_id`}: can't login to multiple local + accounts using the same remote account +- all fields are NOT NULL + +## Role-Based Authentication (RBAC) + +Current acknowledge roles: + +- public (not authenticated) +- bot +- human +- editor (bot or human) +- admin +- superuser -GET /auth/oidc -=> params: provider, sub, iss -=> returns {editor, token} or not found -=> admin auth required +Will probably rename these. Additionally, editor accounts have an `is_active` +flag (used to lock disabled/deleted/abusive/compromised accounts); no roles +beyond public are given for inactive accounts. -POST /auth/oidc -=> params: editor_id, provider, sub, iss -=> returns {editor, token} -=> admin auth required +## Developer Affordances -POST /editor -=> admin auth required +A few core accounts are created automatically, with fixed `username`, +`auth_epoch` and `editor_id`, to make testing and administration easier across +database resets (aka, tokens keep working as long as the signing key stays the +same). -flow is to have single login/signup OIDC flow. If need to create an account, -bounce to special page for that and store ISS/SUB in (signed/secure) session -temporarily. +Tokens and other secrets can be store in environment variables, scripts, or +`.env` files. -This doesn't feel great. Could instead randomly generate a username, and -provide mechanism to update. That's better! +## Future Work and Alternatives -PUT /editor/{editor_id} -=> only allow username updates, and only by admin or logged-in user +Want to support more OAuth/OIDC endpoints: -schema: -`auth_oidc` - => id (BIGINT), editor_id, provider, oidc_iss, oidc_sub - => created (auto-timestamp) - => UNIQ index on (editor_id, provider) - => UNIQ index on (provider, remote_sub, remote_iss) - => all are NOT NULL +- archive.org: bespoke "XAuth" thing; would be reasonable to hack in support. + use user itemname as persistent 'sub' field +- orcid.org: supports OIDC +- wikipedia/wikimedia: OAuth; https://github.com/valhallasw/flask-mwoauth +- additional -## Webface Notes +Additional macaroon caveats: -Want to use "OpenID Connect" (OIDC), which is basically a subset/convention of -OAuth 2.0 for authenticaiton ("log in as"), without granting API priviliges. +- `endpoint` (API method; caveat can include a list) +- `editgroup` +- (etc) -Want to support multiple identity providers, eg: -- orcid.org - => Basic OpenID Provider; implicit token -- git.archive.org -- gitlab.org - => https://docs.gitlab.com/ee/integration/openid_connect_provider.html -- google.com +Looked at a few other options for managing use accounts: -Currently, looks like github.com doesn't support OIDC; they are the only -provider i'm interested in that does not. +- portier, the successor to persona, which basically uses email for magic-link + login, unless the email provider supports OIDC or similar. There is a central + hosted version to use for bootstrap. Appealing/minimal, but feels somewhat + neglected. +- use something like 'dex' as a proxy to multiple OIDC (and other) providers +- deploy a huge all-in-one platform like keycloak for all auth anything ever. + sort of wish Internet Archive, or somebody (Wikimedia?) ran one of these as + public infrastructure. +- having webface generate macaroons itself -authlib/loginpass are tempting to use as they support a bunch of providers -out-of-the-box... but not orcid. +## Implementation Notes -Alternatively, could use any number of "proxies"/thingies to aggregate auth: -- https://www.keycloak.org/about.html -- https://portier.github.io/ -- https://github.com/dexidp/dex +To start, using the `loginpass` python library to handle logins, which is built +on `authlib`. May need to extend or just use `authlib` directly in the future. +Supports many large commercial providers, including gitlab.com, github.com, and +google. -Possible flask integrations: -=> https://flask-oidc.readthedocs.io/en/latest/ -=> https://github.com/zamzterz/Flask-pyoidc +There are many other flask/oauth/OIDC libraries out there, but this one worked +well with multiple popular providers, mostly by being flexible about actual +OIDC support. For example, Github doesn't support OIDC (only OAuth2), and +apparently Gitlab's is incomplete/broken. -Background: -=> https://blog.runscope.com/posts/understanding-oauth-2-and-openid-connect -=> https://latacora.micro.blog/2018/06/12/a-childs-garden.html +### Background Reading -Future work: -=> multiple logins, and/or merging accounts +Other flask OIDC integrations: +- https://flask-oidc.readthedocs.io/en/latest/ +- https://github.com/zamzterz/Flask-pyoidc -"Fatcat is an open, editable database of bibliographic metadata. You can -sign-up and login using orcid.org; this option is used for identity and -authentication only. Fatcat does not currently make changes to any data on -orcid.org, which you can verify from the permissions requested." +Background reading on macaroons: - https://fatcat.wiki/auth/oidc_redirect - https://qa.fatcat.wiki/auth/oidc_redirect +- https://github.com/rescrv/libmacaroons +- http://evancordell.com/2015/09/27/macaroons-101-contextual-confinement.html +- https://blog.runscope.com/posts/understanding-oauth-2-and-openid-connect +- https://latacora.micro.blog/2018/06/12/a-childs-garden.html +- https://github.com/go-macaroon-bakery/macaroon-bakery (for the "bakery" API pattern) -PLAN: -- have a mode/mechanism for login-by-token; mostly for testing -- for now, use loginpass OAuth/OIDC for login/signup. upstream ORCID support or - hack that in somehow when desired -- auto-create a username based on oauth, then allow changes |