summaryrefslogtreecommitdiffstats
path: root/notes
diff options
context:
space:
mode:
Diffstat (limited to 'notes')
-rw-r--r--notes/auth.md355
-rw-r--r--notes/oauth_statements.md14
2 files changed, 215 insertions, 154 deletions
diff --git a/notes/auth.md b/notes/auth.md
index d5e4dbd4..1918dc82 100644
--- a/notes/auth.md
+++ b/notes/auth.md
@@ -1,183 +1,230 @@
-For users: use openid connect (oauth2) to sign up and login to web app. From
-web app, can create (and disable?) API tokens
-
-For impl: fatcat-web has private key to create tokens. tokens used both in
-cookies and as API keys. tokens are macaroons (?). fatcatd only verifies
-tokens. optionally, some redis or other fast shared store to verify that tokens
-haven't been revoked.
-
-Could use portier with openid connect as an email-based option. Otherwise,
-orcid, github, google.
-
----------
-
-Use macaroons!
-
-editor/user table has a "auth_epoch" timestamp; only macaroons generated
-after this timestamp are valid. revocation is done by incrementing this
-timestamp ("touch").
-
-Rust CLI tool for managing users:
-- create editor
-
-Special users/editor that can create editor accounts via API; eg, one for
-fatcat-web.
-
-Associate one oauth2 id per domain per editor/user.
-
-Users come to fatcat-web and do oauth2 to login or create an account. All
-oauth2 internal to fatcat-web. If successful, fatcat-web does an
-(authenticated) lookup to API for that identifier. If found, requests a
-new macaroon to use as a cookie for auth. All future requests pass this
-cookie through as bearer auth. fatcat-web remains stateless! macaroon
-contains username (for display); no lookup-per page. Need to logout/login for
-this to update?
-
-Later, can do a "add additional account" feature.
-
-Backend:
-- oauth2 account table, foreign key to editor table
- => this is the only private table
-- auth_epoch timestamp column on editor table
-- lock editor by setting auth_epoch to deep future
-
-Deploy process:
-- auto-create root (admin), import-bootstrap (admin,bot), and demo-user
- editors, with fixed editor_id and "early" auth_epoch, as part of SQL. save
- tokens in env files, on laptop and QA instance.
-- on live QA instance, revoke all keys when live (?)
-
-TODO: privacy policy
-
-fatcat API doesn't *require* auth, but if auth is provided, it will check
-macaroon, and validate against editor table's timestamp.
-
-support oauth2 against:
-- orcid
-- git.archive.org
-- github
-? google
-
-Macaroon details:
-- worth looking at "bakery" projects (python and golang) for example of how to
- actually implement macaroon authentication/authorization
-- location is fatcat.wiki (and/or qa.fatcat.wiki, or test or localhost or test.fatcat.wiki?)
-- identifier is a UUID in upper-case string format
-- will need some on-disk key storage thing?
- => how to generate new keys? which one should be used, most recent?
- conception of revoking keys? simple JSON/TOML, or LMDB?
-- call them "authentication tokens"?
-- params/constraints
- - editor_id: always, fcid format
- - created: always, some date format (seconds/iso)
- - expires: optional, same date format
-
-It's a huge simplification to have webface generate macaroons as well, using a
-root key. webface doesn't need multiple keys because it only creates, doesn't
-verify.
-
-Code structure:
-- auth service/struct is generated at startup; reads environment and on-disk keys
-- verify helper does the thing
-- some sort of auth/edit context
-
-Roles?
-- public: unauthenticated
-- editor: any authenticated, active account
-- bot
-- admin
+This file summarizes the current fatcat authentication schema, which is based
+on 3rd party OAuth2/OIDC sign-in and macaroon tokens.
+
+## Overview
+
+The informal high-level requirements for the auth system were:
+
+- public read-only (HTTP GET) API and website require no login or
+ authentication
+- all changes to the catalog happen through the API and are associated with an
+ abstract editor (the entity behind an editor could be human, a bots, an
+ organization, change over time, etc). basic editor metadata (eg, identifier)
+ is public for all time.
+- editors can signup (create account) and login using the web interface
+- bots and scripts access the API directly; their actions are associated with
+ an editor (which could be a bot account)
+- authentication can be managed via the web interface (eg, creating any tokens
+ or bot accounts)
+- there is a mechanism to revoke API access and lock editor accounts (eg, to
+ block spam); this mechanism doesn't need to be a web interface, but shouldn't
+ be raw SQL commands
+- store an absolute minimum of PII (personally identifiable intformation) that
+ can't be "mixed in" with public database dumps, or would make the database a
+ security target. eg, if possible don't store emails or passwords
+- the web interface should, as much as possible, not be "special". Eg, should
+ work through the API and not have secret keys, if possible
+- be as simple an efficient as possible (eg, minimize per-request database
+ hits)
+
+The initial design that came out of these requirements is to use bearer tokens
+(in the form of macaroons) for all API authentication needs, and to have editor
+account creation and authentication offloaded to third parties via OAuth2
+(specifically OpenID Connect (OIDC) when available). By storing only OIDC
+identifiers in a single database table (linked but separate from the editor
+table), PII collection is minimized, and no code needs to be written to handle
+password recovery, email verification, etc. Tokens can be embedded in web
+interface session cookies and "passed through" in API calls that require
+authentication, so the web interface is effectively stateless (in that it does
+not hold any session or user information internally).
+
+Macaroons, like JSON Web Tokens (JWT) contain signed (verifiable) constraints,
+called caveats. Unlike JWT, these caveats can easily be "further constrained"
+by any party. There is additional support for signed third party caveats, but
+we don't use that feature currently. Caveats can be used to set an expiry time
+for each token, which is appropriate for cookies (requiring a fresh login). We
+also use creation timestamps and per-editor "authentication epoches" (publicly
+stored in the editor table, non-sensitive) to revoke API tokens per-editor (or
+globally, if necessary). Basically, only macaroons that were "minted" after the
+current `auth_epoch` for the editor are considered valid. If a token is lost,
+the `auth_epoch` is reset to the current time (after the compromised token was
+minted, or any subsequent tokens possibly created by an attacker), all existing
+tokens are considered invalid, and the editor must log back in (and generate
+new API tokens for any bots/scripts). In the event of a serious security
+compromise (like the secret signing key being compromised, or a bug in macaroon
+generation is found), all `auth_epoch` timestamps are updated at once (and a
+new key is used).
+
+The account login/signup flow for new editors is to visit the web interface and
+select an OAuth provider (from a fixed list) where they have an account. After
+they approve Fatcat as an application on the third party site, they bounce back
+to the web interface. If they had signed up previously they are signed in,
+otherwise a new editor account is automatically created. A username is
+generated based on the OAuth remote account name, but the editor can change
+this immediately. The web interface allows (or will, when implemented) creation
+of bot accounts (linked to a "wrangler" editor account), generation of tokens,
+etc.
+
+In theory, the API tokens, as macaroons, can be "attenuated" by the user with
+additional caveats before being used. Eg, the expiry could be throttled down to
+a minute or two, or constrained to edits of a specific editgroup, or to a
+specific API endpoint. A use-case for this would be pasting a token in a
+single-page app or untrusted script with minimal delgated authority. Not all of
+these caveat checks have been implemented in the server yet though.
+
+As an "escape hatch", there is a rust command (`fatcat-auth`) for debugging,
+creating new keys and tokens, revoking tokens (via `auth_epoch`), etc. There is
+also a web interface mechanism to "login via existing token". These mechanisms
+aren't intended for general use, but are helpful when developing (when login
+via OAuth may not be configured or accessible) and for admins/operators.
+
+## Current Limitations
+
+No mechanism for linking (or unlinking) multiple remote OAuth accounts into a
+single editor account. The database schema supports this, there just aren't API
+endpoints or a web interface.
+
+There is no obvious place to store persistent non-public user information:
+things like preferences, or current editgroup being operated on via the web
+interface. This info can go in session cookies, but is lost when user logs
+out/in or uses another device.
+
+## API Tokens (Macaroons)
+
+Macaroons contain "caveats" which constrain their scope. In the context of
+fatcat, macaroons should always be constrained to a single editor account (by
+`editor_id`) and a valid creation timestamp; this enables revocation.
+
+In general, want to keep caveats, identifier, and other macaroon contents as
+short as possible, because they can bloat up the token size.
+
+Use identifiers (unique names for looking up signing keys) that contain the
+date and (short) domain, like `20190110-qa`.
Caveats:
+
- general model is that macaroon is omnipotent and passes all verification,
unless caveats are added. eg, adding verification checks doesn't constrain
auth, only the caveats constrain auth; verification check *allow* additional
auth. each caveat only needs to be allowed by one verifiation.
- can (and should?) add as many caveat checkers/constrants in code as possible
-http://evancordell.com/2015/09/27/macaroons-101-contextual-confinement.html
-
--------
-
-## Schema/API Notes
+## Web Signup/Login
+
+OpenID Connect (OIDC) is basically a convention for servers and clients to use
+OAuth2 for the specific purpose of just logging in or linking accounts, a la
+"Sign In With ...". OAuth is often used to provider interoperability between
+service (eg, a client app can take actions as the user, when granted
+permissions, on the authenticating platform); OIDC doesn't grant any such
+permissions, just refreshing logins at most.
+
+The web interface (webface) does all the OAuth/OIDC trickery, and extracts a
+simple platform identifier and user identifier if authentication was
+successful. It sends this in a fatcat API request to the `/auth/oidc` endpoint,
+using admin authentication (the web interface stores an internal token "for
+itself" for this one purpose). The API will return both an Editor object and a
+token for that editor in the response. If the user had signed in previously
+using the same provider/service/user pair as before, the Editor object is the
+user's login. If the pair is new, a new account is created automatically and
+returned; the HTTP status code indicates which happened. The editor username is
+automatically generated from the remote username and platform (user can change
+it if they want).
+
+The returned token and editor metadata are stored in session cookies. The flask
+framework has a secure cookie implementation that prevents users from making up
+cookies, but this isn't the real security mechanism; the real mechanism is that
+they can't generate valid macaroons because they are signed. Cookie *theft* is
+an issue, so aggressive cookie protections should be activated in the Flask
+configuration.
+
+The `auth_oidc` enforces uniqueness on accounts in a few ways:
+
+- lowercase UNIQ constaint on usernames (can't register upper- and lower-case
+ variants)
+- UNIQ {`editor_id`, `platform`}: can't login using multiple remote accounts
+ from the same platform
+- UNIQ {`platform`, `remote_host`, `remote_id`}: can't login to multiple local
+ accounts using the same remote account
+- all fields are NOT NULL
+
+## Role-Based Authentication (RBAC)
+
+Current acknowledge roles:
+
+- public (not authenticated)
+- bot
+- human
+- editor (bot or human)
+- admin
+- superuser
-GET /auth/oidc
-=> params: provider, sub, iss
-=> returns {editor, token} or not found
-=> admin auth required
+Will probably rename these. Additionally, editor accounts have an `is_active`
+flag (used to lock disabled/deleted/abusive/compromised accounts); no roles
+beyond public are given for inactive accounts.
-POST /auth/oidc
-=> params: editor_id, provider, sub, iss
-=> returns {editor, token}
-=> admin auth required
+## Developer Affordances
-POST /editor
-=> admin auth required
+A few core accounts are created automatically, with fixed `username`,
+`auth_epoch` and `editor_id`, to make testing and administration easier across
+database resets (aka, tokens keep working as long as the signing key stays the
+same).
-flow is to have single login/signup OIDC flow. If need to create an account,
-bounce to special page for that and store ISS/SUB in (signed/secure) session
-temporarily.
+Tokens and other secrets can be store in environment variables, scripts, or
+`.env` files.
-This doesn't feel great. Could instead randomly generate a username, and
-provide mechanism to update. That's better!
+## Future Work and Alternatives
-PUT /editor/{editor_id}
-=> only allow username updates, and only by admin or logged-in user
+Want to support more OAuth/OIDC endpoints:
-schema:
-`auth_oidc`
- => id (BIGINT), editor_id, provider, oidc_iss, oidc_sub
- => created (auto-timestamp)
- => UNIQ index on (editor_id, provider)
- => UNIQ index on (provider, remote_sub, remote_iss)
- => all are NOT NULL
+- archive.org: bespoke "XAuth" thing; would be reasonable to hack in support.
+ use user itemname as persistent 'sub' field
+- orcid.org: supports OIDC
+- wikipedia/wikimedia: OAuth; https://github.com/valhallasw/flask-mwoauth
+- additional
-## Webface Notes
+Additional macaroon caveats:
-Want to use "OpenID Connect" (OIDC), which is basically a subset/convention of
-OAuth 2.0 for authenticaiton ("log in as"), without granting API priviliges.
+- `endpoint` (API method; caveat can include a list)
+- `editgroup`
+- (etc)
-Want to support multiple identity providers, eg:
-- orcid.org
- => Basic OpenID Provider; implicit token
-- git.archive.org
-- gitlab.org
- => https://docs.gitlab.com/ee/integration/openid_connect_provider.html
-- google.com
+Looked at a few other options for managing use accounts:
-Currently, looks like github.com doesn't support OIDC; they are the only
-provider i'm interested in that does not.
+- portier, the successor to persona, which basically uses email for magic-link
+ login, unless the email provider supports OIDC or similar. There is a central
+ hosted version to use for bootstrap. Appealing/minimal, but feels somewhat
+ neglected.
+- use something like 'dex' as a proxy to multiple OIDC (and other) providers
+- deploy a huge all-in-one platform like keycloak for all auth anything ever.
+ sort of wish Internet Archive, or somebody (Wikimedia?) ran one of these as
+ public infrastructure.
+- having webface generate macaroons itself
-authlib/loginpass are tempting to use as they support a bunch of providers
-out-of-the-box... but not orcid.
+## Implementation Notes
-Alternatively, could use any number of "proxies"/thingies to aggregate auth:
-- https://www.keycloak.org/about.html
-- https://portier.github.io/
-- https://github.com/dexidp/dex
+To start, using the `loginpass` python library to handle logins, which is built
+on `authlib`. May need to extend or just use `authlib` directly in the future.
+Supports many large commercial providers, including gitlab.com, github.com, and
+google.
-Possible flask integrations:
-=> https://flask-oidc.readthedocs.io/en/latest/
-=> https://github.com/zamzterz/Flask-pyoidc
+There are many other flask/oauth/OIDC libraries out there, but this one worked
+well with multiple popular providers, mostly by being flexible about actual
+OIDC support. For example, Github doesn't support OIDC (only OAuth2), and
+apparently Gitlab's is incomplete/broken.
-Background:
-=> https://blog.runscope.com/posts/understanding-oauth-2-and-openid-connect
-=> https://latacora.micro.blog/2018/06/12/a-childs-garden.html
+### Background Reading
-Future work:
-=> multiple logins, and/or merging accounts
+Other flask OIDC integrations:
+- https://flask-oidc.readthedocs.io/en/latest/
+- https://github.com/zamzterz/Flask-pyoidc
-"Fatcat is an open, editable database of bibliographic metadata. You can
-sign-up and login using orcid.org; this option is used for identity and
-authentication only. Fatcat does not currently make changes to any data on
-orcid.org, which you can verify from the permissions requested."
+Background reading on macaroons:
- https://fatcat.wiki/auth/oidc_redirect
- https://qa.fatcat.wiki/auth/oidc_redirect
+- https://github.com/rescrv/libmacaroons
+- http://evancordell.com/2015/09/27/macaroons-101-contextual-confinement.html
+- https://blog.runscope.com/posts/understanding-oauth-2-and-openid-connect
+- https://latacora.micro.blog/2018/06/12/a-childs-garden.html
+- https://github.com/go-macaroon-bakery/macaroon-bakery (for the "bakery" API pattern)
-PLAN:
-- have a mode/mechanism for login-by-token; mostly for testing
-- for now, use loginpass OAuth/OIDC for login/signup. upstream ORCID support or
- hack that in somehow when desired
-- auto-create a username based on oauth, then allow changes
diff --git a/notes/oauth_statements.md b/notes/oauth_statements.md
new file mode 100644
index 00000000..5f46c9ed
--- /dev/null
+++ b/notes/oauth_statements.md
@@ -0,0 +1,14 @@
+
+Copy text used when signing up for OAuth applications on various platforms.
+
+## Wikimedia
+
+Fatcat (https://fatcat.wiki) is a publicly-editable bibliographic catalog, containing metadata about tens of millions of research articles, conference proceedings, and books. The particular emphasis is on linking different "releases" of the same "work" (eg, preprint and final copies of a journal paper), and matching specific (archived) files to releases.
+Fatcat is a project of the Internet Archive (https://archive.org).
+
+## ORCID
+
+Fatcat is an open, editable database of bibliographic metadata. You can sign-up
+and login using orcid.org; this option is used for identity and authentication
+only. Fatcat does not currently make changes to any data on orcid.org, which
+you can verify from the permissions requested.