From 8a6ab2ed76d725e6e8d47e51572f009407ed5ca2 Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Sat, 29 Dec 2018 00:09:36 -0800 Subject: notes and TODO (WIP) --- notes/auth_thoughts.txt | 37 +++++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+) (limited to 'notes') diff --git a/notes/auth_thoughts.txt b/notes/auth_thoughts.txt index ba19f4c2..c82b204e 100644 --- a/notes/auth_thoughts.txt +++ b/notes/auth_thoughts.txt @@ -59,3 +59,40 @@ support oauth2 against: - github ? google +Macaroon details: +- worth looking at "bakery" projects (python and golang) for example of how to + actually implement macaroon authentication/authorization +- location is fatcat.wiki (and/or qa.fatcat.wiki, or test or localhost or test.fatcat.wiki?) +- identifier is a UUID in upper-case string format +- will need some on-disk key storage thing? + => how to generate new keys? which one should be used, most recent? + conception of revoking keys? simple JSON/TOML, or LMDB? +- call them "authentication tokens"? +- params/constraints + - editor_id: always, fcid format + - created: always, some date format (seconds/iso) + - expires: optional, same date format + +It's a huge simplification to have webface generate macaroons as well, using a +root key. webface doesn't need multiple keys because it only creates, doesn't +verify. + +Code structure: +- auth service/struct is generated at startup; reads environment and on-disk keys +- verify helper does the thing +- some sort of auth/edit context + +Roles? +- public: unauthenticated +- editor: any authenticated, active account +- bot +- admin + +Caveats: +- general model is that macaroon is omnipotent and passes all verification, + unless caveats are added. eg, adding verification checks doesn't constrain + auth, only the caveats constrain auth; verification check *allow* additional + auth. each caveat only needs to be allowed by one verifiation. +- can (and should?) add as many caveat checkers/constrants in code as possible + +http://evancordell.com/2015/09/27/macaroons-101-contextual-confinement.html -- cgit v1.2.3 From b7da0669b1568bd17192bfbe86f8a248279a870a Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Mon, 31 Dec 2018 17:15:30 -0800 Subject: bit of auth docs --- notes/auth.md | 98 +++++++++++++++++++++++++++++++++++++++++++++++++ notes/auth_thoughts.txt | 98 ------------------------------------------------- rust/HACKING.md | 4 ++ rust/README.md | 3 ++ 4 files changed, 105 insertions(+), 98 deletions(-) create mode 100644 notes/auth.md delete mode 100644 notes/auth_thoughts.txt (limited to 'notes') diff --git a/notes/auth.md b/notes/auth.md new file mode 100644 index 00000000..c82b204e --- /dev/null +++ b/notes/auth.md @@ -0,0 +1,98 @@ + +For users: use openid connect (oauth2) to sign up and login to web app. From +web app, can create (and disable?) API tokens + +For impl: fatcat-web has private key to create tokens. tokens used both in +cookies and as API keys. tokens are macaroons (?). fatcatd only verifies +tokens. optionally, some redis or other fast shared store to verify that tokens +haven't been revoked. + +Could use portier with openid connect as an email-based option. Otherwise, +orcid, github, google. + +--------- + +Use macaroons! + +editor/user table has a "auth_epoch" timestamp; only macaroons generated +after this timestamp are valid. revocation is done by incrementing this +timestamp ("touch"). + +Rust CLI tool for managing users: +- create editor + +Special users/editor that can create editor accounts via API; eg, one for +fatcat-web. + +Associate one oauth2 id per domain per editor/user. + +Users come to fatcat-web and do oauth2 to login or create an account. All +oauth2 internal to fatcat-web. If successful, fatcat-web does an +(authenticated) lookup to API for that identifier. If found, requests a +new macaroon to use as a cookie for auth. All future requests pass this +cookie through as bearer auth. fatcat-web remains stateless! macaroon +contains username (for display); no lookup-per page. Need to logout/login for +this to update? + +Later, can do a "add additional account" feature. + +Backend: +- oauth2 account table, foreign key to editor table + => this is the only private table +- auth_epoch timestamp column on editor table +- lock editor by setting auth_epoch to deep future + +Deploy process: +- auto-create root (admin), import-bootstrap (admin,bot), and demo-user + editors, with fixed editor_id and "early" auth_epoch, as part of SQL. save + tokens in env files, on laptop and QA instance. +- on live QA instance, revoke all keys when live (?) + +TODO: privacy policy + +fatcat API doesn't *require* auth, but if auth is provided, it will check +macaroon, and validate against editor table's timestamp. + +support oauth2 against: +- orcid +- git.archive.org +- github +? google + +Macaroon details: +- worth looking at "bakery" projects (python and golang) for example of how to + actually implement macaroon authentication/authorization +- location is fatcat.wiki (and/or qa.fatcat.wiki, or test or localhost or test.fatcat.wiki?) +- identifier is a UUID in upper-case string format +- will need some on-disk key storage thing? + => how to generate new keys? which one should be used, most recent? + conception of revoking keys? simple JSON/TOML, or LMDB? +- call them "authentication tokens"? +- params/constraints + - editor_id: always, fcid format + - created: always, some date format (seconds/iso) + - expires: optional, same date format + +It's a huge simplification to have webface generate macaroons as well, using a +root key. webface doesn't need multiple keys because it only creates, doesn't +verify. + +Code structure: +- auth service/struct is generated at startup; reads environment and on-disk keys +- verify helper does the thing +- some sort of auth/edit context + +Roles? +- public: unauthenticated +- editor: any authenticated, active account +- bot +- admin + +Caveats: +- general model is that macaroon is omnipotent and passes all verification, + unless caveats are added. eg, adding verification checks doesn't constrain + auth, only the caveats constrain auth; verification check *allow* additional + auth. each caveat only needs to be allowed by one verifiation. +- can (and should?) add as many caveat checkers/constrants in code as possible + +http://evancordell.com/2015/09/27/macaroons-101-contextual-confinement.html diff --git a/notes/auth_thoughts.txt b/notes/auth_thoughts.txt deleted file mode 100644 index c82b204e..00000000 --- a/notes/auth_thoughts.txt +++ /dev/null @@ -1,98 +0,0 @@ - -For users: use openid connect (oauth2) to sign up and login to web app. From -web app, can create (and disable?) API tokens - -For impl: fatcat-web has private key to create tokens. tokens used both in -cookies and as API keys. tokens are macaroons (?). fatcatd only verifies -tokens. optionally, some redis or other fast shared store to verify that tokens -haven't been revoked. - -Could use portier with openid connect as an email-based option. Otherwise, -orcid, github, google. - ---------- - -Use macaroons! - -editor/user table has a "auth_epoch" timestamp; only macaroons generated -after this timestamp are valid. revocation is done by incrementing this -timestamp ("touch"). - -Rust CLI tool for managing users: -- create editor - -Special users/editor that can create editor accounts via API; eg, one for -fatcat-web. - -Associate one oauth2 id per domain per editor/user. - -Users come to fatcat-web and do oauth2 to login or create an account. All -oauth2 internal to fatcat-web. If successful, fatcat-web does an -(authenticated) lookup to API for that identifier. If found, requests a -new macaroon to use as a cookie for auth. All future requests pass this -cookie through as bearer auth. fatcat-web remains stateless! macaroon -contains username (for display); no lookup-per page. Need to logout/login for -this to update? - -Later, can do a "add additional account" feature. - -Backend: -- oauth2 account table, foreign key to editor table - => this is the only private table -- auth_epoch timestamp column on editor table -- lock editor by setting auth_epoch to deep future - -Deploy process: -- auto-create root (admin), import-bootstrap (admin,bot), and demo-user - editors, with fixed editor_id and "early" auth_epoch, as part of SQL. save - tokens in env files, on laptop and QA instance. -- on live QA instance, revoke all keys when live (?) - -TODO: privacy policy - -fatcat API doesn't *require* auth, but if auth is provided, it will check -macaroon, and validate against editor table's timestamp. - -support oauth2 against: -- orcid -- git.archive.org -- github -? google - -Macaroon details: -- worth looking at "bakery" projects (python and golang) for example of how to - actually implement macaroon authentication/authorization -- location is fatcat.wiki (and/or qa.fatcat.wiki, or test or localhost or test.fatcat.wiki?) -- identifier is a UUID in upper-case string format -- will need some on-disk key storage thing? - => how to generate new keys? which one should be used, most recent? - conception of revoking keys? simple JSON/TOML, or LMDB? -- call them "authentication tokens"? -- params/constraints - - editor_id: always, fcid format - - created: always, some date format (seconds/iso) - - expires: optional, same date format - -It's a huge simplification to have webface generate macaroons as well, using a -root key. webface doesn't need multiple keys because it only creates, doesn't -verify. - -Code structure: -- auth service/struct is generated at startup; reads environment and on-disk keys -- verify helper does the thing -- some sort of auth/edit context - -Roles? -- public: unauthenticated -- editor: any authenticated, active account -- bot -- admin - -Caveats: -- general model is that macaroon is omnipotent and passes all verification, - unless caveats are added. eg, adding verification checks doesn't constrain - auth, only the caveats constrain auth; verification check *allow* additional - auth. each caveat only needs to be allowed by one verifiation. -- can (and should?) add as many caveat checkers/constrants in code as possible - -http://evancordell.com/2015/09/27/macaroons-101-contextual-confinement.html diff --git a/rust/HACKING.md b/rust/HACKING.md index 57642b2d..9d161b87 100644 --- a/rust/HACKING.md +++ b/rust/HACKING.md @@ -57,3 +57,7 @@ Debug SQL schema errors (if diesel commands fail): Creating entities via API: http --json post localhost:9411/v0/container name=asdf issn=1234-5678 + +## Authentication + +Uses macaroons. See `notes/auth.md` and maybe look in the guide. diff --git a/rust/README.md b/rust/README.md index ddde9b80..ecbfba2d 100644 --- a/rust/README.md +++ b/rust/README.md @@ -17,6 +17,9 @@ Create a `.env` file with configuration: DATABASE_URL=postgres://fatcat:tactaf@localhost/fatcat_rs TEST_DATABASE_URL=postgres://fatcat:tactaf@localhost/fatcat_rs_test + AUTH_LOCATION=dev.fatcat.wiki + AUTH_KEY_IDENT=2018-12-31-dev + AUTH_SECRET_KEY=VQe8kdn8laZ3MArKAzOeWWNUQgM6IjduG2jwKnSWehQ= Re-create database from scratch: -- cgit v1.2.3 From c1cdda4cf4c714f92b5fd7676e44ab5a92e637a9 Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Fri, 4 Jan 2019 13:31:37 -0800 Subject: update main README --- README.md | 46 ++++++++++++++-------------------------------- notes/fatcat_idents.md | 25 +++++++++++++++++++++++++ 2 files changed, 39 insertions(+), 32 deletions(-) create mode 100644 notes/fatcat_idents.md (limited to 'notes') diff --git a/README.md b/README.md index 1433e62b..4c75dffc 100644 --- a/README.md +++ b/README.md @@ -13,25 +13,30 @@ published written works (mostly journal articles), with a focus on tracking the location and status of full-text copies to ensure "perpetual access". The [RFC](./fatcat-rfc.md) is the original design document, and the best place -to start for background. There is a work-in-progress "guide" at +to start for technical background. There is a work-in-progress "guide" at ; the canonical public location of this repository is . +The public production web interface is . + +See the `LICENSE` file for detailed permissions and licensing of both python +and rust code. In short, the auto-generated client libraries are permissively +released, while the API server and web interface are strong copyleft (AGPLv3). + +## Building and Tests + There are three main components: - backend API server and database (in Rust) - API client libraries and bots (in Python) - front-end web interface (in Python; built on API and library) -See the LICENSE file for details permissions and licensing of both python and -rust code. In short, the auto-generated client libraries are permissively -released, while the API server and web interface are strong copyleft (AGPLv3). - -## Building and Tests - Automated integration tests run on Gitlab CI (see `.gitlab-ci.yml`) on the Internet Archive's internal (not public) infrastructure. +See `./python/README.md` and `./rust/README.md` for details on building, +running, and testing these components. + ## Status - SQL and HTTP API schemas @@ -44,8 +49,8 @@ Internet Archive's internal (not public) infrastructure. - HTTP API Server - [x] base32 encoding of UUID identifiers - [x] inverse many-to-many helpers (files-by-release, release-by-creator) - - [ ] Authentication (eg, accounts, OAuth2, JWT) - - [ ] Authorization (aka, roles) + - [x] Authentication (eg, accounts, OAuth2, JWT) + - [x] Authorization (aka, roles) - Web Interface - [x] Migrate Python codebase - [ ] Creation and editing of all entities @@ -57,26 +62,3 @@ Internet Archive's internal (not public) infrastructure. - [ ] Sentry (error reporting) - [ ] Metrics -## Identifiers - -Fatcat entity identifiers are 128-bit UUIDs encoded in base32 format. Revision -ids are also UUIDs, and encoded in normal UUID fashion, to disambiguate from -edity identifiers. - -Python helpers for conversion: - - import base64 - import uuid - - def fcid2uuid(s): - s = s.split('_')[-1].upper().encode('utf-8') - assert len(s) == 26 - raw = base64.b32decode(s + b"======") - return str(uuid.UUID(bytes=raw)).lower() - - def uuid2fcid(s): - raw = uuid.UUID(s).bytes - return base64.b32encode(raw)[:26].lower().decode('utf-8') - - test_uuid = '00000000-0000-0000-3333-000000000001' - assert test_uuid == fcid2uuid(uuid2fcid(test_uuid)) diff --git a/notes/fatcat_idents.md b/notes/fatcat_idents.md new file mode 100644 index 00000000..84322604 --- /dev/null +++ b/notes/fatcat_idents.md @@ -0,0 +1,25 @@ + +## Identifiers + +Fatcat entity identifiers are 128-bit UUIDs encoded in base32 format. Revision +ids are also UUIDs, and encoded in normal UUID fashion, to disambiguate from +edity identifiers. + +Python helpers for conversion: + + import base64 + import uuid + + def fcid2uuid(s): + s = s.split('_')[-1].upper().encode('utf-8') + assert len(s) == 26 + raw = base64.b32decode(s + b"======") + return str(uuid.UUID(bytes=raw)).lower() + + def uuid2fcid(s): + raw = uuid.UUID(s).bytes + return base64.b32encode(raw)[:26].lower().decode('utf-8') + + test_uuid = '00000000-0000-0000-3333-000000000001' + assert test_uuid == fcid2uuid(uuid2fcid(test_uuid)) + -- cgit v1.2.3 From 315bd097ffa5270fd4082141665b063b72aa56e7 Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Fri, 4 Jan 2019 13:33:32 -0800 Subject: backup auth notes --- notes/auth.md | 85 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 85 insertions(+) (limited to 'notes') diff --git a/notes/auth.md b/notes/auth.md index c82b204e..d5e4dbd4 100644 --- a/notes/auth.md +++ b/notes/auth.md @@ -96,3 +96,88 @@ Caveats: - can (and should?) add as many caveat checkers/constrants in code as possible http://evancordell.com/2015/09/27/macaroons-101-contextual-confinement.html + +------- + +## Schema/API Notes + +GET /auth/oidc +=> params: provider, sub, iss +=> returns {editor, token} or not found +=> admin auth required + +POST /auth/oidc +=> params: editor_id, provider, sub, iss +=> returns {editor, token} +=> admin auth required + +POST /editor +=> admin auth required + +flow is to have single login/signup OIDC flow. If need to create an account, +bounce to special page for that and store ISS/SUB in (signed/secure) session +temporarily. + +This doesn't feel great. Could instead randomly generate a username, and +provide mechanism to update. That's better! + +PUT /editor/{editor_id} +=> only allow username updates, and only by admin or logged-in user + +schema: +`auth_oidc` + => id (BIGINT), editor_id, provider, oidc_iss, oidc_sub + => created (auto-timestamp) + => UNIQ index on (editor_id, provider) + => UNIQ index on (provider, remote_sub, remote_iss) + => all are NOT NULL + +## Webface Notes + +Want to use "OpenID Connect" (OIDC), which is basically a subset/convention of +OAuth 2.0 for authenticaiton ("log in as"), without granting API priviliges. + +Want to support multiple identity providers, eg: +- orcid.org + => Basic OpenID Provider; implicit token +- git.archive.org +- gitlab.org + => https://docs.gitlab.com/ee/integration/openid_connect_provider.html +- google.com + +Currently, looks like github.com doesn't support OIDC; they are the only +provider i'm interested in that does not. + +authlib/loginpass are tempting to use as they support a bunch of providers +out-of-the-box... but not orcid. + +Alternatively, could use any number of "proxies"/thingies to aggregate auth: +- https://www.keycloak.org/about.html +- https://portier.github.io/ +- https://github.com/dexidp/dex + +Possible flask integrations: +=> https://flask-oidc.readthedocs.io/en/latest/ +=> https://github.com/zamzterz/Flask-pyoidc + +Background: +=> https://blog.runscope.com/posts/understanding-oauth-2-and-openid-connect +=> https://latacora.micro.blog/2018/06/12/a-childs-garden.html + +Future work: +=> multiple logins, and/or merging accounts + + +"Fatcat is an open, editable database of bibliographic metadata. You can +sign-up and login using orcid.org; this option is used for identity and +authentication only. Fatcat does not currently make changes to any data on +orcid.org, which you can verify from the permissions requested." + + https://fatcat.wiki/auth/oidc_redirect + https://qa.fatcat.wiki/auth/oidc_redirect + +PLAN: +- have a mode/mechanism for login-by-token; mostly for testing +- for now, use loginpass OAuth/OIDC for login/signup. upstream ORCID support or + hack that in somehow when desired +- auto-create a username based on oauth, then allow changes -- cgit v1.2.3 From d7b0a156d2a3a21e2bf5afc3e4b97e7cf1044248 Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Fri, 4 Jan 2019 17:38:09 -0800 Subject: gigantic auth docs --- notes/auth.md | 355 ++++++++++++++++++++++++++-------------------- notes/oauth_statements.md | 14 ++ 2 files changed, 215 insertions(+), 154 deletions(-) create mode 100644 notes/oauth_statements.md (limited to 'notes') diff --git a/notes/auth.md b/notes/auth.md index d5e4dbd4..1918dc82 100644 --- a/notes/auth.md +++ b/notes/auth.md @@ -1,183 +1,230 @@ -For users: use openid connect (oauth2) to sign up and login to web app. From -web app, can create (and disable?) API tokens - -For impl: fatcat-web has private key to create tokens. tokens used both in -cookies and as API keys. tokens are macaroons (?). fatcatd only verifies -tokens. optionally, some redis or other fast shared store to verify that tokens -haven't been revoked. - -Could use portier with openid connect as an email-based option. Otherwise, -orcid, github, google. - ---------- - -Use macaroons! - -editor/user table has a "auth_epoch" timestamp; only macaroons generated -after this timestamp are valid. revocation is done by incrementing this -timestamp ("touch"). - -Rust CLI tool for managing users: -- create editor - -Special users/editor that can create editor accounts via API; eg, one for -fatcat-web. - -Associate one oauth2 id per domain per editor/user. - -Users come to fatcat-web and do oauth2 to login or create an account. All -oauth2 internal to fatcat-web. If successful, fatcat-web does an -(authenticated) lookup to API for that identifier. If found, requests a -new macaroon to use as a cookie for auth. All future requests pass this -cookie through as bearer auth. fatcat-web remains stateless! macaroon -contains username (for display); no lookup-per page. Need to logout/login for -this to update? - -Later, can do a "add additional account" feature. - -Backend: -- oauth2 account table, foreign key to editor table - => this is the only private table -- auth_epoch timestamp column on editor table -- lock editor by setting auth_epoch to deep future - -Deploy process: -- auto-create root (admin), import-bootstrap (admin,bot), and demo-user - editors, with fixed editor_id and "early" auth_epoch, as part of SQL. save - tokens in env files, on laptop and QA instance. -- on live QA instance, revoke all keys when live (?) - -TODO: privacy policy - -fatcat API doesn't *require* auth, but if auth is provided, it will check -macaroon, and validate against editor table's timestamp. - -support oauth2 against: -- orcid -- git.archive.org -- github -? google - -Macaroon details: -- worth looking at "bakery" projects (python and golang) for example of how to - actually implement macaroon authentication/authorization -- location is fatcat.wiki (and/or qa.fatcat.wiki, or test or localhost or test.fatcat.wiki?) -- identifier is a UUID in upper-case string format -- will need some on-disk key storage thing? - => how to generate new keys? which one should be used, most recent? - conception of revoking keys? simple JSON/TOML, or LMDB? -- call them "authentication tokens"? -- params/constraints - - editor_id: always, fcid format - - created: always, some date format (seconds/iso) - - expires: optional, same date format - -It's a huge simplification to have webface generate macaroons as well, using a -root key. webface doesn't need multiple keys because it only creates, doesn't -verify. - -Code structure: -- auth service/struct is generated at startup; reads environment and on-disk keys -- verify helper does the thing -- some sort of auth/edit context - -Roles? -- public: unauthenticated -- editor: any authenticated, active account -- bot -- admin +This file summarizes the current fatcat authentication schema, which is based +on 3rd party OAuth2/OIDC sign-in and macaroon tokens. + +## Overview + +The informal high-level requirements for the auth system were: + +- public read-only (HTTP GET) API and website require no login or + authentication +- all changes to the catalog happen through the API and are associated with an + abstract editor (the entity behind an editor could be human, a bots, an + organization, change over time, etc). basic editor metadata (eg, identifier) + is public for all time. +- editors can signup (create account) and login using the web interface +- bots and scripts access the API directly; their actions are associated with + an editor (which could be a bot account) +- authentication can be managed via the web interface (eg, creating any tokens + or bot accounts) +- there is a mechanism to revoke API access and lock editor accounts (eg, to + block spam); this mechanism doesn't need to be a web interface, but shouldn't + be raw SQL commands +- store an absolute minimum of PII (personally identifiable intformation) that + can't be "mixed in" with public database dumps, or would make the database a + security target. eg, if possible don't store emails or passwords +- the web interface should, as much as possible, not be "special". Eg, should + work through the API and not have secret keys, if possible +- be as simple an efficient as possible (eg, minimize per-request database + hits) + +The initial design that came out of these requirements is to use bearer tokens +(in the form of macaroons) for all API authentication needs, and to have editor +account creation and authentication offloaded to third parties via OAuth2 +(specifically OpenID Connect (OIDC) when available). By storing only OIDC +identifiers in a single database table (linked but separate from the editor +table), PII collection is minimized, and no code needs to be written to handle +password recovery, email verification, etc. Tokens can be embedded in web +interface session cookies and "passed through" in API calls that require +authentication, so the web interface is effectively stateless (in that it does +not hold any session or user information internally). + +Macaroons, like JSON Web Tokens (JWT) contain signed (verifiable) constraints, +called caveats. Unlike JWT, these caveats can easily be "further constrained" +by any party. There is additional support for signed third party caveats, but +we don't use that feature currently. Caveats can be used to set an expiry time +for each token, which is appropriate for cookies (requiring a fresh login). We +also use creation timestamps and per-editor "authentication epoches" (publicly +stored in the editor table, non-sensitive) to revoke API tokens per-editor (or +globally, if necessary). Basically, only macaroons that were "minted" after the +current `auth_epoch` for the editor are considered valid. If a token is lost, +the `auth_epoch` is reset to the current time (after the compromised token was +minted, or any subsequent tokens possibly created by an attacker), all existing +tokens are considered invalid, and the editor must log back in (and generate +new API tokens for any bots/scripts). In the event of a serious security +compromise (like the secret signing key being compromised, or a bug in macaroon +generation is found), all `auth_epoch` timestamps are updated at once (and a +new key is used). + +The account login/signup flow for new editors is to visit the web interface and +select an OAuth provider (from a fixed list) where they have an account. After +they approve Fatcat as an application on the third party site, they bounce back +to the web interface. If they had signed up previously they are signed in, +otherwise a new editor account is automatically created. A username is +generated based on the OAuth remote account name, but the editor can change +this immediately. The web interface allows (or will, when implemented) creation +of bot accounts (linked to a "wrangler" editor account), generation of tokens, +etc. + +In theory, the API tokens, as macaroons, can be "attenuated" by the user with +additional caveats before being used. Eg, the expiry could be throttled down to +a minute or two, or constrained to edits of a specific editgroup, or to a +specific API endpoint. A use-case for this would be pasting a token in a +single-page app or untrusted script with minimal delgated authority. Not all of +these caveat checks have been implemented in the server yet though. + +As an "escape hatch", there is a rust command (`fatcat-auth`) for debugging, +creating new keys and tokens, revoking tokens (via `auth_epoch`), etc. There is +also a web interface mechanism to "login via existing token". These mechanisms +aren't intended for general use, but are helpful when developing (when login +via OAuth may not be configured or accessible) and for admins/operators. + +## Current Limitations + +No mechanism for linking (or unlinking) multiple remote OAuth accounts into a +single editor account. The database schema supports this, there just aren't API +endpoints or a web interface. + +There is no obvious place to store persistent non-public user information: +things like preferences, or current editgroup being operated on via the web +interface. This info can go in session cookies, but is lost when user logs +out/in or uses another device. + +## API Tokens (Macaroons) + +Macaroons contain "caveats" which constrain their scope. In the context of +fatcat, macaroons should always be constrained to a single editor account (by +`editor_id`) and a valid creation timestamp; this enables revocation. + +In general, want to keep caveats, identifier, and other macaroon contents as +short as possible, because they can bloat up the token size. + +Use identifiers (unique names for looking up signing keys) that contain the +date and (short) domain, like `20190110-qa`. Caveats: + - general model is that macaroon is omnipotent and passes all verification, unless caveats are added. eg, adding verification checks doesn't constrain auth, only the caveats constrain auth; verification check *allow* additional auth. each caveat only needs to be allowed by one verifiation. - can (and should?) add as many caveat checkers/constrants in code as possible -http://evancordell.com/2015/09/27/macaroons-101-contextual-confinement.html - -------- - -## Schema/API Notes +## Web Signup/Login + +OpenID Connect (OIDC) is basically a convention for servers and clients to use +OAuth2 for the specific purpose of just logging in or linking accounts, a la +"Sign In With ...". OAuth is often used to provider interoperability between +service (eg, a client app can take actions as the user, when granted +permissions, on the authenticating platform); OIDC doesn't grant any such +permissions, just refreshing logins at most. + +The web interface (webface) does all the OAuth/OIDC trickery, and extracts a +simple platform identifier and user identifier if authentication was +successful. It sends this in a fatcat API request to the `/auth/oidc` endpoint, +using admin authentication (the web interface stores an internal token "for +itself" for this one purpose). The API will return both an Editor object and a +token for that editor in the response. If the user had signed in previously +using the same provider/service/user pair as before, the Editor object is the +user's login. If the pair is new, a new account is created automatically and +returned; the HTTP status code indicates which happened. The editor username is +automatically generated from the remote username and platform (user can change +it if they want). + +The returned token and editor metadata are stored in session cookies. The flask +framework has a secure cookie implementation that prevents users from making up +cookies, but this isn't the real security mechanism; the real mechanism is that +they can't generate valid macaroons because they are signed. Cookie *theft* is +an issue, so aggressive cookie protections should be activated in the Flask +configuration. + +The `auth_oidc` enforces uniqueness on accounts in a few ways: + +- lowercase UNIQ constaint on usernames (can't register upper- and lower-case + variants) +- UNIQ {`editor_id`, `platform`}: can't login using multiple remote accounts + from the same platform +- UNIQ {`platform`, `remote_host`, `remote_id`}: can't login to multiple local + accounts using the same remote account +- all fields are NOT NULL + +## Role-Based Authentication (RBAC) + +Current acknowledge roles: + +- public (not authenticated) +- bot +- human +- editor (bot or human) +- admin +- superuser -GET /auth/oidc -=> params: provider, sub, iss -=> returns {editor, token} or not found -=> admin auth required +Will probably rename these. Additionally, editor accounts have an `is_active` +flag (used to lock disabled/deleted/abusive/compromised accounts); no roles +beyond public are given for inactive accounts. -POST /auth/oidc -=> params: editor_id, provider, sub, iss -=> returns {editor, token} -=> admin auth required +## Developer Affordances -POST /editor -=> admin auth required +A few core accounts are created automatically, with fixed `username`, +`auth_epoch` and `editor_id`, to make testing and administration easier across +database resets (aka, tokens keep working as long as the signing key stays the +same). -flow is to have single login/signup OIDC flow. If need to create an account, -bounce to special page for that and store ISS/SUB in (signed/secure) session -temporarily. +Tokens and other secrets can be store in environment variables, scripts, or +`.env` files. -This doesn't feel great. Could instead randomly generate a username, and -provide mechanism to update. That's better! +## Future Work and Alternatives -PUT /editor/{editor_id} -=> only allow username updates, and only by admin or logged-in user +Want to support more OAuth/OIDC endpoints: -schema: -`auth_oidc` - => id (BIGINT), editor_id, provider, oidc_iss, oidc_sub - => created (auto-timestamp) - => UNIQ index on (editor_id, provider) - => UNIQ index on (provider, remote_sub, remote_iss) - => all are NOT NULL +- archive.org: bespoke "XAuth" thing; would be reasonable to hack in support. + use user itemname as persistent 'sub' field +- orcid.org: supports OIDC +- wikipedia/wikimedia: OAuth; https://github.com/valhallasw/flask-mwoauth +- additional -## Webface Notes +Additional macaroon caveats: -Want to use "OpenID Connect" (OIDC), which is basically a subset/convention of -OAuth 2.0 for authenticaiton ("log in as"), without granting API priviliges. +- `endpoint` (API method; caveat can include a list) +- `editgroup` +- (etc) -Want to support multiple identity providers, eg: -- orcid.org - => Basic OpenID Provider; implicit token -- git.archive.org -- gitlab.org - => https://docs.gitlab.com/ee/integration/openid_connect_provider.html -- google.com +Looked at a few other options for managing use accounts: -Currently, looks like github.com doesn't support OIDC; they are the only -provider i'm interested in that does not. +- portier, the successor to persona, which basically uses email for magic-link + login, unless the email provider supports OIDC or similar. There is a central + hosted version to use for bootstrap. Appealing/minimal, but feels somewhat + neglected. +- use something like 'dex' as a proxy to multiple OIDC (and other) providers +- deploy a huge all-in-one platform like keycloak for all auth anything ever. + sort of wish Internet Archive, or somebody (Wikimedia?) ran one of these as + public infrastructure. +- having webface generate macaroons itself -authlib/loginpass are tempting to use as they support a bunch of providers -out-of-the-box... but not orcid. +## Implementation Notes -Alternatively, could use any number of "proxies"/thingies to aggregate auth: -- https://www.keycloak.org/about.html -- https://portier.github.io/ -- https://github.com/dexidp/dex +To start, using the `loginpass` python library to handle logins, which is built +on `authlib`. May need to extend or just use `authlib` directly in the future. +Supports many large commercial providers, including gitlab.com, github.com, and +google. -Possible flask integrations: -=> https://flask-oidc.readthedocs.io/en/latest/ -=> https://github.com/zamzterz/Flask-pyoidc +There are many other flask/oauth/OIDC libraries out there, but this one worked +well with multiple popular providers, mostly by being flexible about actual +OIDC support. For example, Github doesn't support OIDC (only OAuth2), and +apparently Gitlab's is incomplete/broken. -Background: -=> https://blog.runscope.com/posts/understanding-oauth-2-and-openid-connect -=> https://latacora.micro.blog/2018/06/12/a-childs-garden.html +### Background Reading -Future work: -=> multiple logins, and/or merging accounts +Other flask OIDC integrations: +- https://flask-oidc.readthedocs.io/en/latest/ +- https://github.com/zamzterz/Flask-pyoidc -"Fatcat is an open, editable database of bibliographic metadata. You can -sign-up and login using orcid.org; this option is used for identity and -authentication only. Fatcat does not currently make changes to any data on -orcid.org, which you can verify from the permissions requested." +Background reading on macaroons: - https://fatcat.wiki/auth/oidc_redirect - https://qa.fatcat.wiki/auth/oidc_redirect +- https://github.com/rescrv/libmacaroons +- http://evancordell.com/2015/09/27/macaroons-101-contextual-confinement.html +- https://blog.runscope.com/posts/understanding-oauth-2-and-openid-connect +- https://latacora.micro.blog/2018/06/12/a-childs-garden.html +- https://github.com/go-macaroon-bakery/macaroon-bakery (for the "bakery" API pattern) -PLAN: -- have a mode/mechanism for login-by-token; mostly for testing -- for now, use loginpass OAuth/OIDC for login/signup. upstream ORCID support or - hack that in somehow when desired -- auto-create a username based on oauth, then allow changes diff --git a/notes/oauth_statements.md b/notes/oauth_statements.md new file mode 100644 index 00000000..5f46c9ed --- /dev/null +++ b/notes/oauth_statements.md @@ -0,0 +1,14 @@ + +Copy text used when signing up for OAuth applications on various platforms. + +## Wikimedia + +Fatcat (https://fatcat.wiki) is a publicly-editable bibliographic catalog, containing metadata about tens of millions of research articles, conference proceedings, and books. The particular emphasis is on linking different "releases" of the same "work" (eg, preprint and final copies of a journal paper), and matching specific (archived) files to releases. +Fatcat is a project of the Internet Archive (https://archive.org). + +## ORCID + +Fatcat is an open, editable database of bibliographic metadata. You can sign-up +and login using orcid.org; this option is used for identity and authentication +only. Fatcat does not currently make changes to any data on orcid.org, which +you can verify from the permissions requested. -- cgit v1.2.3 From 084e476957ce80b456dcf0575de4efc7331d34f9 Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Fri, 4 Jan 2019 17:41:27 -0800 Subject: clean up notes a tiny bit --- notes/UNSORTED.txt | 40 ++++++++++++++++++++++++++++++++++ notes/bot_tools.txt | 17 --------------- notes/domains.txt | 5 ----- notes/golang.txt | 45 --------------------------------------- notes/ideas/bot_tools.txt | 17 +++++++++++++++ notes/ideas/domains.txt | 5 +++++ notes/ideas/more_api_patterns.txt | 15 +++++++++++++ notes/ideas/thoughts.txt | 32 ++++++++++++++++++++++++++++ notes/more_api_patterns.txt | 15 ------------- notes/thoughts.txt | 32 ---------------------------- 10 files changed, 109 insertions(+), 114 deletions(-) create mode 100644 notes/UNSORTED.txt delete mode 100644 notes/bot_tools.txt delete mode 100644 notes/domains.txt delete mode 100644 notes/golang.txt create mode 100644 notes/ideas/bot_tools.txt create mode 100644 notes/ideas/domains.txt create mode 100644 notes/ideas/more_api_patterns.txt create mode 100644 notes/ideas/thoughts.txt delete mode 100644 notes/more_api_patterns.txt delete mode 100644 notes/thoughts.txt (limited to 'notes') diff --git a/notes/UNSORTED.txt b/notes/UNSORTED.txt new file mode 100644 index 00000000..3960f5eb --- /dev/null +++ b/notes/UNSORTED.txt @@ -0,0 +1,40 @@ + +Not allowed to PUT edits to the same entity in the same editgroup. If you want +to update an edit, need to delete the old one first. + +The state depends only on the current entity state, not any redirect. This +means that if the target of a redirect is delted, the redirecting entity is +still "redirect", not "deleted". + +Redirects-to-redirects are not allowed; this is enforced when the editgroup is +accepted, to prevent race conditions. + +Redirects to "work-in-progress" (WIP) rows are disallowed at update time (and +not re-checked at accept time). + +"ident table" parameters are ignored for entity updates. This is so clients can +simply re-use object instantiations. + +The "state" parameter of an entity body is used as a flag when deciding whether +to do non-normal updates (eg, redirect or undelete, as opposed to inserting a +new revision). + +In the API, if you, eg, expand=files on a redirected release, you will get +files that point to the *target* release entity. If you use the /files endpoint +(instead of expand), you will get the files pointing to the redirected entity +(which probably need updating!). Also, if you expand=files on the target +entity, you *won't* get the files pointing to the redirected release. A +high-level merge process might make these changes at the same time? Or at least +tag at edit review time. A sweeper task can look for and auto-correct such +redirects after some delay period. + +=> it would not be too hard to update get_release_files to check for such + redirects; could be handled by request flag? + +`prev_rev` is naively set to the most-recent previous state. If the curent +state was deleted or a redirect, it is set to null. + +This parameter is not checked/enforced at edit accept time (but could be, and +maybe introduce `prev_redirect`, for race detection). Or, could have ident +point to most-recent edit, and have edits point to prev, for firmer control. + diff --git a/notes/bot_tools.txt b/notes/bot_tools.txt deleted file mode 100644 index cf465bde..00000000 --- a/notes/bot_tools.txt +++ /dev/null @@ -1,17 +0,0 @@ - -Could be helpful for writing bots for import: - -metafacture: large/popular java framework for pipelines and munging library -metadata. - - https://github.com/metafacture/metafacture-core/wiki - -catmandu: large/popular set of perl libraries for munging bibliographic -metadata, including a DSL ("Fix"). Can also push/pull to backends. - -miku/siskin: luigi and higher-level tool for running regular tasks. - - https://github.com/miku/span - -miku/span: golang lower-level tools for parsing and normalizing specific -formats (including KBART, DOAJ). diff --git a/notes/domains.txt b/notes/domains.txt deleted file mode 100644 index 8556494e..00000000 --- a/notes/domains.txt +++ /dev/null @@ -1,5 +0,0 @@ - -Many obvious domains and hacks are taken. Would love to get fatcat.org; for now -registered fatcat.wiki. - -fatca.tt is available. diff --git a/notes/golang.txt b/notes/golang.txt deleted file mode 100644 index 404741e8..00000000 --- a/notes/golang.txt +++ /dev/null @@ -1,45 +0,0 @@ - -## Database Schema / ORM / Generation - -start simple, with pg (or sqlx if we wanted to be DB-agnostic): -- pq: basic postgres driver and ORM (similar to sqlalchemy?) -- sqlx: small extensions to builtin sql; row to struct mapping - -debug postgres with gocmdpev - -later, if code is too duplicated, look in to sqlboiler (first) or xo (second): -- https://github.com/xo/xo -- https://github.com/volatiletech/sqlboiler - -later, to do migrations, use goose, or consider alembic (python) for -auto-generation -- https://github.com/steinbacher/goose -- possibly auto-generate with python alembic - -for identifiers, consider either built-in postgres UUID, or: -- https://github.com/rs/xid -- https://github.com/oklog/ulid - like a UUID, but base32 and "sortable" (timestamp + random) - -## API In General - -Hope to use Kong for authentication. - -start with oauth2... orcid? - -## OpenAPI/Swagger - -go-swagger (OpenAPI 2.0): -- generate initial API server skeleton from a yaml definition -- export updated yaml from code after changes -- web UI for documentation -- templating/references -- auto-generate client (in golang) - -also look at ReDoc as a UI; all in-brower generated from JSON (react) - -## Non-API stuff - -- logrus structured logging (or zap?) -- testify tests (and assert?) -- viper config diff --git a/notes/ideas/bot_tools.txt b/notes/ideas/bot_tools.txt new file mode 100644 index 00000000..cf465bde --- /dev/null +++ b/notes/ideas/bot_tools.txt @@ -0,0 +1,17 @@ + +Could be helpful for writing bots for import: + +metafacture: large/popular java framework for pipelines and munging library +metadata. + + https://github.com/metafacture/metafacture-core/wiki + +catmandu: large/popular set of perl libraries for munging bibliographic +metadata, including a DSL ("Fix"). Can also push/pull to backends. + +miku/siskin: luigi and higher-level tool for running regular tasks. + + https://github.com/miku/span + +miku/span: golang lower-level tools for parsing and normalizing specific +formats (including KBART, DOAJ). diff --git a/notes/ideas/domains.txt b/notes/ideas/domains.txt new file mode 100644 index 00000000..8556494e --- /dev/null +++ b/notes/ideas/domains.txt @@ -0,0 +1,5 @@ + +Many obvious domains and hacks are taken. Would love to get fatcat.org; for now +registered fatcat.wiki. + +fatca.tt is available. diff --git a/notes/ideas/more_api_patterns.txt b/notes/ideas/more_api_patterns.txt new file mode 100644 index 00000000..ca61ac81 --- /dev/null +++ b/notes/ideas/more_api_patterns.txt @@ -0,0 +1,15 @@ + +If returning a long list (eg, all releases for a container): + + "releases": { + "data": [ + , + , + ... + ], + "has_mode": true, + "total_count": 100, + "url": "/v0/container/asdf/releases" + } + +This pattern from the Stripe API. diff --git a/notes/ideas/thoughts.txt b/notes/ideas/thoughts.txt new file mode 100644 index 00000000..c01c0d37 --- /dev/null +++ b/notes/ideas/thoughts.txt @@ -0,0 +1,32 @@ + +Instead of having a separate id pointer table, could have an extra "mutable" +public ID column (unique, indexed) on entity rows. Backend would ensure the +right thing happens. Changelog tables (or special redirect/deletion tables) +would record changes and be "fallen through" to. + +Instead of having merge redirects, could just point all identifiers to the same +revision (and update them all in the future). Don't need to recurse! Need to +keep this forever though, could scale badly if "aggregations" get merged. + +Redirections of redirections should probably simply be disallowed. + +"Deletion" is really just pointing to a special or null entity. + +Trade-off: easy querying for common case (wanting "active" rows) vs. robust +handling of redirects (likely to be pretty common). Also, having UUID handling +across more than one table. + +## Scaling database + +Two scaling issues: size of database due to edits (likely billions of rows) and +desire to do complex queries/reports ("analytics"). The later is probably not a +concern, and could be handled by dumping and working on a cluster (or secondary +views, etc). So just a distraction? Simpler to have all rolled up. + +Cockroach is postgres-like; might be able to use that for HA and scaling? +Bottlenecks are probably complex joins (mitigated by "interleave"?) and bulk +import performance (one-time?). + +Using elastic for most (eg, non-logged-in) views could keep things fast. + +Cockroach seems more resourced/polished than TiDB? diff --git a/notes/more_api_patterns.txt b/notes/more_api_patterns.txt deleted file mode 100644 index ca61ac81..00000000 --- a/notes/more_api_patterns.txt +++ /dev/null @@ -1,15 +0,0 @@ - -If returning a long list (eg, all releases for a container): - - "releases": { - "data": [ - , - , - ... - ], - "has_mode": true, - "total_count": 100, - "url": "/v0/container/asdf/releases" - } - -This pattern from the Stripe API. diff --git a/notes/thoughts.txt b/notes/thoughts.txt deleted file mode 100644 index c01c0d37..00000000 --- a/notes/thoughts.txt +++ /dev/null @@ -1,32 +0,0 @@ - -Instead of having a separate id pointer table, could have an extra "mutable" -public ID column (unique, indexed) on entity rows. Backend would ensure the -right thing happens. Changelog tables (or special redirect/deletion tables) -would record changes and be "fallen through" to. - -Instead of having merge redirects, could just point all identifiers to the same -revision (and update them all in the future). Don't need to recurse! Need to -keep this forever though, could scale badly if "aggregations" get merged. - -Redirections of redirections should probably simply be disallowed. - -"Deletion" is really just pointing to a special or null entity. - -Trade-off: easy querying for common case (wanting "active" rows) vs. robust -handling of redirects (likely to be pretty common). Also, having UUID handling -across more than one table. - -## Scaling database - -Two scaling issues: size of database due to edits (likely billions of rows) and -desire to do complex queries/reports ("analytics"). The later is probably not a -concern, and could be handled by dumping and working on a cluster (or secondary -views, etc). So just a distraction? Simpler to have all rolled up. - -Cockroach is postgres-like; might be able to use that for HA and scaling? -Bottlenecks are probably complex joins (mitigated by "interleave"?) and bulk -import performance (one-time?). - -Using elastic for most (eg, non-logged-in) views could keep things fast. - -Cockroach seems more resourced/polished than TiDB? -- cgit v1.2.3 From 5e138c0cf74c68cbf0892437d9081f4132236ef4 Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Mon, 7 Jan 2019 17:44:36 -0800 Subject: more auth notes --- notes/auth.md | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'notes') diff --git a/notes/auth.md b/notes/auth.md index 1918dc82..b73ce343 100644 --- a/notes/auth.md +++ b/notes/auth.md @@ -201,6 +201,16 @@ Looked at a few other options for managing use accounts: public infrastructure. - having webface generate macaroons itself +Will probably eventually need to support multiple logins per editor account. +Shouldn't be too hard, but will require additional API endpoints (POST with +`editor_id` included, DELETE to remove, etc). + +On mobile folks might not be signed in to as many accounts, or it might be +annoying to enter long/secure passwords (eg, to login to github). Could get +around this with "login via token via QR code" with long/unlimited expiry. +Might make more sense to support google OIDC as my guess is that many (most?) +people have a google account logged in on their phone. + ## Implementation Notes To start, using the `loginpass` python library to handle logins, which is built -- cgit v1.2.3 From 01facf0167b4d1033c6af20ba98874757dbc46e5 Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Mon, 7 Jan 2019 17:49:02 -0800 Subject: basic IA XAuth notes --- notes/auth.md | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) (limited to 'notes') diff --git a/notes/auth.md b/notes/auth.md index b73ce343..ea249cf7 100644 --- a/notes/auth.md +++ b/notes/auth.md @@ -148,6 +148,20 @@ The `auth_oidc` enforces uniqueness on accounts in a few ways: accounts using the same remote account - all fields are NOT NULL +### archive.org "XAuth" Login + +The internet archive has it's own bespoke internal API for authentication +between services. Internal (non-public) documentation link: + + https://git.archive.org/ia/petabox/blob/master/www/sf/services/xauthn/README.md + +Fatcat implements "passthrough" authentication to this endpoint by accepting +email/password (in plaintext! red lights and sirens!) and passes them through, +along with with special staff-level authentication keys, to authenticate and +fetch user info. Fatcat then pretends this was a regular OAuth/OIDC +interaction, substituting the archive.org user "itemname" as a persistent +identifier, and the XAuth endpoint as the service key. + ## Role-Based Authentication (RBAC) Current acknowledge roles: @@ -177,11 +191,8 @@ Tokens and other secrets can be store in environment variables, scripts, or Want to support more OAuth/OIDC endpoints: -- archive.org: bespoke "XAuth" thing; would be reasonable to hack in support. - use user itemname as persistent 'sub' field - orcid.org: supports OIDC - wikipedia/wikimedia: OAuth; https://github.com/valhallasw/flask-mwoauth -- additional Additional macaroon caveats: -- cgit v1.2.3