## In Progress

- check that any needed/new indices are in place
    => seems to at least superficially work
- benchmark citation efficiency (in QA)

- all query params need to be strings, and parse in rust :(
    since=(datetime.datetime.utcnow() + datetime.timedelta(seconds=1)).isoformat()+"Z"
- doc: python client API needs to have booleans set as, eg, 'true'/'false' (str) (!?!?)
    "note that non-required or collection query parameters will ignore garbage values, rather than causing a 400 response"

## Next Up

- "don't clobber" mode/flag for crossref import (and others?)
- elastic inserter should handle deletions and redirects; if state isn't
  active, delete the document
    => don't delete, just store state. but need to "blank" redirects and WIP so
       they don't show up in results
    => refactor inserter to be a class (eg, for command line use)
    => end-to-end test of this behavior?
- webcapture timestamp schema cleanup (both CDX and base)
    => dt.to_rfc3339_opts(SecondsFormat::Secs, true)
    => but this is mostly buried in serialization code?
- fake DOI (use in examples): 10.5555/12345678
- URL location duplication (especially IA/wayback)
    => eg, https://fatcat.wiki/file/2g4sz57j3bgcfpwkgz5bome3re
    => UNIQ index on {release_rev, url}?
- shadow library manifest importer
- import from arabesque output (eg, specific crawls)
- elastic iteration
    => any_abstract broken?
    => blank author names? maybe in crossref import; fatcat-api and schema
       should both prevent
- handle very large author/reference lists (instead of dropping)
    => https://api.crossref.org/v1/works/http://dx.doi.org/10.1007/978-3-319-46095-6_7
    => 7000+ authors (!)
- guide updates for auth
- refactor webface views to use shared entity_view.html template
- handle 'wip' status entities in web UI

## Bugs (or at least need tests)

- autoaccept seems to have silently not actually merged editgroup

## Ideas

- more logins: orcid, wikimedia
- `fatcat-auth` tool should support more caveats, both when generating new or
  mutating existing tokens
- fast path to skip recursive redirect checks for bulk inserts
- when getting "wip" entities, require a parameter ("allow_wip"), else get a
  404
- consider dropping CORE identifier
- maybe better 'success' return message? eg, "success: true" flag
- idea: allow users to generate their own editgroup UUIDs, to reduce a round
  trips and "hanging" editgroups (created but never edited)
- API: allow deletion of empty, un-accepted editgroups
- refactor API schema for some entity-generic methos (eg, history, edit
  operations) to take entity type as a URL path param. greatly reduce macro
  foolery and method count/complexity, and ease creation of new entities
    => /{entity}/edit/{edit_id}
    => /{entity}/{ident}/redirects
    => /{entity}/{ident}/history
- investigate data quality by looking at, eg, most popular author strings, most
  popular titles, duplicated containers, etc

## Production blockers

- privacy policy, and link from: create account, create edit
- update /about page
- refactors and correctness in rust/TODO
- importers: don't insert wayback links with short timestamps

## Production Sanity

- fatcat-web is not Type=simple (systemd)
- postgresql replication
- pg_dump/load test
- haproxy somewhere/how
- logging iteration: larger journald buffers? point somewhere?

## Metadata Import

- web.archive.org response not SHA1 match? => need /<dt>id_/ thing
- XML etc in metadata
    => (python) tests for these!
    https://qa.fatcat.wiki/release/b3a2jvhvbvc6rlbdkpw4ukuzyi
    https://qa.fatcat.wiki/release/search?q=xmlns
    https://qa.fatcat.wiki/release/search?q=%26amp%3B
    https://qa.fatcat.wiki/release/search?q=%26gt%3B
- better/complete reltypes probably good (eg, list of IRs, academic domain)
- 'expand' in lookups (derp! for single hit lookups)
- include crossref-capitalized DOI in extra
- some "Elsevier " stuff as publisher
    => also title https://fatcat.wiki/release/uyjzaq3xjnd6tcrqy3vcucczsi
- crossref import: don't store citation unstructured if len() == 0:
    {"crossref": {"unstructured": ""}}
- cleaning/matching: https://ftfy.readthedocs.io/en/latest/
    => and try out beautifulsoup (https://stackoverflow.com/a/34532382/4682349)
- manifest: multiple URLs per SHA1
- crossref: relations ("is-preprint-of")
- crossref: two phase: no citations, then matched citations (via DOI table)
- container import (extra?): lang, region, subject
- crossref: filter works
    => content-type whitelist
    => title length and title/slug blacklist
    => at least one author (?)
    => make this a method on Release object
    => or just set release_type as "stub"?
- special "alias" DOIs... in crossref metadata?

new importers:
- pubmed (medline) (filtered)
    => and/or, use pubmed ID lookups on crossref import
- arxiv.org
- DOAJ
- CORE (filtered)
- semantic scholar (up to 39 million; includes author de-dupe)

## Entity/Edit Lifecycle

- commenting and accepting editgroups
- editgroup state machine?

## Guide / Book / Style

- release_type, release_status, url.rel schemas (enforced in API)
- more+better terms+policies: https://tosdr.org/index.html

## Fun Features

- "save paper now"
    => is it in GWB? if not, SPN
    => get hash + url from GWB, verify mimetype acceptable
    => is file in fatcat?
    => what about HBase? GROBID?
    => create edit, redirect user to editgroup submit page
- python client tool and library in pypi
    => or maybe rust?
- bibtext (etc) export

## Metadata Harvesting

- datacite ingest seems to have failed... got a non-HTTP-200 status code, but also "got 50 (161950 of 21084)"

## Schema / Entity Fields

- elastic transform should only include authors, not editors (?)
- arxiv_id field (keep flip-flopping)
- original_title field (internationalization, "original language")
- `doi` field for containers (at least for "journal" type; maybe for "series"
  as well?)
- `retracted`, `translation`, and perhaps `corrected` as flags on releases,
  instead of release_status?
- 'part-of' relation for releases (release to release) and possibly containers
- `container_type` field for containers (journal, conference, book series, etc)

## Other / Backburner

- document: elastic query date syntax is like: date:[2018-10-01 TO 2018-12-31]
- fileset/webcapture webface anything
- display abstracts better. no hashes or metadata; prefer plain or HTML,
  convert JATS if necessary
- switch from slog to simple pretty_env_log
- format returned datetimes with only second precision, not millisecond (RFC mode)
    => burried in model serialization internals
- refactor openapi schema to use shared response types
- consider using "HTTP 202: Accepted" for entity-mutating calls
- basic python hbase/elastic matcher
  => takes sha1 keys
  => checks fatcat API + hbase
  => if not matched yet, tries elastic search
  => simple ~exact match heuristic
  => proof-of-concept, no tests
- add_header Strict-Transport-Security "max-age=3600";
    => 12 hours? 24?
- haproxy for rate-limiting

better API docs
- readme.io has a free open source plan (or at least used to)
- https://github.com/readmeio/api-explorer
- https://github.com/lord/slate
- https://sourcey.com/spectacle/
- https://github.com/DapperDox/dapperdox

CSL:
- https://citationstyles.org/
- https://github.com/citation-style-language/documentation/blob/master/primer.txt
- https://citeproc-js.readthedocs.io/en/latest/csl-json/markup.html
- https://github.com/citation-style-language/schema/blob/master/csl-types.rnc
- perhaps a "create from CSL" endpoint?