summaryrefslogtreecommitdiffstats
path: root/TODO
blob: 28c83ee9f509751122d4224db9aa6ea65e34e2a7 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196

## Next Up
PLAN:
x update openapi schema with all the below
x python tests (that fail)
x rust stubs (to compile)
- rust tests
    => get and delete edits
    => redirect an entity (via PUT)
    => un-delete an entity (via PUT)
- implement until rust tests pass
- implement until python tests pass

- nomination form/thing for long-tail fatcat targets
- check on LOCKSS
- ask adam for nagios/ansible setup
- arxiv.org mirror


DOCUMENT/TEST EDGE CASES:
- PUT updating entities in an editgroup: overwrite edit, or require previous
  edit to be deleted first?
- prev_revision flag: must always be set as a sanity check for edits? what
  about when previous state was deleted?
- when a redirect has the redirect target deleted, what happens? decision:
  "state" shouldn't change without an update to the entity, though revision_id
  and redirect_id *can* change
- "redirect to redirect" condition. decision: leaf should have redirect_id set
  to end of chain... but need to worry about the delete/undelete states in that
  case
- edit to redirect A to B is started. B is updated and/or redirected and/or
  deleted. then A edit is merged. ensure that editgroup accept process handles
  this correctly
- auto-accpet behavior when there were other edits in same group already (or
  should we just disallow that?)
- "current editgroup" behavior, which should probably just disallow
- use of "state" in entities as a flag for redirects and direct revision
  updates
- reverting to current version isn't allowed
- get of "wip" entities; make sure to check status? hrmpf.

- consider dropping CORE identifier

NOTE: maybe in the future we can make it easier on ourselves by just saying
    that if an entity has redirects to it, it can't be deleted or redirected

x TODO: don't allow redirect to "wip" rows
    => needs test (python)
- TODO: fix returned error messages; should return type (shortname), and then
    actual message/description
- TODO: maybe better success return message?
- allow 'expand' in lookups (particularly for releases/files)
    => needs test (python or rust)
- idea: allow users to generate their own editgroup UUIDs, to reduce a round
  trips and "hanging" editgroups (created but never edited)
- API: deletion of empty, un-accepted editgroups
- TODO: elastic inserter should handle deletions and redirects; if state isn't
  active, delete the document
    => and an end-to-end test of this behavior. hoo-boy.

- test/read: fetching deleted and redirected entities via API and web interface
- large refactor: make many endpoints entity-agnostic (passing entity-type as a param)

- redirecting and reverting endpoints
    => in PUT, path for handling redirect_ident or revision
    => this means can't have any required fields for any entities in API schema
- insertion of entity edit rows should be postgres upserts based on ident and editgroup
    => need UNIQ constraint?
- python test: re-deleting a deleted entity should be 4xx, not 5xx
- python test: can't delete an accepted edit

x API endpoints (GET, DELETE) for entity edits
    => to allow removing individual edit from editgroup
x API endpoints (GET) for entity revisions
x API endpoints to find entities that redirect to an ident
- what to do with redirect-to-redirect, or deletion of redirect?
    => for redirect-to-redirect, point to new redirect
    => for deletion of redirect, keep redirect, but remove revision
x API endpoints additional lookup params

- enforce "no editing if editgroup accepted" behavior
- require and enforce "previous_rev" required in updates
- redirect rev_id needs to be updated when primary changes
- redirect/delete/update/lifecycle tests and completeness
- basic webface creation, editing, merging, editgroup approval

- refactor API schema for some entity-generic methos (eg, history, edit
  operations) to take entity type as a URL path param. greatly reduce macro
  foolery and method count/complexity, and ease creation of new entities
    => /{entity}/edit/{edit_id}
    => /{entity}/{ident}/redirects
    => /{entity}/{ident}/history

## Production blockers

- refactors and correctness in rust/TODO
- importers have editor accounts and include editgroup metadata
- enforce single-ident-edit-per-editgroup
    => entity_edit: entity_ident/entity_editgroup should be UNIQ index
    => UPDATE/REPLACE edits?
- crossref importer sets release_type as "stub" when appropriate
- re-implement old python tests
- real authentication and authorization
- metrics, jwt, config, sentry

## Metadata Import

- manifest: multiple URLs per SHA1
- crossref: relations ("is-preprint-of")
- crossref: two phase: no citations, then matched citations (via DOI table)
- container import (extra?): lang, region, subject
- crossref: filter works
    => content-type whitelist
    => title length and title/slug blacklist
    => at least one author (?)
    => make this a method on Release object
    => or just set release_stub as "stub"?

new importers:
- pubmed (medline) (filtered)
    => and/or, use pubmed ID lookups on crossref import
- arxiv.org
- DOAJ
- CORE (filtered)
- semantic scholar (up to 39 million; includes author de-dupe)

## Entity/Edit Lifecycle

- redirects and merges (API, webface, etc)
- test: release pointing to a collection that has been deleted/redirected
  => UI crash?
- commenting and accepting editgroups
- editgroup state machine?
- enforce "single ident edit per editgroup"
    => how to "edit an edit"? clobber existing?

## Guide / Book / Style

- release_type, release_status, url.rel schemas (enforced in API)
- more+better terms+policies: https://tosdr.org/index.html

## Fun Features

- "save paper now"
    => is it in GWB? if not, SPN
    => get hash + url from GWB, verify mimetype acceptable
    => is file in fatcat?
    => what about HBase? GROBID?
    => create edit, redirect user to editgroup submit page
- python client tool and library in pypi
    => or maybe rust?
- bibtext (etc) export

## Schema / Entity Fields

- arxiv_id field (keep flip-flopping)
- original_title field (?)
- FileSet and WebSnapshot entities
- `doi` field for containers (at least for "journal" type; maybe for "series"
  as well?)
- `retracted`, `translation`, and perhaps `corrected` as flags on releases,
  instead of release_status?
- 'part-of' relation for releases (release to release) and possibly containers
- `container-type` field for containers (journal, conference, book series, etc)

## Other / Backburner

- look at: https://ftfy.readthedocs.io/en/latest/
- refactor openapi schema to use shared response types
- consider using "HTTP 202: Accepted" for entity-mutating calls
- basic python hbase/elastic matcher
  => takes sha1 keys
  => checks fatcat API + hbase
  => if not matched yet, tries elastic search
  => simple ~exact match heuristic
  => proof-of-concept, no tests
- add_header Strict-Transport-Security "max-age=3600";
    => 12 hours? 24?
- haproxy for rate-limiting
- feature flags: consul?
- secrets: vault?
- "authn" microservice: https://keratin.tech/

better API docs
- readme.io has a free open source plan (or at least used to)
- https://github.com/readmeio/api-explorer
- https://github.com/lord/slate
- https://sourcey.com/spectacle/
- https://github.com/DapperDox/dapperdox

CSL:
- https://citationstyles.org/
- https://github.com/citation-style-language/documentation/blob/master/primer.txt
- https://citeproc-js.readthedocs.io/en/latest/csl-json/markup.html
- https://github.com/citation-style-language/schema/blob/master/csl-types.rnc
- perhaps a "create from CSL" endpoint?