aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@robocracy.org>2019-05-15 10:16:40 -0700
committerBryan Newbold <bnewbold@robocracy.org>2019-05-15 10:16:40 -0700
commitffee70b116f2683ca24e8046144fa078f2964774 (patch)
tree628539c703de198b556bf368b21d7f7b8f3fb574
parent4f0be9ae6447073ffe252376d88228083f01f837 (diff)
downloadfatcat-ffee70b116f2683ca24e8046144fa078f2964774.tar.gz
fatcat-ffee70b116f2683ca24e8046144fa078f2964774.zip
TODO progress (v0.3)
-rw-r--r--TODO.md49
1 files changed, 34 insertions, 15 deletions
diff --git a/TODO.md b/TODO.md
index 9b5a432f..f8a0fa31 100644
--- a/TODO.md
+++ b/TODO.md
@@ -1,10 +1,26 @@
## In Progress
-- update existing 1.5 mil longtail OA PDFs with container/ISSN-L
+x webcapture `size_bytes`/`size (consistency with file and fileset)
+x final decision on `version` field
+ => useful for repositories with multiple versions as incrementing integers
+ => also useful for "unstructuring" some identifiers (arxiv, zenodo DOIs)
+ => but CSL wants to use it (only?) for software versions
+ => what about book editions, or draft revisions?
+ => let's keep, but carefully document scope
+x verifiers for all extid types (including new ark, mag)
+x creation of editgroup via auto_batch needs extra checks
+- test: edit_extra set for each entity type
+- merge new importers branch
+ => fix schema changes
+ => use new schema fields
+ => tests
+- update guide with new schema
+- elasticsearch schema changes (and transforms)
## Next Up
+- update existing 1.5 mil longtail OA PDFs with container/ISSN-L
## Bugs
@@ -17,24 +33,29 @@
Changes to SQL (and swagger):
-- structured names in contribs (given/sur)
-- `release_status` => `release_stage`
-- `withdrawn_date`, `withdrawn_state`, and retraction as a release stage
-- subtitle as a string field
+X missing SQL indices: `ENTITY_edit.editgroup_id, ENTITY_edit.ident_id`
+X structured names in contribs (given/sur)
+X `release_status` => `release_stage`
+X size on webcapture CDX lines (we fetch for sha256 anyways, so easy to calculate)
+X `ark_id` release identifier
+X `mag_id` (microsoft academic graph) release identifier
+
+X `withdrawn_date`, `withdrawn_state`, and retraction as a release stage
+ => and `withdrawn_year`?
+X subtitle as a string field
=> but what about translation? `original_subtitle`? just combine them?
=> combine in elasticsearch 'title' field
-- size on webcapture CDX lines (we fetch for sha256 anyways, so easy to calculate)
-- `ark_id` release identifier
-- `mag_id` (microsoft academic graph) release identifier
-- releases: 'number' (eg, report numbers) and 'version' (for numbered variants) fields
-- missing SQL indices: `ENTITY_edit.editgroup_id, ENTITY_edit.ident_id`
+X releases: 'number' (eg, report numbers) and 'version' (for numbered variants) fields
Changes to swagger only:
-- edit URLs: `editgroup_id` in URL, not a query param
+- refactor entity mutation (CUD) endpoints to be like `/editgroup/{editgroup_id}/release/{ident}`
+ => changes editgroup_id from query param to URL param
- changelog API endpoint should needs expand=editors option
+ => editors in a bunch of other return types also?
- include 'created' in editgroup object (already in SQL)
-- FileEntityUrls => FileEntityUrl (and similar)
+x FileEntityUrls => FileEntityUrl (and similar)
+? refactor bulk POST to include editgroup plus array of entity objects (instead of just a couple fields as query params)
## Next Full Release "Touch"
@@ -195,9 +216,7 @@ new importers:
## API Schema / Design
-- refactor entity mutation (CUD) endpoints to be like `/editgroup/{editgroup_id}/release/{ident}`
- => changes editgroup_id from query param to URL param
-- refactor bulk POST to include editgroup plus array of entity objects (instead of just a couple fields as query params)
+- `release_month` field. for journals, having the year and month but not day is relatively common (citation needed)
## Web Interface