update TODO

author: Bryan Newbold <bnewbold@robocracy.org> 2019-05-20 20:36:04 -0700
committer: Bryan Newbold <bnewbold@robocracy.org> 2019-05-20 20:36:04 -0700
commit: 27f5bc0396ecc05ad539e62763cb3116e3d4cfa8 (patch)
tree: f47cf1e7a8e09490848667cec156021b0c04168c /TODO.md
parent: 038395d5975b8dd0c790bf35cb020d1cd4e6cb46 (diff)
download: fatcat-27f5bc0396ecc05ad539e62763cb3116e3d4cfa8.tar.gz
fatcat-27f5bc0396ecc05ad539e62763cb3116e3d4cfa8.zip
1 files changed, 7 insertions, 57 deletions
diff --git a/TODO.md b/TODO.md
index f8a0fa31..5e79e9d0 100644
--- a/TODO.md
+++ b/TODO.md
@@ -1,23 +1,6 @@
 
 ## In Progress
 
-x webcapture `size_bytes`/`size (consistency with file and fileset)
-x final decision on `version` field
-    => useful for repositories with multiple versions as incrementing integers
-    => also useful for "unstructuring" some identifiers (arxiv, zenodo DOIs)
-    => but CSL wants to use it (only?) for software versions
-    => what about book editions, or draft revisions?
-    => let's keep, but carefully document scope
-x verifiers for all extid types (including new ark, mag)
-x creation of editgroup via auto_batch needs extra checks
-- test: edit_extra set for each entity type
-- merge new importers branch
-    => fix schema changes
-    => use new schema fields
-    => tests
-- update guide with new schema
-- elasticsearch schema changes (and transforms)
-
 ## Next Up
 
 - update existing 1.5 mil longtail OA PDFs with container/ISSN-L
@@ -29,34 +12,6 @@ x creation of editgroup via auto_batch needs extra checks
 - author (contrib) names not getting included in search (unless explicit)
 - fatcat flask lookup ValueError should return 4xx (and message?)
 
-## Next Schema Iteration (0.3.0)
-
-Changes to SQL (and swagger):
-
-X missing SQL indices: `ENTITY_edit.editgroup_id, ENTITY_edit.ident_id`
-X structured names in contribs (given/sur)
-X `release_status` => `release_stage`
-X size on webcapture CDX lines (we fetch for sha256 anyways, so easy to calculate)
-X `ark_id` release identifier
-X `mag_id` (microsoft academic graph) release identifier
-
-X `withdrawn_date`, `withdrawn_state`, and retraction as a release stage
-    => and `withdrawn_year`?
-X subtitle as a string field
-    => but what about translation? `original_subtitle`? just combine them?
-    => combine in elasticsearch 'title' field
-X releases: 'number' (eg, report numbers) and 'version' (for numbered variants) fields
-
-Changes to swagger only:
-
-- refactor entity mutation (CUD) endpoints to be like `/editgroup/{editgroup_id}/release/{ident}`
-    => changes editgroup_id from query param to URL param
-- changelog API endpoint should needs expand=editors option
-    => editors in a bunch of other return types also?
-- include 'created' in editgroup object (already in SQL)
-x FileEntityUrls => FileEntityUrl (and similar)
-? refactor bulk POST to include editgroup plus array of entity objects (instead of just a couple fields as query params)
-
 ## Next Full Release "Touch"
 
 Will update all release entities (or at least all Crossref-derived entities).
@@ -69,6 +24,7 @@ Want to minimize edit counts, so will bundle a bunch of changes
 
 ## Production Public Launch Blockers
 
+- update edit flow
 - view edit revisions in webface
 - audit fatcat metadata for CC-0
 - guide updates for auth
@@ -111,7 +67,7 @@ Want to minimize edit counts, so will bundle a bunch of changes
 - page-one.live.cf.public.springer.com seems to serve up bogus one-pagers; should exclude
 - QA sentry has very little host info; also not URL of request
 - elastic schemas:
-    release: drop revision?; container_id; creator_id
+    release: creator_id
         should `release_year` be of date type, instead of int?
     files: domain list; mimetype; release count; url count; web/publisher/etc;
         size; has_md5/sha256/sha1; in_ia, in_shadow
@@ -130,7 +86,7 @@ Want to minimize edit counts, so will bundle a bunch of changes
 
 - `poster` as a `release_type`
 - "revert editgroup" mechanism (creates new editgroup)
-- can guess some `release_status` of files by looking at wayback date vs.
+- can guess some `release_stage` of files by looking at wayback date vs.
   published date
 - ORCID apparently has 37 mil "work activities" (patents, etc), and only 14 mil
   unique DOIs; could import those other "work activities"? do they have
@@ -138,7 +94,7 @@ Want to minimize edit counts, so will bundle a bunch of changes
 - use https://github.com/codelucas/newspaper to extract fulltext+metadata from HTML crawls
 - `fatcat-auth` tool should support more caveats, both when generating new or mutating existing tokens
 - fast path to skip recursive redirect checks for bulk inserts
-- when getting "wip" entities, require a parameter ("allow_wip"), else get a 404
+- when API GET-ing "wip" entities, require a parameter ("allow_wip"), else get a 404
 - maybe better 'success' return message? eg, "success: true" flag
 - idea: allow users to generate their own editgroup UUIDs, to reduce a round
   trips and "hanging" editgroups (created but never edited)
@@ -169,7 +125,6 @@ Want to minimize edit counts, so will bundle a bunch of changes
     "Full title page with Editorial board (with Elsevier tree)"
     "Advisory Board Editorial Board"
 - better/complete reltypes probably good (eg, list of IRs, academic domain)
-- 'expand' in lookups (derp! for single hit lookups)
 - include crossref-capitalized DOI in extra
 - manifest: multiple URLs per SHA1
 - crossref: relations ("is-preprint-of")
@@ -177,16 +132,12 @@ Want to minimize edit counts, so will bundle a bunch of changes
 - special "alias" DOIs... in crossref metadata?
 
 new importers:
-- pubmed (medline) (filtered)
-    => and/or, use pubmed ID lookups on crossref import
-- arxiv.org
 - DOAJ
 - CORE (filtered)
 - semantic scholar (up to 39 million; includes author de-dupe)
 
 ## Guide / Book / Style
 
-- release_type, release_status, url.rel schemas (enforced in API)
 - more+better terms+policies: https://tosdr.org/index.html
 
 ## Fun Features
@@ -199,7 +150,6 @@ new importers:
     => create edit, redirect user to editgroup submit page
 - python client tool and library in pypi
     => or maybe rust?
-- bibtext (etc) export
 
 ## Metadata Harvesting
 
@@ -207,9 +157,10 @@ new importers:
 
 ## Schema / Entity Fields
 
+- file type/scope/coverage: "fulltext", "abstract", etc
 - elastic transform should only include authors, not editors (?)
-- `retracted`, `translation`, and perhaps `corrected` as flags on releases, instead of release_status?
-    => see notes file on retractions, etc
+- `translation_of` field on releases (or similar/general). `retraction_of` to a
+  specific release? `alias_of`/`duplicate_of`
 - 'part-of' relation for releases (release to release, eg for book chapters) and possibly containers
 - `container_type` for containers (journal, conference, book series, etc)
     => in schema, needs vocabulary and implementation
@@ -222,7 +173,6 @@ new importers:
 
 - include that ISO library to do lang/country name decodes
 - container-name when no `container_id`. eg: 10.1016/b978-0-08-037302-7.50022-7
-- fileset/webcapture webface anything
 
 ## Other / Backburner
author	Bryan Newbold <bnewbold@robocracy.org>	2019-05-20 20:36:04 -0700
committer	Bryan Newbold <bnewbold@robocracy.org>	2019-05-20 20:36:04 -0700
commit	27f5bc0396ecc05ad539e62763cb3116e3d4cfa8 (patch)
tree	f47cf1e7a8e09490848667cec156021b0c04168c /TODO.md
parent	038395d5975b8dd0c790bf35cb020d1cd4e6cb46 (diff)
download	fatcat-27f5bc0396ecc05ad539e62763cb3116e3d4cfa8.tar.gz fatcat-27f5bc0396ecc05ad539e62763cb3116e3d4cfa8.zip