aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2019-07-31 21:57:47 -0700
committerBryan Newbold <bnewbold@archive.org>2019-07-31 21:57:47 -0700
commit5b6a0283aa030a34cef4b9b83d281690937cdfae (patch)
tree97539f9d10de11e57a2740abfcd3b7d496c0d245
parentd5295bb1c06c092fe2b0b024207adad1351e3f44 (diff)
downloadchocula-5b6a0283aa030a34cef4b9b83d281690937cdfae.tar.gz
chocula-5b6a0283aa030a34cef4b9b83d281690937cdfae.zip
commit TODO list
-rw-r--r--TODO.md37
1 files changed, 37 insertions, 0 deletions
diff --git a/TODO.md b/TODO.md
new file mode 100644
index 0000000..cfbc4d5
--- /dev/null
+++ b/TODO.md
@@ -0,0 +1,37 @@
+
+## Chocula
+
+priorities:
+x fraction/which are pointing to wayback
+- coverage stats, particularly for longtail
+x wikidata linkage (prep for wikimania)
+- "still in print" flag
+- clean out invalid ISSN-L from fatcat
+- don't list dead URLs in fatcat
+- summary report of some of above
+- update all fatcat (wikidata QID, urls, fixed ISSN-L, etc)
+
+
+- public scopus list (?)
+- scrape/munge public clarivate dumps
+- import JURN into fatcat (one way or another)
+ => try to title match and get ISSN-L
+ => manual lookups for remainders?
+- dump json
+- "GOLD" importer (for scopus/WoS)
+- check that all fields actually getting imported reasonably
+- homepage crawl/status script
+
+- KBART imports (with JSON, so only a single row per slug)
+- imprint/publisher distinction (publisher is big group)
+- summary table should be superset of fatcat table
+- add timestamp columns to enable updates?
+- fatcat export (filters for changes to make, writes out as JSON)
+- update_url_status (needs re-write)
+- index -> directory
+- log out index issues (duplicate ISSN-L, etc) to a file
+- validate against GOLD OA list
+- decide what to do with JURN... match? fuzzy match? create missing fatcat?
+- lots of bogus ISSN-L, like 9999-9999 or 0000-0000. should both validate
+ check digit and require an ISSN-L to actually exist.
+