diff options
author | Bryan Newbold <bnewbold@archive.org> | 2019-07-31 21:57:47 -0700 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2019-07-31 21:57:47 -0700 |
commit | 5b6a0283aa030a34cef4b9b83d281690937cdfae (patch) | |
tree | 97539f9d10de11e57a2740abfcd3b7d496c0d245 | |
parent | d5295bb1c06c092fe2b0b024207adad1351e3f44 (diff) | |
download | chocula-5b6a0283aa030a34cef4b9b83d281690937cdfae.tar.gz chocula-5b6a0283aa030a34cef4b9b83d281690937cdfae.zip |
commit TODO list
-rw-r--r-- | TODO.md | 37 |
1 files changed, 37 insertions, 0 deletions
@@ -0,0 +1,37 @@ + +## Chocula + +priorities: +x fraction/which are pointing to wayback +- coverage stats, particularly for longtail +x wikidata linkage (prep for wikimania) +- "still in print" flag +- clean out invalid ISSN-L from fatcat +- don't list dead URLs in fatcat +- summary report of some of above +- update all fatcat (wikidata QID, urls, fixed ISSN-L, etc) + + +- public scopus list (?) +- scrape/munge public clarivate dumps +- import JURN into fatcat (one way or another) + => try to title match and get ISSN-L + => manual lookups for remainders? +- dump json +- "GOLD" importer (for scopus/WoS) +- check that all fields actually getting imported reasonably +- homepage crawl/status script + +- KBART imports (with JSON, so only a single row per slug) +- imprint/publisher distinction (publisher is big group) +- summary table should be superset of fatcat table +- add timestamp columns to enable updates? +- fatcat export (filters for changes to make, writes out as JSON) +- update_url_status (needs re-write) +- index -> directory +- log out index issues (duplicate ISSN-L, etc) to a file +- validate against GOLD OA list +- decide what to do with JURN... match? fuzzy match? create missing fatcat? +- lots of bogus ISSN-L, like 9999-9999 or 0000-0000. should both validate + check digit and require an ISSN-L to actually exist. + |