diff options
author | Bryan Newbold <bnewbold@archive.org> | 2019-09-03 13:49:26 -0700 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2019-09-03 13:49:26 -0700 |
commit | d9c065edd127e5719f2694f7d8f68c8079b4e38e (patch) | |
tree | 7de66626527931e27c61c04153cbfd83c35a5972 | |
parent | 1cc6dc4749750bc5e51c9877018e474367a64384 (diff) | |
download | chocula-d9c065edd127e5719f2694f7d8f68c8079b4e38e.tar.gz chocula-d9c065edd127e5719f2694f7d8f68c8079b4e38e.zip |
update TODO
-rw-r--r-- | TODO.md | 11 |
1 files changed, 10 insertions, 1 deletions
@@ -10,7 +10,14 @@ x wikidata linkage (prep for wikimania) - don't list dead URLs in fatcat - summary report of some of above - update all fatcat (wikidata QID, urls, fixed ISSN-L, etc) +- when updating fatcat: + if title is "blah, Proceedings of the", set type to proceedings and re-write title + if title like "Workshop on", set type +source improvements: +- entrez: "NLM Unique Id" +- JUFO: finish +- crossref: empty string identifiers? - public scopus list (?) - scrape/munge public clarivate dumps @@ -22,13 +29,15 @@ x wikidata linkage (prep for wikimania) - check that all fields actually getting imported reasonably - homepage crawl/status script +- could poll portal.issn.org like: + https://portal.issn.org/resource/ISSN/1561-7645?format=json + would require a good deal of munging (eg, MARC region -> ISO) - KBART imports (with JSON, so only a single row per slug) - imprint/publisher distinction (publisher is big group) - summary table should be superset of fatcat table - add timestamp columns to enable updates? - fatcat export (filters for changes to make, writes out as JSON) - update_url_status (needs re-write) -- index -> directory - log out index issues (duplicate ISSN-L, etc) to a file - validate against GOLD OA list - decide what to do with JURN... match? fuzzy match? create missing fatcat? |