diff options
author | Bryan Newbold <bnewbold@archive.org> | 2020-06-23 23:15:47 -0700 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2020-06-23 23:15:47 -0700 |
commit | 3a6469f89261b9bbcd28632094a349f66ef27ebb (patch) | |
tree | ce01bcd96244e4f57504de675f68fae54d6e1285 | |
parent | 1aea4911bf336570a1b6b32d75eced523c329ed6 (diff) | |
download | chocula-3a6469f89261b9bbcd28632094a349f66ef27ebb.tar.gz chocula-3a6469f89261b9bbcd28632094a349f66ef27ebb.zip |
update TODO
-rw-r--r-- | TODO.md | 36 |
1 files changed, 15 insertions, 21 deletions
@@ -1,41 +1,34 @@ priorities: - coverage stats, particularly for longtail -- "still in print" flag +- `is_active` coverage - clean out invalid ISSN-L from fatcat - don't list dead URLs in fatcat +- SIM missing/bad ISSNs + Counter({'total': 14860, 'inserted': 11421, 'missing-issn': 2863, 'no-match': 555, 'duplicate': 21}) + ## Sources -- PKP OJS index - => mostly redundant with DOAJ? -- dblp conferences/series - => no container-only metadata dump available? -- MAG -- vanished journals - => https://github.com/njahn82/vanished_journals - => https://isaw.nyu.edu/publications/awol-index/ -- sherpa/romeo refactor (no moreo updates) -- entrez refactor (no moreo updates) - unpaywall journal-level classification => ask for journal-level dump or do munging -- SERP homepage munging -- repositories (?) - jurn matches +- public scopus list (?) +- scrape/munge public clarivate dumps +- repositories (?) - datacite metadata (?) => via munging +- dblp conferences/series + => no container-only metadata dump available? +- SERP homepage munging - currated quality lists (eg, national libraries) - => https://www.arc.gov.au/excellence-research-australia -- public scopus list (?) -- scrape/munge public clarivate dumps - "GOLD" importer (for scopus/WoS) -- ISSN metadata from portal.issn.org - scraping is done - only for ISSN-Ls from existing table - https://portal.issn.org/resource/ISSN/1561-7645?format=json - would require a good deal of munging (eg, MARC region -> ISO) (?) +- PKP OJS index + => mostly redundant with DOAJ? improvements: +- sherpa/romeo refactor (no moreo updates) +- entrez refactor (no moreo updates) - entrez: "NLM Unique Id" - JURN: finish - crossref: empty string identifiers? @@ -54,6 +47,7 @@ improvements: ## Schema +- `original_name` - `platform` column in database - `container_type` column in database => munge this in various ways |