aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2020-06-23 23:15:47 -0700
committerBryan Newbold <bnewbold@archive.org>2020-06-23 23:15:47 -0700
commit3a6469f89261b9bbcd28632094a349f66ef27ebb (patch)
treece01bcd96244e4f57504de675f68fae54d6e1285
parent1aea4911bf336570a1b6b32d75eced523c329ed6 (diff)
downloadchocula-3a6469f89261b9bbcd28632094a349f66ef27ebb.tar.gz
chocula-3a6469f89261b9bbcd28632094a349f66ef27ebb.zip
update TODO
-rw-r--r--TODO.md36
1 files changed, 15 insertions, 21 deletions
diff --git a/TODO.md b/TODO.md
index 8b4cdb9..29b1fe0 100644
--- a/TODO.md
+++ b/TODO.md
@@ -1,41 +1,34 @@
priorities:
- coverage stats, particularly for longtail
-- "still in print" flag
+- `is_active` coverage
- clean out invalid ISSN-L from fatcat
- don't list dead URLs in fatcat
+- SIM missing/bad ISSNs
+ Counter({'total': 14860, 'inserted': 11421, 'missing-issn': 2863, 'no-match': 555, 'duplicate': 21})
+
## Sources
-- PKP OJS index
- => mostly redundant with DOAJ?
-- dblp conferences/series
- => no container-only metadata dump available?
-- MAG
-- vanished journals
- => https://github.com/njahn82/vanished_journals
- => https://isaw.nyu.edu/publications/awol-index/
-- sherpa/romeo refactor (no moreo updates)
-- entrez refactor (no moreo updates)
- unpaywall journal-level classification
=> ask for journal-level dump or do munging
-- SERP homepage munging
-- repositories (?)
- jurn matches
+- public scopus list (?)
+- scrape/munge public clarivate dumps
+- repositories (?)
- datacite metadata (?)
=> via munging
+- dblp conferences/series
+ => no container-only metadata dump available?
+- SERP homepage munging
- currated quality lists (eg, national libraries)
- => https://www.arc.gov.au/excellence-research-australia
-- public scopus list (?)
-- scrape/munge public clarivate dumps
- "GOLD" importer (for scopus/WoS)
-- ISSN metadata from portal.issn.org
- scraping is done
- only for ISSN-Ls from existing table
- https://portal.issn.org/resource/ISSN/1561-7645?format=json
- would require a good deal of munging (eg, MARC region -> ISO) (?)
+- PKP OJS index
+ => mostly redundant with DOAJ?
improvements:
+- sherpa/romeo refactor (no moreo updates)
+- entrez refactor (no moreo updates)
- entrez: "NLM Unique Id"
- JURN: finish
- crossref: empty string identifiers?
@@ -54,6 +47,7 @@ improvements:
## Schema
+- `original_name`
- `platform` column in database
- `container_type` column in database
=> munge this in various ways