diff options
author | Bryan Newbold <bnewbold@archive.org> | 2020-09-03 18:27:44 -0700 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2020-09-03 18:27:44 -0700 |
commit | 043b35040e4385c674267aa88c4056bdfdd9cb6c (patch) | |
tree | 39fc494e496e86017692c19b8b15acbc785c8bd4 | |
parent | 3ad7a3c48de77c00ad0e777d24021f8db340912c (diff) | |
download | chocula-043b35040e4385c674267aa88c4056bdfdd9cb6c.tar.gz chocula-043b35040e4385c674267aa88c4056bdfdd9cb6c.zip |
update notes and explore
-rw-r--r-- | TODO.md | 16 | ||||
-rw-r--r-- | notes/explore.md | 11 |
2 files changed, 26 insertions, 1 deletions
@@ -1,4 +1,5 @@ + priorities: - coverage stats, particularly for longtail - `is_active` coverage @@ -10,9 +11,22 @@ priorities: ## Sources -- unpaywall journal-level classification +- preservation coverage + x hathitrust (huge!) + https://www.hathitrust.org/hathifiles_description + x PKP PLN (ONIX) + https://pkp.sfu.ca/pkp-pn/ + http://pkp.sfu.ca/files/pkppn/onix.csv + => Scholars Portal (canada) + received ONIX XML, hoping for KBART format + => Cariniana + => National Digital Preservation Program, China + => Library of Congress +- additional hathitrust (many more ISSNs/journals) +- unpaywall journal-level classification (OA color) => ask for journal-level dump or do munging - jurn matches + => somebody on github did an openrefine match - public scopus list (?) - scrape/munge public clarivate dumps - repositories (?) diff --git a/notes/explore.md b/notes/explore.md index 5f23d35..c25404d 100644 --- a/notes/explore.md +++ b/notes/explore.md @@ -12,6 +12,17 @@ PKP PLN numbers result in? So about 60k releases. +How about Hathitrust? + + select count(*), sum(journal.release_count), sum(journal.preserved_count) from journal join directory on journal.issnl = directory.issnl where directory.slug = 'hathitrust'; + + count(*) sum(journal.release_count) sum(journal.preserved_count) + ---------- -------------------------- ---------------------------- + 26628 48160184 36905342 + +Much larger potential impact, of 11+ million releases, though unclear how many +are acutally in the hathitrust archives. + ## 2020-06-23 Where do back ISSN-Ls come from? Answer: exiting fatcat metadata. |