From d312ddfa0340b702ac858b08f8f91f785048af0b Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Fri, 11 Dec 2020 11:32:07 -0800 Subject: commit DBLP proposal progress --- proposals/20200807_dblp.md | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) (limited to 'proposals') diff --git a/proposals/20200807_dblp.md b/proposals/20200807_dblp.md index b955268f..8569712e 100644 --- a/proposals/20200807_dblp.md +++ b/proposals/20200807_dblp.md @@ -35,17 +35,20 @@ Fulltext ingest: ## Plan -- get martin review of this plan +x get martin review of this plan x read full XML DTD -- scrape container metadata (for ~6k containers): ISSN, Wikidata QID, name +x scrape container metadata (for ~6k containers): ISSN, Wikidata QID, name => selectolax? - => title, issn, wikidata, "is OA" -- implement basic release import, with tests (no container/creator linking) + => title, issn, wikidata +x implement basic release import, with tests (no container/creator linking) => surface any unexpected issues -- estimate number of entities with/without external identifier (DOI) +x estimate number of entities with/without external identifier (DOI) + Counter({'total': 7953365, 'has-doi': 4277307, 'skip': 2953841, 'skip-key-type': 2640968, 'skip-arxiv-corr': 312872, 'skip-title': 1, 'insert': 0, 'update': 0, 'exists': 0}) +/ update container and creator schemas to have lookup-able dblp identifiers (creator:`dblp_pid`, container:`dblp_prefix`) +. run orcid import/update of creators +- container creator/update for `dblp_prefix` + => chocula import first? - investigate journal+conference ISSN mapping -- run orcid import/update of creators -- update container and creator schemas to have lookup-able dblp identifiers (creator:`dblp_pid`, container:`dblp_prefix`) ## Creator Metadata -- cgit v1.2.3