diff options
-rw-r--r-- | proposals/20200807_dblp.md | 17 |
1 files changed, 10 insertions, 7 deletions
diff --git a/proposals/20200807_dblp.md b/proposals/20200807_dblp.md index b955268f..8569712e 100644 --- a/proposals/20200807_dblp.md +++ b/proposals/20200807_dblp.md @@ -35,17 +35,20 @@ Fulltext ingest: ## Plan -- get martin review of this plan +x get martin review of this plan x read full XML DTD -- scrape container metadata (for ~6k containers): ISSN, Wikidata QID, name +x scrape container metadata (for ~6k containers): ISSN, Wikidata QID, name => selectolax? - => title, issn, wikidata, "is OA" -- implement basic release import, with tests (no container/creator linking) + => title, issn, wikidata +x implement basic release import, with tests (no container/creator linking) => surface any unexpected issues -- estimate number of entities with/without external identifier (DOI) +x estimate number of entities with/without external identifier (DOI) + Counter({'total': 7953365, 'has-doi': 4277307, 'skip': 2953841, 'skip-key-type': 2640968, 'skip-arxiv-corr': 312872, 'skip-title': 1, 'insert': 0, 'update': 0, 'exists': 0}) +/ update container and creator schemas to have lookup-able dblp identifiers (creator:`dblp_pid`, container:`dblp_prefix`) +. run orcid import/update of creators +- container creator/update for `dblp_prefix` + => chocula import first? - investigate journal+conference ISSN mapping -- run orcid import/update of creators -- update container and creator schemas to have lookup-able dblp identifiers (creator:`dblp_pid`, container:`dblp_prefix`) ## Creator Metadata |