summaryrefslogtreecommitdiffstats
path: root/proposals
diff options
context:
space:
mode:
Diffstat (limited to 'proposals')
-rw-r--r--proposals/20200807_dblp.md17
1 files changed, 10 insertions, 7 deletions
diff --git a/proposals/20200807_dblp.md b/proposals/20200807_dblp.md
index b955268f..8569712e 100644
--- a/proposals/20200807_dblp.md
+++ b/proposals/20200807_dblp.md
@@ -35,17 +35,20 @@ Fulltext ingest:
## Plan
-- get martin review of this plan
+x get martin review of this plan
x read full XML DTD
-- scrape container metadata (for ~6k containers): ISSN, Wikidata QID, name
+x scrape container metadata (for ~6k containers): ISSN, Wikidata QID, name
=> selectolax?
- => title, issn, wikidata, "is OA"
-- implement basic release import, with tests (no container/creator linking)
+ => title, issn, wikidata
+x implement basic release import, with tests (no container/creator linking)
=> surface any unexpected issues
-- estimate number of entities with/without external identifier (DOI)
+x estimate number of entities with/without external identifier (DOI)
+ Counter({'total': 7953365, 'has-doi': 4277307, 'skip': 2953841, 'skip-key-type': 2640968, 'skip-arxiv-corr': 312872, 'skip-title': 1, 'insert': 0, 'update': 0, 'exists': 0})
+/ update container and creator schemas to have lookup-able dblp identifiers (creator:`dblp_pid`, container:`dblp_prefix`)
+. run orcid import/update of creators
+- container creator/update for `dblp_prefix`
+ => chocula import first?
- investigate journal+conference ISSN mapping
-- run orcid import/update of creators
-- update container and creator schemas to have lookup-able dblp identifiers (creator:`dblp_pid`, container:`dblp_prefix`)
## Creator Metadata