diff options
| author | Bryan Newbold <bnewbold@robocracy.org> | 2020-12-11 11:32:07 -0800 | 
|---|---|---|
| committer | Bryan Newbold <bnewbold@robocracy.org> | 2020-12-17 23:03:08 -0800 | 
| commit | d312ddfa0340b702ac858b08f8f91f785048af0b (patch) | |
| tree | 9c0a359adcbd95f257021ff2ae4c3115ac95a04b | |
| parent | 543dab55ade2cf2d4744c478691f085297b2545a (diff) | |
| download | fatcat-d312ddfa0340b702ac858b08f8f91f785048af0b.tar.gz fatcat-d312ddfa0340b702ac858b08f8f91f785048af0b.zip | |
commit DBLP proposal progress
| -rw-r--r-- | proposals/20200807_dblp.md | 17 | 
1 files changed, 10 insertions, 7 deletions
| diff --git a/proposals/20200807_dblp.md b/proposals/20200807_dblp.md index b955268f..8569712e 100644 --- a/proposals/20200807_dblp.md +++ b/proposals/20200807_dblp.md @@ -35,17 +35,20 @@ Fulltext ingest:  ## Plan -- get martin review of this plan +x get martin review of this plan  x read full XML DTD -- scrape container metadata (for ~6k containers): ISSN, Wikidata QID, name +x scrape container metadata (for ~6k containers): ISSN, Wikidata QID, name      => selectolax? -    => title, issn, wikidata, "is OA" -- implement basic release import, with tests (no container/creator linking) +    => title, issn, wikidata +x implement basic release import, with tests (no container/creator linking)      => surface any unexpected issues -- estimate number of entities with/without external identifier (DOI) +x estimate number of entities with/without external identifier (DOI) +    Counter({'total': 7953365, 'has-doi': 4277307, 'skip': 2953841, 'skip-key-type': 2640968, 'skip-arxiv-corr': 312872, 'skip-title': 1, 'insert': 0, 'update': 0, 'exists': 0}) +/ update container and creator schemas to have lookup-able dblp identifiers (creator:`dblp_pid`, container:`dblp_prefix`) +. run orcid import/update of creators +- container creator/update for `dblp_prefix` +    => chocula import first?  - investigate journal+conference ISSN mapping -- run orcid import/update of creators -- update container and creator schemas to have lookup-able dblp identifiers (creator:`dblp_pid`, container:`dblp_prefix`)  ## Creator Metadata | 
