summaryrefslogtreecommitdiffstats
path: root/notes/cleanup_tasks.txt
blob: 43b52836615d1822ececa77941dfb6ae2f18997e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

Cambridge Chemical Database (NCI)

    doi_prefix:10.3406 release_type:article
    doi_prefix:10.14469 release_type:article

    193,346+ entities

    should be 'dataset' not 'article'

    datacite importer

Frontiers

    Frontiers non-PDF abstracts, which have DOIs like `10.3389/conf.*`. Should
    crawl these, but `release_type` should be... `abstract`? There are at least
    18,743 of these. Should be fixed in both crossref-bot, then a retro-active
    cleanup.

Applied Physics Letters

    doi_prefix:10.2172 title:10.2172

    For 700+ entities, the title is the DOI number. They all seem to be
    "deleted" DOIs, and should be marked as stubs.