aboutsummaryrefslogtreecommitdiffstats
path: root/notes/should_include.md
blob: 6033eb6c9e333b0d9f56d416470f0b59e130df70 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51

## Queries

    pandemic influenza
    epidemic influenza
    pandemic ventilator
    SARS
    sars-cov-2
    covid-19

## Should not include?

Duplicate releases:

- zenodo versions
- figshare versions
    eg "Coronavirus Research on Figshare" (12 versions)

Remove anything researchgate? Quality is low. DOI prefix: 

"TOF-SARS" => time of flight physics thing

These should not end up in the corpus:

    "Description of a new Norwegian star-fish"
    by M. Sars
    https://fatcat.wiki/release/ngp3qkqf4fccbdlxz2u4h4taoe

## Specific Articles

Expect these to end up in the corpus (they are not already):

    "100 Years of Medical Countermeasures and Pandemic Influenza Preparedness"
    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6187768/


## Hacks

    10.2210/pdb4njl/pdb
    no release_type
    => dataset
    => published

    no release_type
    title starts "figure"
    => graphic/figure, skip it

    journal: "Emerald Expert Briefings"
    container_id:fnllqvywjbec5eumrbavqipfym
    => skip it