aboutsummaryrefslogtreecommitdiffstats
path: root/notes/awol-index.md
blob: a999893a417c9207d091288977d353e401dcfc09 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

Original source: <https://isaw.nyu.edu/publications/awol-index/>

Copyright statement:

    The production and publication of The AWOL Index contributes significant
    additional value both to the content itself and to its presentation and
    utility. This new intellectual property is covered by copyright (2015, New
    York University). The full content of The AWOL Index, both in HTML and JSON
    formats, is published under the terms of a Creative Commons
    Attribution-ShareAlike 4.0 International License .

Extracting ISSN-L, Title, URL from this corpus.

Commands:

    unzip awol-index-json.zip
    fd -I .json json/ | parallel cat {} | jq . -c | pv -l > awol-index-combined.json
    cat awol-index-combined.json | rg '"is_part_of":null' > awol-index-top.json
    cat awol-index-top.json | rg '"issn":' > awol-index-top-issn.json

    wc -l awol-index-combined.json awol-index-top.json awol-index-top-issn.json
    52006 awol-index-combined.json
     1302 awol-index-top.json
      503 awol-index-top-issn.json

    rg '"issn":' awol-index-top.json | wc -l
    503

    cat awol-index-combined.json | jq .identifiers.issn.generic -c | rg -v ^null | sort -u | wc -l
    753

    cat awol-index-top.json | jq .identifiers.issn.generic -c | rg -v ^null | sort -u | wc -l
    486

    cat awol-index-top-issn.json | jq .identifiers.issn.generic -c | rg -v ^null | sort -u | wc -l
    486