aboutsummaryrefslogtreecommitdiffstats
path: root/notes/awol-index.md
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2020-06-23 20:10:46 -0700
committerBryan Newbold <bnewbold@archive.org>2020-06-23 20:10:46 -0700
commit4be1ec0cdb382d7b545eeb4c451cc123d9199d95 (patch)
treee1e8131e2caef749496dbda8b83f355ac296d6ce /notes/awol-index.md
parent05f6a1b143a521688ccb3a59f300b44f5c04c6be (diff)
downloadchocula-4be1ec0cdb382d7b545eeb4c451cc123d9199d95.tar.gz
chocula-4be1ec0cdb382d7b545eeb4c451cc123d9199d95.zip
commit notes and issnl_prefix.py helper script
Diffstat (limited to 'notes/awol-index.md')
-rw-r--r--notes/awol-index.md38
1 files changed, 38 insertions, 0 deletions
diff --git a/notes/awol-index.md b/notes/awol-index.md
new file mode 100644
index 0000000..a999893
--- /dev/null
+++ b/notes/awol-index.md
@@ -0,0 +1,38 @@
+
+Original source: <https://isaw.nyu.edu/publications/awol-index/>
+
+Copyright statement:
+
+ The production and publication of The AWOL Index contributes significant
+ additional value both to the content itself and to its presentation and
+ utility. This new intellectual property is covered by copyright (2015, New
+ York University). The full content of The AWOL Index, both in HTML and JSON
+ formats, is published under the terms of a Creative Commons
+ Attribution-ShareAlike 4.0 International License .
+
+Extracting ISSN-L, Title, URL from this corpus.
+
+Commands:
+
+ unzip awol-index-json.zip
+ fd -I .json json/ | parallel cat {} | jq . -c | pv -l > awol-index-combined.json
+ cat awol-index-combined.json | rg '"is_part_of":null' > awol-index-top.json
+ cat awol-index-top.json | rg '"issn":' > awol-index-top-issn.json
+
+ wc -l awol-index-combined.json awol-index-top.json awol-index-top-issn.json
+ 52006 awol-index-combined.json
+ 1302 awol-index-top.json
+ 503 awol-index-top-issn.json
+
+ rg '"issn":' awol-index-top.json | wc -l
+ 503
+
+ cat awol-index-combined.json | jq .identifiers.issn.generic -c | rg -v ^null | sort -u | wc -l
+ 753
+
+ cat awol-index-top.json | jq .identifiers.issn.generic -c | rg -v ^null | sort -u | wc -l
+ 486
+
+ cat awol-index-top-issn.json | jq .identifiers.issn.generic -c | rg -v ^null | sort -u | wc -l
+ 486
+