aboutsummaryrefslogtreecommitdiffstats
path: root/python
diff options
context:
space:
mode:
authorMartin Czygan <martin.czygan@gmail.com>2021-03-30 02:16:21 +0200
committerMartin Czygan <martin.czygan@gmail.com>2021-03-30 02:16:21 +0200
commitd01ace6b26c0becc3d1eed70a903de0aaa89ac7a (patch)
tree2cd1fc0777a64d717a6f62a55300ead5a2a4be13 /python
parentdfc258b352d6d3a7f5b56ffdadebfb8a13260966 (diff)
downloadrefcat-d01ace6b26c0becc3d1eed70a903de0aaa89ac7a.tar.gz
refcat-d01ace6b26c0becc3d1eed70a903de0aaa89ac7a.zip
wp: id list parser
Diffstat (limited to 'python')
-rw-r--r--python/notes/wikipedia_citations_2020-07-14.md39
1 files changed, 39 insertions, 0 deletions
diff --git a/python/notes/wikipedia_citations_2020-07-14.md b/python/notes/wikipedia_citations_2020-07-14.md
index d079312..7eca025 100644
--- a/python/notes/wikipedia_citations_2020-07-14.md
+++ b/python/notes/wikipedia_citations_2020-07-14.md
@@ -49,3 +49,42 @@ Most referenced on WP:
* https://git.archive.org/webgroup/fatcat/-/blob/10eb30251f89806cb7a0f147f427c5ea7e5f9941/proposals/2021-01-29_citation_api.md
* `source_wikipedia_article: Optional[str]`
+
+About 29M citations. IDlist uses various id types:
+
+```
+$ cat minimal_dataset.json |
+ jq -rc 'select(.ID_list != null) | .ID_list' |
+ tr ',' '\n' |
+ tr -d '{}' |
+ sed -e 's@^ *@@' |
+ cut -d '=' -f 1 |
+ sort |
+ uniq -c |
+ sort -nr
+```
+
+Except artifacts:
+
+```
+2160818 ISBN
+1442176 DOI
+ 825970 PMID
+ 353425 ISSN
+ 279369 PMC
+ 185742 OCLC
+ 181375 BIBCODE
+ 110920 JSTOR
+ 47601 ARXIV
+ 15202 LCCN
+ 12878 MR
+ 8270 ASIN
+ 6293 OL
+ 3790 SSRN
+ 3013 ZBL
+ 413 OSTI
+ 357 JFM
+ 277 USENETID
+ 85 RFC
+ 78 ISMN
+```