diff options
author | Martin Czygan <martin.czygan@gmail.com> | 2021-03-30 02:16:21 +0200 |
---|---|---|
committer | Martin Czygan <martin.czygan@gmail.com> | 2021-03-30 02:16:21 +0200 |
commit | d01ace6b26c0becc3d1eed70a903de0aaa89ac7a (patch) | |
tree | 2cd1fc0777a64d717a6f62a55300ead5a2a4be13 | |
parent | dfc258b352d6d3a7f5b56ffdadebfb8a13260966 (diff) | |
download | refcat-d01ace6b26c0becc3d1eed70a903de0aaa89ac7a.tar.gz refcat-d01ace6b26c0becc3d1eed70a903de0aaa89ac7a.zip |
wp: id list parser
-rw-r--r-- | python/notes/wikipedia_citations_2020-07-14.md | 39 |
1 files changed, 39 insertions, 0 deletions
diff --git a/python/notes/wikipedia_citations_2020-07-14.md b/python/notes/wikipedia_citations_2020-07-14.md index d079312..7eca025 100644 --- a/python/notes/wikipedia_citations_2020-07-14.md +++ b/python/notes/wikipedia_citations_2020-07-14.md @@ -49,3 +49,42 @@ Most referenced on WP: * https://git.archive.org/webgroup/fatcat/-/blob/10eb30251f89806cb7a0f147f427c5ea7e5f9941/proposals/2021-01-29_citation_api.md * `source_wikipedia_article: Optional[str]` + +About 29M citations. IDlist uses various id types: + +``` +$ cat minimal_dataset.json | + jq -rc 'select(.ID_list != null) | .ID_list' | + tr ',' '\n' | + tr -d '{}' | + sed -e 's@^ *@@' | + cut -d '=' -f 1 | + sort | + uniq -c | + sort -nr +``` + +Except artifacts: + +``` +2160818 ISBN +1442176 DOI + 825970 PMID + 353425 ISSN + 279369 PMC + 185742 OCLC + 181375 BIBCODE + 110920 JSTOR + 47601 ARXIV + 15202 LCCN + 12878 MR + 8270 ASIN + 6293 OL + 3790 SSRN + 3013 ZBL + 413 OSTI + 357 JFM + 277 USENETID + 85 RFC + 78 ISMN +``` |