From d01ace6b26c0becc3d1eed70a903de0aaa89ac7a Mon Sep 17 00:00:00 2001 From: Martin Czygan Date: Tue, 30 Mar 2021 02:16:21 +0200 Subject: wp: id list parser --- python/notes/wikipedia_citations_2020-07-14.md | 39 ++++++++++++++++++++++++++ 1 file changed, 39 insertions(+) (limited to 'python/notes') diff --git a/python/notes/wikipedia_citations_2020-07-14.md b/python/notes/wikipedia_citations_2020-07-14.md index d079312..7eca025 100644 --- a/python/notes/wikipedia_citations_2020-07-14.md +++ b/python/notes/wikipedia_citations_2020-07-14.md @@ -49,3 +49,42 @@ Most referenced on WP: * https://git.archive.org/webgroup/fatcat/-/blob/10eb30251f89806cb7a0f147f427c5ea7e5f9941/proposals/2021-01-29_citation_api.md * `source_wikipedia_article: Optional[str]` + +About 29M citations. IDlist uses various id types: + +``` +$ cat minimal_dataset.json | + jq -rc 'select(.ID_list != null) | .ID_list' | + tr ',' '\n' | + tr -d '{}' | + sed -e 's@^ *@@' | + cut -d '=' -f 1 | + sort | + uniq -c | + sort -nr +``` + +Except artifacts: + +``` +2160818 ISBN +1442176 DOI + 825970 PMID + 353425 ISSN + 279369 PMC + 185742 OCLC + 181375 BIBCODE + 110920 JSTOR + 47601 ARXIV + 15202 LCCN + 12878 MR + 8270 ASIN + 6293 OL + 3790 SSRN + 3013 ZBL + 413 OSTI + 357 JFM + 277 USENETID + 85 RFC + 78 ISMN +``` -- cgit v1.2.3