diff options
Diffstat (limited to 'python')
-rw-r--r-- | python/notes/wikipedia_citations_2020-07-14.md | 39 |
1 files changed, 39 insertions, 0 deletions
diff --git a/python/notes/wikipedia_citations_2020-07-14.md b/python/notes/wikipedia_citations_2020-07-14.md index d079312..7eca025 100644 --- a/python/notes/wikipedia_citations_2020-07-14.md +++ b/python/notes/wikipedia_citations_2020-07-14.md @@ -49,3 +49,42 @@ Most referenced on WP: * https://git.archive.org/webgroup/fatcat/-/blob/10eb30251f89806cb7a0f147f427c5ea7e5f9941/proposals/2021-01-29_citation_api.md * `source_wikipedia_article: Optional[str]` + +About 29M citations. IDlist uses various id types: + +``` +$ cat minimal_dataset.json | + jq -rc 'select(.ID_list != null) | .ID_list' | + tr ',' '\n' | + tr -d '{}' | + sed -e 's@^ *@@' | + cut -d '=' -f 1 | + sort | + uniq -c | + sort -nr +``` + +Except artifacts: + +``` +2160818 ISBN +1442176 DOI + 825970 PMID + 353425 ISSN + 279369 PMC + 185742 OCLC + 181375 BIBCODE + 110920 JSTOR + 47601 ARXIV + 15202 LCCN + 12878 MR + 8270 ASIN + 6293 OL + 3790 SSRN + 3013 ZBL + 413 OSTI + 357 JFM + 277 USENETID + 85 RFC + 78 ISMN +``` |