diff options
Diffstat (limited to 'python/notes/wikipedia_citations_2020-07-14.md')
-rw-r--r-- | python/notes/wikipedia_citations_2020-07-14.md | 51 |
1 files changed, 51 insertions, 0 deletions
diff --git a/python/notes/wikipedia_citations_2020-07-14.md b/python/notes/wikipedia_citations_2020-07-14.md new file mode 100644 index 0000000..d079312 --- /dev/null +++ b/python/notes/wikipedia_citations_2020-07-14.md @@ -0,0 +1,51 @@ +# Notes on wikipedia_citations_2020-07-14 + +* https://archive.org/details/wikipedia_citations_2020-07-14 +* https://zenodo.org/record/3940692 +* https://github.com/Harshdeep1996/cite-classifications-wiki + +``` +. +├── [6.6G] citations_from_wikipedia.zip +├── [819M] lookup_data.zip +├── [1.4G] minimal_dataset.zip +├── [ 91K] wikipedia_citations_2020-07-14_archive.torrent +├── [2.0K] wikipedia_citations_2020-07-14_files.xml +├── [ 20K] wikipedia_citations_2020-07-14_meta.sqlite +└── [1.3K] wikipedia_citations_2020-07-14_meta.xml +``` + +Using `parquet-tools cat --json` +(https://stackoverflow.com/questions/36140264/inspect-parquet-from-command-line) +to convert to json. + +About 1442176 DOI, 1027006 unique. + +Most referenced on WP: + +``` + 4393 10.1073/pnas.242603899 + 3182 10.1101/gr.2596504 + 2307 10.24436/2 + 2079 10.1038/ng1285 + 1447 10.1007/BF00171763 + 1357 10.1051/0004-6361:20078357 + 1346 10.1038/nature04209 + 1293 10.1016/0378-1119(94)90802-8 + 1246 10.1016/S0378-1119(97)00411-3 + 927 10.1111/j.1096-3642.2005.00153.x + 738 10.1016/j.cell.2006.09.026 + 657 10.1101/gr.4039406 + 631 10.1101/gr.6.9.791 + 607 10.1038/msb4100134 + 602 10.1101/gr.143000 + 591 10.5194/hess-11-1633-2007 + 531 10.1101/gr.GR1547R + 492 10.1080/002229300299282 + 480 10.1101/gr.2576704 + 460 10.1093/nar/gkj139 +``` + +* https://git.archive.org/webgroup/fatcat/-/blob/10eb30251f89806cb7a0f147f427c5ea7e5f9941/proposals/2021-01-29_citation_api.md +* `source_wikipedia_article: Optional[str]` + |