# Notes on wikipedia_citations_2020-07-14 * https://archive.org/details/wikipedia_citations_2020-07-14 * https://zenodo.org/record/3940692 * https://github.com/Harshdeep1996/cite-classifications-wiki ``` . ├── [6.6G] citations_from_wikipedia.zip ├── [819M] lookup_data.zip ├── [1.4G] minimal_dataset.zip ├── [ 91K] wikipedia_citations_2020-07-14_archive.torrent ├── [2.0K] wikipedia_citations_2020-07-14_files.xml ├── [ 20K] wikipedia_citations_2020-07-14_meta.sqlite └── [1.3K] wikipedia_citations_2020-07-14_meta.xml ``` Using `parquet-tools cat --json` (https://stackoverflow.com/questions/36140264/inspect-parquet-from-command-line) to convert to json. About 1442176 DOI, 1027006 unique. Most referenced on WP: ``` 4393 10.1073/pnas.242603899 3182 10.1101/gr.2596504 2307 10.24436/2 2079 10.1038/ng1285 1447 10.1007/BF00171763 1357 10.1051/0004-6361:20078357 1346 10.1038/nature04209 1293 10.1016/0378-1119(94)90802-8 1246 10.1016/S0378-1119(97)00411-3 927 10.1111/j.1096-3642.2005.00153.x 738 10.1016/j.cell.2006.09.026 657 10.1101/gr.4039406 631 10.1101/gr.6.9.791 607 10.1038/msb4100134 602 10.1101/gr.143000 591 10.5194/hess-11-1633-2007 531 10.1101/gr.GR1547R 492 10.1080/002229300299282 480 10.1101/gr.2576704 460 10.1093/nar/gkj139 ``` * https://git.archive.org/webgroup/fatcat/-/blob/10eb30251f89806cb7a0f147f427c5ea7e5f9941/proposals/2021-01-29_citation_api.md * `source_wikipedia_article: Optional[str]`