aboutsummaryrefslogtreecommitdiffstats
path: root/python/notes/wikipedia_citations_2020-07-14.md
diff options
context:
space:
mode:
Diffstat (limited to 'python/notes/wikipedia_citations_2020-07-14.md')
-rw-r--r--python/notes/wikipedia_citations_2020-07-14.md51
1 files changed, 51 insertions, 0 deletions
diff --git a/python/notes/wikipedia_citations_2020-07-14.md b/python/notes/wikipedia_citations_2020-07-14.md
new file mode 100644
index 0000000..d079312
--- /dev/null
+++ b/python/notes/wikipedia_citations_2020-07-14.md
@@ -0,0 +1,51 @@
+# Notes on wikipedia_citations_2020-07-14
+
+* https://archive.org/details/wikipedia_citations_2020-07-14
+* https://zenodo.org/record/3940692
+* https://github.com/Harshdeep1996/cite-classifications-wiki
+
+```
+.
+├── [6.6G] citations_from_wikipedia.zip
+├── [819M] lookup_data.zip
+├── [1.4G] minimal_dataset.zip
+├── [ 91K] wikipedia_citations_2020-07-14_archive.torrent
+├── [2.0K] wikipedia_citations_2020-07-14_files.xml
+├── [ 20K] wikipedia_citations_2020-07-14_meta.sqlite
+└── [1.3K] wikipedia_citations_2020-07-14_meta.xml
+```
+
+Using `parquet-tools cat --json`
+(https://stackoverflow.com/questions/36140264/inspect-parquet-from-command-line)
+to convert to json.
+
+About 1442176 DOI, 1027006 unique.
+
+Most referenced on WP:
+
+```
+ 4393 10.1073/pnas.242603899
+ 3182 10.1101/gr.2596504
+ 2307 10.24436/2
+ 2079 10.1038/ng1285
+ 1447 10.1007/BF00171763
+ 1357 10.1051/0004-6361:20078357
+ 1346 10.1038/nature04209
+ 1293 10.1016/0378-1119(94)90802-8
+ 1246 10.1016/S0378-1119(97)00411-3
+ 927 10.1111/j.1096-3642.2005.00153.x
+ 738 10.1016/j.cell.2006.09.026
+ 657 10.1101/gr.4039406
+ 631 10.1101/gr.6.9.791
+ 607 10.1038/msb4100134
+ 602 10.1101/gr.143000
+ 591 10.5194/hess-11-1633-2007
+ 531 10.1101/gr.GR1547R
+ 492 10.1080/002229300299282
+ 480 10.1101/gr.2576704
+ 460 10.1093/nar/gkj139
+```
+
+* https://git.archive.org/webgroup/fatcat/-/blob/10eb30251f89806cb7a0f147f427c5ea7e5f9941/proposals/2021-01-29_citation_api.md
+* `source_wikipedia_article: Optional[str]`
+