blob: d079312b101a234e43177702195de34b28ff6659 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
|
# Notes on wikipedia_citations_2020-07-14
* https://archive.org/details/wikipedia_citations_2020-07-14
* https://zenodo.org/record/3940692
* https://github.com/Harshdeep1996/cite-classifications-wiki
```
.
├── [6.6G] citations_from_wikipedia.zip
├── [819M] lookup_data.zip
├── [1.4G] minimal_dataset.zip
├── [ 91K] wikipedia_citations_2020-07-14_archive.torrent
├── [2.0K] wikipedia_citations_2020-07-14_files.xml
├── [ 20K] wikipedia_citations_2020-07-14_meta.sqlite
└── [1.3K] wikipedia_citations_2020-07-14_meta.xml
```
Using `parquet-tools cat --json`
(https://stackoverflow.com/questions/36140264/inspect-parquet-from-command-line)
to convert to json.
About 1442176 DOI, 1027006 unique.
Most referenced on WP:
```
4393 10.1073/pnas.242603899
3182 10.1101/gr.2596504
2307 10.24436/2
2079 10.1038/ng1285
1447 10.1007/BF00171763
1357 10.1051/0004-6361:20078357
1346 10.1038/nature04209
1293 10.1016/0378-1119(94)90802-8
1246 10.1016/S0378-1119(97)00411-3
927 10.1111/j.1096-3642.2005.00153.x
738 10.1016/j.cell.2006.09.026
657 10.1101/gr.4039406
631 10.1101/gr.6.9.791
607 10.1038/msb4100134
602 10.1101/gr.143000
591 10.5194/hess-11-1633-2007
531 10.1101/gr.GR1547R
492 10.1080/002229300299282
480 10.1101/gr.2576704
460 10.1093/nar/gkj139
```
* https://git.archive.org/webgroup/fatcat/-/blob/10eb30251f89806cb7a0f147f427c5ea7e5f9941/proposals/2021-01-29_citation_api.md
* `source_wikipedia_article: Optional[str]`
|