aboutsummaryrefslogtreecommitdiffstats
path: root/notes/2022_01_10_refcat_update.md
blob: 795a9d4837e72a28062c4a9c1051241bd2c74563 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Refcat update

* new refs export, about 10% more (2.7B)
* new fatcat export

New wikipedia extraction:

```
martin@ia601101:/magna/data/wikipedia_citations_2020-07-14 $ LC_ALL=C grep ID_list minimal_dataset.json | grep -c DOI
1442189

$ jq -rc '.refs[] | select(.ID_list != null) | {"URL": .URL, "Title": .title, "ID_list": .ID_list}' enwiki-20211201-pages-articles.citations.json | pv -l  > minimal.json
$ grep -c DOI minimal.json
1932578
```