blob: 795a9d4837e72a28062c4a9c1051241bd2c74563 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
# Refcat update
* new refs export, about 10% more (2.7B)
* new fatcat export
New wikipedia extraction:
```
martin@ia601101:/magna/data/wikipedia_citations_2020-07-14 $ LC_ALL=C grep ID_list minimal_dataset.json | grep -c DOI
1442189
$ jq -rc '.refs[] | select(.ID_list != null) | {"URL": .URL, "Title": .title, "ID_list": .ID_list}' enwiki-20211201-pages-articles.citations.json | pv -l > minimal.json
$ grep -c DOI minimal.json
1932578
```
|