aboutsummaryrefslogtreecommitdiffstats
path: root/notes/2022_01_10_refcat_update.md
blob: f5c6bb52ef9c2a10276334054312965bdcfb3855 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Refcat update

* new refs export, about 10% more (2.7B)
* new fatcat export

New wikipedia extraction:

```
martin@ia601101:/magna/data/wikipedia_citations_2020-07-14 $ LC_ALL=C grep ID_list minimal_dataset.json | grep -c DOI
1442189

$ jq -rc '.refs[] | select(.ID_list != null) | {"URL": .URL, "Title": .title, "ID_list": .ID_list}' enwiki-20211201-pages-articles.citations.json | pv -l  > minimal.json
$ grep -c DOI minimal.json
1932578
```

Convert format to existing minimal format, for "BrefZipWikiDOI" task.