aboutsummaryrefslogtreecommitdiffstats
path: root/python/notes/mag_notes.md
blob: 6341676648ffc618bf0361347fa356b3e3141686 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# MAG 2020 Notes

* /magna/data/mag-2020-06-25
* 1637615789

```
$ unpigz -c PaperReferences.txt.gz| pv -l | wc -l
1637615789
```

* 238M rows in the papers table (238938563)
* only 3516356 DOI?

```
$ zstdcat -T0 Papers.txt.zst | pv -l | LC_ALL=C cut -f3 | LC_ALL=C grep -v ^$ > mag_doi_list.txt
 238M 0:06:12 [ 641k/s]

$ wc -l mag_doi_list.txt
3516356 mag_doi_list.txt
```