blob: 8f8a7fb919b120d066a1f3e9ec891d6de59435ed (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
# Notes on MAG
Using: https://archive.org/details/mag-2021-06-07
* 260M entities
* 96.7M with DOI
In order to generate a doi-to-doi version, we need to:
* create a mapping from id-to-doi
* apply the mapping to the PaperReferences file
```sh
$ time unpigz -c /magna/data/mag-2021-06-07/Papers.txt.gz | cut -f1,3 | awk '$2 != ""' | mkocidb -o /sandcrawler-db/tmp-refcat/mag_id_doi.db
```
|