aboutsummaryrefslogtreecommitdiffstats
path: root/extra/mag/README.md
blob: 8f8a7fb919b120d066a1f3e9ec891d6de59435ed (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Notes on MAG

Using: https://archive.org/details/mag-2021-06-07

* 260M entities
* 96.7M with DOI

In order to generate a doi-to-doi version, we need to:

* create a mapping from id-to-doi
* apply the mapping to the PaperReferences file

```sh
$ time unpigz -c /magna/data/mag-2021-06-07/Papers.txt.gz | cut -f1,3 | awk '$2 != ""' | mkocidb -o /sandcrawler-db/tmp-refcat/mag_id_doi.db
```