diff options
Diffstat (limited to 'extra/mag/README.md')
-rw-r--r-- | extra/mag/README.md | 15 |
1 files changed, 15 insertions, 0 deletions
diff --git a/extra/mag/README.md b/extra/mag/README.md new file mode 100644 index 0000000..8f8a7fb --- /dev/null +++ b/extra/mag/README.md @@ -0,0 +1,15 @@ +# Notes on MAG + +Using: https://archive.org/details/mag-2021-06-07 + +* 260M entities +* 96.7M with DOI + +In order to generate a doi-to-doi version, we need to: + +* create a mapping from id-to-doi +* apply the mapping to the PaperReferences file + +```sh +$ time unpigz -c /magna/data/mag-2021-06-07/Papers.txt.gz | cut -f1,3 | awk '$2 != ""' | mkocidb -o /sandcrawler-db/tmp-refcat/mag_id_doi.db +``` |