summaryrefslogtreecommitdiffstats
path: root/extra/journal_metadata/README.md
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@robocracy.org>2019-01-25 16:43:18 -0800
committerBryan Newbold <bnewbold@robocracy.org>2019-01-25 16:43:18 -0800
commit16256f8ed119c072c09b13b0b1a6d4a56bed5113 (patch)
tree036191dea9e72cc9ee657026290116bef5449355 /extra/journal_metadata/README.md
parent1079aea4a5ea8f22353caf7c74070b6e830498bf (diff)
downloadfatcat-16256f8ed119c072c09b13b0b1a6d4a56bed5113.tar.gz
fatcat-16256f8ed119c072c09b13b0b1a6d4a56bed5113.zip
improved journal metadata munger
Diffstat (limited to 'extra/journal_metadata/README.md')
-rw-r--r--extra/journal_metadata/README.md18
1 files changed, 17 insertions, 1 deletions
diff --git a/extra/journal_metadata/README.md b/extra/journal_metadata/README.md
index 61dbc6b0..dec32624 100644
--- a/extra/journal_metadata/README.md
+++ b/extra/journal_metadata/README.md
@@ -17,7 +17,7 @@ A few sources of normalization/mappings:
- Original:
- Snapshot: <https://archive.org/download/issn_issnl_mappings/20180216.ISSN-to-ISSN-L.txt>
- ISO 639-1 language codes: https://datahub.io/core/language-codes
-- ISO 3166-1 alpha-2 nation codes
+- ISO 3166-1 alpha-2 country codes
In order of precedence (first higher than later):
@@ -60,6 +60,22 @@ ISSN-L, then write out to disk as JSON. Then the journal-metadata importer
takes a subset of fields and inserts to fatcat. Lastly, the elasticsearch
transformer takes a subset/combination of
+## ISSN-L Munging
+
+Unfortunately, there seem to be plenty of legitimate ISSNs that don't end up in
+the ISSN-L table. On the portal.issn.org public site, these are listed as:
+
+ "This provisional record has been produced before publication of the
+ resource. The published resource has not yet been checked by the ISSN
+ Network.It is only available to subscribing users."
+
+For example:
+
+- 2199-3246/2199-3254: Digital Experiences in Mathematics Education
+
+Instead of just dropping these entirely, we're currently munging these by
+putting the electronic or print ISSN in the ISSN-L position.
+
## Python Helpers/Libraries
- ftfy