aboutsummaryrefslogtreecommitdiffstats
path: root/README.md
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@robocracy.org>2019-01-25 16:43:18 -0800
committerBryan Newbold <bnewbold@robocracy.org>2019-01-25 16:43:18 -0800
commit2a2eadc89afee7526d6cc92bbbe22dea40ba993e (patch)
tree492d391da8a8963bfd56d33fe06f2cc4b23f435c /README.md
parentc46d078d8e2a8ff6a6ced0530ab5a6293214f2d5 (diff)
downloadchocula-2a2eadc89afee7526d6cc92bbbe22dea40ba993e.tar.gz
chocula-2a2eadc89afee7526d6cc92bbbe22dea40ba993e.zip
improved journal metadata munger
Diffstat (limited to 'README.md')
-rw-r--r--README.md18
1 files changed, 17 insertions, 1 deletions
diff --git a/README.md b/README.md
index 61dbc6b..dec3262 100644
--- a/README.md
+++ b/README.md
@@ -17,7 +17,7 @@ A few sources of normalization/mappings:
- Original:
- Snapshot: <https://archive.org/download/issn_issnl_mappings/20180216.ISSN-to-ISSN-L.txt>
- ISO 639-1 language codes: https://datahub.io/core/language-codes
-- ISO 3166-1 alpha-2 nation codes
+- ISO 3166-1 alpha-2 country codes
In order of precedence (first higher than later):
@@ -60,6 +60,22 @@ ISSN-L, then write out to disk as JSON. Then the journal-metadata importer
takes a subset of fields and inserts to fatcat. Lastly, the elasticsearch
transformer takes a subset/combination of
+## ISSN-L Munging
+
+Unfortunately, there seem to be plenty of legitimate ISSNs that don't end up in
+the ISSN-L table. On the portal.issn.org public site, these are listed as:
+
+ "This provisional record has been produced before publication of the
+ resource. The published resource has not yet been checked by the ISSN
+ Network.It is only available to subscribing users."
+
+For example:
+
+- 2199-3246/2199-3254: Digital Experiences in Mathematics Education
+
+Instead of just dropping these entirely, we're currently munging these by
+putting the electronic or print ISSN in the ISSN-L position.
+
## Python Helpers/Libraries
- ftfy