diff options
author | bnewbold <bnewbold@archive.org> | 2021-11-11 01:12:18 +0000 |
---|---|---|
committer | bnewbold <bnewbold@archive.org> | 2021-11-11 01:12:18 +0000 |
commit | 6ad9d24e4d7d901d6fc394e6e91575f6acba7ff4 (patch) | |
tree | 1b80344125152b46ae727dc8bbff73cc12abfd3e /python/fatcat_tools/importers/__init__.py | |
parent | 7e3f91f1a49ea85707cae31125021ba761f5373d (diff) | |
parent | 6eaf4f57c1f92b6f4f46adc38e5b39fd30b65d81 (diff) | |
download | fatcat-6ad9d24e4d7d901d6fc394e6e91575f6acba7ff4.tar.gz fatcat-6ad9d24e4d7d901d6fc394e6e91575f6acba7ff4.zip |
Merge branch 'bnewbold-import-refactors' into 'master'
import refactors and deprecations
Some of these are from old stale branches (the datacite subject metadata patch), but most are from yesterday and today. Sort of a hodge-podge, but the general theme is getting around to deferred cleanups and refactors specific to importer code before making some behavioral changes.
The Datacite-specific stuff could use review here.
Remove unused/deprecated/dead code:
- cdl_dash_dat and wayback_static importers, which were for specific early example entities and have been superseded by other importers
- "extid map" sqlite3 feature from several importers, was only used for initial bulk imports (and maybe should not have been used)
Refactors:
- moved a number of large datastructures out of importer code and into a dedicated static file (`biblio_lookup_tables.py`). Didn't move all, just the ones that were either generic or very large (making it hard to read code)
- shuffled around relative imports and some function names ("clean_str" vs. "clean")
Some actual behavioral changes:
- remove some Datacite-specific license slugs
- stop trying to fix double-slashes in DOIs, that was causing more harm than help (some DOIs do actually have double-slashes!)
- remove some excess metadata from datacite 'extra' fields
Diffstat (limited to 'python/fatcat_tools/importers/__init__.py')
-rw-r--r-- | python/fatcat_tools/importers/__init__.py | 8 |
1 files changed, 1 insertions, 7 deletions
diff --git a/python/fatcat_tools/importers/__init__.py b/python/fatcat_tools/importers/__init__.py index 06ecfd58..654be2e9 100644 --- a/python/fatcat_tools/importers/__init__.py +++ b/python/fatcat_tools/importers/__init__.py @@ -13,10 +13,8 @@ To run an import you combine two classes; one each of: from .arabesque import ARABESQUE_MATCH_WHERE_CLAUSE, ArabesqueMatchImporter from .arxiv import ArxivRawImporter -from .cdl_dash_dat import auto_cdl_dash_dat from .chocula import ChoculaImporter from .common import ( - LANG_MAP_MARC, Bs4XmlFileListPusher, Bs4XmlFilePusher, Bs4XmlLargeFilePusher, @@ -28,11 +26,8 @@ from .common import ( KafkaJsonPusher, LinePusher, SqlitePusher, - clean, - is_cjk, - make_kafka_consumer, ) -from .crossref import CROSSREF_TYPE_MAP, CrossrefImporter, lookup_license_slug +from .crossref import CrossrefImporter from .datacite import DataciteImporter from .dblp_container import DblpContainerImporter from .dblp_release import DblpReleaseImporter @@ -55,4 +50,3 @@ from .matched import MatchedImporter from .orcid import OrcidImporter from .pubmed import PubmedImporter from .shadow import ShadowLibraryImporter -from .wayback_static import auto_wayback_static |