aboutsummaryrefslogtreecommitdiffstats
path: root/python/fatcat_tools/importers/__init__.py
Commit message (Collapse)AuthorAgeFilesLines
* refactor importer metadata tables into separate file; move some helpers aroundBryan Newbold2021-11-101-2/+1
| | | | | | | - MAX_ABSTRACT_LENGTH set in a single place (importer common) - merge datacite license slug table in to common table, removing some TDM-specific licenses (which do not apply in the context of preserving the full work)
* importers: refactor imports of clean() and other normalization helpersBryan Newbold2021-11-101-3/+0
|
* remove cdl_dash_dat and wayback_static importersBryan Newbold2021-11-101-2/+0
| | | | | | | | Cleaning out dead code. These importers were used to create demonstration fileset and webcapture entities early in development. They have been replaced by the fileset and webcapture ingest importers.
* re-fmt all the fatcat_tools __init__ files for readabilityBryan Newbold2021-11-021-17/+39
|
* initial implementation of fileset ingest importersBryan Newbold2021-10-141-1/+1
|
* generic fileset importer class, with test coverageBryan Newbold2021-10-141-0/+1
|
* new SPN web (html) importerBryan Newbold2021-10-011-1/+1
|
* very simple dblp container importerBryan Newbold2020-12-171-0/+1
|
* initial implementation of dblp release importer (in progress)Bryan Newbold2020-12-171-0/+1
|
* initial implementation of DOAJ importerBryan Newbold2020-11-191-0/+1
| | | | Several things to finish implementing and polish.
* ingest: initial 'web' worker implementationBryan Newbold2020-11-051-1/+1
|
* initial implementation of file_meta importerBryan Newbold2020-08-211-0/+1
|
* Merge branch 'martin-kafka-bs4-import' into 'master'Martin Czygan2020-03-101-1/+1
|\ | | | | | | | | pubmed and arxiv harvest preparations See merge request webgroup/fatcat!28
| * pubmed ftp harvest and KafkaBs4XmlPusherMartin Czygan2020-02-191-1/+1
| | | | | | | | | | | | | | * add PubmedFTPWorker * utils are currently stored alongside pubmed (e.g. ftpretr, xmlstream) but may live elsewhere, as they are more generic * add KafkaBs4XmlPusher
* | basic shadow importerBryan Newbold2020-02-131-0/+1
|/
* datacite: importer skeletonMartin Czygan2019-12-281-0/+1
| | | | | | * contributors, title, date, publisher, container, license Field and value analysis via https://github.com/miku/indigo.
* savepapernow result importerBryan Newbold2019-12-121-1/+1
| | | | Based on ingest-file-results importer
* ingest file result importerBryan Newbold2019-11-151-2/+1
|
* implement ChoculaImporterBryan Newbold2019-09-031-0/+1
|
* faster LargeFile XML importer for PubMedBryan Newbold2019-05-291-1/+1
|
* creative importer for bulk JSTOR importsBryan Newbold2019-05-221-1/+1
|
* JALC bulk file importerBryan Newbold2019-05-211-1/+1
|
* initial pubmed importerBryan Newbold2019-05-211-2/+3
|
* initial arxivraw importer (from parser)Bryan Newbold2019-05-211-0/+1
|
* initial JSTOR importerBryan Newbold2019-05-211-0/+1
|
* initial flesh out of JALC parserBryan Newbold2019-05-211-1/+2
|
* early version of arabesque importerBryan Newbold2019-04-121-0/+1
|
* add SqlitePusher importer optionBryan Newbold2019-04-121-1/+1
|
* importer for CDL/DASH dat pilot dweb datasetsBryan Newbold2019-03-191-0/+1
|
* new importer: wayback_staticBryan Newbold2019-03-191-0/+1
|
* ftfy all over (needs Pipfile.lock)Bryan Newbold2019-01-231-1/+1
|
* refactor remaining importersBryan Newbold2019-01-221-1/+1
|
* refactored crossref importer to new styleBryan Newbold2019-01-221-3/+3
|
* new importer API interfacesBryan Newbold2019-01-221-0/+15
|
* issn => journal_metadata in several placesBryan Newbold2019-01-171-1/+1
|
* start supporting kafka importersBryan Newbold2018-11-191-1/+1
| | | | A nice feature would be some/any log output as to progress.
* large refactor of python names/pathsBryan Newbold2018-11-151-0/+7
- Add __init__.py files for fatcat_tools submodules, and use them in imports - Add a bunch of comments to files. - rename a number of classes and functions to be less verbose