Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | refactor importer metadata tables into separate file; move some helpers around | Bryan Newbold | 2021-11-10 | 1 | -2/+1 |
| | | | | | | | - MAX_ABSTRACT_LENGTH set in a single place (importer common) - merge datacite license slug table in to common table, removing some TDM-specific licenses (which do not apply in the context of preserving the full work) | ||||
* | importers: refactor imports of clean() and other normalization helpers | Bryan Newbold | 2021-11-10 | 1 | -3/+0 |
| | |||||
* | remove cdl_dash_dat and wayback_static importers | Bryan Newbold | 2021-11-10 | 1 | -2/+0 |
| | | | | | | | | Cleaning out dead code. These importers were used to create demonstration fileset and webcapture entities early in development. They have been replaced by the fileset and webcapture ingest importers. | ||||
* | re-fmt all the fatcat_tools __init__ files for readability | Bryan Newbold | 2021-11-02 | 1 | -17/+39 |
| | |||||
* | initial implementation of fileset ingest importers | Bryan Newbold | 2021-10-14 | 1 | -1/+1 |
| | |||||
* | generic fileset importer class, with test coverage | Bryan Newbold | 2021-10-14 | 1 | -0/+1 |
| | |||||
* | new SPN web (html) importer | Bryan Newbold | 2021-10-01 | 1 | -1/+1 |
| | |||||
* | very simple dblp container importer | Bryan Newbold | 2020-12-17 | 1 | -0/+1 |
| | |||||
* | initial implementation of dblp release importer (in progress) | Bryan Newbold | 2020-12-17 | 1 | -0/+1 |
| | |||||
* | initial implementation of DOAJ importer | Bryan Newbold | 2020-11-19 | 1 | -0/+1 |
| | | | | Several things to finish implementing and polish. | ||||
* | ingest: initial 'web' worker implementation | Bryan Newbold | 2020-11-05 | 1 | -1/+1 |
| | |||||
* | initial implementation of file_meta importer | Bryan Newbold | 2020-08-21 | 1 | -0/+1 |
| | |||||
* | Merge branch 'martin-kafka-bs4-import' into 'master' | Martin Czygan | 2020-03-10 | 1 | -1/+1 |
|\ | | | | | | | | | pubmed and arxiv harvest preparations See merge request webgroup/fatcat!28 | ||||
| * | pubmed ftp harvest and KafkaBs4XmlPusher | Martin Czygan | 2020-02-19 | 1 | -1/+1 |
| | | | | | | | | | | | | | | * add PubmedFTPWorker * utils are currently stored alongside pubmed (e.g. ftpretr, xmlstream) but may live elsewhere, as they are more generic * add KafkaBs4XmlPusher | ||||
* | | basic shadow importer | Bryan Newbold | 2020-02-13 | 1 | -0/+1 |
|/ | |||||
* | datacite: importer skeleton | Martin Czygan | 2019-12-28 | 1 | -0/+1 |
| | | | | | | * contributors, title, date, publisher, container, license Field and value analysis via https://github.com/miku/indigo. | ||||
* | savepapernow result importer | Bryan Newbold | 2019-12-12 | 1 | -1/+1 |
| | | | | Based on ingest-file-results importer | ||||
* | ingest file result importer | Bryan Newbold | 2019-11-15 | 1 | -2/+1 |
| | |||||
* | implement ChoculaImporter | Bryan Newbold | 2019-09-03 | 1 | -0/+1 |
| | |||||
* | faster LargeFile XML importer for PubMed | Bryan Newbold | 2019-05-29 | 1 | -1/+1 |
| | |||||
* | creative importer for bulk JSTOR imports | Bryan Newbold | 2019-05-22 | 1 | -1/+1 |
| | |||||
* | JALC bulk file importer | Bryan Newbold | 2019-05-21 | 1 | -1/+1 |
| | |||||
* | initial pubmed importer | Bryan Newbold | 2019-05-21 | 1 | -2/+3 |
| | |||||
* | initial arxivraw importer (from parser) | Bryan Newbold | 2019-05-21 | 1 | -0/+1 |
| | |||||
* | initial JSTOR importer | Bryan Newbold | 2019-05-21 | 1 | -0/+1 |
| | |||||
* | initial flesh out of JALC parser | Bryan Newbold | 2019-05-21 | 1 | -1/+2 |
| | |||||
* | early version of arabesque importer | Bryan Newbold | 2019-04-12 | 1 | -0/+1 |
| | |||||
* | add SqlitePusher importer option | Bryan Newbold | 2019-04-12 | 1 | -1/+1 |
| | |||||
* | importer for CDL/DASH dat pilot dweb datasets | Bryan Newbold | 2019-03-19 | 1 | -0/+1 |
| | |||||
* | new importer: wayback_static | Bryan Newbold | 2019-03-19 | 1 | -0/+1 |
| | |||||
* | ftfy all over (needs Pipfile.lock) | Bryan Newbold | 2019-01-23 | 1 | -1/+1 |
| | |||||
* | refactor remaining importers | Bryan Newbold | 2019-01-22 | 1 | -1/+1 |
| | |||||
* | refactored crossref importer to new style | Bryan Newbold | 2019-01-22 | 1 | -3/+3 |
| | |||||
* | new importer API interfaces | Bryan Newbold | 2019-01-22 | 1 | -0/+15 |
| | |||||
* | issn => journal_metadata in several places | Bryan Newbold | 2019-01-17 | 1 | -1/+1 |
| | |||||
* | start supporting kafka importers | Bryan Newbold | 2018-11-19 | 1 | -1/+1 |
| | | | | A nice feature would be some/any log output as to progress. | ||||
* | large refactor of python names/paths | Bryan Newbold | 2018-11-15 | 1 | -0/+7 |
- Add __init__.py files for fatcat_tools submodules, and use them in imports - Add a bunch of comments to files. - rename a number of classes and functions to be less verbose |