Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | importer code updates | Bryan Newbold | 2019-05-13 | 4 | -3/+18 |
| | |||||
* | partial python impl of ext_id and release_stage refactors | Bryan Newbold | 2019-05-13 | 3 | -15/+20 |
| | |||||
* | add limits to match importers | Bryan Newbold | 2019-04-23 | 3 | -2/+27 |
| | |||||
* | archive.org isn't really a repository | Bryan Newbold | 2019-04-22 | 1 | -1/+3 |
| | |||||
* | editgroup description override | Bryan Newbold | 2019-04-22 | 1 | -2/+2 |
| | |||||
* | arabesque importer does require timestamp/wayback | Bryan Newbold | 2019-04-22 | 1 | -0/+3 |
| | |||||
* | matched importer shouldn't require wayback | Bryan Newbold | 2019-04-22 | 1 | -5/+7 |
| | |||||
* | handle API 400 in arabesque import (invalid extid) | Bryan Newbold | 2019-04-19 | 1 | -7/+14 |
| | |||||
* | fix arabesque importer crawl_id None bug | Bryan Newbold | 2019-04-18 | 1 | -1/+1 |
| | |||||
* | mechanism to not double-update entities | Bryan Newbold | 2019-04-18 | 2 | -1/+9 |
| | |||||
* | minor arabesque tweaks | Bryan Newbold | 2019-04-18 | 1 | -0/+2 |
| | |||||
* | update URL rel list | Bryan Newbold | 2019-04-18 | 1 | -1/+10 |
| | |||||
* | arabesque importer does fewer updates | Bryan Newbold | 2019-04-18 | 1 | -1/+8 |
| | |||||
* | arabesque importer | Bryan Newbold | 2019-04-18 | 1 | -0/+165 |
| | |||||
* | early version of arabesque importer | Bryan Newbold | 2019-04-12 | 1 | -0/+1 |
| | |||||
* | add SqlitePusher importer option | Bryan Newbold | 2019-04-12 | 2 | -1/+21 |
| | |||||
* | fix cdl_dash_dat license_slug | Bryan Newbold | 2019-03-19 | 1 | -7/+3 |
| | |||||
* | importer for CDL/DASH dat pilot dweb datasets | Bryan Newbold | 2019-03-19 | 2 | -0/+200 |
| | |||||
* | new importer: wayback_static | Bryan Newbold | 2019-03-19 | 2 | -0/+237 |
| | |||||
* | bunch of lint/whitespace cleanups | Bryan Newbold | 2019-02-22 | 3 | -5/+3 |
| | |||||
* | better/additional crossref license lookups | Bryan Newbold | 2019-02-14 | 1 | -20/+58 |
| | |||||
* | crossref: import subtitle as str, not list[str] | Bryan Newbold | 2019-02-14 | 1 | -0/+2 |
| | |||||
* | don't print missing DOIs, just count | Bryan Newbold | 2019-02-05 | 1 | -1/+3 |
| | |||||
* | add some missing LICENSE_SLUG_MAP | Bryan Newbold | 2019-02-05 | 1 | -1/+4 |
| | |||||
* | yet another required field bug | Bryan Newbold | 2019-01-29 | 1 | -4/+5 |
| | |||||
* | fix null name for container (required) | Bryan Newbold | 2019-01-29 | 1 | -1/+5 |
| | |||||
* | tweaks to GROBID metadata import | Bryan Newbold | 2019-01-29 | 1 | -3/+2 |
| | |||||
* | crossref import tweaks/fixes | Bryan Newbold | 2019-01-29 | 1 | -7/+9 |
| | | | | | - refs: article-title not title; save unstructured; authors not author - save 'language' field (already an ISO code) | ||||
* | fix bug in clean() resulting in many consistency check fails | Bryan Newbold | 2019-01-29 | 2 | -12/+12 |
| | |||||
* | fix refs extra ordering bug | Bryan Newbold | 2019-01-29 | 1 | -6/+6 |
| | |||||
* | pass through kwargs (fixes bezerk imports) | Bryan Newbold | 2019-01-29 | 5 | -5/+10 |
| | |||||
* | ensure raw_name is not stub | Bryan Newbold | 2019-01-29 | 1 | -1/+4 |
| | |||||
* | ensure abstracts aren't stubs | Bryan Newbold | 2019-01-29 | 1 | -2/+3 |
| | |||||
* | add stub parse_record() to make pylint happy | Bryan Newbold | 2019-01-28 | 1 | -0/+4 |
| | |||||
* | fix title length checks in crossref | Bryan Newbold | 2019-01-28 | 1 | -2/+2 |
| | |||||
* | fix rel/url order swap | Bryan Newbold | 2019-01-28 | 1 | -1/+1 |
| | |||||
* | don't allow empty or single-character clean strings | Bryan Newbold | 2019-01-28 | 1 | -1/+1 |
| | |||||
* | filter short/stub original_title | Bryan Newbold | 2019-01-28 | 1 | -3/+7 |
| | |||||
* | many fixes in GROBID importer | Bryan Newbold | 2019-01-28 | 1 | -14/+10 |
| | |||||
* | fix GROBID null/short abstract additions | Bryan Newbold | 2019-01-28 | 1 | -1/+2 |
| | |||||
* | enforce title len>1 for release imports | Bryan Newbold | 2019-01-28 | 2 | -1/+8 |
| | |||||
* | drop creators with no display name at all | Bryan Newbold | 2019-01-28 | 1 | -3/+3 |
| | |||||
* | make ORCID importer skip no-names, not assert | Bryan Newbold | 2019-01-28 | 1 | -1/+2 |
| | |||||
* | transform and import fixes/tweaks | Bryan Newbold | 2019-01-25 | 2 | -4/+10 |
| | |||||
* | update journal meta import/transform | Bryan Newbold | 2019-01-25 | 1 | -104/+39 |
| | |||||
* | grobid import extra metadata tweaks | Bryan Newbold | 2019-01-24 | 1 | -6/+7 |
| | |||||
* | refactor _get_editgroup => get_editgroup_id | Bryan Newbold | 2019-01-24 | 2 | -5/+6 |
| | |||||
* | refactor make_rel_url | Bryan Newbold | 2019-01-24 | 3 | -29/+66 |
| | |||||
* | tweak crossref import, and update tests | Bryan Newbold | 2019-01-24 | 1 | -11/+27 |
| | |||||
* | allow importing contrib/refs lists | Bryan Newbold | 2019-01-24 | 1 | -5/+13 |
| | | | | | | The motivation here isn't really to support these gigantic lists on principle, but to be able to ingest large corpuses without having to decide whether to filter out or crop such lists. |