Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | fix arabesque sqlite3 examples to have 14-digit timestamps | Bryan Newbold | 2021-05-21 | 1 | -0/+0 |
| | |||||
* | transform tool: container transform stats lookup support | Bryan Newbold | 2021-04-06 | 1 | -0/+1 |
| | |||||
* | datacite: a missing surname should be None, not the empty string | Martin Czygan | 2021-04-02 | 2 | -2/+0 |
| | | | | refs sentry #77700 | ||||
* | very simple dblp container importer | Bryan Newbold | 2020-12-17 | 3 | -0/+21 |
| | |||||
* | basic test coverage of dblp release importer | Bryan Newbold | 2020-12-17 | 3 | -0/+431 |
| | |||||
* | improve release elasticsearch transform test coverage | Bryan Newbold | 2020-12-16 | 2 | -0/+2 |
| | |||||
* | doaj: fix update code path (getattr not __dict__) | Bryan Newbold | 2020-11-20 | 1 | -1/+1 |
| | | | | Also add missing code coverage for update path (disabled by default). | ||||
* | initial implementation of DOAJ importer | Bryan Newbold | 2020-11-19 | 1 | -0/+5 |
| | | | | Several things to finish implementing and polish. | ||||
* | ingest: fix XML ingest test file | Bryan Newbold | 2020-11-05 | 1 | -1/+1 |
| | |||||
* | ingest: progress on HTML ingest | Bryan Newbold | 2020-11-05 | 1 | -0/+1 |
| | |||||
* | ingest: tests for basic XML ingest | Bryan Newbold | 2020-11-05 | 1 | -0/+1 |
| | |||||
* | ingest: basic checks for ingest_type | Bryan Newbold | 2020-11-05 | 1 | -1/+1 |
| | |||||
* | datacite: handle case of empty-string version | Bryan Newbold | 2020-09-10 | 1 | -1/+1 |
| | | | | | Includes a tiny tweak to the datacite import sample file to test this code path. | ||||
* | fixes and test coverage for file_meta importer | Bryan Newbold | 2020-08-21 | 1 | -0/+7 |
| | |||||
* | datacite importer: update test cases for 'Additional file' as component, not ↵ | Bryan Newbold | 2020-08-11 | 5 | -5/+5 |
| | | | | stub | ||||
* | datacite import: figshare-specific hacks | Bryan Newbold | 2020-08-11 | 1 | -0/+1 |
| | |||||
* | datacite: adjust tests | Martin Czygan | 2020-07-10 | 4 | -10/+6 |
| | |||||
* | wip: contrib, GH59 | Martin Czygan | 2020-07-10 | 5 | -3/+105 |
| | |||||
* | datacite: address duplicated contributor issue | Martin Czygan | 2020-07-07 | 4 | -10/+93 |
| | | | | | | | Use string comparison. * https://fatcat.wiki/release/spjysmrnsrgyzgq6ise5o44rlu/contribs * https://api.datacite.org/dois/10.25940/roper-31098406 | ||||
* | regression test for release_stage mismatch with ingest request | Bryan Newbold | 2020-05-26 | 1 | -1/+2 |
| | |||||
* | datacite: fix type error | Martin Czygan | 2020-04-22 | 2 | -0/+76 |
| | | | | | | | Up to now, we expected the description to be a string or list. Add handling for int as well. First appeared: Apr 22 19:58:39. | ||||
* | datacite: fix a raw name constraint violation | Martin Czygan | 2020-04-20 | 2 | -0/+77 |
| | | | | | | | It was possible that contribs got added which had no raw name. One example would be a name consisting of whitespace only. This fix adds a final check for this case. | ||||
* | pubmed: handle multiple ReferenceList | Bryan Newbold | 2020-03-20 | 1 | -0/+206 |
| | | | | | | | This resolves a situation noticed in prod where we were only importing/updating a single reference per article. Includes a regression test. | ||||
* | Merge branch 'martin-kafka-bs4-import' into 'master' | Martin Czygan | 2020-03-10 | 2 | -0/+0 |
|\ | | | | | | | | | pubmed and arxiv harvest preparations See merge request webgroup/fatcat!28 | ||||
| * | more pubmed adjustments | Martin Czygan | 2020-02-22 | 2 | -0/+0 |
| | | | | | | | | | | * regenerate map in continuous mode * add tests | ||||
* | | Merge branch 'bnewbold-elastic-v03b' | Bryan Newbold | 2020-02-26 | 3 | -0/+3 |
|\ \ | |||||
| * | | fix some transform bugs, add some tests | Bryan Newbold | 2020-01-29 | 3 | -0/+3 |
| | | | |||||
* | | | shadow import: more filtering of file_meta fields | Bryan Newbold | 2020-02-13 | 1 | -12/+10 |
| | | | |||||
* | | | basic shadow importer | Bryan Newbold | 2020-02-13 | 1 | -0/+12 |
| |/ |/| | |||||
* | | datacite: add exception for https://www.micropublication.org/ | Martin Czygan | 2020-01-31 | 1 | -1/+2 |
| | | |||||
* | | datacite: improve date handling and minor tweak | Martin Czygan | 2020-01-30 | 2 | -0/+110 |
|/ | | | | | | | | | | | | | Records from https://www.micropublication.org/ did not have a date in FC, although raw data contained date strings - they were not using the finer-grained "attributes.date" but "attributes.published" and/or "attributes.publicationYear". Support for those fields has been added, including a test case. During this test (#30) a processing gap for names became clear (author may have "given_name" and "surname", but no "name"). This bug has been fixed, too. | ||||
* | do not normalize "en dash" in DOI | Martin Czygan | 2020-01-17 | 1 | -1/+1 |
| | | | | | | | | | Technically, [...] DOI names may incorporate any printable characters from the Universal Character Set (UCS-2), of ISO/IEC 10646, which is the character set defined by Unicode (https://www.doi.org/doi_handbook/2_Numbering.html#2.5.1). For mostly QA reasons, we currently treat a DOI with an "en dash" as invalid. | ||||
* | ingest: improve tests, support old ingest results | Bryan Newbold | 2020-01-15 | 2 | -1/+2 |
| | |||||
* | datacite: ignore known unknown values in resourceType* | Martin Czygan | 2020-01-09 | 2 | -0/+94 |
| | |||||
* | datacite: abstracts may be strings or list of strings | Martin Czygan | 2020-01-09 | 4 | -0/+186 |
| | |||||
* | datacite: improve license_slug handling | Martin Czygan | 2020-01-09 | 2 | -1/+3 |
| | |||||
* | datacite: add 'Unknown' to blacklist | Martin Czygan | 2020-01-09 | 1 | -7/+1 |
| | |||||
* | datacite: get rid of schemaVersion | Martin Czygan | 2020-01-09 | 17 | -32/+14 |
| | |||||
* | datacite: reformat test cases and use jq . --sort-keys | Martin Czygan | 2020-01-08 | 54 | -2299/+2301 |
| | |||||
* | datacite: factor out contributor handling | Martin Czygan | 2020-01-08 | 4 | -0/+105 |
| | | | | | | | Use values from: * attributes.creators[] * attributes.contributors[] | ||||
* | datacite: adjust tests for release_month | Martin Czygan | 2020-01-08 | 12 | -12/+12 |
| | |||||
* | datacite: mark additional files as stub | Martin Czygan | 2020-01-08 | 2 | -0/+72 |
| | |||||
* | datacite: CCDC are entries, mostly | Martin Czygan | 2020-01-08 | 1 | -1/+1 |
| | |||||
* | datacite: adding datacite-specific extra metadata | Martin Czygan | 2020-01-07 | 30 | -1468/+1570 |
| | | | | | | | | | | | | | * attributes.metadataVersion * attributes.schemaVersion * attributes.version (source dependent values, follows suggestions in https://schema.datacite.org/meta/kernel-4.3/doc/DataCite-MetadataKernel_v4.3.pdf#page=26, but values vary) Furthermore: * attributes.types.resourceTypeGeneral * attributes.types.resourceType | ||||
* | datacite: month field should be top-level | Martin Czygan | 2020-01-06 | 11 | -14/+14 |
| | |||||
* | datacite: include month in extra | Martin Czygan | 2020-01-06 | 11 | -11/+13 |
| | | | | | > include release_month as a top-level extra field [...] to auto-populate the schema field from that | ||||
* | datacite: clean abstracts, use unknown value tokens | Martin Czygan | 2020-01-06 | 3 | -3/+3 |
| | | | | | | | | Datacite defines placeholders for unknown values: * https://support.datacite.org/docs/schema-values-unknown-information-v43 Clean abstracts. | ||||
* | datacite: always include "datacite" key in extra | Martin Czygan | 2020-01-04 | 14 | -26/+26 |
| | | | | | | > always include extra values for the respective DOI registrars (datacite, crossref, jalc), even if they are empty ({}), to be used as a flag so we know which DOI registrar supplied the metadata. | ||||
* | datacite: remove --lang-detect flag | Martin Czygan | 2020-01-03 | 5 | -10/+15 |
| | | | | Estimated time for a single call is in the order of 50ms. | ||||
* | datacite: add another test case | Martin Czygan | 2020-01-02 | 2 | -0/+70 |
| |