| Commit message (Collapse) | Author | Age | Files | Lines | |
|---|---|---|---|---|---|
| * | datacite: more careful title string access; fixes sentry #88350 | Martin Czygan | 2021-06-11 | 2 | -0/+95 | 
| | | | | | | Caused by a partial "title entry without title" coming *first* (e.g. just holding, e.g. a language, like: {'lang': 'da'} | ||||
| * | fix arabesque sqlite3 examples to have 14-digit timestamps | Bryan Newbold | 2021-05-21 | 1 | -0/+0 | 
| | | |||||
| * | transform tool: container transform stats lookup support | Bryan Newbold | 2021-04-06 | 1 | -0/+1 | 
| | | |||||
| * | datacite: a missing surname should be None, not the empty string | Martin Czygan | 2021-04-02 | 2 | -2/+0 | 
| | | | | | refs sentry #77700 | ||||
| * | very simple dblp container importer | Bryan Newbold | 2020-12-17 | 3 | -0/+21 | 
| | | |||||
| * | basic test coverage of dblp release importer | Bryan Newbold | 2020-12-17 | 3 | -0/+431 | 
| | | |||||
| * | improve release elasticsearch transform test coverage | Bryan Newbold | 2020-12-16 | 2 | -0/+2 | 
| | | |||||
| * | doaj: fix update code path (getattr not __dict__) | Bryan Newbold | 2020-11-20 | 1 | -1/+1 | 
| | | | | | Also add missing code coverage for update path (disabled by default). | ||||
| * | initial implementation of DOAJ importer | Bryan Newbold | 2020-11-19 | 1 | -0/+5 | 
| | | | | | Several things to finish implementing and polish. | ||||
| * | ingest: fix XML ingest test file | Bryan Newbold | 2020-11-05 | 1 | -1/+1 | 
| | | |||||
| * | ingest: progress on HTML ingest | Bryan Newbold | 2020-11-05 | 1 | -0/+1 | 
| | | |||||
| * | ingest: tests for basic XML ingest | Bryan Newbold | 2020-11-05 | 1 | -0/+1 | 
| | | |||||
| * | ingest: basic checks for ingest_type | Bryan Newbold | 2020-11-05 | 1 | -1/+1 | 
| | | |||||
| * | datacite: handle case of empty-string version | Bryan Newbold | 2020-09-10 | 1 | -1/+1 | 
| | | | | | | Includes a tiny tweak to the datacite import sample file to test this code path. | ||||
| * | fixes and test coverage for file_meta importer | Bryan Newbold | 2020-08-21 | 1 | -0/+7 | 
| | | |||||
| * | datacite importer: update test cases for 'Additional file' as component, not ↵ | Bryan Newbold | 2020-08-11 | 5 | -5/+5 | 
| | | | | | stub | ||||
| * | datacite import: figshare-specific hacks | Bryan Newbold | 2020-08-11 | 1 | -0/+1 | 
| | | |||||
| * | datacite: adjust tests | Martin Czygan | 2020-07-10 | 4 | -10/+6 | 
| | | |||||
| * | wip: contrib, GH59 | Martin Czygan | 2020-07-10 | 5 | -3/+105 | 
| | | |||||
| * | datacite: address duplicated contributor issue | Martin Czygan | 2020-07-07 | 4 | -10/+93 | 
| | | | | | | | | Use string comparison. * https://fatcat.wiki/release/spjysmrnsrgyzgq6ise5o44rlu/contribs * https://api.datacite.org/dois/10.25940/roper-31098406 | ||||
| * | regression test for release_stage mismatch with ingest request | Bryan Newbold | 2020-05-26 | 1 | -1/+2 | 
| | | |||||
| * | datacite: fix type error | Martin Czygan | 2020-04-22 | 2 | -0/+76 | 
| | | | | | | | | Up to now, we expected the description to be a string or list. Add handling for int as well. First appeared: Apr 22 19:58:39. | ||||
| * | datacite: fix a raw name constraint violation | Martin Czygan | 2020-04-20 | 2 | -0/+77 | 
| | | | | | | | | It was possible that contribs got added which had no raw name. One example would be a name consisting of whitespace only. This fix adds a final check for this case. | ||||
| * | pubmed: handle multiple ReferenceList | Bryan Newbold | 2020-03-20 | 1 | -0/+206 | 
| | | | | | | | | This resolves a situation noticed in prod where we were only importing/updating a single reference per article. Includes a regression test. | ||||
| * | Merge branch 'martin-kafka-bs4-import' into 'master' | Martin Czygan | 2020-03-10 | 2 | -0/+0 | 
| |\ | | | | | | | | | pubmed and arxiv harvest preparations See merge request webgroup/fatcat!28 | ||||
| | * | more pubmed adjustments | Martin Czygan | 2020-02-22 | 2 | -0/+0 | 
| | | | | | | | | | | | * regenerate map in continuous mode * add tests | ||||
| * | | Merge branch 'bnewbold-elastic-v03b' | Bryan Newbold | 2020-02-26 | 3 | -0/+3 | 
| |\ \ | |||||
| | * | | fix some transform bugs, add some tests | Bryan Newbold | 2020-01-29 | 3 | -0/+3 | 
| | | | | |||||
| * | | | shadow import: more filtering of file_meta fields | Bryan Newbold | 2020-02-13 | 1 | -12/+10 | 
| | | | | |||||
| * | | | basic shadow importer | Bryan Newbold | 2020-02-13 | 1 | -0/+12 | 
| | |/ |/| | |||||
| * | | datacite: add exception for https://www.micropublication.org/ | Martin Czygan | 2020-01-31 | 1 | -1/+2 | 
| | | | |||||
| * | | datacite: improve date handling and minor tweak | Martin Czygan | 2020-01-30 | 2 | -0/+110 | 
| |/ | | | | | | | | | | | | | Records from https://www.micropublication.org/ did not have a date in FC, although raw data contained date strings - they were not using the finer-grained "attributes.date" but "attributes.published" and/or "attributes.publicationYear". Support for those fields has been added, including a test case. During this test (#30) a processing gap for names became clear (author may have "given_name" and "surname", but no "name"). This bug has been fixed, too. | ||||
| * | do not normalize "en dash" in DOI | Martin Czygan | 2020-01-17 | 1 | -1/+1 | 
| | | | | | | | | | | Technically, [...] DOI names may incorporate any printable characters from the Universal Character Set (UCS-2), of ISO/IEC 10646, which is the character set defined by Unicode (https://www.doi.org/doi_handbook/2_Numbering.html#2.5.1). For mostly QA reasons, we currently treat a DOI with an "en dash" as invalid. | ||||
| * | ingest: improve tests, support old ingest results | Bryan Newbold | 2020-01-15 | 2 | -1/+2 | 
| | | |||||
| * | datacite: ignore known unknown values in resourceType* | Martin Czygan | 2020-01-09 | 2 | -0/+94 | 
| | | |||||
| * | datacite: abstracts may be strings or list of strings | Martin Czygan | 2020-01-09 | 4 | -0/+186 | 
| | | |||||
| * | datacite: improve license_slug handling | Martin Czygan | 2020-01-09 | 2 | -1/+3 | 
| | | |||||
| * | datacite: add 'Unknown' to blacklist | Martin Czygan | 2020-01-09 | 1 | -7/+1 | 
| | | |||||
| * | datacite: get rid of schemaVersion | Martin Czygan | 2020-01-09 | 17 | -32/+14 | 
| | | |||||
| * | datacite: reformat test cases and use jq . --sort-keys | Martin Czygan | 2020-01-08 | 54 | -2299/+2301 | 
| | | |||||
| * | datacite: factor out contributor handling | Martin Czygan | 2020-01-08 | 4 | -0/+105 | 
| | | | | | | | | Use values from: * attributes.creators[] * attributes.contributors[] | ||||
| * | datacite: adjust tests for release_month | Martin Czygan | 2020-01-08 | 12 | -12/+12 | 
| | | |||||
| * | datacite: mark additional files as stub | Martin Czygan | 2020-01-08 | 2 | -0/+72 | 
| | | |||||
| * | datacite: CCDC are entries, mostly | Martin Czygan | 2020-01-08 | 1 | -1/+1 | 
| | | |||||
| * | datacite: adding datacite-specific extra metadata | Martin Czygan | 2020-01-07 | 30 | -1468/+1570 | 
| | | | | | | | | | | | | | | * attributes.metadataVersion * attributes.schemaVersion * attributes.version (source dependent values, follows suggestions in https://schema.datacite.org/meta/kernel-4.3/doc/DataCite-MetadataKernel_v4.3.pdf#page=26, but values vary) Furthermore: * attributes.types.resourceTypeGeneral * attributes.types.resourceType | ||||
| * | datacite: month field should be top-level | Martin Czygan | 2020-01-06 | 11 | -14/+14 | 
| | | |||||
| * | datacite: include month in extra | Martin Czygan | 2020-01-06 | 11 | -11/+13 | 
| | | | | | | > include release_month as a top-level extra field [...] to auto-populate the schema field from that | ||||
| * | datacite: clean abstracts, use unknown value tokens | Martin Czygan | 2020-01-06 | 3 | -3/+3 | 
| | | | | | | | | | Datacite defines placeholders for unknown values: * https://support.datacite.org/docs/schema-values-unknown-information-v43 Clean abstracts. | ||||
| * | datacite: always include "datacite" key in extra | Martin Czygan | 2020-01-04 | 14 | -26/+26 | 
| | | | | | | | > always include extra values for the respective DOI registrars (datacite, crossref, jalc), even if they are empty ({}), to be used as a flag so we know which DOI registrar supplied the metadata. | ||||
| * | datacite: remove --lang-detect flag | Martin Czygan | 2020-01-03 | 5 | -10/+15 | 
| | | | | | Estimated time for a single call is in the order of 50ms. | ||||
