Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | datacite: adding datacite-specific extra metadata | Martin Czygan | 2020-01-07 | 30 | -1468/+1570 |
| | | | | | | | | | | | | | * attributes.metadataVersion * attributes.schemaVersion * attributes.version (source dependent values, follows suggestions in https://schema.datacite.org/meta/kernel-4.3/doc/DataCite-MetadataKernel_v4.3.pdf#page=26, but values vary) Furthermore: * attributes.types.resourceTypeGeneral * attributes.types.resourceType | ||||
* | datacite: month field should be top-level | Martin Czygan | 2020-01-06 | 11 | -14/+14 |
| | |||||
* | datacite: include month in extra | Martin Czygan | 2020-01-06 | 11 | -11/+13 |
| | | | | | > include release_month as a top-level extra field [...] to auto-populate the schema field from that | ||||
* | datacite: indicate mismatched file in test | Martin Czygan | 2020-01-06 | 1 | -1/+1 |
| | |||||
* | datacite: clean abstracts, use unknown value tokens | Martin Czygan | 2020-01-06 | 3 | -3/+3 |
| | | | | | | | | Datacite defines placeholders for unknown values: * https://support.datacite.org/docs/schema-values-unknown-information-v43 Clean abstracts. | ||||
* | datacite: always include "datacite" key in extra | Martin Czygan | 2020-01-04 | 14 | -26/+26 |
| | | | | | | > always include extra values for the respective DOI registrars (datacite, crossref, jalc), even if they are empty ({}), to be used as a flag so we know which DOI registrar supplied the metadata. | ||||
* | datacite: use normal.clean_doi | Martin Czygan | 2020-01-03 | 1 | -4/+0 |
| | |||||
* | datacite: parse_datacite_dates returns month | Martin Czygan | 2020-01-03 | 1 | -7/+16 |
| | | | | As [...] we will soon add support for release_month field in the release schema. | ||||
* | datacite: prepare release_month (stub) | Martin Czygan | 2020-01-03 | 1 | -14/+14 |
| | |||||
* | datacite: remove --lang-detect flag | Martin Czygan | 2020-01-03 | 5 | -10/+15 |
| | | | | Estimated time for a single call is in the order of 50ms. | ||||
* | datacite: add another test case | Martin Czygan | 2020-01-02 | 3 | -1/+71 |
| | |||||
* | datacite: open case for editing after creation | Martin Czygan | 2020-01-02 | 1 | -0/+2 |
| | |||||
* | datacite: add helper script to create new test case | Martin Czygan | 2020-01-02 | 1 | -0/+14 |
| | |||||
* | datacite: address raw_name index form comment | Martin Czygan | 2020-01-02 | 20 | -112/+128 |
| | | | | | | | | | > The convention for display_name and raw_name is to be how the name would normally be printed, not in index form (surname comma given_name). So we might need to un-encode names like "Tricart, Pierre". Use an additional `index_form_to_display_name` function to convert index from to display form, heuristically. | ||||
* | datacite: add conversion fixtures | Martin Czygan | 2020-01-02 | 50 | -1/+3949 |
| | | | | | | | | | | | | | The `test_datacite_conversions` function will compare an input (datacite) document to an expected output (release entity as JSON). This way, it should not be too hard to add more cases by adding: input, output - and by increasing the counter in the range loop within the test. To view input and result side by side with vim, change into the test directory and run: tests/files/datacite $ ./caseview.sh 18 | ||||
* | datacite: adjust tests | Martin Czygan | 2019-12-28 | 1 | -2/+1 |
| | |||||
* | address first round of MR14 comments | Martin Czygan | 2019-12-28 | 1 | -2/+176 |
| | | | | | | | | | | | | | * add missing langdetect * use entity_to_dict for json debug output * factor out code for fields in function and add table driven tests * update citeproc types * add author as default role * add raw_affiliation * include relations from datacite * remove url (covered by doi already) Using yapf for python formatting. | ||||
* | improve datacite field mapping and import | Martin Czygan | 2019-12-28 | 3 | -17/+92 |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current version succeeded to import a random sample of 100000 records (0.5%) from datacite. The --debug (write JSON to stdout) and --insert-log-file (log batch before committing to db) flags are temporary added to help debugging. Add few unit tests. Some edge cases: a) Existing keys without value requires a slightly awkward: ``` titles = attributes.get('titles', []) or [] ``` b) There can be 0, 1, or more (first one wins) titles. c) Date handling is probably not ideal. Datacite has a potentiall fine grained list of dates. The test case (tests/files/datacite_sample.jsonl) refers to https://ssl.fao.org/glis/doi/10.18730/8DYM9, which has date (main descriptor) 1986. The datacite record contains: 2017 (publicationYear, probably the year of record creation with reference system), 1978-06-03 (collected, e.g. experimental sample), 1986 ("Accepted"). The online version of the resource knows even one more date (2019-06-05 10:14:43 by WIEWS update). | ||||
* | datacite: importer skeleton | Martin Czygan | 2019-12-28 | 1 | -0/+25 |
| | | | | | | * contributors, title, date, publisher, container, license Field and value analysis via https://github.com/miku/indigo. | ||||
* | datacite: fix harvest test | Martin Czygan | 2019-12-27 | 1 | -1/+1 |
| | | | | | | Produced messages should match: jq '.data|length' tests/files/datacite_api.json | ||||
* | datacite: add simple test and fixture for datacite api interaction | Martin Czygan | 2019-12-27 | 2 | -0/+46 |
| | |||||
* | add regression test for medlinedate -> year parsing | Bryan Newbold | 2019-12-23 | 2 | -0/+102 |
| | |||||
* | regression test for deleted entity history view | Bryan Newbold | 2019-12-09 | 1 | -0/+25 |
| | |||||
* | add basic test for crossref harvest API call | Bryan Newbold | 2019-12-06 | 2 | -0/+46 |
| | |||||
* | add regression test for upper-case SHA-1 form submit | Bryan Newbold | 2019-12-02 | 1 | -0/+10 |
| | |||||
* | ingest file result importer | Bryan Newbold | 2019-11-15 | 2 | -0/+59 |
| | |||||
* | test for ingest transform | Bryan Newbold | 2019-11-15 | 1 | -0/+57 |
| | |||||
* | add ingest request transform (and test) | Bryan Newbold | 2019-11-15 | 1 | -1/+1 |
| | |||||
* | Merge branch 'martin-search-results-pagination' into 'master' | Martin Czygan | 2019-11-15 | 1 | -2/+3 |
|\ | | | | | | | | | Add basic pagination to search results See merge request webgroup/fatcat!4 | ||||
| * | address test issue | Martin Czygan | 2019-11-15 | 1 | -2/+3 |
| | | |||||
| * | adjust search test case for new wording | Martin Czygan | 2019-11-14 | 1 | -2/+2 |
| | | | | | | | | > "Showing top " -> "Showing first " | ||||
* | | fix crossref component test | Bryan Newbold | 2019-11-04 | 1 | -1/+1 |
|/ | |||||
* | commit file cleaner tests | Bryan Newbold | 2019-10-08 | 1 | -0/+58 |
| | |||||
* | redirect direct entity underscore links | Bryan Newbold | 2019-10-03 | 1 | -0/+2 |
| | |||||
* | python webface impl token generation | Bryan Newbold | 2019-09-18 | 1 | -0/+8 |
| | |||||
* | skip test_crossref_importer_huge() by default | Bryan Newbold | 2019-09-13 | 1 | -0/+1 |
| | |||||
* | refactor all python source for client lib name | Bryan Newbold | 2019-09-05 | 26 | -74/+74 |
| | |||||
* | add kbart counts to container stats | Bryan Newbold | 2019-07-31 | 1 | -0/+1 |
| | |||||
* | complete generic entity rev views | Bryan Newbold | 2019-06-28 | 1 | -8/+46 |
| | | | | | | Was getting 500s in production from crawlers. Also expand test coverage. | ||||
* | release elasticsearch results: stage not status | Bryan Newbold | 2019-06-13 | 1 | -1/+1 |
| | |||||
* | start adding some new web route tests | Bryan Newbold | 2019-06-13 | 1 | -0/+6 |
| | |||||
* | update tests for lookup views | Bryan Newbold | 2019-06-05 | 1 | -3/+3 |
| | |||||
* | release lookup view | Bryan Newbold | 2019-06-05 | 1 | -1/+1 |
| | |||||
* | tweak JALC tests for english swaperoo | Bryan Newbold | 2019-05-29 | 1 | -2/+2 |
| | |||||
* | faster LargeFile XML importer for PubMed | Bryan Newbold | 2019-05-29 | 1 | -3/+3 |
| | |||||
* | set superceded flag on 'old' arxiv releases | Bryan Newbold | 2019-05-23 | 1 | -0/+3 |
| | |||||
* | count linked refs (not just raw refs) in elasticsearch | Bryan Newbold | 2019-05-22 | 1 | -0/+6 |
| | |||||
* | arxiv license slug shorter; fix test | Bryan Newbold | 2019-05-22 | 1 | -2/+2 |
| | |||||
* | more JALC importer polish | Bryan Newbold | 2019-05-21 | 1 | -2/+29 |
| | |||||
* | JALC bulk file importer | Bryan Newbold | 2019-05-21 | 1 | -0/+100 |
| |