Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | pubmed: ignore empty map during baseline update | Martin Czygan | 2022-12-12 | 1 | -3/+13 |
| | | | | | | | | | | | > NLM produces a baseline set of PubMed citation records in XML format for download on an annual basis. The annual baseline is released in December of each year. -- https://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/README.txt Last occurence Dec 8, 2022. Since we do not know the exact date, but the Pubmed docs explicitly state "December", we ignore empty map error in this month. | ||||
* | Merge branch 'bnewbold-dblp-iteration' into 'master' | bnewbold | 2022-07-25 | 2 | -2/+9 |
|\ | | | | | | | | | dblp import iteration See merge request webgroup/fatcat!141 | ||||
| * | ingest: generate URLs for hdl (handle.net) | Bryan Newbold | 2022-07-19 | 1 | -0/+4 |
| | | |||||
| * | dblp: more skip patterns, and rename variable | Bryan Newbold | 2022-07-19 | 1 | -2/+5 |
| | | |||||
* | | chocula importer: do update if publisher_type was null | Bryan Newbold | 2022-07-21 | 1 | -0/+3 |
| | | |||||
* | | doaj: fix tests now that container_id is required | Bryan Newbold | 2022-07-19 | 1 | -1/+1 |
| | | |||||
* | | doaj: require container linkage for release import | Bryan Newbold | 2022-07-19 | 1 | -0/+4 |
|/ | |||||
* | ingest: DOAJ article URLs | Bryan Newbold | 2022-07-12 | 1 | -0/+4 |
| | |||||
* | arxiv: work-around hack for strange title | Bryan Newbold | 2022-07-07 | 1 | -0/+8 |
| | |||||
* | fileset ingest: handle missing/partial file-level metadata | Bryan Newbold | 2022-04-05 | 1 | -3/+3 |
| | |||||
* | ingest importer: improved extra/edit_extra code flow | Bryan Newbold | 2022-04-05 | 1 | -20/+13 |
| | |||||
* | fileset ingest: remove a TODO | Bryan Newbold | 2022-04-04 | 1 | -1/+0 |
| | |||||
* | filesets: typo bugfix, and test 'mimetype' on entity, not extra | Bryan Newbold | 2022-04-04 | 1 | -1/+1 |
| | |||||
* | fileset ingest: fix mimetype handling | Bryan Newbold | 2022-03-31 | 1 | -4/+5 |
| | |||||
* | bugfix: logic flow in fileset release checking | Bryan Newbold | 2022-03-23 | 1 | -3/+6 |
| | |||||
* | single-file variant of fileset importer for dataset attempts | Bryan Newbold | 2022-03-23 | 2 | -0/+202 |
| | |||||
* | fix typo in fileset comparison helper | Bryan Newbold | 2022-03-23 | 1 | -1/+1 |
| | |||||
* | ingest fileset fixes, and some test coverage | Bryan Newbold | 2022-03-23 | 2 | -13/+30 |
| | |||||
* | dataset ingest: JSON object fixes | Bryan Newbold | 2022-03-22 | 1 | -5/+5 |
| | |||||
* | Merge branch 'bnewbold-container-web' into 'master' | bnewbold | 2022-03-10 | 3 | -2/+185 |
|\ | | | | | | | | | container web interface improvements See merge request webgroup/fatcat!140 | ||||
| * | move container_status ES query code from fatcat_web to fatcat_tools | Bryan Newbold | 2022-02-09 | 3 | -2/+185 |
| | | | | | | | | | | | | The main motivation is to never have fatcat_tools import from fatcat_web, only vica-versa. Some code in fatcat_tools needs container stats, so starting with that code path (plus some generic helpers). | ||||
* | | entity updates: don't try to ingest arxiv DOIs (for now) | Bryan Newbold | 2022-02-28 | 1 | -0/+2 |
| | | |||||
* | | datacite importer: skip container_id for some repository sources | Bryan Newbold | 2022-02-09 | 1 | -0/+34 |
|/ | |||||
* | doaj importer: TODO note to skip some larger publishers | Bryan Newbold | 2022-02-09 | 1 | -0/+4 |
| | |||||
* | container ES transform: include old extra.issne/p fields | Bryan Newbold | 2022-02-03 | 1 | -1/+4 |
| | | | | | These were removed prematurely. Not all containers have been updated to use these fields yet. | ||||
* | Merge branch 'bnewbold-file-es' into 'master' | bnewbold | 2022-01-21 | 3 | -4/+38 |
|\ | | | | | | | | | File entity elasticsearch index worker See merge request webgroup/fatcat!136 | ||||
| * | entity worker: expand creators in release entities | Bryan Newbold | 2021-12-15 | 1 | -1/+1 |
| | | |||||
| * | small default config typo fixes for elasticsearch workers | Bryan Newbold | 2021-12-15 | 1 | -2/+2 |
| | | |||||
| * | file elasticsearch index worker | Bryan Newbold | 2021-12-15 | 2 | -1/+35 |
| | | |||||
* | | crossref importer: skip affiliations lacking 'name' | Bryan Newbold | 2021-12-15 | 1 | -0/+3 |
|/ | | | | Relatedly, we should start handling ROR affiliations in contribs soon. | ||||
* | mergers: fix typo in env var name | Bryan Newbold | 2021-12-07 | 3 | -3/+3 |
| | |||||
* | ES container schema: add 'sim_pubid' and `ia_sim_collection` fields | Bryan Newbold | 2021-12-03 | 1 | -0/+2 |
| | |||||
* | ES transform: remove prototype microfilm links | Bryan Newbold | 2021-12-03 | 1 | -20/+0 |
| | | | | This ended up being a feature in scholar.archive.org, not fatcat. | ||||
* | chocula importer: handle not-upper-case ISSNs | Bryan Newbold | 2021-11-30 | 1 | -2/+6 |
| | |||||
* | chocula importer: handle broken ISSNs in extra metadata | Bryan Newbold | 2021-11-30 | 1 | -2/+7 |
| | |||||
* | chocula importer: tweak counting, conditions for doing updates | Bryan Newbold | 2021-11-30 | 1 | -15/+7 |
| | |||||
* | chocula importer: move issne/issnp 'extra' to top-level fields if doing updates | Bryan Newbold | 2021-11-30 | 1 | -0/+6 |
| | |||||
* | chocula: don't do name cleanups in importer | Bryan Newbold | 2021-11-30 | 1 | -8/+2 |
| | | | | This kind of cleanup should be done in 'chocula' instead. | ||||
* | container merger: fix bug with filtering by release count | Bryan Newbold | 2021-11-30 | 1 | -13/+15 |
| | | | | | Also apply the "human edit" and "release count" checks only to the dupe (to-be-redirected) idents. | ||||
* | release merger: same editgroup_id fixes as for file and container mergers | Bryan Newbold | 2021-11-24 | 1 | -1/+5 |
| | |||||
* | container merger: fixes from QA testing | Bryan Newbold | 2021-11-24 | 1 | -8/+13 |
| | |||||
* | mergers: don't try to accept empty editgroups in dry-run-mode | Bryan Newbold | 2021-11-24 | 1 | -2/+4 |
| | |||||
* | ES release transform: handle redirected containers better | Bryan Newbold | 2021-11-24 | 1 | -1/+1 |
| | | | | | Despite the inline comment, we were not actually grabbing the "redirected" ident correctly, meaning some counts would not be accurate. | ||||
* | container merger: defer allocation of editgroup_id; and dummy code path | Bryan Newbold | 2021-11-24 | 1 | -1/+5 |
| | |||||
* | initial implementation of container merger | Bryan Newbold | 2021-11-24 | 1 | -0/+237 |
| | |||||
* | file merger: allocate editgroup id later in 'merge' process | Bryan Newbold | 2021-11-24 | 1 | -1/+5 |
| | | | | | The motivation is to avoid creating empty editgroups in dry-run mode, and when all entities are "skipped" | ||||
* | Merge branch 'bnewbold-mergers' into 'master' | bnewbold | 2021-11-25 | 4 | -0/+640 |
|\ | | | | | | | | | entity mergers framework See merge request webgroup/fatcat!133 | ||||
| * | mergers common: remove inaccurate comment | Bryan Newbold | 2021-11-24 | 1 | -2/+0 |
| | | | | | | | | Caught in review, thanks miku | ||||
| * | file merger: add content_scope to list of merged fields | Bryan Newbold | 2021-11-24 | 1 | -1/+1 |
| | | |||||
| * | release merger: some progress, but also disable (not complete) | Bryan Newbold | 2021-11-23 | 1 | -12/+72 |
| | |