From 4953fec93338084adfc37ec40cfed2bc242744f3 Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Fri, 17 Apr 2020 10:47:57 -0700 Subject: retro-active v0.3.2 changelog updates --- CHANGELOG.md | 34 ++++++++++++++++++++++++++++++++-- notes/bulk_edits/CHANGELOG.md | 9 +++++++++ 2 files changed, 41 insertions(+), 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 1a669e5f..287f651d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -16,15 +16,45 @@ See also: ## [Unreleased] -## Changed +## [0.3.2] - 2020-04-08 +This release was tagged retro-actively; it was the last commit before upgrading +to Python 3.7. + +Many small changes and tweaks to importers, web interface, etc were made in +this release. + +### Fixed + +- pubmed importer `text` vs. `get_text()` for HTML tags + +### Changed + +- minimum rust version now 1.36 - Switch from swagger-codegen to openapi-generator for python client generation +- switch python Kafka code from pykafka to confluent-kafka +- update release and container elasticsearch schemas to v03b. Release search is + now over "biblio" field, allowing matches on multiple fields at the same time +- Crossref harvester using 'update-date' not 'index-date' to detect updated documents + +### Removed + +- OpenSSL support removed from fatcatd (Rust) -## Added +#@# Added - webface endpoints for entity view URLs with an underscore instead of slash, as a redirect. Eg, `https://fatcat.wiki/release_asdf` => `https://fatcat.wiki/release/asdf`. A hack to make copy/paste easier. +- pagination of search results in web interface +- sandcrawler daily crawling pipeline, including ingest-file importer and + publishing requests to sandcrawler kafka topic +- "Save Paper Now" feature (using sandcrawler pipeline) +- Datacite DOI registrar daily harvesting and importing +- Arxiv daily harvesting, using OAI-PMH worker +- Pubmed daily harvesting, using FTP worker +- "file" entity elasticsearch schema (though pipeline not yet running + continuously) ## [0.3.1] - 2019-09-18 diff --git a/notes/bulk_edits/CHANGELOG.md b/notes/bulk_edits/CHANGELOG.md index 172528da..be53d10c 100644 --- a/notes/bulk_edits/CHANGELOG.md +++ b/notes/bulk_edits/CHANGELOG.md @@ -9,6 +9,13 @@ this file should probably get merged into the guide at some point. This file should not turn in to a TODO list! +## 2020-03 + +Started harvesting both Arxiv and Pubmed metadata daily and importing to +fatcat. Did backfill imports for both sources. + +JALC DOI register update from 2019 dump. + ## 2020-01 Imported around 2,500 new containers (journals, by ISSN-L) from chocula @@ -21,6 +28,8 @@ Imported new release entities from 2020 Pubmed/MEDLINE baseline. This import included only new Pubmed works cataloged in 2019 (up until December or so). Only a few hundred thousand new release entities. +Daily "ingest" (crawling) pipeline running. + ## 2019-12 Started continuous harvesting Datacite DOI metadata; first date harvested was -- cgit v1.2.3