aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--CHANGELOG.md34
-rw-r--r--notes/bulk_edits/CHANGELOG.md9
2 files changed, 41 insertions, 2 deletions
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 1a669e5f..287f651d 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -16,15 +16,45 @@ See also:
## [Unreleased]
-## Changed
+## [0.3.2] - 2020-04-08
+This release was tagged retro-actively; it was the last commit before upgrading
+to Python 3.7.
+
+Many small changes and tweaks to importers, web interface, etc were made in
+this release.
+
+### Fixed
+
+- pubmed importer `text` vs. `get_text()` for HTML tags
+
+### Changed
+
+- minimum rust version now 1.36
- Switch from swagger-codegen to openapi-generator for python client generation
+- switch python Kafka code from pykafka to confluent-kafka
+- update release and container elasticsearch schemas to v03b. Release search is
+ now over "biblio" field, allowing matches on multiple fields at the same time
+- Crossref harvester using 'update-date' not 'index-date' to detect updated documents
+
+### Removed
+
+- OpenSSL support removed from fatcatd (Rust)
-## Added
+#@# Added
- webface endpoints for entity view URLs with an underscore instead of slash,
as a redirect. Eg, `https://fatcat.wiki/release_asdf` =>
`https://fatcat.wiki/release/asdf`. A hack to make copy/paste easier.
+- pagination of search results in web interface
+- sandcrawler daily crawling pipeline, including ingest-file importer and
+ publishing requests to sandcrawler kafka topic
+- "Save Paper Now" feature (using sandcrawler pipeline)
+- Datacite DOI registrar daily harvesting and importing
+- Arxiv daily harvesting, using OAI-PMH worker
+- Pubmed daily harvesting, using FTP worker
+- "file" entity elasticsearch schema (though pipeline not yet running
+ continuously)
## [0.3.1] - 2019-09-18
diff --git a/notes/bulk_edits/CHANGELOG.md b/notes/bulk_edits/CHANGELOG.md
index 172528da..be53d10c 100644
--- a/notes/bulk_edits/CHANGELOG.md
+++ b/notes/bulk_edits/CHANGELOG.md
@@ -9,6 +9,13 @@ this file should probably get merged into the guide at some point.
This file should not turn in to a TODO list!
+## 2020-03
+
+Started harvesting both Arxiv and Pubmed metadata daily and importing to
+fatcat. Did backfill imports for both sources.
+
+JALC DOI register update from 2019 dump.
+
## 2020-01
Imported around 2,500 new containers (journals, by ISSN-L) from chocula
@@ -21,6 +28,8 @@ Imported new release entities from 2020 Pubmed/MEDLINE baseline. This import
included only new Pubmed works cataloged in 2019 (up until December or so).
Only a few hundred thousand new release entities.
+Daily "ingest" (crawling) pipeline running.
+
## 2019-12
Started continuous harvesting Datacite DOI metadata; first date harvested was