<feed xmlns='http://www.w3.org/2005/Atom'>
<title>fatcat/python/tests/files, branch v0.3.2</title>
<subtitle>[no description]</subtitle>
<id>https://git.bnewbold.net/fatcat/atom?h=v0.3.2</id>
<link rel='self' href='https://git.bnewbold.net/fatcat/atom?h=v0.3.2'/>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/'/>
<updated>2020-03-20T20:00:52+00:00</updated>
<entry>
<title>pubmed: handle multiple ReferenceList</title>
<updated>2020-03-20T20:00:52+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2020-03-20T20:00:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=a6f74183dd1cf1eaa44f7edeb98dbc5dc737dabb'/>
<id>urn:sha1:a6f74183dd1cf1eaa44f7edeb98dbc5dc737dabb</id>
<content type='text'>
This resolves a situation noticed in prod where we were only
importing/updating a single reference per article.

Includes a regression test.
</content>
</entry>
<entry>
<title>Merge branch 'martin-kafka-bs4-import' into 'master'</title>
<updated>2020-03-10T15:33:17+00:00</updated>
<author>
<name>Martin Czygan</name>
<email>martin@archive.org</email>
</author>
<published>2020-03-10T15:33:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=336630e1d445fb9d233447f9af4bac94473a12bf'/>
<id>urn:sha1:336630e1d445fb9d233447f9af4bac94473a12bf</id>
<content type='text'>
pubmed and arxiv harvest preparations

See merge request webgroup/fatcat!28</content>
</entry>
<entry>
<title>Merge branch 'bnewbold-elastic-v03b'</title>
<updated>2020-02-27T06:05:43+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2020-02-27T06:05:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=ae50ee2274031ddc178fa4a10b59280e8440a24c'/>
<id>urn:sha1:ae50ee2274031ddc178fa4a10b59280e8440a24c</id>
<content type='text'>
</content>
</entry>
<entry>
<title>more pubmed adjustments</title>
<updated>2020-02-22T16:44:38+00:00</updated>
<author>
<name>Martin Czygan</name>
<email>martin.czygan@gmail.com</email>
</author>
<published>2020-02-19T01:28:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=376053a479a8d683fc5e099d0b0b3cb76c026d16'/>
<id>urn:sha1:376053a479a8d683fc5e099d0b0b3cb76c026d16</id>
<content type='text'>
* regenerate map in continuous mode
* add tests
</content>
</entry>
<entry>
<title>shadow import: more filtering of file_meta fields</title>
<updated>2020-02-14T06:24:20+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2020-01-30T20:15:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=87029cb13d244381f915fe66e40760477edb5675'/>
<id>urn:sha1:87029cb13d244381f915fe66e40760477edb5675</id>
<content type='text'>
</content>
</entry>
<entry>
<title>basic shadow importer</title>
<updated>2020-02-14T06:24:20+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2019-12-24T01:59:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=e59d1b617d4abd5f002d9e59b6bbaebc9ff30993'/>
<id>urn:sha1:e59d1b617d4abd5f002d9e59b6bbaebc9ff30993</id>
<content type='text'>
</content>
</entry>
<entry>
<title>datacite: add exception for https://www.micropublication.org/</title>
<updated>2020-01-31T00:44:46+00:00</updated>
<author>
<name>Martin Czygan</name>
<email>martin.czygan@gmail.com</email>
</author>
<published>2020-01-31T00:44:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=a42206d2603e28f1311ac3873dc168c78eabffee'/>
<id>urn:sha1:a42206d2603e28f1311ac3873dc168c78eabffee</id>
<content type='text'>
</content>
</entry>
<entry>
<title>datacite: improve date handling and minor tweak</title>
<updated>2020-01-30T12:36:01+00:00</updated>
<author>
<name>Martin Czygan</name>
<email>martin.czygan@gmail.com</email>
</author>
<published>2020-01-30T12:36:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=7dec2d1560ebf5ca6d0d337eb246fe345f6ec0bb'/>
<id>urn:sha1:7dec2d1560ebf5ca6d0d337eb246fe345f6ec0bb</id>
<content type='text'>
Records from https://www.micropublication.org/ did not have a date in
FC, although raw data contained date strings - they were not using the
finer-grained "attributes.date" but "attributes.published" and/or
"attributes.publicationYear".

Support for those fields has been added, including a test case.

During this test (#30) a processing gap for names became clear (author
may have "given_name" and "surname", but no "name"). This bug has been
fixed, too.
</content>
</entry>
<entry>
<title>fix some transform bugs, add some tests</title>
<updated>2020-01-30T05:59:05+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2020-01-30T05:52:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=5d458a3df7e58e6551d8ec72979e376c62fdd2f7'/>
<id>urn:sha1:5d458a3df7e58e6551d8ec72979e376c62fdd2f7</id>
<content type='text'>
</content>
</entry>
<entry>
<title>do not normalize "en dash" in DOI</title>
<updated>2020-01-17T13:03:00+00:00</updated>
<author>
<name>Martin Czygan</name>
<email>martin.czygan@gmail.com</email>
</author>
<published>2020-01-17T13:03:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=53756811572bab0679cb8cee1b9de95e7b29b96a'/>
<id>urn:sha1:53756811572bab0679cb8cee1b9de95e7b29b96a</id>
<content type='text'>
Technically, [...] DOI names may incorporate any printable characters
from the Universal Character Set (UCS-2), of ISO/IEC 10646, which is the
character set defined by Unicode (https://www.doi.org/doi_handbook/2_Numbering.html#2.5.1).

For mostly QA reasons, we currently treat a DOI with an "en dash" as
invalid.
</content>
</entry>
</feed>
