<feed xmlns='http://www.w3.org/2005/Atom'>
<title>fatcat/python/fatcat_tools/importers, branch bnewbold-rust-gen-v5</title>
<subtitle>[no description]</subtitle>
<id>https://git.bnewbold.net/fatcat/atom?h=bnewbold-rust-gen-v5</id>
<link rel='self' href='https://git.bnewbold.net/fatcat/atom?h=bnewbold-rust-gen-v5'/>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/'/>
<updated>2020-04-22T20:25:36Z</updated>
<entry>
<title>datacite: fix type error</title>
<updated>2020-04-22T20:25:36Z</updated>
<author>
<name>Martin Czygan</name>
<email>martin.czygan@gmail.com</email>
</author>
<published>2020-04-22T20:25:36Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=e0baeade7924019c5bbd27d9a7c116a1e26854fc'/>
<id>urn:sha1:e0baeade7924019c5bbd27d9a7c116a1e26854fc</id>
<content type='text'>
Up to now, we expected the description to be a string or list. Add
handling for int as well.

First appeared: Apr 22 19:58:39.
</content>
</entry>
<entry>
<title>datacite: fix a raw name constraint violation</title>
<updated>2020-04-20T18:52:10Z</updated>
<author>
<name>Martin Czygan</name>
<email>martin.czygan@gmail.com</email>
</author>
<published>2020-04-20T18:52:10Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=7c6febf20c84dd4f5778e1fb02369456f7dad344'/>
<id>urn:sha1:7c6febf20c84dd4f5778e1fb02369456f7dad344</id>
<content type='text'>
It was possible that contribs got added which had no raw name. One
example would be a name consisting of whitespace only.

This fix adds a final check for this case.
</content>
</entry>
<entry>
<title>consistently use raw string prefix for regex</title>
<updated>2020-04-17T17:56:17Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2020-04-16T04:35:48Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=116a26f072e8628cc4cfabb2e55c6661b6b94605'/>
<id>urn:sha1:116a26f072e8628cc4cfabb2e55c6661b6b94605</id>
<content type='text'>
</content>
</entry>
<entry>
<title>pubmed: use untranslated title if translated not available</title>
<updated>2020-04-01T19:02:45Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2020-04-01T19:02:43Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=938d2c5366d80618b839c83baadc9b5c62d10dce'/>
<id>urn:sha1:938d2c5366d80618b839c83baadc9b5c62d10dce</id>
<content type='text'>
The primary motivation for this change is that fatcat *requires* a
non-empty title for each release entity. Pubmed/Medline occasionally
indexes just a VenacularTitle with no ArticleTitle for foreign
publications, and currently those records don't end up in fatcat at all.
</content>
</entry>
<entry>
<title>importers: replace newlines in get_text() strings</title>
<updated>2020-04-01T19:02:20Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2020-04-01T19:02:20Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=f77a553350238c8ccc9c3bc0edcf47fb9dd067b3'/>
<id>urn:sha1:f77a553350238c8ccc9c3bc0edcf47fb9dd067b3</id>
<content type='text'>
</content>
</entry>
<entry>
<title>importers: more string/get_text swaps</title>
<updated>2020-03-29T03:12:58Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2020-03-29T03:12:54Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=6681500eeffe39b7d029a0e0d6b2ed83729f555f'/>
<id>urn:sha1:6681500eeffe39b7d029a0e0d6b2ed83729f555f</id>
<content type='text'>
See previous pubmed commit for details.
</content>
</entry>
<entry>
<title>pubmed: bunch of .get_text() instead of .string</title>
<updated>2020-03-29T03:01:48Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2020-03-29T03:01:46Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=d6af7b7544ddb3b5e7b1f4a0fd76bd9cd5ed9125'/>
<id>urn:sha1:d6af7b7544ddb3b5e7b1f4a0fd76bd9cd5ed9125</id>
<content type='text'>
Yikes! Apparently when a tag has child tags, .string will return None
instead of all the strings. .get_text() returns all of it:

  https://www.crummy.com/software/BeautifulSoup/bs4/doc/#get-text
  https://www.crummy.com/software/BeautifulSoup/bs4/doc/#string

I've things like identifiers as .string, when we expect only a single
string inside.
</content>
</entry>
<entry>
<title>Merge pull request #53 from EdwardBetts/spelling</title>
<updated>2020-03-27T23:50:08Z</updated>
<author>
<name>bnewbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2020-03-27T23:50:08Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=98abe2e751187aa7c2e751b355ffb56d9b1f8c6a'/>
<id>urn:sha1:98abe2e751187aa7c2e751b355ffb56d9b1f8c6a</id>
<content type='text'>
Correct spelling mistakes</content>
</entry>
<entry>
<title>Correct spelling mistakes</title>
<updated>2020-03-27T21:25:54Z</updated>
<author>
<name>Edward Betts</name>
<email>edward@4angle.com</email>
</author>
<published>2020-03-27T21:25:54Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=94710b2803780ab16fb30b79010f8e27cf115512'/>
<id>urn:sha1:94710b2803780ab16fb30b79010f8e27cf115512</id>
<content type='text'>
</content>
</entry>
<entry>
<title>datacite: nameIdentifier corner case</title>
<updated>2020-03-26T21:09:15Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2020-03-26T20:58:32Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=ec82404f0d0ad6b92491a1cb90a823d421857348'/>
<id>urn:sha1:ec82404f0d0ad6b92491a1cb90a823d421857348</id>
<content type='text'>
Works around a bug in production:

  AttributeError: 'NoneType' object has no attribute 'replace'
  (datacite.py:724)

NOTE: there are no tests for this code path
</content>
</entry>
</feed>
