<feed xmlns='http://www.w3.org/2005/Atom'>
<title>fatcat/python/fatcat_tools/importers, branch v0.4.0</title>
<subtitle>[no description]</subtitle>
<id>https://git.bnewbold.net/fatcat/atom?h=v0.4.0</id>
<link rel='self' href='https://git.bnewbold.net/fatcat/atom?h=v0.4.0'/>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/'/>
<updated>2021-10-13T23:21:31Z</updated>
<entry>
<title>dblp import: basic support for handles as identifiers</title>
<updated>2021-10-13T23:21:31Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2021-10-13T22:53:32Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=2d755c83895271ad214dcefc234bf7da36e572e3'/>
<id>urn:sha1:2d755c83895271ad214dcefc234bf7da36e572e3</id>
<content type='text'>
</content>
</entry>
<entry>
<title>dblp import: fix typos in identifier parsing</title>
<updated>2021-10-13T23:21:31Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2021-10-13T22:39:05Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=e3f892877222309db1c98009d766c658bcb913bb'/>
<id>urn:sha1:e3f892877222309db1c98009d766c658bcb913bb</id>
<content type='text'>
</content>
</entry>
<entry>
<title>python: partial importer utilization of new schema changes</title>
<updated>2021-10-13T23:21:31Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2021-10-13T03:05:57Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=3052b094f2b3c1183abc17c9ca158eb6a8808a42'/>
<id>urn:sha1:3052b094f2b3c1183abc17c9ca158eb6a8808a42</id>
<content type='text'>
</content>
</entry>
<entry>
<title>Merge branch 'bnewbold-ingest-tweaks' into 'master'</title>
<updated>2021-10-02T01:22:40Z</updated>
<author>
<name>bnewbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-10-02T01:22:40Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=571502e21ccfb9f3cae5b0a8f8706f9ce99a08fe'/>
<id>urn:sha1:571502e21ccfb9f3cae5b0a8f8706f9ce99a08fe</id>
<content type='text'>
ingest importer behavior tweaks

See merge request webgroup/fatcat!120</content>
</entry>
<entry>
<title>kafka import: optional 'force-flush' mode for some importers</title>
<updated>2021-10-02T00:39:43Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2021-10-02T00:39:40Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=b72c18e3518e827bd09044deaadcbf0b0ca50335'/>
<id>urn:sha1:b72c18e3518e827bd09044deaadcbf0b0ca50335</id>
<content type='text'>
Behavior and motivation described in the kafka json import comment.
</content>
</entry>
<entry>
<title>new SPN web (html) importer</title>
<updated>2021-10-02T00:33:42Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2021-10-02T00:33:42Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=9618d5146eea046342b69895e68b937a056d2816'/>
<id>urn:sha1:9618d5146eea046342b69895e68b937a056d2816</id>
<content type='text'>
</content>
</entry>
<entry>
<title>ingest importer behavior tweaks</title>
<updated>2021-10-01T22:11:40Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2021-10-01T22:11:38Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=6e0736cebcb2b1e5ddbae03127572ad9d1ffca49'/>
<id>urn:sha1:6e0736cebcb2b1e5ddbae03127572ad9d1ffca49</id>
<content type='text'>
- change order of 'want()' checks, so that result counts are clearer
- don't require GROBID success for file imports with SPN
</content>
</entry>
<entry>
<title>importer common: more verbose logging (with counts)</title>
<updated>2021-10-01T22:07:20Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2021-10-01T22:07:20Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=dd219464cfc90b9b469fd851b48b08668ff17ba8'/>
<id>urn:sha1:dd219464cfc90b9b469fd851b48b08668ff17ba8</id>
<content type='text'>
</content>
</entry>
<entry>
<title>datacite: skip empty abstracts</title>
<updated>2021-10-01T14:56:59Z</updated>
<author>
<name>Martin Czygan</name>
<email>martin.czygan@gmail.com</email>
</author>
<published>2021-10-01T14:56:59Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=bdc4347acbbdb9f58b7c3abc2578a488de3d0a85'/>
<id>urn:sha1:bdc4347acbbdb9f58b7c3abc2578a488de3d0a85</id>
<content type='text'>
Do not add abstracts where `clean` results in the empty string - this
violates a constraint: `either abstract_sha1 or content is required`
</content>
</entry>
<entry>
<title>more consistent and defensive lower-casing of DOIs</title>
<updated>2021-06-24T00:51:15Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2021-06-17T23:26:50Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=fa11747574f086e99459914f93d24bad7a8eacce'/>
<id>urn:sha1:fa11747574f086e99459914f93d24bad7a8eacce</id>
<content type='text'>
After noticing more upper/lower ambiguity in production. In particular,
we have some old ingest requests in sandcrawler DB, which get
re-submitted/re-tried, which have capitalized DOIs in the link source id
field.
</content>
</entry>
</feed>
