<feed xmlns='http://www.w3.org/2005/Atom'>
<title>sandcrawler/python/tests/files, branch bnewbold-persist-grobid-errors</title>
<subtitle>[no description]</subtitle>
<id>https://git.bnewbold.net/sandcrawler/atom?h=bnewbold-persist-grobid-errors</id>
<link rel='self' href='https://git.bnewbold.net/sandcrawler/atom?h=bnewbold-persist-grobid-errors'/>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/'/>
<updated>2020-01-10T00:30:02+00:00</updated>
<entry>
<title>basic elife+plos extraction tests</title>
<updated>2020-01-10T00:30:02+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2020-01-09T02:24:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=7eb8f74bc15d1acb5771320ec4e2342d85077555'/>
<id>urn:sha1:7eb8f74bc15d1acb5771320ec4e2342d85077555</id>
<content type='text'>
Ripped out some HTML, but these could have been minimized even further
to keep repository from growing large.
</content>
</entry>
<entry>
<title>teixml2json test update for skipping null JSON keys</title>
<updated>2020-01-03T02:12:58+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2019-12-27T05:33:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=9756f536b24ba13c34d0dcc49153020477cce012'/>
<id>urn:sha1:9756f536b24ba13c34d0dcc49153020477cce012</id>
<content type='text'>
</content>
</entry>
<entry>
<title>grobid2json: language_code</title>
<updated>2019-10-04T21:09:59+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2019-10-04T21:09:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=ee797ddc0a1377423cfe1939634e6d019eecea9e'/>
<id>urn:sha1:ee797ddc0a1377423cfe1939634e6d019eecea9e</id>
<content type='text'>
</content>
</entry>
<entry>
<title>python tests for pusher classes</title>
<updated>2019-10-03T01:02:40+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2019-10-03T01:02:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=910e59e9011935fecaae62520eb3fc30cbd65800'/>
<id>urn:sha1:910e59e9011935fecaae62520eb3fc30cbd65800</id>
<content type='text'>
</content>
</entry>
<entry>
<title>add tests for affiliation extraction</title>
<updated>2019-10-03T01:00:12+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2019-10-03T01:00:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=9092f027004095f5cacb5dc870737751397872cc'/>
<id>urn:sha1:9092f027004095f5cacb5dc870737751397872cc</id>
<content type='text'>
</content>
</entry>
<entry>
<title>refactor old python hadoop code into new directory</title>
<updated>2019-09-26T00:51:07+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2019-09-26T00:51:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=d7830b4a5aad0a59a588e98798711f0e694d50d6'/>
<id>urn:sha1:d7830b4a5aad0a59a588e98798711f0e694d50d6</id>
<content type='text'>
</content>
</entry>
<entry>
<title>fix test grobid2json test</title>
<updated>2019-09-26T00:35:08+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2019-09-26T00:35:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=a3383f8794bcd8aa9195de37c63f040086d57f77'/>
<id>urn:sha1:a3383f8794bcd8aa9195de37c63f040086d57f77</id>
<content type='text'>
For new extra fields
</content>
</entry>
<entry>
<title>start refactoring sandcrawler python common code</title>
<updated>2019-09-24T05:58:55+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2019-09-24T05:58:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=b438f52dbb7578c9a5c2153bc4ba50e33fdae7c3'/>
<id>urn:sha1:b438f52dbb7578c9a5c2153bc4ba50e33fdae7c3</id>
<content type='text'>
</content>
</entry>
<entry>
<title>update grobid2json to include given_name/surname</title>
<updated>2019-05-13T23:41:45+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2019-05-13T23:41:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=594678f6c2705b8b88c6e23d68981a851df0aa5e'/>
<id>urn:sha1:594678f6c2705b8b88c6e23d68981a851df0aa5e</id>
<content type='text'>
</content>
</entry>
<entry>
<title>longtail grobid metadata parse/filter WIP</title>
<updated>2018-09-23T05:53:50+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2018-09-23T05:53:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=7159fdf1ec55a4c9c096afb5eb1ce57b9a51f1e8'/>
<id>urn:sha1:7159fdf1ec55a4c9c096afb5eb1ce57b9a51f1e8</id>
<content type='text'>
</content>
</entry>
</feed>
