<feed xmlns='http://www.w3.org/2005/Atom'>
<title>sandcrawler/mapreduce, branch bnewbold-args</title>
<subtitle>[no description]</subtitle>
<id>https://git.bnewbold.net/sandcrawler/atom?h=bnewbold-args</id>
<link rel='self' href='https://git.bnewbold.net/sandcrawler/atom?h=bnewbold-args'/>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/'/>
<updated>2018-06-04T19:42:29+00:00</updated>
<entry>
<title>bnewbold-dev &gt; wbgrp-svc263</title>
<updated>2018-06-04T19:42:29+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2018-05-31T02:40:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=d2dd016aa8da93ad14654237dbb7cfac214f9da8'/>
<id>urn:sha1:d2dd016aa8da93ad14654237dbb7cfac214f9da8</id>
<content type='text'>
This is a new production VM running an HBase-Thrift gateway
</content>
</entry>
<entry>
<title>actually fix oversize inserts</title>
<updated>2018-05-08T03:24:01+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2018-05-08T03:24:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=0c398392aa298d28694bf5bd37d3e4912de8a2f5'/>
<id>urn:sha1:0c398392aa298d28694bf5bd37d3e4912de8a2f5</id>
<content type='text'>
</content>
</entry>
<entry>
<title>XML size limit</title>
<updated>2018-04-26T18:24:42+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2018-04-26T18:24:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=ee6ce29e7987f936536a0ef128d3a96cc1df3d86'/>
<id>urn:sha1:ee6ce29e7987f936536a0ef128d3a96cc1df3d86</id>
<content type='text'>
</content>
</entry>
<entry>
<title>force_existing flag for extraction</title>
<updated>2018-04-19T05:15:02+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2018-04-19T05:14:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=df23b6f45922875f0bf657aea3b8c3fb4451469d'/>
<id>urn:sha1:df23b6f45922875f0bf657aea3b8c3fb4451469d</id>
<content type='text'>
</content>
</entry>
<entry>
<title>NLineInputFormat requires RawProtocol</title>
<updated>2018-04-19T05:15:02+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2018-04-15T05:44:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=e0d1e381bf536d1c077546526c21eab909444193'/>
<id>urn:sha1:e0d1e381bf536d1c077546526c21eab909444193</id>
<content type='text'>
Should make this a command line argument or something. Want one in
HADOOP, the other for local/tests/inline/etc.
</content>
</entry>
<entry>
<title>local mrjob config</title>
<updated>2018-04-19T05:15:02+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2018-04-15T05:44:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=a8a568f03d7f537a8683adf23f6643c7704e8d3d'/>
<id>urn:sha1:a8a568f03d7f537a8683adf23f6643c7704e8d3d</id>
<content type='text'>
</content>
</entry>
<entry>
<title>switch to new (local) sentry instance</title>
<updated>2018-04-19T05:13:59+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2018-04-19T05:13:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=c3e925cdd00c28531ee37ac7ada5cdee229762db'/>
<id>urn:sha1:c3e925cdd00c28531ee37ac7ada5cdee229762db</id>
<content type='text'>
</content>
</entry>
<entry>
<title>update Pipfile.lock (new pluggy)</title>
<updated>2018-04-16T19:47:13+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2018-04-16T19:46:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=e70ac19416d7f2d8f91da3e700e0c3b88f3965bc'/>
<id>urn:sha1:e70ac19416d7f2d8f91da3e700e0c3b88f3965bc</id>
<content type='text'>
</content>
</entry>
<entry>
<title>use NLineInputFormat so we can control split size</title>
<updated>2018-04-11T05:33:53+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2018-04-11T05:33:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=43f4a6ec3895f4ac5a7db0dfa237aed44f52358b'/>
<id>urn:sha1:43f4a6ec3895f4ac5a7db0dfa237aed44f52358b</id>
<content type='text'>
</content>
</entry>
<entry>
<title>revert PYTHONPATH in cmdenv</title>
<updated>2018-04-11T05:19:34+00:00</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2018-04-11T05:19:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/sandcrawler/commit/?id=be1704a419a1e916bb0055e2b40d2db026976001'/>
<id>urn:sha1:be1704a419a1e916bb0055e2b40d2db026976001</id>
<content type='text'>
Seemed to break hadoop jobs for some reason
</content>
</entry>
</feed>
