<feed xmlns='http://www.w3.org/2005/Atom'>
<title>fatcat/notes/cleanups, branch master</title>
<subtitle>[no description]</subtitle>
<id>https://git.bnewbold.net/fatcat/atom?h=master</id>
<link rel='self' href='https://git.bnewbold.net/fatcat/atom?h=master'/>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/'/>
<updated>2021-11-29T22:33:14Z</updated>
<entry>
<title>move 'cleanups' directory from notes to extra/</title>
<updated>2021-11-29T22:33:14Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2021-11-29T22:33:14Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=c5ea2dba358624f4c14da0a1a988ae14d0edfd59'/>
<id>urn:sha1:c5ea2dba358624f4c14da0a1a988ae14d0edfd59</id>
<content type='text'>
</content>
</entry>
<entry>
<title>Merge branch 'bnewbold-container-merger'</title>
<updated>2021-11-29T22:31:26Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2021-11-29T22:31:26Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=ec2809ef2ac51c992463839c1e3451927f5e1661'/>
<id>urn:sha1:ec2809ef2ac51c992463839c1e3451927f5e1661</id>
<content type='text'>
</content>
</entry>
<entry>
<title>notes on file_meta partial cleanup</title>
<updated>2021-11-25T03:58:20Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2021-11-25T03:58:20Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=eb60449cdc9614ec7eda79b8481d1d8487b9a5f6'/>
<id>urn:sha1:eb60449cdc9614ec7eda79b8481d1d8487b9a5f6</id>
<content type='text'>
</content>
</entry>
<entry>
<title>notes on container ISSN-L merging, tested in QA</title>
<updated>2021-11-25T02:22:06Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2021-11-25T02:22:06Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=487923dc81d877207556f8a90a3ce048fe6bafb5'/>
<id>urn:sha1:487923dc81d877207556f8a90a3ce048fe6bafb5</id>
<content type='text'>
</content>
</entry>
<entry>
<title>Merge branch 'bnewbold-mergers' into 'master'</title>
<updated>2021-11-25T00:36:34Z</updated>
<author>
<name>bnewbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-11-25T00:36:34Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=5bc5eeed5e3ba54c2129c4233b881291c5fa7449'/>
<id>urn:sha1:5bc5eeed5e3ba54c2129c4233b881291c5fa7449</id>
<content type='text'>
entity mergers framework

See merge request webgroup/fatcat!133</content>
</entry>
<entry>
<title>codepsell fixes to notes</title>
<updated>2021-11-24T23:48:42Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2021-11-24T23:48:42Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=fafc32e0ea1adc95eea817af7273d4c47422b364'/>
<id>urn:sha1:fafc32e0ea1adc95eea817af7273d4c47422b364</id>
<content type='text'>
</content>
</entry>
<entry>
<title>file de-dupe: notes on prep and QA testing</title>
<updated>2021-11-24T02:58:37Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2021-11-24T02:58:37Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=8080eef139b5dcf6201e4f27076a879d0df20096'/>
<id>urn:sha1:8080eef139b5dcf6201e4f27076a879d0df20096</id>
<content type='text'>
</content>
</entry>
<entry>
<title>document cleanups run this week</title>
<updated>2021-11-12T19:45:48Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2021-11-12T19:45:48Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=f157cc7a50e0fd9a1c79efb3c29be7d8508ffa66'/>
<id>urn:sha1:f157cc7a50e0fd9a1c79efb3c29be7d8508ffa66</id>
<content type='text'>
</content>
</entry>
<entry>
<title>Merge branch 'bnewbold-import-refactors' into 'master'</title>
<updated>2021-11-11T01:12:18Z</updated>
<author>
<name>bnewbold</name>
<email>bnewbold@archive.org</email>
</author>
<published>2021-11-11T01:12:18Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=6ad9d24e4d7d901d6fc394e6e91575f6acba7ff4'/>
<id>urn:sha1:6ad9d24e4d7d901d6fc394e6e91575f6acba7ff4</id>
<content type='text'>
import refactors and deprecations

Some of these are from old stale branches (the datacite subject metadata patch), but most are from yesterday and today. Sort of a hodge-podge, but the general theme is getting around to deferred cleanups and refactors specific to importer code before making some behavioral changes.

The Datacite-specific stuff could use review here.

Remove unused/deprecated/dead code:

- cdl_dash_dat and wayback_static importers, which were for specific early example entities and have been superseded by other importers
- "extid map" sqlite3 feature from several importers, was only used for initial bulk imports (and maybe should not have been used)

Refactors:

- moved a number of large datastructures out of importer code and into a dedicated static file (`biblio_lookup_tables.py`). Didn't move all, just the ones that were either generic or very large (making it hard to read code)
- shuffled around relative imports and some function names ("clean_str" vs. "clean")

Some actual behavioral changes:

- remove some Datacite-specific license slugs
- stop trying to fix double-slashes in DOIs, that was causing more harm than help (some DOIs do actually have double-slashes!)
- remove some excess metadata from datacite 'extra' fields</content>
</entry>
<entry>
<title>wayback ts cleanup: one more filter tweak</title>
<updated>2021-11-10T06:55:58Z</updated>
<author>
<name>Bryan Newbold</name>
<email>bnewbold@robocracy.org</email>
</author>
<published>2021-11-10T06:55:58Z</published>
<link rel='alternate' type='text/html' href='https://git.bnewbold.net/fatcat/commit/?id=cd09c6d6bd4deef0627de4f8a8a301725db01e14'/>
<id>urn:sha1:cd09c6d6bd4deef0627de4f8a8a301725db01e14</id>
<content type='text'>
</content>
</entry>
</feed>
