index
:
sandcrawler
bnewbold-args
bnewbold-backfill
bnewbold-persist-grobid-errors
bnewbold-refactor-loggging
master
trawler
[no description]
about
summary
refs
log
tree
commit
diff
stats
log msg
author
committer
range
Commit message (
Expand
)
Author
Age
Files
Lines
*
insertable flag for match-crossref
Bryan Newbold
2018-09-12
1
-1
/
+9
*
hack scorejob variant with extra context joined in
Bryan Newbold
2018-09-12
2
-0
/
+348
*
blacklist -> denylist
Bryan Newbold
2018-09-05
5
-12
/
+12
*
Merge branch 'ellen-none-refactor' into 'master'
bnewbold
2018-09-05
11
-154
/
+179
|
\
|
*
Merge branch 'ellen-none-refactor' of git.archive.org:webgroup/sandcrawler in...
Ellen Spertus
2018-09-04
0
-0
/
+0
|
|
\
|
|
*
changed style of ScoreJobTest.bundle
Ellen Spertus
2018-09-04
1
-14
/
+10
|
|
*
minor style improvement
Ellen Spertus
2018-09-04
1
-2
/
+2
|
|
*
restored code I inadvertantly removed when merging
Ellen Spertus
2018-08-28
1
-1
/
+4
|
|
*
fixed scalastyle issues, including cyclomatic complexity
Ellen Spertus
2018-08-28
2
-48
/
+62
|
|
*
fixed tests after replacing NoSlug with None
Ellen Spertus
2018-08-28
4
-77
/
+85
|
|
*
replaced NoSlug with proper use of Option
Ellen Spertus
2018-08-28
7
-37
/
+41
|
*
|
changed style of ScoreJobTest.bundle
Ellen Spertus
2018-09-04
1
-14
/
+10
|
*
|
minor style improvement
Ellen Spertus
2018-09-04
1
-2
/
+2
|
*
|
restored code I inadvertantly removed when merging
Ellen Spertus
2018-09-04
1
-1
/
+4
|
*
|
fixed scalastyle issues, including cyclomatic complexity
Ellen Spertus
2018-09-04
2
-48
/
+62
|
*
|
fixed tests after replacing NoSlug with None
Ellen Spertus
2018-09-04
4
-77
/
+85
|
*
|
replaced NoSlug with proper use of Option
Ellen Spertus
2018-09-04
7
-37
/
+41
|
/
/
*
|
warning in python/README
Bryan Newbold
2018-09-04
1
-0
/
+22
*
|
match crossref reducers=200
Bryan Newbold
2018-08-31
1
-1
/
+1
|
/
*
make similarity score case-insensitive
Bryan Newbold
2018-08-27
2
-1
/
+9
*
basic crossref subtitle concatination support
Bryan Newbold
2018-08-27
2
-1
/
+40
*
more special characters to strip
Bryan Newbold
2018-08-27
2
-2
/
+2
*
add even more entries to slug blacklist
Bryan Newbold
2018-08-27
1
-0
/
+96
*
update python TODO
Bryan Newbold
2018-08-27
1
-4
/
+1
*
Merge branch 'bnewbold-ungrobided'
Bryan Newbold
2018-08-27
8
-0
/
+703
|
\
|
*
finally got extraction_ungrobided to run in prod
Bryan Newbold
2018-08-26
1
-4
/
+7
|
*
WIP: ungrobided doesn't inherit (copypasta)
Bryan Newbold
2018-08-25
2
-7
/
+136
|
*
please: save extraction output
Bryan Newbold
2018-08-26
1
-0
/
+6
|
*
ungrobided: example real output
Bryan Newbold
2018-08-25
1
-0
/
+20
|
*
ungrobided: fix python call typo
Bryan Newbold
2018-08-25
1
-1
/
+1
|
*
disambiguration parse_line method
Bryan Newbold
2018-08-25
1
-3
/
+3
|
*
ungrobided: add real results to tests
Bryan Newbold
2018-08-25
1
-1
/
+51
|
*
add extraction_ungrobided support to please
Bryan Newbold
2018-08-25
1
-0
/
+30
|
*
python extraction_ungrobided job
Bryan Newbold
2018-08-24
3
-0
/
+288
|
*
HOWTO some common hbase shell queries
Bryan Newbold
2018-08-24
1
-0
/
+14
|
*
rename DumpUnGrobidedJob
Bryan Newbold
2018-08-24
2
-9
/
+9
|
*
please support for DumpUnGrobidedJob
Bryan Newbold
2018-08-24
1
-0
/
+24
|
*
scalding: UnGrobidedDumpJob
Bryan Newbold
2018-08-24
2
-0
/
+139
*
|
crude job stats/metrics in a text file
Bryan Newbold
2018-08-27
1
-0
/
+95
*
|
set CI LANG env variable
Bryan Newbold
2018-08-25
1
-0
/
+4
*
|
switch Dockerfile to xenial
Bryan Newbold
2018-08-24
1
-1
/
+1
|
/
*
clean up commented out code in scalding/
Bryan Newbold
2018-08-24
4
-23
/
+2
*
Merge branch 'bnewbold-missing-column'
Bryan Newbold
2018-08-24
2
-0
/
+96
|
\
|
*
fixes to please keys-missing-col
Bryan Newbold
2018-08-21
1
-2
/
+2
|
*
add please for keysmissingcolumn
Bryan Newbold
2018-08-21
1
-0
/
+29
|
*
rewrite MissingColumnDumpJob as a join (sigh)
Bryan Newbold
2018-08-21
1
-29
/
+37
|
*
WIP: MissingColumnDumpJob
Bryan Newbold
2018-08-21
1
-0
/
+59
*
|
more TODO
Bryan Newbold
2018-08-24
1
-0
/
+5
*
|
update TODO
Bryan Newbold
2018-08-24
2
-13
/
+16
*
|
move HBase schema and notes from journal-infra repo
Bryan Newbold
2018-08-24
3
-0
/
+331
[next]