index
:
sandcrawler
bnewbold-args
bnewbold-backfill
bnewbold-persist-grobid-errors
bnewbold-refactor-loggging
master
trawler
[no description]
about
summary
refs
log
tree
commit
diff
stats
log msg
author
committer
range
path:
root
/
scalding
/
src
/
main
Commit message (
Expand
)
Author
Age
Files
Lines
*
Merge branch 'bnewbold-backfill' into 'master'
bnewbold
2021-10-04
1
-0
/
+187
|
\
|
*
small CdxBackfillJob refactor (code quality)
Bryan Newbold
2018-07-24
1
-5
/
+5
|
*
do sha1 pattern match correctly
Bryan Newbold
2018-07-24
1
-3
/
+7
|
*
more PDF mimetypes; fix return refactor
Bryan Newbold
2018-07-24
1
-2
/
+5
|
*
CdxBackfillJob: comment cleanup
Bryan Newbold
2018-07-24
1
-6
/
+0
|
*
CdxBackfillJob: scalastyle
Bryan Newbold
2018-07-24
1
-22
/
+14
|
*
address some (but not all) review comments
Bryan Newbold
2018-07-24
1
-20
/
+21
|
*
fix CdxBackfillJob tests
Bryan Newbold
2018-07-24
1
-4
/
+11
|
*
some scalastyle on CdxBackfillJob
Bryan Newbold
2018-07-24
1
-7
/
+8
|
*
CdxBackfillJob: implement other fields
Bryan Newbold
2018-07-24
1
-10
/
+24
|
*
CdxBackfillJob back to HBase; tests work
Bryan Newbold
2018-07-24
1
-7
/
+5
|
*
variant of CdxBackfillJob that writes to TSV
Bryan Newbold
2018-07-24
1
-0
/
+173
*
|
GroupFatcatWorksSubsetJob
Bryan Newbold
2019-08-26
2
-0
/
+67
*
|
please command for groupworksfatcat
Bryan Newbold
2019-08-10
1
-1
/
+1
*
|
FatcatScorable and ScoreSelfFatcat job
Bryan Newbold
2019-08-10
2
-0
/
+174
*
|
add fatcat ident fields in prep for self-scoring job
Bryan Newbold
2019-08-10
2
-3
/
+24
*
|
scalding dump-grobid-status-code job
Bryan Newbold
2019-04-12
1
-0
/
+34
*
|
fix typos in DumpGrobidXmlJob
Bryan Newbold
2018-10-30
1
-2
/
+2
*
|
quick and dirty GROBID XML dumper
Bryan Newbold
2018-10-30
1
-0
/
+41
*
|
new DumpGrobidMetaInsertableJob
Bryan Newbold
2018-09-22
1
-0
/
+38
*
|
new simple file metadata dump script
Bryan Newbold
2018-09-13
1
-0
/
+36
*
|
hack scorejob variant with extra context joined in
Bryan Newbold
2018-09-12
1
-0
/
+86
*
|
blacklist -> denylist
Bryan Newbold
2018-09-05
2
-3
/
+3
*
|
restored code I inadvertantly removed when merging
Ellen Spertus
2018-09-04
1
-1
/
+4
*
|
fixed scalastyle issues, including cyclomatic complexity
Ellen Spertus
2018-09-04
2
-48
/
+62
*
|
replaced NoSlug with proper use of Option
Ellen Spertus
2018-09-04
7
-37
/
+41
*
|
make similarity score case-insensitive
Bryan Newbold
2018-08-27
1
-1
/
+1
*
|
basic crossref subtitle concatination support
Bryan Newbold
2018-08-27
1
-1
/
+22
*
|
more special characters to strip
Bryan Newbold
2018-08-27
1
-1
/
+1
*
|
add even more entries to slug blacklist
Bryan Newbold
2018-08-27
1
-0
/
+96
*
|
rename DumpUnGrobidedJob
Bryan Newbold
2018-08-24
1
-5
/
+5
*
|
scalding: UnGrobidedDumpJob
Bryan Newbold
2018-08-24
1
-0
/
+67
*
|
clean up commented out code in scalding/
Bryan Newbold
2018-08-24
4
-23
/
+2
*
|
Merge branch 'bnewbold-missing-column'
Bryan Newbold
2018-08-24
1
-0
/
+67
|
\
\
|
*
|
rewrite MissingColumnDumpJob as a join (sigh)
Bryan Newbold
2018-08-21
1
-29
/
+37
|
*
|
WIP: MissingColumnDumpJob
Bryan Newbold
2018-08-21
1
-0
/
+59
*
|
|
Merge branch 'bnewbold-match-quality'
Bryan Newbold
2018-08-24
2
-0
/
+79
|
\
\
\
|
*
|
|
BibjsonScorable: fix ScorableFeatures (after rebase)
Bryan Newbold
2018-08-21
1
-1
/
+1
|
*
|
|
local bibjson-to-bibjson matching job (no tests)
Bryan Newbold
2018-08-21
1
-0
/
+29
|
*
|
|
bibjson scorable class (no tests)
Bryan Newbold
2018-08-21
1
-0
/
+50
*
|
|
|
add counters to ScoreJob
Bryan Newbold
2018-08-24
1
-13
/
+47
*
|
|
|
remove duplicate imports
Bryan Newbold
2018-08-24
1
-2
/
+0
*
|
|
|
add a content-type filter for crossref works
Bryan Newbold
2018-08-23
1
-1
/
+17
*
|
|
|
require crossref works to have at least one author (for matching)
Bryan Newbold
2018-08-23
1
-1
/
+1
*
|
|
|
author parsing (and year, for crossref)
Bryan Newbold
2018-08-23
3
-6
/
+51
*
|
|
|
set a minimum slug size (8 chars)
Bryan Newbold
2018-08-23
1
-1
/
+5
*
|
|
|
clean up indendation in ScoreJob.scala
Bryan Newbold
2018-08-23
1
-5
/
+5
*
|
|
|
clean up commented-out code in ScoreJob.scala
Bryan Newbold
2018-08-23
1
-30
/
+0
*
|
|
|
increase MaxTitleLength from 255 to 1023
Bryan Newbold
2018-08-23
1
-1
/
+1
*
|
|
|
additions to slug blacklist
Bryan Newbold
2018-08-23
1
-383
/
+414
[next]