index
:
sandcrawler
bnewbold-args
bnewbold-backfill
bnewbold-persist-grobid-errors
bnewbold-refactor-loggging
master
trawler
[no description]
about
summary
refs
log
tree
commit
diff
stats
log msg
author
committer
range
path:
root
/
scalding
/
src
/
main
Commit message (
Expand
)
Author
Age
Files
Lines
*
make similarity score case-insensitive
Bryan Newbold
2018-08-27
1
-1
/
+1
*
basic crossref subtitle concatination support
Bryan Newbold
2018-08-27
1
-1
/
+22
*
more special characters to strip
Bryan Newbold
2018-08-27
1
-1
/
+1
*
add even more entries to slug blacklist
Bryan Newbold
2018-08-27
1
-0
/
+96
*
rename DumpUnGrobidedJob
Bryan Newbold
2018-08-24
1
-5
/
+5
*
scalding: UnGrobidedDumpJob
Bryan Newbold
2018-08-24
1
-0
/
+67
*
clean up commented out code in scalding/
Bryan Newbold
2018-08-24
4
-23
/
+2
*
Merge branch 'bnewbold-missing-column'
Bryan Newbold
2018-08-24
1
-0
/
+67
|
\
|
*
rewrite MissingColumnDumpJob as a join (sigh)
Bryan Newbold
2018-08-21
1
-29
/
+37
|
*
WIP: MissingColumnDumpJob
Bryan Newbold
2018-08-21
1
-0
/
+59
*
|
Merge branch 'bnewbold-match-quality'
Bryan Newbold
2018-08-24
2
-0
/
+79
|
\
\
|
*
|
BibjsonScorable: fix ScorableFeatures (after rebase)
Bryan Newbold
2018-08-21
1
-1
/
+1
|
*
|
local bibjson-to-bibjson matching job (no tests)
Bryan Newbold
2018-08-21
1
-0
/
+29
|
*
|
bibjson scorable class (no tests)
Bryan Newbold
2018-08-21
1
-0
/
+50
*
|
|
add counters to ScoreJob
Bryan Newbold
2018-08-24
1
-13
/
+47
*
|
|
remove duplicate imports
Bryan Newbold
2018-08-24
1
-2
/
+0
*
|
|
add a content-type filter for crossref works
Bryan Newbold
2018-08-23
1
-1
/
+17
*
|
|
require crossref works to have at least one author (for matching)
Bryan Newbold
2018-08-23
1
-1
/
+1
*
|
|
author parsing (and year, for crossref)
Bryan Newbold
2018-08-23
3
-6
/
+51
*
|
|
set a minimum slug size (8 chars)
Bryan Newbold
2018-08-23
1
-1
/
+5
*
|
|
clean up indendation in ScoreJob.scala
Bryan Newbold
2018-08-23
1
-5
/
+5
*
|
|
clean up commented-out code in ScoreJob.scala
Bryan Newbold
2018-08-23
1
-30
/
+0
*
|
|
increase MaxTitleLength from 255 to 1023
Bryan Newbold
2018-08-23
1
-1
/
+1
*
|
|
additions to slug blacklist
Bryan Newbold
2018-08-23
1
-383
/
+414
*
|
|
Fixed style violations.
Ellen Spertus
2018-08-22
1
-2
/
+4
*
|
|
Merge branch 'master' into ellen-length-filtering
Ellen Spertus
2018-08-22
1
-1
/
+1
|
\
\
\
|
*
|
|
add more punctuation characters to slug filter
Bryan Newbold
2018-08-22
1
-1
/
+1
*
|
|
|
Added title-length filtering to CrossrefScorable.
Ellen Spertus
2018-08-22
1
-13
/
+37
*
|
|
|
Added title length filtering to GrobidScorable
Ellen Spertus
2018-08-22
2
-0
/
+17
|
/
/
/
*
|
/
expand slug-blacklist with results from prod GROBID/crossref match
Bryan Newbold
2018-08-21
1
-0
/
+393
|
|
/
|
/
|
*
|
Merge branch 'bnewbold-match-scale'
Bryan Newbold
2018-08-21
1
-5
/
+8
|
\
\
|
*
|
add a trap to ScoreJob
Bryan Newbold
2018-08-20
1
-5
/
+8
*
|
|
scalastyle
Bryan Newbold
2018-08-21
1
-9
/
+8
*
|
|
fix bugs/typos in HBaseColCountJob and HBaseStatusCountJob
Bryan Newbold
2018-08-21
2
-4
/
+4
*
|
|
make col counter generic
Bryan Newbold
2018-08-21
1
-6
/
+7
*
|
|
add dedicated job for counting GrobidMetadata column
Bryan Newbold
2018-08-21
1
-0
/
+36
*
|
|
use grobid0:metadata, not tei_json
Bryan Newbold
2018-08-21
2
-10
/
+10
*
|
|
distinction between status_code and status counting
Bryan Newbold
2018-08-21
2
-4
/
+36
*
|
|
fold all scorable code into sanity check; counters
Bryan Newbold
2018-08-21
1
-7
/
+52
*
|
|
add GrobidScorableDumpJob and basic test
Bryan Newbold
2018-08-21
1
-0
/
+18
|
|
/
|
/
|
*
|
Merge branch 'strings'
Bryan Newbold
2018-08-21
2
-9
/
+43
|
\
\
|
*
|
Reads blacklist from file.
Ellen Spertus
2018-08-20
2
-18
/
+55
|
|
/
*
|
Removed debugging code, fixed style warnings.
Ellen Spertus
2018-08-20
1
-11
/
+6
*
|
Created static factory method for ScorableCreations to deal with null.
Ellen Spertus
2018-08-20
3
-15
/
+23
|
/
*
Merge branch 'bnewbold-scoring-patches' into 'master'
bnewbold
2018-08-16
6
-0
/
+359
|
\
|
*
change slugification behavior to not split on colon
Bryan Newbold
2018-08-15
1
-2
/
+2
|
*
do strip periods ('.')
Bryan Newbold
2018-08-15
1
-1
/
+1
|
*
add a stub title blacklist
Bryan Newbold
2018-08-15
1
-1
/
+12
|
*
handle null status_code lines
Bryan Newbold
2018-08-15
1
-0
/
+1
|
*
comment about possible slugification process
Bryan Newbold
2018-08-15
1
-0
/
+9
[next]