index
:
sandcrawler
bnewbold-args
bnewbold-backfill
bnewbold-persist-grobid-errors
bnewbold-refactor-loggging
master
trawler
[no description]
about
summary
refs
log
tree
commit
diff
stats
log msg
author
committer
range
Commit message (
Expand
)
Author
Age
Files
Lines
*
increase MaxTitleLength from 255 to 1023
Bryan Newbold
2018-08-23
1
-1
/
+1
*
additions to slug blacklist
Bryan Newbold
2018-08-23
1
-383
/
+414
*
Merge branch 'bnewbold-extraction-tweaks'
Bryan Newbold
2018-08-23
1
-4
/
+7
|
\
|
*
extraction: do want content, not text
Bryan Newbold
2018-08-21
1
-1
/
+1
|
*
extraction: status reporting tweaks
Bryan Newbold
2018-08-21
1
-5
/
+8
*
|
Merge branch 'ellen-length-filtering' into 'master'
bnewbold
2018-08-23
6
-24
/
+138
|
\
\
|
*
|
Fixed style violations.
Ellen Spertus
2018-08-22
2
-4
/
+5
|
*
|
Added ScoreJob test for title-length filtering.
Ellen Spertus
2018-08-22
1
-5
/
+13
|
*
|
Merge branch 'master' into ellen-length-filtering
Ellen Spertus
2018-08-22
3
-2
/
+6
|
|
\
\
|
|
/
/
|
/
|
|
*
|
|
Merge branch 'copyright' into 'master'
bnewbold
2018-08-22
1
-0
/
+4
|
\
\
\
|
*
|
|
Created CONTRIBUTORS.
Ellen Spertus
2018-08-22
1
-0
/
+4
*
|
|
|
add more punctuation characters to slug filter
Bryan Newbold
2018-08-22
2
-2
/
+2
|
|
*
|
Added title-length filtering to CrossrefScorable.
Ellen Spertus
2018-08-22
2
-15
/
+71
|
|
*
|
Added more tests of GrobidScorable.keepRecord
Ellen Spertus
2018-08-22
1
-0
/
+5
|
|
*
|
Added title length filtering to GrobidScorable
Ellen Spertus
2018-08-22
3
-2
/
+46
|
|
/
/
|
/
|
|
*
|
|
expand slug-blacklist with results from prod GROBID/crossref match
Bryan Newbold
2018-08-21
1
-0
/
+393
*
|
|
remove slug-blacklist conservative test
Bryan Newbold
2018-08-21
1
-16
/
+0
*
|
|
Merge branch 'bnewbold-match-scale'
Bryan Newbold
2018-08-21
3
-5
/
+15
|
\
\
\
|
*
|
|
explicit spill and compression settings for ScoreJob
Bryan Newbold
2018-08-20
1
-0
/
+5
|
*
|
|
add a trap to ScoreJob
Bryan Newbold
2018-08-20
2
-5
/
+10
|
|
/
/
*
|
|
update spyglass patch/version (as an experiment)
Bryan Newbold
2018-08-21
1
-1
/
+1
*
|
|
scalastyle
Bryan Newbold
2018-08-21
1
-9
/
+8
*
|
|
HDFS doesn't like colons
Bryan Newbold
2018-08-21
1
-1
/
+1
*
|
|
fix bugs/typos in HBaseColCountJob and HBaseStatusCountJob
Bryan Newbold
2018-08-21
4
-18
/
+11
*
|
|
please support for status-code-count
Bryan Newbold
2018-08-21
1
-0
/
+24
*
|
|
make col counter generic
Bryan Newbold
2018-08-21
2
-6
/
+35
*
|
|
add dedicated job for counting GrobidMetadata column
Bryan Newbold
2018-08-21
1
-0
/
+36
*
|
|
use grobid0:metadata, not tei_json
Bryan Newbold
2018-08-21
2
-10
/
+10
*
|
|
distinction between status_code and status counting
Bryan Newbold
2018-08-21
4
-10
/
+111
*
|
|
fold all scorable code into sanity check; counters
Bryan Newbold
2018-08-21
1
-7
/
+52
*
|
|
please support for grobid-scorable-dump
Bryan Newbold
2018-08-21
1
-0
/
+24
*
|
|
add GrobidScorableDumpJob and basic test
Bryan Newbold
2018-08-21
2
-0
/
+142
*
|
|
Merge branch 'strings'
Bryan Newbold
2018-08-21
4
-9
/
+65
|
\
\
\
|
|
_
|
/
|
/
|
|
|
*
|
Reads blacklist from file.
Ellen Spertus
2018-08-20
4
-18
/
+77
|
|
/
*
|
Merge branch 'little-things' into 'master'
bnewbold
2018-08-20
6
-32
/
+37
|
\
\
|
|
/
|
/
|
|
*
Removed debugging code, fixed style warnings.
Ellen Spertus
2018-08-20
1
-11
/
+6
|
*
Created static factory method for ScorableCreations to deal with null.
Ellen Spertus
2018-08-20
4
-18
/
+26
|
*
Disabled scalastyle null checking where we want to test null values.
Ellen Spertus
2018-08-20
1
-0
/
+2
|
*
Reduced boilerplate code.
Ellen Spertus
2018-08-20
1
-11
/
+11
|
/
*
Merge branch 'bnewbold-scoring-patches' into 'master'
bnewbold
2018-08-16
19
-13
/
+1056
|
\
|
*
change slugification behavior to not split on colon
Bryan Newbold
2018-08-15
3
-25
/
+25
|
*
do strip periods ('.')
Bryan Newbold
2018-08-15
1
-1
/
+1
|
*
add a stub title blacklist
Bryan Newbold
2018-08-15
2
-1
/
+18
|
*
handle null status_code lines
Bryan Newbold
2018-08-15
2
-3
/
+8
|
*
unrelated TODO about testing with null HBase values
Bryan Newbold
2018-08-15
1
-0
/
+1
|
*
comment about possible slugification process
Bryan Newbold
2018-08-15
1
-0
/
+9
|
*
scorable: test for more punctuation removal
Bryan Newbold
2018-08-15
1
-0
/
+8
|
*
crossref: test for empty-string title
Bryan Newbold
2018-08-15
1
-0
/
+6
|
*
scorable: test for null strings
Bryan Newbold
2018-08-15
3
-1
/
+10
|
*
grobid scoring: status_code as signed int, not string
Bryan Newbold
2018-08-15
2
-4
/
+10
[next]