index
:
sandcrawler
bnewbold-args
bnewbold-backfill
bnewbold-persist-grobid-errors
bnewbold-refactor-loggging
master
trawler
[no description]
about
summary
refs
log
tree
commit
diff
stats
log msg
author
committer
range
path:
root
/
scalding
/
src
/
main
Commit message (
Expand
)
Author
Age
Files
Lines
*
WIP: MissingColumnDumpJob
Bryan Newbold
2018-08-21
1
-0
/
+59
*
Merge branch 'bnewbold-match-scale'
Bryan Newbold
2018-08-21
1
-5
/
+8
|
\
|
*
add a trap to ScoreJob
Bryan Newbold
2018-08-20
1
-5
/
+8
*
|
scalastyle
Bryan Newbold
2018-08-21
1
-9
/
+8
*
|
fix bugs/typos in HBaseColCountJob and HBaseStatusCountJob
Bryan Newbold
2018-08-21
2
-4
/
+4
*
|
make col counter generic
Bryan Newbold
2018-08-21
1
-6
/
+7
*
|
add dedicated job for counting GrobidMetadata column
Bryan Newbold
2018-08-21
1
-0
/
+36
*
|
use grobid0:metadata, not tei_json
Bryan Newbold
2018-08-21
2
-10
/
+10
*
|
distinction between status_code and status counting
Bryan Newbold
2018-08-21
2
-4
/
+36
*
|
fold all scorable code into sanity check; counters
Bryan Newbold
2018-08-21
1
-7
/
+52
*
|
add GrobidScorableDumpJob and basic test
Bryan Newbold
2018-08-21
1
-0
/
+18
*
|
Merge branch 'strings'
Bryan Newbold
2018-08-21
2
-9
/
+43
|
\
\
|
*
|
Reads blacklist from file.
Ellen Spertus
2018-08-20
2
-18
/
+55
|
|
/
*
|
Removed debugging code, fixed style warnings.
Ellen Spertus
2018-08-20
1
-11
/
+6
*
|
Created static factory method for ScorableCreations to deal with null.
Ellen Spertus
2018-08-20
3
-15
/
+23
|
/
*
Merge branch 'bnewbold-scoring-patches' into 'master'
bnewbold
2018-08-16
6
-0
/
+359
|
\
|
*
change slugification behavior to not split on colon
Bryan Newbold
2018-08-15
1
-2
/
+2
|
*
do strip periods ('.')
Bryan Newbold
2018-08-15
1
-1
/
+1
|
*
add a stub title blacklist
Bryan Newbold
2018-08-15
1
-1
/
+12
|
*
handle null status_code lines
Bryan Newbold
2018-08-15
1
-0
/
+1
|
*
comment about possible slugification process
Bryan Newbold
2018-08-15
1
-0
/
+9
|
*
scorable: test for null strings
Bryan Newbold
2018-08-15
2
-1
/
+5
|
*
grobid scoring: status_code as signed int, not string
Bryan Newbold
2018-08-15
1
-2
/
+7
|
*
Now ignores grobid entries with status other than 200.
Ellen Spertus
2018-08-14
1
-3
/
+7
|
*
Factored out ScorableFeatures.
Ellen Spertus
2018-08-13
4
-40
/
+33
|
*
Pipeline works, all tests pass, no scalastyle errors.
Ellen Spertus
2018-08-13
5
-290
/
+12
|
*
Snapshot before changing Scorable to find bug.
Ellen Spertus
2018-08-12
2
-19
/
+23
|
*
Added back file I shouldn't have deleted.
Ellen Spertus
2018-08-12
1
-22
/
+0
|
*
Tests pass.
Ellen Spertus
2018-08-12
1
-5
/
+6
|
*
It compiles.
Ellen Spertus
2018-08-11
3
-42
/
+73
|
*
Tests pass. Still have changes to do but made huge progress.
Ellen Spertus
2018-08-10
2
-55
/
+27
|
*
It compiles
Ellen Spertus
2018-08-10
4
-31
/
+43
|
*
Broken code to share with Bryan.
Ellen Spertus
2018-08-09
5
-8
/
+90
|
*
WIP
Ellen Spertus
2018-08-09
4
-5
/
+226
|
*
WIP
Ellen Spertus
2018-08-09
4
-13
/
+21
|
*
Removed implicit parameters. Does not compile.
Ellen Spertus
2018-08-09
4
-10
/
+9
|
*
WIP
Ellen Spertus
2018-08-09
4
-18
/
+46
|
*
Fixed scalastyle violations.
Ellen Spertus
2018-08-09
4
-18
/
+14
|
*
Added test for null argument to titleToSlug()
Ellen Spertus
2018-08-09
1
-4
/
+9
|
*
Removed HBaseCrossrefScore{Job,Test} and references thereto.
Ellen Spertus
2018-08-07
2
-219
/
+5
|
*
Added punctuation removal to slug creation and similarity comparisons
Ellen Spertus
2018-08-07
2
-2
/
+9
|
*
Added GrobidScorableTest, minor improvements.
Ellen Spertus
2018-08-07
2
-10
/
+33
|
*
Minor refactoring. Added test.
Ellen Spertus
2018-08-07
2
-10
/
+9
|
*
Removed commented-out code.
Ellen Spertus
2018-08-07
1
-29
/
+0
|
*
Minor cleanup. Passes scalastyle.
Ellen Spertus
2018-08-07
1
-3
/
+0
|
*
Added CrossrefScorable.scala. All code compiles.
Ellen Spertus
2018-08-07
3
-10
/
+34
|
*
New code compiles. Old tests pass. New tests not yet written.
Ellen Spertus
2018-08-06
5
-9
/
+65
|
*
Partly refactored HBaseCrossrefScoreJob. Everything compiles.
Ellen Spertus
2018-08-06
3
-0
/
+194
|
*
Changed scoring, including adding code to compute string differences. Turned ...
Ellen Spertus
2018-07-30
1
-21
/
+36
|
*
Added accent removal to titleToSlug().
Ellen Spertus
2018-07-28
1
-1
/
+27
[next]