diff options
| author | Bryan Newbold <bnewbold@archive.org> | 2018-08-27 16:40:15 -0700 |
|---|---|---|
| committer | Bryan Newbold <bnewbold@archive.org> | 2018-08-27 16:40:15 -0700 |
| commit | b4eac17049e19d33b1a55664a7258c0f62f8a8c7 (patch) | |
| tree | e71de0eb969f53765e4f7b8f5053b5ece3b28781 /scalding/src/main/scala | |
| parent | 309f40b66d474f12c0cfe60c449d43ae4bacb912 (diff) | |
| download | sandcrawler-b4eac17049e19d33b1a55664a7258c0f62f8a8c7.tar.gz sandcrawler-b4eac17049e19d33b1a55664a7258c0f62f8a8c7.zip | |
make similarity score case-insensitive
Diffstat (limited to 'scalding/src/main/scala')
| -rw-r--r-- | scalding/src/main/scala/sandcrawler/Scorable.scala | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/scalding/src/main/scala/sandcrawler/Scorable.scala b/scalding/src/main/scala/sandcrawler/Scorable.scala index 5aac032..f7eb95d 100644 --- a/scalding/src/main/scala/sandcrawler/Scorable.scala +++ b/scalding/src/main/scala/sandcrawler/Scorable.scala @@ -72,7 +72,7 @@ object Scorable { getStringOption(json2, "title") match { case None => 0 case Some(title2) => - (StringUtilities.similarity(title1, title2) * MaxScore).toInt + (StringUtilities.similarity(title1.toLowerCase, title2.toLowerCase) * MaxScore).toInt } } } |
