Description
The sanitizer seems to have issues when its input is a string in ASCII-8BIT
encoding:
irb(main):006:0* Rails::Html::WhiteListSanitizer.new.sanitize("tooth".encode('ASCII-8BIT'))
output error : unknown encoding ASCII-8BIT
=> ""
irb(main):007:0>
While ASCII-8BIT
isn't the default encoding these days, it seems that strings coming from the mysql
adapter (but not the mysql2
adapter) are always in ASCII-8BIT
encoding, even when the table is using charset utf8
:
irb(main):004:0> Day.connection.charset
=> "utf8"
irb(main):005:0> Day.last.notes.encoding
=> #<Encoding:ASCII-8BIT>
This means that using the sanitizer on any string from the database when using the mysql
adapter will result in errors. I chased the error down to Nokogiri's NodeSet#to_s
method, but wasn't sure what the right approach was for addressing the issue.
Switching to the mysql2
adapter makes the issue go away, since it produces all strings in UTF-8. However, folks who've been using the mysql
gem (for legacy reasons or whatever) could run into headaches trying to upgrade to Rails 4.2 because of this (it hit me by way of the highlight
method in ActionView::Helpers::TextHelper
).