[solved] Filtering model in a locale-aware way
-
Hi guys,
I'm using a QSortFilterProxyModel to, you guessed right, sort and filter a custom model in a QTableView. The items in the model have special characters, such as á, à, ä, ç, ñ, etc., so sorting and filtering must be local aware to work properly. When sorting, that is accomplished by setting the isSortLocalAware property to true. My problem is that I don't know how to solve this for filtering.
I'll show an example just to make my needs clear: If the user types "to" as the filtering string, I would like to get from the list the rows that contain words like "Toledo", "Töölö", "tónica" and "Fitò". With my current implementation, by typing "to", the user just gets "Toledo", since the rest of words have accents that ruin the search. On the other hand, sorting works like a charm for those words after I set to true the isSortLocalAware.
Has any of you any tips about how to do this? Help would be strongly appreciated, cause I'm stuck with this for hours!
Regards, Arturo
-
After unsuccessfully looking for some helpful function in the classes QSortFilterProxyModel, QString, QRegExp, etc. etc., I've decided to do this myself, which was what I should have done in the first place!
My way through is to modify the filter text entered by the user, so that any vowel, including those with accents, is replaced by an expression matching all the possible forms that vowel may take. Then, I create a regular expression from the resulting string, and I call setFilterRegExp() on the QSortFilterProxyModel. It'll be better if you look at the code instead of reading my "buggy" english:
@
#define _A_REGEXP "[aA\x00C0\x00C1\x00C4\x00E0\x00E1\x00E4]" //aAàáäÀÁÄ
#define _E_REGEXP "[eE\x00C8\x00C9\x00CB\x00E8\x00E9\x00EB]"
#define _I_REGEXP "[iI\x00CC\x00CD\x00CF\x00EC\x00ED\x00EF]"
#define _O_REGEXP "[oO\x00D2\x00D3\x00D6\x00F2\x00F3\x00F6]"
#define _U_REGEXP "[uU\x00D9\x00DA\x00DC\x00F9\x00FA\x00FC]"//[...]
void MyClass::_makeFilteringLocaleAware(QString &filter)
{
/* Maybe not the best solution from a performance point of view, but works interactively fast for me */
filter.replace(QRegExp(_A_REGEXP), _A_REGEXP);
filter.replace(QRegExp(_E_REGEXP), _E_REGEXP);
filter.replace(QRegExp(_I_REGEXP), _I_REGEXP);
filter.replace(QRegExp(_O_REGEXP), _O_REGEXP);
filter.replace(QRegExp(_U_REGEXP), _U_REGEXP);
}void MyClass::on_filterTextEdit_textChanged()
{
QString filter;//[...]
/* filterTextEdit is a QPlainTextEdit that the user edits to search the TableView rows he's interested in */
filter = filterTextEdit->toPlainText();//[...]
_makeFilteringLocaleAware(filter);
proxyModel->setFilterRegExp(QRegExp(filter, Qt::CaseInsensitive));
}@
Any comments are appreciated, and obviously a smarter solution is welcome too!
-
Oh, once again it is time for "fun with Unicode"!
This will not work for strings having the 'LATIN CAPITAL LETTER O WITH DIAERESIS' U+00D6 encoded as 'LATIN CAPITAL LETTER O' U+004F follow by 'COMBINING DIAERESIS' U+0308. Of course you can normalize those away (see QString::normalize(...)).
Maybe QString::localeAwareCompare(...) is what you want?
-
I would think not. It needs something like that, but I don't think it is easy to create something like a localeAwareContains() based on localeAwareCompare(). I dug into it a bit today by going through some seemingly relevant classes docs, and I could not come up with a reasonable approach either.
-
If it is just about getting "basic" characters, you could do the following:
Normalize the filter string to QString::NormalizationForm_KD, which should make sure you turn all the LATIN CAPITAL LETTER O WITH DIAERESIS into the LATIN CAPITAL LETTER O, COMBINING DIAERESIS form (if I am not mixing up my normalization forms again, check the standard yourself;-). Then you can iterate over the string and remove everything not a letter or number (maybe also adding spaces). That should rather reliably get rid of all the possible "special decorations" added to letters.
This might or might not work for non-european languages, my knowledge about those is too limited to judge.
-
[quote author="Tobias Hunger" date="1324394294"]
Maybe QString::localeAwareCompare(...) is what you want?[/quote]As Andre points out, it's not simple to user QString::localeAwareCompare(…) to mimic a regular expression based filter. Even if it was doable, the performance would be really bad.
[quote author="Tobias Hunger" date="1324461364"]
Normalize the filter string to QString::NormalizationForm_KD, which should make sure you turn all the LATIN CAPITAL LETTER O WITH DIAERESIS into the LATIN CAPITAL LETTER O, COMBINING DIAERESIS form (if I am not mixing up my normalization forms again, check the standard yourself;-). Then you can iterate over the string and remove everything not a letter or number (maybe also adding spaces). That should rather reliably get rid of all the possible "special decorations" added to letters.
[/quote]That would definitely work nicely for me. As you may have noticed, I'm not too familiar with Unicode, and didn't even know that you could represent a decorated letter either as a single character or as two different ones.
I guess I can mark this as solved, even if there might be some smarter solutions.
Thank you for your help, guys!
-
This would be an awesome feature if provided by Qt itself. It seems it is coming up again and again and again.