Hi! Our national alphabet (Latvian) consists of letters that are subset of Latin alphabet. In addition to them there are some Latin letters with diacritics that are considered as separate significant letters (used in sorting, etc.). I hope you see them correctly in this post — ā ē ī ū ŗ ļ ķ ņ ģ š ž.
By searching our historical EX archive through a SourceOne Search page I discovered that the results mostly are not consistent when searching for Latvian words that contain these diacritics — almost always some messages are not included. Strangely, the situation can be improved when the Latvian word is ORed with a word, that has the diacritics replaced with respective Latin letters. For example, the search for a phrase could look like: "meklēšana vēstuļu arhīvā" "meklesana vestulu arhiva". It's still not perfect, because EX/SO indexes letter 'Š' as a significant character and words with that character are sometimes returned but sometimes they don't when using the search above. I do not have the Native archive in place yet, so I can't test it if that behaves any better.
Is there anything I can try to improve the searches?
And what is the apropriate (system) locale to set in SourceOne server's regional settings in our case?
Archive and indexes starting from year 2006.
Indeed Š falls under the significant characters category whereas ā ē ī ū ŗ ļ ķ ņ ģ ž do not fall in the category. Characters that fall under the significant characters are indexed as standard alphanumeric characters. You should be able to search for those. Following is the list of such characters.
$ % & - 0 1 2 3 4 5 6 7 8 9 @ _ Š OE š oe Ÿ À Á Â Ã Ä Å
Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à
á â ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ø ù ú û
ü ý þ ÿ
(Some of the characters do not seem to appear fine in this post, please refer latest version of Search User Guide)
I think it is worthwhile to engage EMC support to see if there is anything that could be done.