Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce FR_CA_CI_AI collation and change FR_FR and FR_FR_CI_AI to be identical to FR_CA and FR_CA_CI_AI respectively. [CORE3638] #3989

Closed
firebird-automations opened this issue Oct 19, 2011 · 13 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: @pmakowski

The FR_FR collation appear to be completely weird.

0) Reason for FR_FR existence seems to be to remove accents, as otherwise would be identical to FR_CA.

1) But some (most tilde/diaresis) letters are not removed when uppering.

2) Already upper-case letters don't lose the accents when uppered. I suppose this is inconsistent with 1.

3) Lower-case is identical to FR_CA. Also seems inconsistent with 1.

it concerne all FR_FR collations : DB_FRA437,DB_FRA850, FR_FR, NEXT_FRA

IMHO FR_FR should be the same than FR_CA

perhaps we could have a FR_AI instead ?

Target is 3.0
no backport to not break existing code and index that rely on FR_FR

Commits: e44d7fd

@firebird-automations
Copy link
Collaborator Author

Modified by: @pmakowski

assignee: Adriano dos Santos Fernandes [ asfernandes ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Sean Leyne (seanleyne)

My reading of the handling of European French (my early education was French-Canadian) characters suggests that there is no historical consensus on the items you listed.

It appears that the FR_FR collation meets l'Academie Francaise (the gods of the French language) current requirements (http://www.academie%http://2Dfrancaise.fr/langue/questions.html%23accentuation) .

FR_FR and FR_CA should be maintained as separate and distinct.

@firebird-automations
Copy link
Collaborator Author

Commented by: @pmakowski

Sean, your reading of Académie française is wrong, the Académie française says it clearly FR_CA is the good one, according to the Académie française.

Of course, we could have a FR_AI, (even if it's name is FR_FR), but at least it have to be consistent, and today it is not the case

quoted from the Académie française (sorry for bad translation) :
"en français, l'accent a pleine valeur orthographique"
In French, accents have an orthographic meaning

"On veille donc, en bonne typographie, à utiliser systématiquement les capitales accentuées"
We take care then, in good typography, to consistently use capital letters with accent

@firebird-automations
Copy link
Collaborator Author

Commented by: Sean Leyne (seanleyne)

Philippe,

I guess after 40 years my french reading skills have severely attrophied.

What would the "AI" in FR_AI refer to?

Separately, in doing searches on this subject, I did find a mention on a Microsoft site while talking about uppercasing and how different rules can apply to the same base language based on regional requirements, which said "...in the case of European French accents are remove where strings are uppercased, whereas in the case of French-Canadian they are kept..." (I can't find the link now). So, it seems that when the original FR_FR collation was developed it followed that 'mindset'.

@firebird-automations
Copy link
Collaborator Author

Commented by: @pmakowski

sorry AI was for Accent Insensitive

About the "mindset", yes but it was in time where Microsoft did not provide good keyboard mapping I guess
today on my French keyboard I have to problem to type ÉÀ etc ... (but maybe it is still hard under Windows, I don't know)

to come back to the problem
the actual FR_FR (but also NXT_FRA, db_fra850 and db_frc850) have some kind of logic, when uppering it remove accent but only for letter that exist in French
1/ it is wrong according to French rules
2/ it lead to problem :
upper(sales)=upper(salés) but what about lower(SALES) ?
it should be Undefined ...
and in French sale and salés means dirty and salted , see the problem ?
so really all should behave the correct way, like FR_CA, this silly FR_FR have no sense, it was here only to help (badly) to do case insensitive, accent insensitive search, but today, we have other way to do it right with CI_AI collations
upper(lower('À')) should be equal to 'À'
and lower(upper('à')) should be equal to 'à'

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

But then it seems the problems of these collations is that they exist. :)

Then I wonder if remove then or make an alias to FR_CA would be better or worse than just let them as is.

As you said, they "have some kind of logic", so it's not bug, and since there is already better alternatives...

@firebird-automations
Copy link
Collaborator Author

Commented by: @pmakowski

I pretty in favor of remove them or make an alias to FR_CA

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

Philippe, should FR_FR_CI_AI also be changed to be based on FR_CA?

Should also be a FR_CA_CI_AI equal to FR_FR_CI_AI, or because it never exist we don't need this alias?

@firebird-automations
Copy link
Collaborator Author

Commented by: @pmakowski

Yes FR_FR_CI_AI based on FR_CA
about FR_CA_CI_AI , it can exist

@firebird-automations
Copy link
Collaborator Author

Modified by: @asfernandes

summary: The FR_FR collation appear to be completely weird. => Introduce FR_CA_CI_AI collation and change FR_FR and FR_FR_CI_AI to be identical to FR_CA and FR_CA_CI_AI respectively.

@firebird-automations
Copy link
Collaborator Author

Modified by: @asfernandes

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 3.0 Alpha 1 [ 10331 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

status: Resolved [ 5 ] => Closed [ 6 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

QA Status: No test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment