AsmBB

Power
Login Register

EncodingTable koi8-r.tbl
0

#16072 (ツ) ganuonglachanh
Created 22.03.2020, read: 2149 times

Hi johnfound

The default Utf8ToAnsi function use EncodingTable koi8-r.tbl, how can I make another EncodingTable to replace some other chars like ế => e (many more)

Thank you!

#16076 (ツ) johnfound
Created 22.03.2020, read: 2148 times
ganuonglachanh

Hi johnfound

The default Utf8ToAnsi function use EncodingTable koi8-r.tbl, how can I make another EncodingTable to replace some other chars like ế => e (many more)

Thank you!

Well, the only implemented code tables for now are WIN1251, CP866, KOI8R and KOI8U;

But if you are asking about the slug/tag generation, you actually don't need this. I am using Utf8ToAnsi procedure here, because in the Russian KOI8 table the Cyrillic letters have the same codes as the UTF8 Latin letters with similar sound.

After the conversion, the string remains valid UTF8 encoded, but all the Cyrillic letters are replaced with the respective Latin letters that can be read the proper way in Russian, Bulgarian, Serbian, etc.

In other words, the use of Utf8ToAnsi is simply a hack. In order to fix the special Latin characters you will need different code at all.

#16080 (ツ) ganuonglachanh
Created 22.03.2020, read: 2139 times
johnfound
ganuonglachanh

Hi johnfound

The default Utf8ToAnsi function use EncodingTable koi8-r.tbl, how can I make another EncodingTable to replace some other chars like ế => e (many more)

Thank you!

Well, the only implemented code tables for now are WIN1251, CP866, KOI8R and KOI8U;

But if you are asking about the slug/tag generation, you actually don't need this. I am using Utf8ToAnsi procedure here, because in the Russian KOI8 table the Cyrillic letters have the same codes as the UTF8 Latin letters with similar sound.

After the conversion, the string remains valid UTF8 encoded, but all the Cyrillic letters are replaced with the respective Latin letters that can be read the proper way in Russian, Bulgarian, Serbian, etc.

In other words, the use of Utf8ToAnsi is simply a hack. In order to fix the special Latin characters you will need different code at all.

Yes I asking about slug/tag generation, because I used to use this js function to handle slugify url in VietNamese:

    slug = slug.replace(/á|à|ả|ạ|ã|ă|ắ|ằ|ẳ|ẵ|ặ|â|ấ|ầ|ẩ|ẫ|ậ/gi, 'a');
    slug = slug.replace(/é|è|ẻ|ẽ|ẹ|ê|ế|ề|ể|ễ|ệ/gi, 'e');
    slug = slug.replace(/i|í|ì|ỉ|ĩ|ị/gi, 'i');
    slug = slug.replace(/ó|ò|ỏ|õ|ọ|ô|ố|ồ|ổ|ỗ|ộ|ơ|ớ|ờ|ở|ỡ|ợ/gi, 'o');
    slug = slug.replace(/ú|ù|ủ|ũ|ụ|ư|ứ|ừ|ử|ữ|ự/gi, 'u');
    slug = slug.replace(/ý|ỳ|ỷ|ỹ|ỵ/gi, 'y');
    slug = slug.replace(/đ/gi, 'd');

My knowledge about UTF-8 encode is limited, still can't find a solution :-(

#16081 (ツ) johnfound
Created 22.03.2020, read: 2136 times
ganuonglachanh

Yes I asking about slug/tag generation, because I used to use this js function to handle slugify url in VietNamese:

    slug = slug.replace(/á|à|ả|ạ|ã|ă|ắ|ằ|ẳ|ẵ|ặ|â|ấ|ầ|ẩ|ẫ|ậ/gi, 'a');
    slug = slug.replace(/é|è|ẻ|ẽ|ẹ|ê|ế|ề|ể|ễ|ệ/gi, 'e');
    slug = slug.replace(/i|í|ì|ỉ|ĩ|ị/gi, 'i');
    slug = slug.replace(/ó|ò|ỏ|õ|ọ|ô|ố|ồ|ổ|ỗ|ộ|ơ|ớ|ờ|ở|ỡ|ợ/gi, 'o');
    slug = slug.replace(/ú|ù|ủ|ũ|ụ|ư|ứ|ừ|ử|ữ|ự/gi, 'u');
    slug = slug.replace(/ý|ỳ|ỷ|ỹ|ỵ/gi, 'y');
    slug = slug.replace(/đ/gi, 'd');

My knowledge about UTF-8 encode is limited, still can't find a solution :-(

I will see what I can do about it. In my opinion, we need some general solution able to process such symbols in all languages the same way...

EncodingTable koi8-r.tbl
0

AsmBB v3.0 (check-in: a316dab8b98d07d9); SQLite v3.42.0 (check-in: 831d0fb2836b71c9);
©2016..2023 John Found; Licensed under EUPL. Powered by Assembly language Created with Fresh IDE