HTMLTag: View Ticket

2019-05-01
06:45		• Deferred ticket [6e6359698e]: Using plugin to convert HTML to UTF-8 (exceptions) plus 6 other changes artifact: 3a0b278714 user: tinus
01:19		• New ticket [6e6359698e]. artifact: b448e7dd3a user: anonymous

Ticket Hash:	6e6359698eb3931b62bede4b14bec6a54e9d2d01
Title:	Using plugin to convert HTML to UTF-8 (exceptions)
Status:	Deferred	Type:	Feature_Request
Severity:	Important	Priority:	Medium
Subsystem:	Entity replacement	Resolution:	Works_As_Designed
Last Modified:	2019-05-01 06:45:31 6.17 years ago	Created:	2019-05-01 01:19:58 6.17 years ago
Version Found In:	v1.0.0

User Comments:

anonymous added on 2019-05-01 01:19:58:

I have used your plugin repeatedly during some five hours to convert a website consisting of more than one hundred pages from ISO-8859-1 to UTF-8. My technique was to copy the editable part of each page, paste it into Notepad++, select all, press Ctrl+Shift+E to decode it, copy and paste the code back into Dreamweaver, and check my page. Sadly I could not automate the entire process and had to work by brute force.

One very important thing I had to do prior to doing this was to modify file "HTMLTag-entities.ini" to exclude the following codes by adding a semicolon at the start of the lines: &nbsp; &quot; &amp; &lt; and &gt. This is essential because all my unbreakable spaces, to mention just this case, would no longer be visible in the code, which would lead to much time loss constantly checking if they are actually present. I am glad that I began by making some tests before processing the entire site.

I am wondering if you could implement an option to disable these specific codes for those situations when one cannot convert all the way without consequences. I thank you in advance and congratulate you for this remarkable utility.

tinus added on 2019-05-01 06:45:31:

Salut,

If your main concern is converting pages from ISO-8859-1 to UTF-8, I think you’d better open the page in Notepad++, and actually convert the page’s encoding to UTF-8 (using the "Convert / Convert to UTF-8" menu item), instead of escaping all the entities.  Nowadays, the only entities that need to be escaped in text are <, >, and & (and in attributes: " and ').

I also have a command-line tool ‘ConvertCharset’, that can be used for conversion from any encoding to any other encoding:
[https://fossil.2of4.net/mc_tools/doc/trunk/README.md#ConvertCharset]