This page lists a few pieces of software I've written, or am writing. Most of them were written
to scratch an itch I had. I’m releasing them here as open source in the hopes of
it being useful to someone.
Just for clarity: I meant that you shouldn't need to use ANY entities: neither named entities nor numeric entities. Just use the proper UTF-8 character encoding.
The only exception is &, < and >, and in attribute text ' and ".
First of all, you shouldn't need to use named entities anymore nowadays, apart from &, <, > (and ' and " inside attributes). It's preferable to save your HTML document in UTF-8, and specify that as charset; either in the Content-Type header, or by specifying <meta charset="utf-8" /> at the top of the <head> section of your page. (UTF-8 supports all Unicode characters, therefore there is no need to escape them).
However, if you really want named entities, then read on.
I can think of two causes for the plugin not converting named entities:
1. Your document could be set to be XML. XML does not support named entities (apart from those mentioned above), so the plugin doesn't use them. To get named entities, use Notepad++'s Language menu to choose HTML.
2. The plugin can't find the file containing the entity definitions, HTMLTag-entities.ini.
The plugin expects that file in its "Config" folder. You can check where that is located using the plugin's About window, via the Plugins menu, HTML Tag, About...
Copy the file HTML-entities.ini, and paste it into the Config folder mentioned there.
The file can be found at the following location:
Press the [Download] button on that page to download the file.
I'm pretty much an amateur here, so I beg your patience if this is an easy fix (hopefully it is).
Is there a way to set HTML Tag so it converts all special characters to named entities only instead of the numeric code? (ex. For the copyright symbol, converting to “©” instead of “©”). I am using Notepad++ and HTML Tag to format Word documents for eBook formats (epub, etc), if that is of any relevance.
As late as 2016, this plugin was converting all the characters to named entities without a hitch (and through no settings adjustments by me, as far as I’m aware…I wouldn’t even know the first thing to adjust). But after a two-year absence and installing Notepad and HTML Tag on a cleaned hard drive, the entity conversions are now showing up as mostly numbered entities (it seems like “&” shows up as “&”, which is the only named entity I’ve noticed thus far.)
Upon noticing this change, I’ve fiddled around with a few settings to see if I could correct this on my own, but with no success as of yet, so I’m more than happy to uninstall everything and then reinstall Notepad and HTML along with any adjustments/suggestions you may have if need be.
Thanks for your time (and for creating this in the first place, if you are the one who did!) and I hope to hear from you soon. Though I of course would like a remedy for my problem so I can continue my projects, I’m also now genuinely curious to know how this puzzle is solved. Any additional info you need from me, I’ll be happy to supply.
If your main concern is converting pages from ISO-8859-1 to UTF-8, I think you’d better open the page in Notepad++, and actually convert the page’s encoding to UTF-8 (using the "Convert / Convert to UTF-8" menu item), instead of escaping all the entities. Nowadays, the only entities that need to be escaped in text are <, >, and & (and in attributes: " and ').
I also have a command-line tool ‘ConvertCharset’, that can be used for conversion from any encoding to any other encoding:
I have used your plugin repeatedly during some five hours to convert a website consisting of more than one hundred pages from ISO-8859-1 to UTF-8. My technique was to copy the editable part of each page, paste it into Notepad++, select all, press Ctrl+Shift+E to decode it, copy and paste the code back into Dreamweaver, and check my page. Sadly I could not automate the entire process and had to work by brute force.
One very important thing I had to do prior to doing this was to modify file "HTMLTag-entities.ini" to exclude the following codes by adding a semicolon at the start of the lines: " & < and >. This is essential because all my unbreakable spaces, to mention just this case, would no longer be visible in the code, which would lead to much time loss constantly checking if they are actually present. I am glad that I began by making some tests before processing the entire site.
I am wondering if you could implement an option to disable these specific codes for those situations when one cannot convert all the way without consequences. I thank you in advance and congratulate you for this remarkable utility.