View Ticket
Not logged in
Ticket UUID: 623eb98079e18fa52e7134efce2f81d6c24d4430
Title: Converting to Named Entities (Newbie Question)
Status: Closed Type: Incident
Severity: Important Priority: Medium
Subsystem: Entity replacement Resolution: Workaround
Last Modified: 2019-05-23 19:21:46
Version Found In: v1.0.0
User Comments:
anonymous added on 2019-05-23 06:20:31:
Hi,

I'm pretty much an amateur here, so I beg your patience if this is an easy fix (hopefully it is).

Is there a way to set HTML Tag so it converts all special characters to named entities only instead of the numeric code? (ex. For the copyright symbol, converting to “©” instead of “©”). I am using Notepad++ and HTML Tag to format Word documents for eBook formats (epub, etc), if that is of any relevance. 

As late as 2016, this plugin was converting all the characters to named entities without a hitch (and through no settings adjustments by me, as far as I’m aware…I wouldn’t even know the first thing to adjust). But after a two-year absence and installing Notepad and HTML Tag on a cleaned hard drive, the entity conversions are now showing up as mostly numbered entities (it seems like “&” shows up as “&”, which is the only named entity I’ve noticed thus far.) 

Upon noticing this change, I’ve fiddled around with a few settings to see if I could correct this on my own, but with no success as of yet, so I’m more than happy to uninstall everything and then reinstall Notepad and HTML along with any adjustments/suggestions you may have if need be. 

Thanks for your time (and for creating this in the first place, if you are the one who did!) and I hope to hear from you soon. Though I of course would like a remedy for my problem so I can continue my projects, I’m also now genuinely curious to know how this puzzle is solved. Any additional info you need from me, I’ll be happy to supply.

tinus added on 2019-05-23 12:44:44:
Hi,

First of all, you shouldn't need to use named entities anymore nowadays, apart from &amp;, &lt;, &gt; (and &apos; and &quot; inside attributes). It's preferable to save your HTML document in UTF-8, and specify that as charset; either in the Content-Type header, or by specifying <meta charset="utf-8" /> at the top of the <head> section of your page. (UTF-8 supports all Unicode characters, therefore there is no need to escape them).

However, if you really want named entities, then read on.

I can think of two causes for the plugin not converting named entities:

1. Your document could be set to be XML. XML does not support named entities (apart from those mentioned above), so the plugin doesn't use them. To get named entities, use Notepad++'s Language menu to choose HTML. 

2. The plugin can't find the file containing the entity definitions, HTMLTag-entities.ini.

The plugin expects that file in its "Config" folder. You can check where that is located using the plugin's About window, via the Plugins menu, HTML Tag, About...

Copy the file HTML-entities.ini, and paste it into the Config folder mentioned there.

The file can be found at the following location:
https://fossil.2of4.net/npp_htmltag/artifact?filename=dat/HTMLTag-entities.ini&ci=trunk
Press the [Download] button on that page to download the file.

tinus added on 2019-05-23 19:21:46:
Just for clarity: I meant that you shouldn't need to use ANY entities: neither named entities nor numeric entities. Just use the proper UTF-8 character encoding.

The only exception is &, < and >, and in attribute text ' and ".