Unicode does not distinguish silluq, metheg, and ga'ya as noted in a discussion here. And in an exercise I did yesterday to create SimHebrew from Hebrew, I saw that it fails to distinguish between dagesh, mappiq and shuruq. That makes identifying u (one of two vowels used in SimHebrew) is ambiguous. Is this important enough to ask whoever is in charge of Unicode to fix the errors?
top of page
bottom of page
I agree, it is frustrating that Unicode Hebrew conflates ("unifies") several semantically-distinct notions, just because in most (but not all!) publications these notions are represented by the same grapheme. As you point out, this makes it tricky to do semantics-aware manipulations of Unicode Hebrew text. Many of these distinctions can be made automatically, like I believe vav-dagesh vs. shuruq can be distinguished automatically. But it is burdensome. And even for strictly visual purposes, it causes publishers to have to go to great lengths if they do want to make such visual distinctions. What's SimHebrew, by the way?