What is this character: '*'?
Andrew Henderson
A friend pasted a command into a Slack chat room which contained the character *. This looks like a normal * but isn't:
$ uniprops '*'
uniprops: no character named ‹*›While if I run uniprops on the asterisk I get when typing on my machine, I get:
$ uniprops '*'
U+002A ‹*› \N{ASTERISK} \pP \p{Po} All Any ASCII Assigned Basic_Latin Punct Is_Punctuation Common Zyyy Po P Gr_Base Grapheme_Base Graph X_POSIX_Graph GrBase Other_Punctuation Pat_Syn Pattern_Syntax PatSyn POSIX_Graph POSIX_Print POSIX_Punct Print X_POSIX_Print Punctuation Unicode X_POSIX_PunctI can also see that it isn't an actual asterisk by passing it through od:
$ printf '*' | od -c
0000000 * 342 200 213
0000004While the normal one gives:
$ printf '*' | od -c
0000000 *
0000001Here's the mystery character a bit larger:
*
And the normal asterisk (yes, they do look identical):
*
So, uniprops doesn't know what this is, and I can't find it on either. I do know that the friend who pasted it is on OS X (I am on Linux) and that it works on their system as a regular asterisk. I am assuming that Slack somehow changed it. So, does anyone have any idea what that character is?
Note that you can't copy the weird character directly from the question. Apparently, the Stack Exchange engine strips the trailing non-printing characters. Click on the "edit" link and copy from there instead.
uniprops is a neat little script included in the Unicode::Tussle Perl module which identifies and prints information about the character you give it.
2 Answers
The paste failed not because of the asterisk, which is a perfectly regular asterisk, but because of the Unicode character U+200B. As the character is a ZERO WIDTH SPACE, it does not display when it is copied.
Using the Python code:
stro=u"'*'?"
def uniconv(text): return " ".join(hex(ord(char)) for char in text)
uniconv(stro)The function uniconv converts the input string (in this case, u"'*'?") into their Unicode codepage equivalents in hexadecimal format. The u prefix to the string identifies the string as a Unicode string.
I was able to obtain the output:
0x27 0x2a 0x200b 0x27 0x3fWe can clearly see that 0x27, 0x2a and 0x3f are the ASCII/Unicode hexadecimal values for the characters ',* and ? respectively. That leaves 0x200b, therefore identifying the character.
Note that the Python code, when pasted into the body, had the U+200B character removed by SE's Markdown software. In order to obtain the expected result, you need to copy it directly from the title using the Edit view.
3With the help of @Rinzwind in the Ask Ubuntu chat room, I figured out that the problem isn't the character at all. Note the output of od:
$ printf '*' | od -c
0000000 * 342 200 213
0000004The 342 200 213 is an octal representation of another character and we can use this site to look it up:
Character
Character name ZERO WIDTH SPACE
Hex code point 200B
Decimal code point 8203
Hex UTF-8 bytes E2 80 8B
Octal UTF-8 bytes 342 200 213
UTF-8 bytes as Latin-1 characters bytes â <80> <8B>So, what I actually had was two unicode characters, the normal * and a zero width space.