Velvet Star Monitor

Standout celebrity highlights with iconic style.

news

Is there one simple formula for Excel that can identify a string contain non-ASCII Characters?

Writer Matthew Harrington

I know that using the clean formula, I can clean up some of the non-ASCII characters (such as additional non-printable ASCII control characters #0 through to #31, #129, #141, #143, #144, and #157 except #127)

I also know that I can use the SUBSTITUTE(D1,CHAR(127),"") to remove non-printable ASCII #127.

However, I cannot replace or identify non-ASCII characters inside a cell in Excel.

The following is a simple example:

Burrell's Model
Burrell’s Model

Notice that the first line is a normal ASCII string while the second line contains a non-ASCII character (the apostrophe).

May I know how to use Excel to find non-ASCII characters?

Thank you.

Update 1

According to Bandrami comments as follows:

In the broadest sense this is impossible; there are valid ASCII strings that are also valid UTF characters, so there's no way to ever know "for certain" (see the "Bush hid the facts" bug: )

May I know how do I identify characters that is not in the following ASCII range?

  • Less Than 128
  • Not equal to 255
4

5 Answers

MS Office help seems to consider the apostrophe as a character that CLEAN should catch but it doesn't remove it on my machine.

MS Office help, Remove spaces and nonprinting characters from text, second to last paragraph

The 'simplest' I could find was a nested run of SUBSTITUTEs that clean anything that the CLEAN function doesn't catch.

From an old google excel group

=SUBSTITUTE(SUBSTITUTE(F17,CHAR(141),""),CHAR(143),"") etc etc

Well, here's an approach to the "true ASCII" portion, and one can make a slight adjustment for 255 vs. the 127. It is set for up to ten characters in the cell text being checked, but one can easily make that any length one chooses, though it takes more to run if checking for 20,000 characters in every cell vs. a smaller, likely seeming figure based upon your judgment. Also, it is checking cell A1 with the idea you'd fill a column to match, so if text to check is in A1:A20000, you'd enter the formula in, say, B1:B20000. (I have not looked at whether one could use it in a way that triggered spilling so that it filled the column for you rather than having to copy and paste down.)

=TEXTJOIN("",TRUE,IFERROR(CHAR(IF(UNICODE(MID(A1,ROW(1:10),1))>127,"",UNICODE(MID(A1,ROW(1:10),1)))),""))

Basically, it uses the UNICODE() function (not CODE() since there are characters, maybe, that are not in the <255 range) to read each character in the cell being checked. MID() is used to isolate a character at a time (length 1 for the last parameter) and the way we check EACH character is to use ROW(1:whatever number you choose) to give it more than just a single starting point.

Then we test the results vs. 127, generating "" for any character at all that is not in the true ASCII set. Then we recreate the characters (CHAR(), which we can safely use because there are NO results it cannot handle properly.) for our successes, getting errors for all those that were "". Those errors are handled by IFERROR() which turns them back to "". Finally, TEXTJOIN() joins them together to give the single result we want rather than SPILLING.

The result WILL be text but one can process it further if there are cells that are meant to be numerical by wrapping it in VALUE() to get a real value or an error, and then IFERROR() to give the original text result if `VALUE() failed. So things that could be numbers come out as numbers and the rest as text.

VBA could be used following this approach to remove anything non-ASCII (true or <=255, whichever) if one desired since if nothing else, VBA can always run any formula you can write in a cell. That would make it transparent (no helper column) and VBA can also write results right over the original which is a nice feature for imported (and therefore still available in its original form if something goes wrong) data. Not as nice a feature for entered data that is not available for reloading if something goes wrong...

It can also be readily adapted a couple ways for a list of chosen characters to test for rather than everything >127 or >255. Or one could adapt to leave particular characters in.

Dump it to a csv and run it through the unix command cat -v. Non-ASCII characters will be displayed in M- notation, so just grep for M-.

I came here searching for the same answer. By viewing the answers, I realize that there is not a simple solution. This is a subroutine that I call in my procedures. You can adapt it to your convenience, the only thing you need is define first (in your own code) the range where you going to apply substitutions. I hope this can be useful for someone.

Public Sub M2_reemplaza_acentos() 'replace accented characters Dim Orig As Variant, Sust As Variant, i As Integer Orig = Array("á", "é", "í", "ó", "ú", "ñ", "Á", "É", "Í", "Ó", "Ú", "Ñ", "km/h") Sust = Array("a", "e", "i", "o", "u", "n", "A", "E", "I", "O", "U", "N", "kph") Application.ScreenUpdating = False For i = LBound(Orig) To UBound(Orig) ActiveSheet.Cells.Replace _ What:=Orig(i), _ Replacement:=Sust(i), _ LookAt:=xlPart, _ SearchOrder:=xlByRows, _ MatchCase:=False Next i Application.ScreenUpdating = True End Sub

I'm not totally clear what you want, and although you clearly state worksheet function, I offer this VBa code as it's possibly going to be more customizable...

According to ASCII, the is has a decimal value of greater than 127 (actual value doesn't matter)... So, this code checks each character's value and if it's above 127, it flags it... This does mean you'll need to review the link to see which characters are 'OK' characters to use.

Sub Sheet2_Button1_Click() Dim rCell As Range Dim rRng As Range Set rRng = Range("A1:D8") For Each rCell In rRng.Cells Debug.Print rCell.Address & " --- " & rCell.Value Dim s As String s = rCell.Value For i = 1 To Len(s) Dim c2 As String Dim ascInt As Integer c2 = Mid(s, i, 1) ascInt = asc(Mid(s, i, 1)) If (ascInt > 127) Then MsgBox ("Cell " & rCell.Address & " has a " & c2) End If Next i Next rCell
End Sub

So, as you can see, I've used the 2 examples over a small table. I run the macro, and for each occurrence of the character, it shows up in a message box

enter image description here

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy