Velvet Star Monitor

Standout celebrity highlights with iconic style.

news

How can I strip punctuation from a string?

Writer Matthew Harrington

For the hope-to-have-an-answer-in-30-seconds part of this question, I'm specifically looking for C#

But in the general case, what's the best way to strip punctuation in any language?

I should add: Ideally, the solutions won't require you to enumerate all the possible punctuation marks.

Related: Strip Punctuation in Python

1

16 Answers

new string(myCharCollection.Where(c => !char.IsPunctuation(c)).ToArray());
6

Why not simply:

string s = "sxrdct?fvzguh,bij.";
var sb = new StringBuilder();
foreach (char c in s)
{ if (!char.IsPunctuation(c)) sb.Append(c);
}
s = sb.ToString();

The usage of RegEx is normally slower than simple char operations. And those LINQ operations look like overkill to me. And you can't use such code in .NET 2.0...

1

Describes intent, easiest to read (IMHO) and best performing:

 s = s.StripPunctuation();

to implement:

public static class StringExtension
{ public static string StripPunctuation(this string s) { var sb = new StringBuilder(); foreach (char c in s) { if (!char.IsPunctuation(c)) sb.Append(c); } return sb.ToString(); }
}

This is using Hades32's algorithm which was the best performing of the bunch posted.

1

Assuming "best" means "simplest" I suggest using something like this:

String stripped = input.replaceAll("\\p{Punct}+", "");

This example is for Java, but all sufficiently modern Regex engines should support this (or something similar).

Edit: the Unicode-Aware version would be this:

String stripped = input.replaceAll("\\p{P}+", "");

The first version only looks at punctuation characters contained in ASCII.

1

You can use the regex.replace method:

 replace(YourString, RegularExpressionWithPunctuationMarks, Empty String)

Since this returns a string, your method will look something like this:

 string s = Regex.Replace("Hello!?!?!?!", "[?!]", "");

You can replace "[?!]" with something more sophiticated if you want:

(\p{P})

This should find any punctuation.

1

This thread is so old, but I'd be remiss not to post a more elegant (IMO) solution.

string inputSansPunc = input.Where(c => !char.IsPunctuation(c)).Aggregate("", (current, c) => current + c);

It's LINQ sans WTF.

Based off GWLlosa's idea, I was able to come up with the supremely ugly, but working:

string s = "cat!";
s = s.ToCharArray().ToList<char>() .Where<char>(x => !char.IsPunctuation(x)) .Aggregate<char, string>(string.Empty, new Func<string, char, string>( delegate(string s, char c) { return s + c; }));
3

The most braindead simple way of doing it would be using string.replace

The other way I would imagine is a regex.replace and have your regular expression with all the appropriate punctuation marks in it.

Here's a slightly different approach using linq. I like AviewAnew's but this avoids the Aggregate

 string myStr = "Hello there..';,]';';., Get rid of Punction"; var s = from ch in myStr where !Char.IsPunctuation(ch) select ch; var bytes = UnicodeEncoding.ASCII.GetBytes(s.ToArray()); var stringResult = UnicodeEncoding.ASCII.GetString(bytes);
1

If you want to use this for tokenizing text you can use:

new string(myText.Select(c => char.IsPunctuation(c) ? ' ' : c).ToArray())

For anyone who would like to do this via RegEx:

This code shows the full RegEx replace process and gives a sample Regex that only keeps letters, numbers, and spaces in a string - replacing ALL other characters with an empty string:

//Regex to remove all non-alphanumeric characters
System.Text.RegularExpressions.Regex TitleRegex = new
System.Text.RegularExpressions.Regex("[^a-z0-9 ]+",
System.Text.RegularExpressions.RegexOptions.IgnoreCase);
string ParsedString = TitleRegex.Replace(stringToParse, String.Empty);
return ParsedString;

I faced the same issue and was concerned about the performance impact of calling the IsPunctuation for every single check.

I found this post: .

Accross the lines: char.IsPunctuation also handles Unicode on top of ASCII. The method matches a bunch of characters including control characters. By definiton, this method is heavy and expensive.

The bottom line is that I finally didn't go for it because of its performance impact on my ETL process.

I went for the custom implemetation of dotnetperls.

And jut FYI, here is some code deduced from the previous answers to get the list of all punctuation characters (excluding the control ones):

var punctuationCharacters = new List<char>(); for (int i = char.MinValue; i <= char.MaxValue; i++) { var character = Convert.ToChar(i); if (char.IsPunctuation(character) && !char.IsControl(character)) { punctuationCharacters.Add(character); } } var commaSeparatedValueOfPunctuationCharacters = string.Join("", punctuationCharacters); Console.WriteLine(commaSeparatedValueOfPunctuationCharacters);

Cheers, Andrew

$newstr=ereg_replace("[[:punct:]]",'',$oldstr);

For long strings I use this:

var normalized = input .Where(c => !char.IsPunctuation(c)) .Aggregate(new StringBuilder(), (current, next) => current.Append(next), sb => sb.ToString());

performs much better than using string concatenations (though I agree it's less intuitive).

This is simple code for removing punctuation from strings given by the user

Import required library

 import string

Ask input from user in string format

 strs = str(input('Enter your string:')) for c in string.punctuation: strs= strs.replace(c,"") print(f"\n Your String without punctuation:{strs}")
#include<string> #include<cctype> using namespace std; int main(int a, char* b[]){ string strOne = "H,e.l/l!o W#o@r^l&d!!!"; int punct_count = 0;
cout<<"before : "<<strOne<<endl;
for(string::size_type ix = 0 ;ix < strOne.size();++ix)
{ if(ispunct(strOne[ix])) { ++punct_count; strOne.erase(ix,1); ix--; }//if
} cout<<"after : "<<strOne<<endl; return 0; }//main

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge that you have read and understand our privacy policy and code of conduct.