Velvet Star Monitor

Standout celebrity highlights with iconic style.

general

Regex to parse URLs from text

Writer Mia Lopez

I have this regex:

[a-z]+[:.].*?\s

I run it on the following text:

regexbuddy.com cvc cvcv g f f
You can download RegexBu ddy at f
" URL with spaces" v

I need to match the following – the bolded text only:

  • regexbuddy.com
  • cvc
  • cvcv
  • g
  • f
  • f
  • You can download RegexBu ddy at . f
  • " URL with spaces"
  • v

How can I do that?

UPDATE

@slhck your revised regex matches almost everything except when the url starts with www. e.g - " URL with spaces"

I made some changes to the regex to match the leading www. It looks like

(https?)://.(?=\s)|(www.).?(?=\s)

Can you please review ? and suggest if there exists better ways of matching it.

1

1 Answer

If you don't want to include the trailing whitespace in a match, use a negative lookahead:

[a-z]+[:.].*?(?=\s)

In your example, this would match:

regexbuddy.com

To further match only http or https, and optional www use something like:

(https?):\/\/(www\.)?[a-z0-9\.:].*?(?=\s)

Here's John Gruber's regex to check for what looks like an URL, which appears to work quite well in your case:

(?i)\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))

But honestly, all those approaches will only get you false matches sooner or later. If you need a regular expression to parse URLs, see this Stack Overflow question: What is the best regular expression to check if a string is a valid URL?

2

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy