Regex to parse URLs from text
Mia Lopez
I have this regex:
[a-z]+[:.].*?\s
I run it on the following text:
regexbuddy.com cvc cvcv g f f
You can download RegexBu ddy at f
" URL with spaces" vI need to match the following – the bolded text only:
- regexbuddy.com
- cvc
- cvcv
- g
- f
- f
- You can download RegexBu ddy at . f
- " URL with spaces"
- v
How can I do that?
UPDATE
@slhck your revised regex matches almost everything except when the url starts with www. e.g - " URL with spaces"
I made some changes to the regex to match the leading www. It looks like
(https?)://.(?=\s)|(www.).?(?=\s)
Can you please review ? and suggest if there exists better ways of matching it.
11 Answer
If you don't want to include the trailing whitespace in a match, use a negative lookahead:
[a-z]+[:.].*?(?=\s)In your example, this would match:
regexbuddy.comTo further match only http or https, and optional www use something like:
(https?):\/\/(www\.)?[a-z0-9\.:].*?(?=\s)Here's John Gruber's regex to check for what looks like an URL, which appears to work quite well in your case:
(?i)\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))But honestly, all those approaches will only get you false matches sooner or later. If you need a regular expression to parse URLs, see this Stack Overflow question: What is the best regular expression to check if a string is a valid URL?
2