Control of regulated networks and life hacks and monitoring of web addresses in DLP, SIEM system.
You want to check whether a given piece of text is a URL that is valid for your purposes.
- RegEx
^(https?|ftp|sftp|file)://.+$
^
– Anchor(https?|ftp|sftp)://
– Scheme[a-z0-9-]+(.[a-z0-9-]+)+
– Domain([/?].*)?
– Path and/or parameters$
– Anchor
- RegEx
^(https?|ftp|sftp)://[a-z0-9-]+(\.[a-z0-9-]+)+([/?].+)?$
Require a domain name, and don’t allow a username or password. Allow the scheme
(http or ftp) to be omitted if it can be inferred from the subdomain (www or ftp):
^((https?|ftp|sftp)://|(www|ftp|sftp)\.)[a-z0-9-]+(\.[a-z0-9-]+)+([/?].*)?$
Require a domain name and a path that points to an image file. Don’t allow a username,
password, or parameters:
^(https?|ftp|sftp)://[a-z0-9-]+(\.[a-z0-9-]+)+(/[\w-]+)*/[\w-]+\.(gif|png|jpg)$
^
– Anchor(https?|ftp)://
– Scheme[a-z0-9-]+(.[a-z0-9-]+)+
– Domain(/[\w-]+)*
– Path/[\w-]+.(gif|png|jpg)
– File$
– Anchor
- RegEx
You want to find URLs in a larger body of text. URLs may or may not be enclosed in
punctuation that is part of the larger body of text rather than part of the URL. You want
to give users the option to place URLs between quotation marks, so they can explicitly
indicate whether punctuation, or even spaces, should be part of the URL.
(?:(?:https?|ftp|sftp|file)://|(www|ftp)\.)[-A-Z0-9+&@#/%?=~_|$!:,.;]*