They allow to express with a few characters search for strings, characters or words, and if done well can lead to good results, but if they are wrong they can not give you any useful result, and the worst thing is that often it is difficult to understand whether or not a regepx it is written with a correct syntax to cover all the possibility.
But now as first thing let’s see what is a regular expression:
From WIkipedia ”
In computing, a regular expression, also referred to as regex or regexp, provides a concise and flexible means for matching strings of text, such as particular characters, words, or patterns of characters. A regular expression is written in a formal language that can be interpreted by a regular expression processor, a program that either serves as a parser generator or examines text and identifies parts that match the provided specification.”
Syntax: it’s the same for all programs / languages?
Usually, yes in javascript and perl the syntax is similar, using preg_replace in php the syntax it’s the same of perl, and ereg … well, now ereg it is deprecated .Even the IDE and text programs such as vim, notepad++, Komodo Edit, Dreamweaver, etc.. support search & replace with regular expressions.
In general the syntax may change, but not by much. Anyway, if you learn the syntax of perl you can still get away with any other variant
metacharacters
In regular expressions there are several “special characters ” with different functions:. (dot) Means any character except those that identify a new line (\n \r)
Example
$text = "espressioni regolari!"; preg_match_all('/./', $text, $ris); // Will match all characters
Example
$text = "espressioni regolari!"; preg_match_all('/^./', $text, $ris); // It will find only the character "e"
Example
$text = "espressioni regolari!"; preg_match_all('/.$/', $text, $ris); // It will find only "!"
Example
$text = "espressioni regolari!"; preg_match_all('/a|i|u|o|e/', $text, $ris); // You will find all the vowels
[] Brackets indicate ranges and character classes
\ This character cancels the effects of the next metacharacter
Esempio
$text = "espressioni.regolari!"; preg_match_all('/\./', $text, $ris); // You will find only the . (dot)
Quantifiers
Quantifiers, as the term itself, indicate how often search for a given sequence of characters.
* (star) indicates 0 or more occurrencesExample
$testo = "Espressioni, pesi, piume!"; preg_match_all('/s*i/', $testo, $ris); // Will match "ssi" of espressioni, // "si" in pesi // and "i" of word piume
Example
$testo = "Espressioni, pesi, piume!"; preg_match_all('/s+i/', $testo, $ris); // Will match "ssi" of espressioni, // and "si" in pesi
Example
$testo = "Espressioni, pesi, piume!"; preg_match_all('/s?i/', $testo, $ris); // Will match "si" and "i" in espressioni, // "si" in pesi // and "i" si piume
Example with a replace
$testo = "ese, esse, essse, esssse!"; $testo = preg_replace('/es{2}e/', '*', $testo); // Now $testo will be "ese, *, essse, esssse!"
Example with a replace
$testo = "ese, esse, essse, esssse!"; $testo = preg_replace('/es{3,}e/', '*', $testo); // Now $testo will be "ese, esse, *, *!"
Example with a replace
$testo = "ese, esse, essse, esssse!"; $testo = preg_replace('/es{2,3}e/', '*', $testo); // Now $testo will be "ese, *, *, esssse!"
Quantifiers ungreedy
Almost everyone sooner or later stumble into this problem: if I use an expression such as /”.*”/ i will find all the words enclosed in double quotes? Unfortunately, no!This is because the standard quantifiers are “greedy”, that seek the greatest possible occurrence.
Let’s look at an example:
Example
$testo = 'class="pluto" id="pippo"'; preg_match_all('/".*"/', $testo, $ris); // Will find a single occurrence: // "pluto" id="pippo"
Just add a question mark at the end of our quantifiers
Example
$testo = 'class="pluto" id="pippo"'; preg_match_all('/".*?"/', $testo, $ris); // Now it will find "pluto" e "pippo" !
classes and ranges
The classes determine a list of characters, character classes or POSIX (see next section) to be searched. They are enclosed in square brackets and can be followed by the quantifiers.Example
$testo = 'Questa è una stringa lunga lunga di esempio'; preg_match_all('/[aiuoe]{2}/', $testo, $ris); // The expression will search for two successive vowels, // so it will find "ue" and "io"
Example
$testo = 'caratteri 16sdf456 e un colore esadecimale 94fa3c '; preg_match_all('/[0-9a-f]{6}/', $testo, $ris); // The expression will search for 6 characters that are numbers or letters from a to f // so it will find "94fa3c"
Example
$testo = 'Questa è una stringa lunga lunga di esempio'; preg_match_all('/[^aiuoe ]{3}/', $testo, $ris); // The term will search for 3 letters that are NOT vowels or spaces // s oit will find only "str"
character classes and POSIX
The POSIX character classes and are used to specify a set of characters at the same time, without using the groups.Class : \w
Matches : [a-zA-Z0-9_]
Description: Search a character “word” (w stands for word), ie letters, numbers and “_”
Example:
$testo = "[[Le_Regex sono_belle!!!]]"; preg_match_all('/\w+/', $testo, $ris); // Will match "Le_Regex" and "sono_belle"
Matches : [0-9]
Description: Research a number (d stands for digit)
Example:
$testo = "123 stella! 456 cometa!"; preg_match_all('/\d+/', $testo, $ris); // Will match "123" and "456"
Matches: [ \t\r\n\v\f]
Description: research space, including tabs and newlines
Example:
$testo = "manuale sulle espressioni regolari!"; $testo = preg_replace('/\s+/', '', $testo); // Now testo will be manualesulleespressioniregolari!
Thus, for example, \D search anything that is not a number.
Class: [:ALNUM:]
Matches: [a-zA-Z0-9]
Description: Search alphanumeric characters, without “_”
Example
$testo = "[[Le_Regex 123 sono_belle!!!]]"; preg_match_all('/[[:alnum:]]+/', $testo, $ris); // Will match "Le","Regex", "123", // "sono" and "belle"
Matches: [a-zA-Z]
Description: Search alphabetic characters
Example
$testo = "[[Le_Regex 123 sono_belle!!!]]"; preg_match_all('/[[:alpha:]]+/', $testo, $ris); // Will match "Le","Regex", "sono" e "belle"
Matches: [ \t]
Description: Search only spaces and tabs
Example
$testo = "questa è una prova con spazi e tabulazioni"; $testo = preg_replace('/[[:blank:]]+/', '', $testo); /* $testo Now will be: questaèunaprova conspazietabulazioni */
Matches: [A-Z]
Description: Research uppercase
Example
$testo = "ESPRESSIONI regolari"; preg_match_all('/[[:lower:]]+/', $testo, $ris); // Will match "ESPRESSIONI"
Modifiers
Each search operation can use several modifiers, which, as its name implies, can change the default search criteria.These modifiers should be placed at the end of the search string, immediately after the character limitation.
You can combine multiple effects without appending modifiers spaces (for example: /imsu will apply all 4 the effects described below).
i the search becomes case-insensitive, ie upper and lower case are considered equal
$testo = "Le Espressioni Regolari sono regolari?"; preg_match_all('/regolari/i', $testo, $ris); // Will match both "regolari" and "Regolari"
$testo = 'Espressioni Regolari Espressioni in perl Espressioni php'; preg_match_all('/^Espressioni/m', $testo, $ris); // will match all 3 "Espressioni" // and not just the first
$testo = 'Espressioni Regolari Espressioni in perl Espressioni php'; preg_match('/perl.Espressioni/s', $testo); // research will be successful
$testo = '紫の触手、緑の触手'; preg_match('/\x{89e6}\x{624b}/u', $testo, $ris); // research will be successful
$testo = 'class="pluto" id="pippo"'; preg_match_all('/".*"/U', $testo, $ris); // it's the same as /".*?"/ will match both "pluto" and "pippo"
Anchors
Anchors identify the location where to search our text.
^ Identifies the beginning of the string, with the modifier /m identifies the beginning of each line$testo = 'Questo è un esempio sulle Espressioni Regolari nella sintassi di perl'; preg_match_all('/^[\w]+/m', $testo, $ris); //will match "Questo", "sulle" e "nella"
$testo = 'Questo è un esempio sulle Espressioni Regolari nella sintassi di perl'; preg_match('/[\w]+$/m', $testo, $ris); // Will match // "esempio", "Regolari", "perl"
$testo = 'Questo è un esempio sulle Espressioni Regolari nella sintassi di perl'; preg_match_all('/\A[\w]+/m', $testo, $ris); // will match only "Questo"
$testo = 'Questo è un esempio sulle Espressioni Regolari nella sintassi di perl'; preg_match_all('/[\w]+\Z/m', $testo, $ris); //will match only "perl"
$testo = 'condor daino dingo elefante'; preg_match_all('/\bd\w+/', $testo, $ris); // The search will find only the words that begin // with the letter d, so "daino" and "dingo"
$testo = 'condor daino dingo elefante'; preg_match_all('/\Bd\w+/', $testo, $ris); // The search will find only a set of characters // beginning with d, which is not the beginning of // a word, in this case "dor"
The special characters
If you have some knowledge of php or perl you will have already had to deal with that placeholder that identify special characters such as newline or tabs. All All of these placeholders begin with the character \Here is a list with a lot of detail, to emphasize the importance of \Q!!
\t tab (HT, TAB)
\n line endings (LF, NL)
\r carriage return (CR)
\Q This disables any wild card up to \ It is very useful to insert variables into the string
$variabile = '[\w\s]+'; $testo = "Questo testo [\w\s]+[\w\s]+ ha regex al suo interno!"; $testo1 = preg_replace('/'.$variabile.'/', '', $testo); // $testo1 will be "[\\]+[\\]+!" $testo2 = preg_replace('/\Q'.$variabile.'\E/', '', $testo); // $testo2 will be "Questo testo ha regex al suo interno!"\E see above
\nnn character in octal form where n is a number from 0 to 7
\Xnn character as a hexadecimal number where n is a hexadecimal …
\f form feed (FF)
\a alarm / bell (BEL)
\and escape (BEL)
groups
The groups are enclosed by parentheses and become essential at the time of replacement, because you can recall them. An example to clarify everything:$testo = "This is a date in mysql format: 2010-01-28"; $testo = preg_replace('/(\d{4})-(\d{2})-(\d{2})/', 'This is a date in European format: $3/$2/$1', $testo); // Now $testo will be "This is a date in European format: 28/01/2010"
In groups you can also add a logical “OR”, ie to find a set of characters or another
$testo = "Si dice ha piovuto o è piovuto in italiano?"; $testo = preg_replace('/\s+((ha|è)\s+piovuto)\s+/', ' è nevicato ', $testo); // This will search "ha piovuto" OR "è piovuto" followed or preceded by spaces; // $testo now will be "Si dice è nevicato o è nevicato in italiano?"