Regular Expression Fundamentals : Practical Examples
- Get link
- X
- Other Apps
Regular Expression Fundamentals : Practical Examples - LifeAdda
Before getting into regular expressions, first needs to understand the data on we're planning to apply regular expressions. Below are few important data types :
Characters | a-z & A-Z |
Digits | 0-9 |
Words | letter, digit or underscore collection |
White-space | white space is any character or series of characters that represent horizontal (Spacebar, Tab) or vertical space (Enter) in typography. |
Boundaries | The word boundary is the position of word (letter, digit or underscore collection) |
Anchors | ^ and $ called anchors |
Basic Meta-characters :
*A line terminator is a one- or two-character sequence that marks the end of a line of the input character sequence. The following are recognized as line terminators:
- A newline (line feed) character ('\n')
- A carriage-return character followed immediately by a newline character ("\r\n")
- A standalone carriage-return character ('\r')
- A next-line character ('\u0085')
- A line-separator character ('\u2028')
- A paragraph-separator character ('\u2029').
Quantifiers :
A word or number (such as many, few, some, two, or 2) that is used with a noun to show the amount of something. Below is the detail. These has been segregated into greedy, lazy, possessive, fixed quantifiers.
Class Groupings:
character class groupings inside brackets are created to match or neglect a character.
General Understanding :
- The difference between a 'bracket', a 'parentheses' and a 'curly braces' can be a bit confusing. Generally, 'parentheses' refers to round brackets ( ), 'brackets' to square brackets [ ] and 'braces' to curly braces { }.
- A backslash (\) escapes special characters to suppress their special meaning. General special characters are [\^$.|?*+(){}.
Lookahead and Lookbehind :
This is often confusing to understand. So I try to explain through simple example. In all four patterns, syntax (? ... ) is using. Below we're trying to explain by a simple example.
Below we're taking some online content. On we'll analyze Lookahead and Lookbehind theory :
"
WASHINGTON/BEIJING (Reuters) - U.S. President Donald Trump accused Russia and China on Monday of devaluing their currencies while the United States raises interest rates, prompting China to accuse the United States of sending confusing messages.
Donald Duck is a cartoon character created in 1934 at Walt Disney Productions. Donald is an anthropomorphic white duck with a yellow-orange bill, legs, and feet. He typically wears a sailor shirt and cap with a bow tie.
We all know that Disney's animators like to hide a few easter eggs in their work — but did we all just somehow miss this shot of Donald Duck with a boner??
(CNN)Donald Trump's tabloid lifestyle made him rich, famous and ultimately built the persona that made him President. Yet his back-to-the-future encounter with his sensational and melodramatic past might become his Achilles' heel.
The name Donald is a Scottish baby name. In Scottish the meaning of the name Donald is: Great cheif, world mighty. From the Gaelic Domhnall. The name Donald has been borne by a number of early Scottish kings. Famous Bearers: Billionaire Donald John Trump; actor Donald Sutherland.
"
Positive Lookahead :
Donald (?=Trump) find all "Donald" which has "Trump" after it.
Negative Lookahead
Donald (?!Trump) find all "Donald" which does not have "Trump" after it.
Positive Lookbehind
(?<=Donald )Trump "Trump" which has "Donald" before it.
Negative Lookbehind
(?< !Donald )Trump "Trump" which does not has "Donald" before it.
Capturing and non-capturing groups :
The only difference between capture groups and non-capture groups is that the former captures the matched character sequences for possible later re-use with a numbered back reference while a non-capture group does not. Both are used to group subexpressions, which is the main reason most people will utilize parentheses ( ) within regular expressions.
The brackets surround the parts of the expression whose corresponding string we want to "remember". These bracketed expressions are called groups, and are number from 1 upwards from left to right.
By placing part of a regular expression inside round brackets, we may apply quantifier to the entire group or to restrict alternation to part of the regex.
Example :
We take a date string 25/01/2018 follow the pattern DD/MM/YYYY. To capture date, month and year differently. Capturing groups is best option. Below pattern would do this.
Capturing groups:
([0-9]{2})/([0-9]{2})/([0-9]{4})
Result would be in three groups
non-capturing groups:
([0-9]{2})/([0-9]{2})/(?:[0-9]{4})
Result would be in two groups
Resources:
https://www.javamex.com/tutorials/regular_expressions/capturing_groups.shtml#.Wtah-Jdx3IU
https://www.javamex.com/tutorials/regular_expressions/capturing_groups_alternatives.shtml#.WtbJrZdx3IU
https://www.javamex.com/tutorials/regular_expressions/non_capturing_groups.shtml#.Wtahh5dx3IV
https://www.regular-expressions.info/refcapture.html
https://stackoverflow.com/questions/3512471/what-is-a-non-capturing-group-what-does-do
http://www.manifold.net/doc/radian/why_do_non-capture_groups_exist_.htm
https://regexr.com/
https://stackoverflow.com/questions/2973436/regex-lookahead-lookbehind-and-atomic-groups
https://www.javamex.com/tutorials/regular_expressions/non_capturing_groups.shtml#.WtYDxYhubIU
http://www.rexegg.com/regex-disambiguation.html#codecapsule (Advanced Learning)
- Get link
- X
- Other Apps
Comments
Regular Flags
ReplyDeletehttps://regex101.com/
Nice example