• Skip to main content
  • Skip to primary sidebar
  • Skip to secondary sidebar
  • Skip to footer

Computer Notes

Library
    • Computer Fundamental
    • Computer Memory
    • DBMS Tutorial
    • Operating System
    • Computer Networking
    • C Programming
    • C++ Programming
    • Java Programming
    • C# Programming
    • SQL Tutorial
    • Management Tutorial
    • Computer Graphics
    • Compiler Design
    • Style Sheet
    • JavaScript Tutorial
    • Html Tutorial
    • Wordpress Tutorial
    • Python Tutorial
    • PHP Tutorial
    • JSP Tutorial
    • AngularJS Tutorial
    • Data Structures
    • E Commerce Tutorial
    • Visual Basic
    • Structs2 Tutorial
    • Digital Electronics
    • Internet Terms
    • Servlet Tutorial
    • Software Engineering
    • Interviews Questions
    • Basic Terms
    • Troubleshooting
Menu

Header Right

Home » Python » Regular Expression in Python
Next →
← Prev

Regular Expression in Python

By Dinesh Thakur

In this tutorial we are going to talk about regular expressions and their implementation or usage in the Python programming language.

We’ll be covering the following topics in this tutorial:

  • What is RegEx?
  • what do we mean by a pattern?
  • Regular Expression in Python and Their Uses
  • Regular Expressions Methods in Python

What is RegEx?

Regular expressions in Python also write in short as RegEx, or you can also pronounce it as regX. In simple words, a regular expression is a sequence of characters that define the search pattern. We know that a sequence of characters is simply a string. A string that acts as a pattern for searching something in any given text can term as a regular expression.

what do we mean by a pattern?

It’s just a strategy that we use to identify text. So a pattern can mean something like three digits in a row or two alphabetic letters in a row or the letters BCA in sequence, or any number of whitespace characters in a row. It’s just a search pattern. It’s a strategy to identify text, and the applications in the real world are vast.

For example, we may need to pass a big chunk of text and find the nested email address within it. An email address has a particular pattern.

We have the @ sign in the middle, and then we have something before it and something afterward. Or we can, for example, be looking for a phone number. A phone number has a specific pattern as well. It’s a sequence of numbers. And usually, those numbers are separated by spaces or dashes or slashes or something like that. Or if we’re looking for something like a zip code within the United States, we can write a pattern to search for five digits in a row.

So regular expressions are just an internal language built into Python that allows us to identify and write out those strategies to help identify snippets of text within larger chunks of text.

To work with regular expressions will have to begin by importing a module from within the standard library called re. That is short for regular expressions.

If the pattern does not exist, we’re going to get a none object to represent nowness or nothingness. And if the pattern does match in the string that we pass in, we’re going to get a different type of object called a match.

Let’s take a look at both of those scenarios. First up, let’s pass in a string like candy.

import re
pattern = re.compile("flower")
print(type(pattern))
print(pattern.search("candy"))

So, again, Python and regular expressions is going to look for this combination of characters flower within this string of candy.

We’re going to see it’s going to be the none object whenever Python cannot find a match using the regular expression pattern and returns None.

So what I’m going to do below is I’m going to once again invoke the search method on my pattern object and I’m going to give it a string like flower power.

import re
pattern = re.compile("flower")
print(type(pattern))
print(pattern.search("candy"))
match = pattern.search("flower power")
print(type(match))

So now this combination of six characters that we specified in here is going to exist at some point in this string. So we’re going to get a match object right here on the right hand side.

Now, that match object is going to have some helpful methods to help us figure out where the match occurred.

For example, on my match object, I can call a method called group and group is going to return the actual string that’s matched.

import re
pattern = re.compile("flower")
print(type(pattern))
print(pattern.search("candy"))
match = pattern.search("flower power")
print(type(match))
print(match.group())

So within flower power with the pattern of flower, the pattern that was identified was flower.

Regular Expression in Python and Their Uses

Metacharacters

Metacharacters are characters which are interpreted in a particular way.

Metacharacter is a character with the specified meaning.

Metacharacter Description Example
[] Specifies set of characters to match. “[a-z]”
\ Treat meta characters as ordinary characters. “\r”
. Matches any single character except a newline. “Ja.v.”
^ Match the starting character of the string. “^Java”
$ Match ending character of the string. “point”
* Matches zero or more occurrence of the pattern left to it. “hello*”
+ Matches one or more occurrences of the pattern left to it. “hello+”
{} Match for a specific number of pattern occurrences in a string. “java{2}”
| Either/Or “java|point”
() Group various patterns.

Special Sequences

Special sequences are the sequences containing \ followed by one of the characters.

Character Description

\A Return a match if the pattern is at the start of the string.
\b Return a match if the pattern is at the beginning or end of a word.
\B Return a match if the pattern is present but not at the beginning or end of a word.
\d Return a match where the string contains digits.
\D Return a match where the string does not contain digits.
\s Return a match where the string contains a white space character.
\S Return a match where the string does not contain a white space character.
\w Return a match where the string contains any word character.
\W Return a match where the string does not contain any word character.
\Z Return a match if the pattern is at the end of the string.

Sets

A set is a group of characters given inside a pair of square brackets. It represents the special meaning.
SN Set Description
1 [arn] Returns a match if the string includes some defined characters in the sequence.
2 [a-n] Returns a match if the string contains any characters between a to n.
3 [^arn] Returns a match if the string includes the characters except a, r, and n.
4 [0123] Returns a match if the string includes any specified digits.
5 [0-9] Returns a match if the string is between 0 and 9 digits.
6 [0-5][0-9] Returns a match if the string is between 00 and 59 digits.
10 [a-zA-Z] Returns a match if there is some alphabet in the string (lower-case or upper-case).

Regular Expressions Methods in Python

1. let us suppose we are to find string for a particular match. So such for ape in the string.

import re
# Search for ape in the string
if re.search("ape","The ape was at the apex")
print("There is an ape")
Output: There is an ape

Now if we do this searching, we are finding that there is and if so, when this particular added or such return are true, then this respective message will get printed.

2. Next, we’re going to find all this function returns a list of matches.

import re
# findall() return a list of matches
# . is used to match only 1 character or space
allApes = re.findall("ape.","The ape was at the apex")
for i in allApes:
    print(i)
Output: 
ape
apex

So Dot it to match any one character.Dot Will is nothing but one wildcard character, which will be denoting any single character or espace.

3. Next, we are going for this finditer, which returns and iterator of matching objects and you spend to get the location.

theStr = "The ape was at the apex"
for i in re.finditer("ape.",theStr):
# Span returns a tuple
locTuple = i.span()
print(locTuple)
# Slice the match out using the tuple values
print(theStr[locTuple[0]:locTuple[1]])
Output:
(4,8)
ape
(19,23)
apex

4. Now Square brackets will match any one of the character between the brackets not including upper and lowercase varieties unless they are listed.

animalStr = Cat rat mat fat pat"
allAnimals = re.findall("[crmfp]at",animalStr)
for i in allAnimals:
    print(i)
print()
Output:
rat
mat 
fat
pat

5. We can also allow for characters in a range.

animalStr = "Cat rat mat fat pat"
someAnimals = re.findall("[c-mC-M]at",animalStr)
for i in someAnimals:
    print(i)
print()
Output:
Cat
mat
fat

6. Next Use ^ to denote any character but whatever characters are between the brackets.

animalStr = "Cat rat mat fat pat"
someAnimals = re.findall("[^Cr]at", animalStr)
for i in someAnimals:
    print(i)
print()
Output:
mat
fat
pat

7. Replace maching items in a string

owlFood = "rat cat mat pat"
# You can compile a regex into pattern objects which provide additional methods.
regex = re.compile("[cr]at")
# sub() replaces items that match the regex in the string with the 1st 
  attribute string passed to sub
owlFood = regex.sub("owl",owlFood)
print(owlFood)
Output:
owl owl mat pat

8. Regex use the backslash to designate special characters and Python does the same inside strings which causes issues.Lets try to get “”\\stuff out of a string.

randStr = "Here is \\stuff"
# This won't find it
print("Find \\stuff : ",re.search("\\stuff", randStr))
#This does, but we have to put in 4 slashes which is messy
print("Find \\stuff: ", research("\\\\stuff", randStr))
# You can get around this by using raw string which don't treat backslashes as special
print("Find \\stuff: ", re.search(r"\\stuff", randStr))
Output
Find \stuff: None
Find \stuff: <_sre.SRE_Match object; span=(8,14), match='\\stuff'>
Find \stuff: <_sre.SRE_Match object; span=(8,14), match='\\stuff'>

9. We saw that . matches any character, but what if we want to match a period. Backslash the period. You do the same with[,] and others

randStr= " F.B.I. I.R.S. CIA"
print("Matches :", len(re.findall(".\..\..",randStr)))
print("Matches :", re.findall(".\..\.."",randStr))
Matches : 2
Matches : ['F.B.I', 'I.R.S']

10. We can match many whitespace characters

randStr = """This is a long string that goes on for many lines"""
print(randStr)
#Remove newlines
regex = re.compile("\n")
randStr = regex.sub(" ", randStr)
print(randStr)
# You can also match
# \b : backspace
# \f : Form Feed
# \r : Carriage Return
# \t : Tab
# \v : vertical Tab
# You may need to remove \r\n on Windows
Output :
This is a long
string that goes on for many lines 
This is a long string that goes on for many lines
import re
# \d can be used instead of [0-9]
# \D is the same as [^0-9]
randStr = "12345"
print("Matches :", len(re.findall("\d",randStr))))
Output:
Matches : 5

12. You can match multiple digits by following the \d with {numOfValues}

#Match 5 numbers only
if re.search("\d{5}","12345"):
print("It is a zip code")
# You can also match within a range. Match values that are between 5 and 7 digits.
numStr = "123 12345 123456 1234567"
print("Matches :", len(re.findall("\d{5,7}", numStr)))
Output :
It is a zip code
Matches : 3

You’ll also like:

  1. Regular Expression – Compiler Design
  2. Convert Regular Expression to DFA – Compiler Design
  3. Regular Inner Class Example in Java
  4. Python File I/O: Read and Write to Files in Python
  5. What is Python? | Introduction to Python Programming
Next →
← Prev
Like/Subscribe us for latest updates     

About Dinesh Thakur
Dinesh ThakurDinesh Thakur holds an B.C.A, MCDBA, MCSD certifications. Dinesh authors the hugely popular Computer Notes blog. Where he writes how-to guides around Computer fundamental , computer software, Computer programming, and web apps.

Dinesh Thakur is a Freelance Writer who helps different clients from all over the globe. Dinesh has written over 500+ blogs, 30+ eBooks, and 10000+ Posts for all types of clients.


For any type of query or something that you think is missing, please feel free to Contact us.


Primary Sidebar

Python

Python Tutorials

  • Python - Home
  • Python - Features
  • Python - Installation
  • Python - Hello World
  • Python - Operators Types
  • Python - Data Types
  • Python - Variable Type
  • Python - Switch Case
  • Python - Line Structure
  • Python - String Variables
  • Python - Condition Statement
  • Python - if else Statement
  • Python - for-loop
  • Python - while loop
  • Python - Command Line
  • Python - Regular Expression

Python Collections Data Types

  • Python - List
  • Python - Sets
  • Python - Tuples
  • Python - Dictionary

Python Functions

  • Python - Functions
  • Python - String Functions
  • Python - Lambda Function
  • Python - map() Function

Python Object Oriented

  • Python - Oops Concepts
  • Python - File Handling
  • Python - Exception Handling
  • Python - Multithreading
  • Python - File I/O

Python Data Structure

  • Python - Linked List
  • Python - Bubble Sort
  • Python - Selection Sort
  • Python - Linear Search
  • Python - Binary Search

Python Programs

  • Python - Armstrong Number
  • Python - Leap Year Program
  • Python - Fibonacci Series
  • Python - Factorial Program

Other Links

  • Python - PDF Version

Footer

Basic Course

  • Computer Fundamental
  • Computer Networking
  • Operating System
  • Database System
  • Computer Graphics
  • Management System
  • Software Engineering
  • Digital Electronics
  • Electronic Commerce
  • Compiler Design
  • Troubleshooting

Programming

  • Java Programming
  • Structured Query (SQL)
  • C Programming
  • C++ Programming
  • Visual Basic
  • Data Structures
  • Struts 2
  • Java Servlet
  • C# Programming
  • Basic Terms
  • Interviews

World Wide Web

  • Internet
  • Java Script
  • HTML Language
  • Cascading Style Sheet
  • Java Server Pages
  • Wordpress
  • PHP
  • Python Tutorial
  • AngularJS
  • Troubleshooting

 About Us |  Contact Us |  FAQ

Dinesh Thakur is a Technology Columinist and founder of Computer Notes.

Copyright © 2023. All Rights Reserved.