Welcome!

By registering with us, you'll be able to discuss, share and private message with other members of our community.

SignUp Now!

Using regex in contains() to select rows from a pandas data frame having some string value (Capital or small)

New member
Joined
Feb 7, 2023
Messages
27
I want to extract rows from a pandas data frame based on the values of a column using regex in contains() method.

I am using the following code line to extract rows from a data frame if the 'COMPTYPE' column has any string value mentioned in contains() method

Code:
df = df[df['COMPTYPE'].astype(str).str.contains('MCCB|ACB|VCB|CONTACTOR', regex=True)]

It works fine however it's not selecting those rows which have MccB or Vcb or Contactor or acb etc. values in the 'COMPTYPE' column. How to use this command so it will take rows irrespective of the case of the string values.
Input:
BOARDIBNOSUBCOMP_IBNOCOMPTYPE
10444440019044444001ACB
10444440019044444002Relay
10444440019044444003Meters
10444440019044444004MCCB/MPCB
10444440019044444005vcb
10444440019044444006MCCB/MPCB
10444440019044444007acb
10444440019044444008mccb
10444440019044444009MCCB/MPCB
10444440019044444010Power Contactor
10444440019044444011Power Contactor
10444440019044444012Control Contactor
10444440019044444013VCB
Expected output is this,
BOARDIBNOSUBCOMP_IBNOCOMPTYPE
10444440019044444001ACB
10444440019044444004MCCB/MPCB
10444440019044444005vcb
10444440019044444006MCCB/MPCB
10444440019044444007acb
10444440019044444008mccb
10444440019044444009MCCB/MPCB
10444440019044444010Power Contactor
10444440019044444011Power Contactor
10444440019044444012Control Contactor
10444440019044444013VCB
However, I'm getting following output,
BOARDIBNOSUBCOMP_IBNOCOMPTYPE
10444440019044444001ACB
10444440019044444004MCCB/MPCB
10444440019044444005MCCB/MPCB
10444440019044444006MCCB/MPCB
10444440019044444010VCB
How to do it? Please help!
 
New member
Joined
Feb 7, 2023
Messages
21
Just use flags=re.IGNORECASE as parameter of str.contains or use case=False as suggested by @JoanLara:

Code:
import re
out = (df[df['COMPTYPE'].astype(str)
          .str.contains('MCCB|ACB|VCB|CONTACTOR', regex=True, flags=re.IGNORECASE)]

print(out)

# Output
     BOARDIBNO  SUBCOMP_IBNO           COMPTYPE
0   1044444001    9044444001                ACB
3   1044444001    9044444004          MCCB/MPCB
4   1044444001    9044444005                vcb
5   1044444001    9044444006          MCCB/MPCB
6   1044444001    9044444007                acb
7   1044444001    9044444008               mccb
8   1044444001    9044444009          MCCB/MPCB
9   1044444001    9044444010    Power Contactor
10  1044444001    9044444011    Power Contactor
11  1044444001    9044444012  Control Contactor
12  1044444001    9044444013                VCB

Or upper case the column before:

Code:
>>> out = df[df['COMPTYPE'].astype(str).str.upper()
             .str.contains('MCCB|ACB|VCB|CONTACTOR', regex=True)]

print(out)

# Output
     BOARDIBNO  SUBCOMP_IBNO           COMPTYPE
0   1044444001    9044444001                ACB
3   1044444001    9044444004          MCCB/MPCB
4   1044444001    9044444005                vcb
5   1044444001    9044444006          MCCB/MPCB
6   1044444001    9044444007                acb
7   1044444001    9044444008               mccb
8   1044444001    9044444009          MCCB/MPCB
9   1044444001    9044444010    Power Contactor
10  1044444001    9044444011    Power Contactor
11  1044444001    9044444012  Control Contactor
12  1044444001    9044444013                VCB
 
Top