I have a table with over 900 rows like this :
reference | software |
---|---|
ABD0H5D | software1, software2, software3 |
DPJ6R8G | software2, software3 |
GHI5P6M | NaN |
AGH7U8N | software1, software3 |
I would like my code to read the ‘software’ column and if ‘software2’ (for example) is present in the cell then the tag text takes the value ‘Yes’ otherwise ‘No’ in my xml file whose title contains the same characters as the ‘reference’ column.
I tried something like :
import pandas as pd
import xml.etree.ElementTree as ET
import os
table1 = pd.read_excel('C:/Users/Documents/software.xlsx', na_values=['NA'], dtype=str)
table2 = table1.replace('\xa0', ' ',regex=True)
for root, dirs, files in os.walk("."):
for file in files :
if file[-4:] == '.xml' and file[:3] == 'LLL':
filePath2 = os.path.join(root, file)
xml = ET.parse(filePath2)
root = xml.getroot()
nomenc1 = file[4:10]
nomenc2 = file[4:11]
nomenc3= file[4:12]
nomenc4 = file[4:14]
software_tag = ET.SubElement(root, "tagname")
softw_excel = table2['software'][table2['reference'].isin([nomenc1, nomenc2, nomenc3, nomenc4])]
for softw in softw_excel :
if softw.str.contains('software2') :
software_tag.text = 'Yes'
else:
software_tag.text = 'No'
ET.indent(root_produit)
xml.write("infodump.xml", encoding='utf-8', xml_declaration=True, method='xml')
But it sends me back :
Traceback (most recent call last):
Cell In[30], line 23
if softw.str.contains('software2'):
AttributeError: 'float' object has no attribute 'str'
I also tried this :
syst_excel = table2['software'][table2['reference'].isin([nomenc1, nomenc2, nomenc3, nomenc4])]
S2 = syst_excel.str.contains('software2')
for values in S2 :
if S2 == True :
software_tag.text = 'Yes'
else:
software_tag.text = 'No'
ET.indent(root)
xml.write("infodump.xml", encoding='utf-8', xml_declaration=True, method='xml')
I get :
Traceback (most recent call last):
Cell In[31], line 26
if S2 == True :
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I know how to match my ‘software’ column with my ‘reference’ column. My question is : How can I retrieve a specific character string from a sentence in a cell in order to either :
- use it as a variable in a for, if, else statement, getting rid of the string or boolean format problem ?
- if this isn’t possible, create a new column in my table where when the sentence contains my character string in ‘software’, this character string is added to the new column with the NaN cells remaining in NaN in the new column ? For the last one, I tried with pandas.DataFrame.where and pandas.Series.str.contains together but I’m still having problems with the string format. Furthermore, I’m not sure I understand what a ‘Series’ is in relation to a ‘Dataframe’…
Could you help me, please ?