I have several xml files with names like :
LLL_ABC0D012_title.xml
LLL_ABC0D013_title.xml
LLL_ABT0G012_title.xml
LLL_ABR0N012_title.xml
There are always 8 characters after the ‘LLL_’.
I have an Excel file with over 900 lines that looks like this :
reference | dtp1 | dtp2 | dtp3 |
---|---|---|---|
ABC0D012 | 1_blabla | 1_1_blablabla | 1_1_1_blablablabla |
ABC0D013 | 1_blabla | 1_1_blablabla | 1_1_1_blablablabla |
ABC0D014 | 1_blabla | 1_1_blablabla | 1_1_1_blablablabla |
ABT0G012 | 1_blabla | 1_1_blablabla | 1_1_1_blablablabla |
In my xml files, I have text, text, text tags. I would like to modify the text of these tags with the text in the cells of the corresponding columns in the table above.
To do this, I’d like my script to do a loop that reads ABC0D012 characters (and then the other) in my xml file title, find the match in the ‘reference’ column of my Excel file and then look for the corresponding value in ‘dtp1’, ‘dtp2’ and ‘dtp3’. I would then like to save these values in a variable to replace the text in my tags.
I’m a beginner in Python. I tried something like :
import numpy as np
import openpyxl
import pandas as pd
import xml.etree.ElementTree as ET
import os
table1 = pd.read_excel('C:/Users/Documents/datatypes.xlsx', na_values=['NA'])
table2 = table1.replace('\xa0', ' ',regex=True)
for root, dirs, files in os.walk("."):
for file in files :
if file[-4:] == '.xml':
#print(file)
xml = ET.parse('LLL_ABC0D012_title.xml')
root = xml.getroot()
dtp1_xml = root[8]
dtp2_xml = root[9]
dtp3_xml = root[10]
num = file[4:12]
#print(num)
dtp1_excel = table2['dtp1'].where(table2['reference'] == num)
dtp1_xml.text = dtp1_excel
#print(dtp1_xml.text)
ET.indent(root)
xml.write("LLL_ABC0D012_title.xml", encoding='utf-8', xml_declaration=True, method='xml')
But it doesn’t work, ‘dtp1_excel’ does not return the dtp1 value from the table in print function but something like :
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
...
931 NaN
932 NaN
933 NaN
934 NaN
935 NaN
Name: dtp1, Length: 936, dtype: object
I have no NaN value in the reference and dtp1 columns and it should only return a single cell value.
What’s wrong with my approach ? Could you help me, please ?