I have a file tree like :
2_Product
2-1_CategoryName1_Product
2-1-1_Name1_Product
LLL_nomenclature1_product.zip
LLL_nomenclature1_product (folder)
notice_nomenclature1.pdf
LLL_nomenclature1_product_metadata.xml
LLL_nomenclature2_product.zip
LLL_nomenclature2_product (folder)
notice_nomenclature2.pdf
LLL_nomenclature2_product_metadata.xml
LLL_nomenclature3_product.zip
LLL_nomenclature3_subproduct1 (folder)
notice_nomenclature3.pdf
LLL_nomenclature3_subproduct2 (folder)
notice_nomenclature3.pdf
LLL_nomenclature3_subproduct3 (folder)
notice_nomenclature3.pdf
LLL_nomenclature3_product_metadata.xml
... etc
2-1-2_Name2_Product
2-1-3_ ...etc
2-2_CategoryName2_Product
2-2-1_ ...
2-2-2_ ...
... etc
I have a script that searches my zipped folders for the ‘notice_nomenclatureX.pdf’ files and then adds a tag in the xml of the associated product with the name of the associated notice in it (here ‘notice_nomenclature1.pdf’ for example).
import os
import xml.etree.ElementTree as ET
import zipfile
for root, dirs, files in os.walk("."):
for folder_ext in files:
if folder_ext[-4:] == '.zip' and folder_ext[:3] == 'LLL':
filePath3 = os.path.join(root, folder_ext)
zip_folder = zipfile.ZipFile(filePath3)
zipfile_paths = zip_folder.namelist()
for paths in zipfile_paths:
zipfiles = os.path.basename(paths)
if zipfiles[-4:] == '.pdf' and zipfiles[:3] == 'not':
notice_name = zipfiles
for prdt in files:
if prdt[-4:] == '.xml' and prdt[:-13] == folder_ext[:-4] :
filePath4 = os.path.join(root, prdt)
xml_produit = ET.parse(filePath4)
root_produit = xml_produit.getroot()
notice_tag = ET.SubElement(root_produit, "notice_pdf")
notice_tag.text = notice_name
ET.indent(root_produit)
xml_produit.write(filePath4, encoding='utf-8', xml_declaration=True, method='xml', short_empty_elements=False)
My script works well for ‘nomenclature1’ and ‘nomenclature2’ and gives this in my xml (what I want) :
<?xml version="1.0" encoding="UTF-8"?>
<gmd:MD_Metadata xmlns:gmd="http:...">
.
.
.
<notice_pdf>notice_nomenclature1.pdf</notice_pdf>
</gmd:MD_Metadata>
But for ‘nomenclature3’, I get (what I don’t want) :
<?xml version="1.0" encoding="UTF-8"?>
<gmd:MD_Metadata xmlns:gmd="http:...">
.
.
.
<notice_pdf>notice_nomenclature3.pdf</notice_pdf>
<notice_pdf>notice_nomenclature3.pdf</notice_pdf>
<notice_pdf>notice_nomenclature3.pdf</notice_pdf>
</gmd:MD_Metadata>
How do I write in my script that when the ‘zipfiles’ variable contains the same notice name several times, it only transcribes one of them in the xml tag ?
I’ve tried using .sort() and sorted, to no avail.
And I tried this :
...
new_list = []
for paths in zipfile_paths:
zipfiles = os.path.basename(paths)
if zipfiles[-4:] == '.pdf' and zipfiles[:3] == 'not':
if zipfiles not in new_list:
new_list.append(zipfiles)
notice_name = new_list
... etc
“notice_nomenclature3.pdf” appears only once in “new_list” but when I run the script, it has a problem with the list format and it returns the following error :
TypeError: write() argument must be str, not list
Would you know how I can achieve the desired result ?
Thank you.