Using wildcard for "if .... in .." statement

19,834

Solution 1

Try the following:

if any(f.startswith(existingXML) and f.endswith('.xml') for f in check_meta):
   print "exists"

The any() built-in function takes an iterable as an argument and returns true if any of the elements are true. The argument that we pass is a generator expression which will yield the value f.startswith(existingXML) and f.endswith('.xml') for each file f in your list check_meta.

A regex solution might look something like this:

regex = re.compile(re.escape(existingXML) + '.*\.xml$')
if any(regex.match(f) for f in check_meta):
    print "exists"

If you need to know which entry actually matches, use a for loop instead:

for f in check_meta:
    if f.startswith(existingXML) and f.endswith('.xml'):
        print "exists, file name:", f

Solution 2

why not just use:

searchtext = "sometext"
matching = [ f for f in os.listdir(currentPath) if f.startswith(searchtext) and ".xml" in f]

If you want to check for different extentions you can list them out.

exts = (".xml", ".tab", ".shp")
matching = [ f for f in os.listdir(currentPath) if f.startswith(searchtext) and os.path.splitext(f)[-1] in exts]

Of course you could figure out the regex to do the same thing as well.

Share:
19,834
GeorgeC
Author by

GeorgeC

I am a conservation biologist (with 15 years international experience) who is now working in the Spatial industry. One of my projects was awarded the GITA Australia and New Zealand Spatial Excellence award for 2013 and Highly Commended at QSEA. I have presented at several GIS and Disaster Management conferences in Australia since I moved in 2010. I am focused on automating large processes using python along with ESRI/Mapinfo and open source GIS tools that complement the strengths of each other. I also work in Disaster Management and risk mitigation.

Updated on June 04, 2022

Comments

  • GeorgeC
    GeorgeC almost 2 years

    I am trying to find files in directories where the file name used is sometimes only a part of the full file name.

    So

    check_meta=os.listdir(currentPath)
    

    gives

    ['ANZMeta.xsl', 'Benefited_Areas', 'divisons', 'emergency', 'Error_LOG.txt', 'hex.dbf', 'hex.shp', 'hex.shp_BaseMetadata.xml', 'hex.shx', 'Maintenance_Areas', 'Rates.mxd', 'Regulated_Parking', 'schema.ini', 'Service_Areas', 'Shortcut to Local_Govt.lnk', 'TAB', 'TRC.rar', 'trc_boundary.dbf', 'trc_boundary.kml', 'trc_boundary.prj', 'trc_boundary.sbn', 'trc_boundary.sbx', 'trc_boundary.shp', 'trc_boundary.shp.ATGIS29.1772.3444.sr.lock', 'trc_boundary.shp.ATGIS30.2668.2356.sr.lock', 'trc_boundary.shp.xml', 'trc_boundary.shx', 'trc_boundary_Metadata.xml.auto', 'trc_boundary_Polygon.dbf', 'trc_boundary_Polygon.prj', 'trc_boundary_Polygon.sbn', 'trc_boundary_Polygon.sbx', 'trc_boundary_Polygon.shp', 'trc_boundary_Polygon.shp.ATGIS29.1772.3444.sr.lock', 'trc_boundary_Polygon.shx', 'trc_boundary_polygon.xml', 'Urbanlevy_bdy_region.dbf', 'Urbanlevy_bdy_region.prj', 'Urbanlevy_bdy_region.shp', 'Urbanlevy_bdy_region.shp.xml', 'Urbanlevy_bdy_region.shx', 'Urbanlevy_bdy_trc.dbf', 'Urbanlevy_bdy_trc. prj', 'Urbanlevy_bdy_trc.sbn', 'Urbanlevy_bdy_trc.sbx', 'Urbanlevy_bdy_trc.shp', 'Urbanlevy_bdy_trc.shp.xml', 'Urbanlevy_bdy_trc.shx']

    I want to

    existingXML=FileNm[:FileNm.find('.')]
    if  existingXML+"*"+'.xml' in check_meta: # this is where the issue is
       print "exists"
    

    so sometimes the xml to use is Urbanlevy_bdy_trc.shp.xml and at others it is Urbanlevy_bdy_trc.xml (whichever exists -note it is not to simply use a OR function for ".shp.xml" as there are multiple file extentions like tab, ecw etc that the datasets will have). Also sometimes the related xml file maybe called Urbanlevy_bdy_trc_Metadata.shp.xml so the key is just to search for the core file name "Urbanlevy_bdy_trc" with extension .xml

    How can I specify this? the purpose of this is mentioned in Search and replace multiple lines in xml/text files using python

    FULL CODE

    import os, xml, arcpy, shutil, datetime
    from xml.etree import ElementTree as et 
    
    path=os.getcwd()
    RootDirectory=path
    arcpy.env.workspace = path
    Count=0
    
    Generated_XMLs=RootDirectory+'\GeneratedXML_LOG.txt'
    f = open(Generated_XMLs, 'a')
    f.write("Log of Metadata Creation Process - Update: "+str(datetime.datetime.now())+"\n")
    f.close()
    
    for root, dirs, files in os.walk(RootDirectory, topdown=False):
        #print root, dirs
        for directory in dirs:
            currentPath=os.path.join(root,directory)
            os.chdir(currentPath)
            arcpy.env.workspace = currentPath
            print currentPath
    #def Create_xml(currentPath):
    
            FileList = arcpy.ListFeatureClasses()
            zone="_Zone"
    
            for File in FileList:
                Count+=1
                FileDesc_obj = arcpy.Describe(File)
                FileNm=FileDesc_obj.file
                print FileNm
    
                check_meta=os.listdir(currentPath)
                existingXML=FileNm[:FileNm.find('.')]
                print "XML: "+existingXML
                print check_meta
                #if  existingXML+'.xml' in check_meta:
                if any(f.startswith(existingXML) and f.endswith('.xml') for f in check_meta):
                    print "exists"
                    newMetaFile=FileNm+"_2012Metadata.xml"
                    shutil.copy2(FileNm+'.xml', newMetaFile)
                else:
                    print "Does not exist"
                    newMetaFile=FileNm+"_BaseMetadata.xml"
                    shutil.copy2('L:\Data_Admin\QA\Metadata_python_toolset\Master_Metadata.xml', newMetaFile)
                tree=et.parse(newMetaFile)
    
                print "Processing: "+str(File)
    
                for node in tree.findall('.//title'):
                    node.text = str(FileNm)
                for node in tree.findall('.//northbc'):
                    node.text = str(FileDesc_obj.extent.YMax)
                for node in tree.findall('.//southbc'):
                    node.text = str(FileDesc_obj.extent.YMin)
                for node in tree.findall('.//westbc'):
                    node.text = str(FileDesc_obj.extent.XMin)
                for node in tree.findall('.//eastbc'):
                    node.text = str(FileDesc_obj.extent.XMax)        
                for node in tree.findall('.//native/nondig/formname'):
                    node.text = str(os.getcwd()+"\\"+File)
                for node in tree.findall('.//native/digform/formname'):
                    node.text = str(FileDesc_obj.featureType)
                for node in tree.findall('.//avlform/nondig/formname'):
                    node.text = str(FileDesc_obj.extension)
                for node in tree.findall('.//avlform/digform/formname'):
                    node.text = str(float(os.path.getsize(File))/int(1024))+" KB"
                for node in tree.findall('.//theme'):
                    node.text = str(FileDesc_obj.spatialReference.name +" ; EPSG: "+str(FileDesc_obj.spatialReference.factoryCode))
                print node.text
                projection_info=[]
                Zone=FileDesc_obj.spatialReference.name
    
                if "GCS" in str(FileDesc_obj.spatialReference.name):
                    projection_info=[FileDesc_obj.spatialReference.GCSName, FileDesc_obj.spatialReference.angularUnitName, FileDesc_obj.spatialReference.datumName, FileDesc_obj.spatialReference.spheroidName]
                    print "Geographic Coordinate system"
                else:
                    projection_info=[FileDesc_obj.spatialReference.datumName, FileDesc_obj.spatialReference.spheroidName, FileDesc_obj.spatialReference.angularUnitName, Zone[Zone.rfind(zone)-3:]]
                    print "Projected Coordinate system"
                x=0
                for node in tree.findall('.//spdom'):
                    for node2 in node.findall('.//keyword'):
                        print node2.text
                        node2.text = str(projection_info[x])
                        print node2.text
                        x=x+1
    
    
                tree.write(newMetaFile)
    
                f = open(Generated_XMLs, 'a')
                f.write(str(Count)+": "+File+"; "+newMetaFile+"; "+currentPath+"\n")
                f.close()
    
    
    
        #        Create_xml(currentPath)
    

    RESULT