How to move files of same extension in databricks files system?

16,600

Solution 1

Wildcards are currently not supported with dbutils. You can move the whole directory:

dbutils.fs.mv("dbfs:/tmp/test", "dbfs:/tmp/test2", recurse=True)

or just a single file:

dbutils.fs.mv("dbfs:/tmp/test/test.csv", "dbfs:/tmp/test2/test2.csv")

As mentioned in the comments below, you can use python to implement this wildcard-logic. See also some code examples in my following answer.

Solution 2

Since the wildcards are not allowed, we need to make it work in this way (list the files and then move or copy - slight traditional way)

import os

def db_list_files(file_path, file_prefix):
  file_list = [file.path for file in dbutils.fs.ls(file_path) if os.path.basename(file.path).startswith(file_prefix)]
  return file_list

files = db_list_files('dbfs:/your/src_dir', 'foobar')

for file in files:
  dbutils.fs.cp(file, os.path.join('dbfs:/your/tgt_dir', os.path.basename(file)))
Share:
16,600

Related videos on Youtube

Krishna Reddy
Author by

Krishna Reddy

Updated on June 04, 2022

Comments

  • Krishna Reddy
    Krishna Reddy almost 2 years

    I am facing file not found exception when i am trying to move the file with * in DBFS. Here both source and destination directories are in DBFS. I have the source file named "test_sample.csv" available in dbfs directory and i am using the command like below from notebook cell,

    dbutils.fs.mv("dbfs:/usr/krishna/sample/test*.csv", "dbfs:/user/abc/Test/Test.csv")
    

    Error:

    java.io.FileNotFoundException: dbfs:/usr/krishna/sample/test*.csv
    

    I appreciate any help. Thanks.