Question

我正在使用Python，我需要通过python直接获取文件夹（保存为HDFS）中的文件名列表，并将文件名（.wav文件）从路径中分离出来（I只需要这个名字）。我想可能是我可以使用pyspark或subprocess但它们只将整个'path + filename'作为字节，而不是分开，并且很难将它们分开。如果有人可以帮助我，我将感激不尽。

import subprocess 

p = subprocess.Popen("hdfs dfs -ls <directory>",
shell=True,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)

for line in p.stdout.readlines():
print(line)

Answer 1

这给出了列表数据类型

中的输出

login(){
console.log('login'+ this.userData);
// Your app login API web service call triggers
this.authService.postData(this.userData,'login').then((result) => {
  this.responseData = result;

  console.log('userdata : '+ temp);
  if(this.responseData.values){
    console.log('response: ' +  this.responseData);
    localStorage.setItem('userData', JSON.stringify(this.responseData));
    this.navCtrl.push(TabsPage);
  }
  else{
    this.showToastWithCloseButton()
  }
}, (err) => {
  console.log('erreur : '+err);
});

Answer 2

使用递归遍历hdfs的HDFS CLI

select count(*) . . .

Answer 3

使用此：

from subprocess import Popen, PIPE
hdfs_path = '/path/to/the/designated/folder'
process = Popen(f'hdfs dfs -ls -h {hdfs_path}', shell=True, stdout=PIPE, stderr=PIPE)
std_out, std_err = process.communicate()
list_of_file_names = [fn.split(' ')[-1].split('/')[-1] for fn in std_out.decode().readlines()[1:]][:-1]
list_of_file_names_with_full_address = [fn.split(' ')[-1] for fn in std_out.decode().readlines()[1:]][:-1]

使用Python

3 个答案: