阅读背景:

从远程服务器目录列表下载文件并导入HDFS

来源:互联网 

I have been given access to a server that provides a directory listing of files which I will download and import into HDFS. What I am currently doing is hitting the server with an HTTP GET and downloading the HTML directory listing and then I use jsoup and parse all the links to the files which I need to download. Once I have a complete list I download each files one by one and then import each into HDFS. I don't believe that flume is able to read & parse html to download files. Is there an easier cleaner way to do what I am describing? I have been given access to a server that provi




你的当前访问异常,请进行认证后继续阅读剩余内容。

分享到: