I have been given access to a server that provides a directory listing of files which I will download and import into HDFS. What I am currently doing is hitting the server with an HTTP GET and downloading the HTML directory listing and then I use jsoup and parse all the links to the files which I need to download. Once I have a complete list I download each files one by one and then import each into HDFS. I don't believe that flume is able to read & parse html to download files. Is there an easier cleaner way to do what I am describing? I have been given access to a server that provi