I have a Python Scraper that I run periodically in my free tier AWS EC2 instance using Cron that outputs a csv file every day containing around 4-5000 rows with 8 columns. I have been ssh-ing into it from my home Ubuntu OS and adding the new data to a SQLite database which I can then use to extract the data I want.
我有一个Python Scraper,我使用Cron在我的免费层AWS EC2实例中定期运行,每天输出一个包含大约4-5000行8列的csv文件。我一直在从我的家庭Ubuntu操作系统中删除它并将新数据添加到SQLite数据库中,然后我可以使用它来提取我想要的数据。
Now I would like to try the free tier AWS MySQL database so I can have the database in the Cloud and pull data from it from my terminal on my home PC. I have searched around and found no direct tutorial on how this could be done. It would be great if anyone that has done this could give me a conceptual idea of the steps I would need to take. Ideally I would like to automate the updating of the database as soon as my EC2 instance updates with a new csv table. I can do all the de-duping once the table is in the aws MySQL database.
现在我想尝试免费的AWS MySQL数据库,这样我就可以在云端拥有数据库,并从家用PC上的终端中提取数据。我四处搜索,没有找到关于如何做到这一点的直接教程。如果有人这样做可以让我对我需要采取的步骤有一个概念性的想法,那就太好了。理想情况下,一旦我的EC2实例使用新的csv表更新,我想自动更新数据库。一旦表在aws MySQL数据库中,我就可以完成所有的重复数据删除。
Any advice or link to tutorials on this most welcome. As I stated, I have searched quite a bit for guides but haven't found anything on this. Perhaps the concept is completely wrong and there is an entirely different way of doing it that I am not seeing?
任何有关此教程的建议或链接都是最受欢迎的。正如我所说,我已经搜索了很多指南,但没有找到任何相关内容。也许这个概念是完全错误的,并且有一种完全不同的做法,我没有看到它?
2 个解决方案
#1
1
The problem is you don't have access to RDS filesystem, therefore cannot upload csv there (and import too).
问题是你无权访问RDS文件系统,因此无法在那里上传csv(并导入)。
Modify your Python Scraper to connect to DB directly and insert data there.
修改Python Scraper以直接连接到DB并在那里插入数据。
#2
0
Did you consider using AWS Lambda to run your scraper?
您是否考虑过使用AWS Lambda来运行刮刀?
Take a look at this AWS tutorial which will help you configure a Lambda Function to access an Amazon RDS database.
请查看此AWS教程,该教程将帮助您配置Lambda函数以访问Amazon RDS数据库。