I am working on a directory application using Google Apps API to store all data(users, groups, org units) from APIs into data store and then query the data store to display the user for searching and viewing.
我正在使用Google Apps API处理目录应用程序,将API中的所有数据(用户,组,组织单位)存储到数据存储中,然后查询数据存储以显示用户进行搜索和查看。
I am doing the loading of users and groups using tasks. The approach that I am following is to read data from API and then create an entity of every user in a loop and then look for nextPageToken and if it is not null, assign another task of loading the users. The same approach is being followed for groups and OU as well.
我正在使用任务加载用户和组。我遵循的方法是从API读取数据,然后在循环中创建每个用户的实体,然后查找nextPageToken,如果它不为null,则分配另一个加载用户的任务。对于组和OU也遵循相同的方法。
Now the problem is if I run it on a google domain having 2k users, it works fine, however when I run it on environment having 90K users, it works fine till it reaches 12-13k users and then it stops responding and tasks stop responding and the memory usage on my machine goes higher and this is the case on my local devserver as I havent deployed it yet on appengine.
现在的问题是,如果我在拥有2k用户的谷歌域上运行它,它运行正常,但是当我在具有90K用户的环境上运行它时,它工作正常,直到它达到12-13k用户然后它停止响应并且任务停止响应并且我的机器上的内存使用率更高,我的本地devserver就是这种情况,因为我还没有在appengine上部署它。
There is lots of backend code following the approach that I described above, however I am not sure what to provide here, so please ask questions that you may think could be causing the problem. I will paste the snippet here, whatever is asked for!
我在上面描述的方法之后有很多后端代码,但是我不确定在这里提供什么,所以请提出您认为可能导致问题的问题。无论要求什么,我都会在这里粘贴代码片段!
The actual production server will have double the amount of users , i.e. close to 200K, which concerns me a lot. Please help!
实际的生产服务器的用户数量将增加一倍,即接近200K,这对我来说很重要。请帮忙!
1 个解决方案
#1
Don't try to create all users / entities in one loop and then save all in one step, that is unnecessary and (as you ran into it) requires a lot more memory.
不要尝试在一个循环中创建所有用户/实体,然后在一个步骤中保存所有这些,这是不必要的(当你遇到它时)需要更多的内存。
Instead break it into smaller groups, e.g. limit your cycle to 100 entities for example, save those 100 and continue. That way the instance handling the request doesn't have to keep several thousands of entities in memory.
而是将其分成更小的组,例如例如,将您的周期限制为100个实体,保存100个并继续。这样,处理请求的实例不必在内存中保留数千个实体。
You mentioned you are doing this with tasks. Your bottleneck is memory. By decreasing the number of entities created and saved by a task, you effectively will increase the number of requests to do the job, and since there will be less entities handled by a task, you will increase not just the number of requests but also the request rate. This will result in less memory usage by tasks, and this will suggest to the GAE platform to spin up new instances more often which means you will have overall more memory.
你提到你正在用任务做这件事。你的瓶颈是记忆。通过减少任务创建和保存的实体数量,您实际上会增加执行作业的请求数量,并且由于任务处理的实体数量会减少,因此您不仅会增加请求数量,还会增加请求率。这将导致任务的内存使用量减少,这将建议GAE平台更频繁地启动新实例,这意味着您将拥有更多的内存。
So handling less entities in a task is a good and easy way to decrease your memory requirement.
因此,在任务中处理较少的实体是降低内存需求的一种简单方便的方法。