阅读背景:

为什么Cloud Dataflow在与我的数据所在的不同区域运行其工作人员?

来源:互联网 

In an evaluation of GCP as a potential analytics platform for our business, I have set up a Cloud Storage bucket to be located in the EU. I have configured my BigQuery dataset to also be located in the EU. But when I run an ETL job in the Cloud Dataflow service that moves data from the former to the latter, I see the following message in the logs:

在评估GCP作为我们业务的潜在分析平台时,我已经建立了一个位于欧盟的云存储桶。我已将BigQuery数据集配置为也位于EU中。但是当我在Cloud Dataflow服务中运行ETL作业,将数据从前者移动到后者时,我在日志中看到以下消息:

Worker configuration: n1-standard-1 in us-central1-f

工作人员配置:us-central1-f中的n1-standard-1

Apart from the technical questions that arise regarding performance and latency, I am also concerned about the legal aspect of having data that needs to stay within EU roundtripping to US datacenters for processing.

除了在性能和延迟方面出现的技术问题之外,我还担心将数据需要保留在欧盟往返美国数据中心进行处理的法律方面。

I cannot specify worker location in the DataflowPipelineRunner options, and I can't make any sense in the Data Processing and Security Terms of whether or not I can assume that my data doesn't move.

我无法在DataflowPipelineRunner选项中指定工作者位置,在数据处理和安全术语中我是否可以假设我的数据不会移动。

Is it expected that Cloud Dataflow may process my data geographically anywhere it find convenient, regardless of where it is stored or where it is destined?

是否期望Cloud Dataflow可以在任何方便的地方处理我的数据,无论其存储位置或目标位置如何?

1 个解决方案

#1


1  

According to the documentation:

根据文件:

The Dataflow service deploys Compute Engine resources in the zone us-central1-f by default. You can override this setting by specifying the --zone option when you create your pipeline.

默认情况下,Dataflow服务在区域us-central1-f中部署计算引擎资源。您可以在创建管道时通过指定--zone选项来覆盖此设置。

This option is declared in DataflowPipelineWorkerPoolOptions.

此选项在DataflowPipelineWorkerPoolOptions中声明。


分享到: