spark-etl-framework/docs/tutorials/minio-read-write.md at main · qwshen/spark-etl-framework

To read from/write to MinIO buckets:

Set up the configuration for connecting to MinIO cluster with the SparkConfActor

actor:
  type: spark-conf-actor
  properties:
    hadoopConfigs:
      fs.s3a.path.style.access: "true"
      fs.s3a.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem"
      fs.s3a.aws.credentials.provider: "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider"
      fs.s3a.connection.ssl.enabled: "true"
      fs.s3a.connection.timeout: "30000",
      fs.s3a.endpoint: "${minio.host}:${minio.port}"
      fs.s3a.access.key: "${minio.access_key}"
      fs.s3a.secret.key: "${minio.secret_key}"

Note:

${minio.host} - the full-name of a host of the MinIO cluster.
${minio.port} - the port number at the host for connecting to the MinIO cluster
${minio.access_key} & ${minio.secret_key} - the access key & secret key for connecting to the MinIO cluster and accessing objects in buckets.

Use FileReader/FileWriter, IcebergReader/IcebergWriter and/or DeltaReader/DeltaWriter for the read/write operations.

The following is one example of reading csv objects from the users bucket and an underneath input path:
```
actor:
  type: file-reader:
  properties:
    format: csv
    options:
      header: true
      delimiter: "|"
    fileUri: "s3a://users/input"
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

minio-read-write.md

Latest commit

History

minio-read-write.md

File metadata and controls