How to read all parquet files from S3 using awswrangler in python

Need read all parquet files with ext .parquet

s3_path = "s3://buckte/table/files.parquet"
df = wr.s3.read_parquet( path=[s3_path]
)

, but still a error :

Error occurred (404) when calling the HeadObject

2 Answers

The trick is to put only one string as s3 path and path_sufix

s3_path = "s3://buckte/table"
df = wr.s3.read_parquet( path=s3_path, path_suffix = ".snappy.parquet" , use_threads =True
)

You are getting this error because the file you are trying to search is not found, or the location that you are trying to read from doesn't exist.

You can either specify the exact (and correct) location of the file you want to access. Or if you want to read all the parquet files from a folder, you can just specify the name of the folder, while specifying the extensions (".parquet", ".csv", ".json" etc.) through the suffix property.

The following code helps to read all parquet files within the folder 'table'.

df = wr.s3.read_parquet( path = "s3://bucket/table/", path_suffix = ".parquet"
)

If you want to read all the parquet files within your bucket, the following code helps

df = wr.s3.read_parquet( path = "s3://bucket/", path_suffix = ".parquet"
)

Velvet Star Monitor

How to read all parquet files from S3 using awswrangler in python

2 Answers

Your Answer

Sign up or log in

Post as a guest

Similar Journal

How was someone else watching my Arena battle live?

How do subsequent playthroughs work?

When should I use high gear?

How do Steam Trading cards and the trade system work?