Velvet Star Monitor

Standout celebrity highlights with iconic style.

updates

How to read all parquet files from S3 using awswrangler in python

Writer Sophia Terry

Need read all parquet files with ext .parquet

s3_path = "s3://buckte/table/files.parquet"
df = wr.s3.read_parquet( path=[s3_path]
)

, but still a error :

Error occurred (404) when calling the HeadObject

2 Answers

The trick is to put only one string as s3 path and path_sufix

s3_path = "s3://buckte/table"
df = wr.s3.read_parquet( path=s3_path, path_suffix = ".snappy.parquet" , use_threads =True
)

You are getting this error because the file you are trying to search is not found, or the location that you are trying to read from doesn't exist.

You can either specify the exact (and correct) location of the file you want to access. Or if you want to read all the parquet files from a folder, you can just specify the name of the folder, while specifying the extensions (".parquet", ".csv", ".json" etc.) through the suffix property.

The following code helps to read all parquet files within the folder 'table'.

df = wr.s3.read_parquet( path = "s3://bucket/table/", path_suffix = ".parquet"
)

If you want to read all the parquet files within your bucket, the following code helps

df = wr.s3.read_parquet( path = "s3://bucket/", path_suffix = ".parquet"
)

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge that you have read and understand our privacy policy and code of conduct.