



GCS credentials are automatically detected. The priorities are as follows:

The environment variable GOOGLE_APPLICATION_CREDENTIALS is set and points to a valid .json file. A valid Cloud SDK is installed. In that case, a warning may be displayed. UserWarning: The application is authenticated using the Google Cloud SDK end-user credentials without a quota project. The machine that executes the code itself is a GCP machine.

Note that the associated account must have permission to list the buckets in the project. TransparentPath tries to ensure that the bucket required for set_global_fs exists in the project.

File system, Pathlib and str

All the methods available on the file system object are in TransparentPath. The difference is that you don’t have to specify the path to apply them.

fs.glob (“/” .join (“/”.[first_path, “*”]) # GCSFileSystempath.glob (“*”) # TransparentPath

All the methods available in Pathlib are in TransparentPath.

path.parentpath.namepath.suffix …

All the methods available in str are in TransparentPath:

>>> “foo” in (TransparentPath (“gs: // bucket_name”) / “foo”) True

To make these methods available, TransparentPath does not implement them. When an unknown method is called into a class, it is checked to see if the method is recognized as follows:

File system (GCS or local depending on the path) Pathlibstr

Use this method in this order, with the appropriate arguments. For example, if you use a glob that is not a method of TransparentPath, the class first checks to see if it exists on the filesystem object. So use it. glob also exists in Pathlib, but this method was first found in the filesystem and is never used.

Read and write

Reading and writing Pandas DataFrames and Series is very easy with TransparentPath.

p = Path (“gs: // bucket_name / file.csv”) df = path.read (index_col = 0, parse_dates = True) p = Path (“gs: // bucket_name / file.parquet”) p.write ( df)

Depending on the suffix of the file, the class will call the appropriate panda read method. The supported formats for reading in pandas are csv, parquet, hdf5, xlsx, xls, xlsm. You can also load json into a dictionary. If the suffix is ​​neither, the class considers the file to contain plain text and uses the file system’s open method to read it.

The built-in open is overloaded by the class and can be used with a TransparentPaths object or a string starting with gs: //.

# Both commands are the same as open (“gs: // bucket_name / file.txt”, “r”) with f: … with open (Path (“gs: // bucket_name / file.txt”)). It has an effect. “R”) as f: ..

TransparentPath can also handle Dask data frames.

Panda pdimportdask.dataframe as dddf_dask = dd.from_pandas (pd.DataFrame (columns =)[“foo”, “bar”], Index =[“a”, “b”], Data =[[1, 2], [3, 4]]), npartitions = 1) pfile = Path (“gs: // bucket_name / file.parquet”) pfile.write (df_dask) # Detect that the object is Dask dataframedf_dask = pfile.read (use_dask = True) # Tell To use Dask Copying and moving you need

The cp and mv methods are available and work for both files and directories.

p1 = Path (“foo /”, bucket = “bucket_name”, fs = “gcs”) p2 = Path (“foo /”, fs = “local”) p1.cp (p2) # Copy items from GCS to localConclusion To do

The package is still new and will be updated regularly. The stable version is available in Python 3.8 and above via pip install transparentpath, and the beta version is available via pip install transparentpath-nightly. You can submit your issue on the project’s Github page. There is also a README here that contains more information and examples on how to use the class.

about us

Advestis is an investment technology development company that has a deep understanding and practice of interpretable AI and machine learning techniques. LinkedIn: https: //www.linkedin.com/company/advestis/

