- The HDFView is a Java-based tool for browsing and editing NCSA HDF4 and HDF5 files. HDFView allows users to browse through any HDF4 and HDF5 file; starting with a tree view of all top-level objects in an HDF file's hierarchy. HDFView allows a user to descend through the hierarchy and navigate among the file's data objects.
- HDF5 version 1.8.19 released on 2017-06-15 INTRODUCTION This document describes the differences between HDF5-1.8.18 and HDF5-1.8.19, and contains information on the platforms tested and known problems in HDF5-1.8.19.
- Download Hdf Viewer Mac Software. HDFView for Mac OS X v.2.8The HDFView is a Java-based tool for browsing and editing NCSA HDF4 and HDF5 files. HDFView allows users to browse through any HDF4 and HDF5 file; starting with a tree view of all top-level objects in an HDFfile's hierarchy.
Introduced in release: 1.18.
Using CMake to build HDF5 on a MAC OS X. I had a successful cmake build/test/install on a MAC by adding: '-DCMAKEBUILDWITHINSTALLPATH:BOOL=OFF' to the build options.
Hierarchical Data Format (HDF) is a set of file formats designed to store and organize large amounts of data 1. Originally developed at the National Center for Supercomputing Applications, it is supported by The HDF Group, a non-profit corporation whose mission is to ensure continued development of HDF5 technologies and the continued accessibility of data stored in HDF 2.
This plugin enables Apache Drill to query HDF5 files.
Configuring the HDF5 Format Plugin
There are three configuration variables in this plugin and which are tabled below.
Option | Default | Description |
---|---|---|
type | (none) | Set to “hdf5” to make use of this plugin |
extensions | “.h5” | This is a list of the file extensions used to identify HDF5 files. Typically HDF5 uses .h5 or .hdf5 as file extensions. |
defaultPath | null | The default path defines which path Drill will query for data. Typically this should be left as null in the configuration file. Its usage is explained below. |
Example Configuration
For most uses, the configuration below will suffice to enable Drill to query HDF5 files.
Usage
Since HDF5 can be viewed as a file system within a file, a single file can contain many datasets. For instance, if you have a simple HDF5 file, a star query will produce the following result:
The actual data in this file is mapped to a column called int_data. In order to effectively access the data, you should use Drill’s FLATTEN()
function on the int_data
column, which produces the following result.
apache drill> select flatten(int_data) as int_data from dfs.test.dset.h5
;
Once the data is in this form, you can access it similarly to how you might access nested data in JSON or other files.
However, a better way to query the actual data in an HDF5 file is to use the defaultPath
field in your query. If the defaultPath
field is defined in the query, or via the plugin configuration, Drill will only return the data, rather than the file metadata.
Note
Once you have determined which data set you are querying, it is advisable to use this method to query HDF5 data.
Note
Datasets larger than 16MB will be truncated in the metadata view.
You can set the defaultPath
variable in either the plugin configuration, or at query time using the table()
function as shown in the example below:
This query will return the result below:
If the data in defaultPath
is a column, the column name will be the last part of the path. If the data is multidimensional, the columns will get a name of <data_type>_col_n
. Therefore a column of integers will be called int_col_1
.
Attributes
Occasionally, HDF5 paths will contain attributes. Drill will map these to a map data structure called attributes
, as shown in the query below.
Hdf5 For Mac Versions
You can access the individual fields within the attributes
map by using the structure table.map.key
. Note that you will have to give the table an alias for this to work properly.
Known Limitations
Hdf5 For Mac Pro
There are several limitations of the HDF5 format plugin in Drill.
Hdf5 Mac Os Install
- Drill cannot read unsigned 64 bit integers. When the plugin encounters this data type, it will write an INFO message to the log.
- While Drill can read compressed HDF5 files, Drill cannot read individual compressed fields within an HDF5 file.
- HDF5 files can contain nested data sets of up to
n
dimensions. Since Drill works best with two dimensional data, datasets with more than two dimensions are reduced to 2 dimensions. - HDF5 has a
COMPOUND
data type. At present, Drill supports readingCOMPOUND
data types that contain multiple datasets. At present Drill does not supportCOMPOUND
fields with multidimesnional columns. Drill will ignore multidimensional columns withinCOMPOUND
fields.
Hdf5 For Machine Learning
https://en.wikipedia.org/wiki/Hierarchical_Data_Format ↩
https://www.hdfgroup.org ↩