Designed for long-term storage of data, and indefinitely scaled datastore sizes with zero data loss, and high configurability.
Hierarchical checksumming of all data and metadata, ensuring that the entire storage system can be verified on use, and confirmed to be correctly stored, or remedied if corrupt. Checksums are stored with a block’s parent block, rather than with the block itself. This contrasts with many file systems where checksums (if held) are stored with the data so that if the data is lost or corrupt, the checksum is also likely to be lost or incorrect.
Can store a user-specified number of copies of data or metadata, or selected types of data, to improve the ability to recover from data corruption of important files and structures.
Automatic rollback of recent changes to the file system and data, in some circumstances, in the event of an error or inconsistency.
Automated and (usually) silent self-healing of data inconsistencies and write failure when detected, for all errors where the data is capable of reconstruction. Data can be reconstructed using all of the following: error detection and correction checksums stored in each block’s parent block; multiple copies of data (including checksums) held on the disk; write intentions logged on the SLOG (ZIL) for writes that should have occurred but did not occur (after a power failure); parity data from RAID/RAIDZ disks and volumes; copies of data from mirrored disks and volumes.
Native handling of standard RAID levels and additional ZFS RAID layouts (“RAIDZ”). The RAIDZ levels stripe data across only the disks required, for efficiency (many RAID systems stripe indiscriminately across all devices), and checksumming allows rebuilding of inconsistent or corrupted data to be minimised to those blocks with defects;
Native handling of tiered storage and caching devices, which is usually a volume related task. Because ZFS also understands the file system, it can use file-related knowledge to inform, integrate and optimize its tiered storage handling which a separate device cannot;
Native handling of snapshots and backup/replication which can be made efficient by integrating the volume and file handling. Relevant tools are provided at a low level and require external scripts and software for utilization.
Native data compression and deduplication, although the latter is largely handled in RAM and is memory hungry.
Efficient rebuilding of RAID arrays—a RAID controller often has to rebuild an entire disk, but ZFS can combine disk and file knowledge to limit any rebuilding to data which is actually missing or corrupt, greatly speeding up rebuilding;
Unaffected by RAID hardware changes which affect many other systems. On many systems, if self-contained RAID hardware such as a RAID card fails, or the data is moved to another RAID system, the file system will lack information that was on the original RAID hardware, which is needed to manage data on the RAID array. This can lead to a total loss of data unless near-identical hardware can be acquired and used as a “stepping stone”. Since ZFS manages RAID itself, a ZFS pool can be migrated to other hardware, or the operating system can be reinstalled, and the RAIDZ structures and data will be recognized and immediately accessible by ZFS again.
Ability to identify data that would have been found in a cache but has been discarded recently instead; this allows ZFS to reassess its caching decisions in light of later use and facilitates very high cache-hit levels (ZFS cache hit rates are typically over 80%);
Alternative caching strategies can be used for data that would otherwise cause delays in data handling. For example, synchronous writes which are capable of slowing down the storage system can be converted to asynchronous writes by being written to a fast separate caching device, known as the SLOG (sometimes called the ZIL – ZFS Intent Log).
Highly tunable—many internal parameters can be configured for optimal functionality.
Can be used for high availability clusters and computing, although not fully designed for this use.
fsck must be run on an offline filesystem, which means the filesystem must be unmounted and is not usable while being repaired, while scrub is designed to be used on a mounted, live filesystem, and does not need the ZFS filesystem to be taken offline.
fsck usually only checks metadata (such as the journal log) but never checks the data itself. This means, after an fsck, the data might still not match the original data as stored.
fsck cannot always validate and repair data when checksums are stored with data (often the case in many file systems), because the checksums may also be corrupted or unreadable. ZFS always stores checksums separately from the data they verify, improving reliability and the ability of scrub to repair the volume. ZFS also stores multiple copies of data—metadata, in particular, may have upwards of 4 or 6 copies (multiple copies per disk and multiple disk mirrors per volume), greatly improving the ability of scrub to detect and repair extensive damage to the volume, compared to fsck.
scrub checks everything, including metadata and the data. The effect can be observed by comparing fsck to scrub times—sometimes a fsck on a large RAID completes in a few minutes, which means only the metadata was checked. Traversing all metadata and data on a large RAID takes many hours, which is exactly what scrub does.
248: number of entries in any individual directory
16 exbibytes (264 bytes): maximum size of a single file
16 exbibytes: maximum size of any attribute
256 quadrillion zebibytes (2128 bytes): maximum size of any zpool
256: number of attributes of a file (actually constrained to 248 for the number of files in a directory)
264: number of devices in any zpool
264: number of zpools in a system
264: number of file systems in a zpool
Explicit I/O priority with deadline scheduling.
Claimed globally optimal I/O sorting and aggregation.
Multiple independent prefetch streams with automatic length and stride detection.
Parallel, constant-time directory operations.
End-to-end checksumming, using a kind of “Data Integrity Field”, allowing data corruption detection (and recovery if you have redundancy in the pool). A choice of 3 hashes can be used, optimized for speed (fletcher), standardization and security (SHA256) and salted hashes (Skein).
Transparent filesystem compression. Supports LZJB, gzip and LZ4.
Intelligent scrubbing and resilvering (resyncing).
Load and space usage sharing among disks in the pool.
Ditto blocks: Configurable data replication per filesystem, with zero, one or two extra copies requested per write for user data, and with that same base number of copies plus one or two for metadata (according to metadata importance). If the pool has several devices, ZFS tries to replicate over different devices. Ditto blocks are primarily an additional protection against corrupted sectors, not against total disk failure.
ZFS design (copy-on-write + superblocks) is safe when using disks with write cache enabled, if they honor the write barriers. This feature provides safety and a performance boost compared with some other filesystems.[according to whom?]
On Solaris, when entire disks are added to a ZFS pool, ZFS automatically enables their write cache. This is not done when ZFS only manages discrete slices of the disk, since it does not know if other slices are managed by non-write-cache safe filesystems, like UFS. The FreeBSD implementation can handle disk flushes for partitions thanks to its GEOM framework, and therefore does not suffer from this limitation.
Per-user, per-group, per-project, and per-dataset quota limits.
Filesystem encryption since Solaris 11 Express (on some other systems ZFS can utilize encrypted disks for a similar effect; GELI on FreeBSD can be used this way to create fully encrypted ZFS storage).
Pools can be imported in read-only mode.
It is possible to recover data by rolling back entire transactions at the time of importing the zpool.
ZFS is not a clustered filesystem; however, clustered ZFS is available from third parties.
Snapshots can be taken manually or automatically. The older versions of the stored data that they contain can be exposed as full read-only file systems. They can also be exposed as historic versions of files and folders when used with CIFS (also known as SMB, Samba or file shares); this is known as “Previous versions”, “VSS shadow copies”, or “File history” on Windows, or AFP and “Apple Time Machine” on Apple devices.
Disks can be marked as ‘spare’. A data pool can be set to automatically and transparently handle disk faults by activating a spare disk and beginning to resilver the data that was on the suspect disk onto it, when needed.
According to the authors, by using ECC RAM; however, the authors considered that adding error detection related to the page cache and heap would allow ZFS to handle certain classes of error more robustly.
One of the main architects of ZFS, Matt Ahrens, explains there is an option to enable checksumming of data in memory by using the ZFS_DEBUG_MODIFY flag (zfs_flags=0x10) which addresses these concerns.
Capacity expansion is normally achieved by adding groups of disks as a top-level vdev: simple device, RAID-Z, RAID Z2, RAID Z3, or mirrored. Newly written data will dynamically start to use all available vdevs. It is also possible to expand the array by iteratively swapping each drive in the array with a bigger drive and waiting for ZFS to self-heal; the heal time will depend on the amount of stored information, not the disk size.
As of Solaris 10 Update 11 and Solaris 11.2, it was neither possible to reduce the number of top-level vdevs in a pool, nor to otherwise reduce pool capacity. This functionality was said to be in development in 2007. Enhancements to allow reduction of vdevs is under development in OpenZFS.
As of 2008 it was not possible to add a disk as a column to a RAID Z, RAID Z2 or RAID Z3 vdev. However, a new RAID Z vdev can be created instead and added to the zpool.
Some traditional nested RAID configurations, such as RAID 51 (a mirror of RAID 5 groups), are not configurable in ZFS. Vdevs can only be composed of raw disks or files, not other vdevs. However, a ZFS pool effectively creates a stripe (RAID 0) across its vdevs, so the equivalent of a RAID 50 or RAID 60 is common.
Reconfiguring the number of devices in a top-level vdev requires copying data offline, destroying the pool, and recreating the pool with the new top-level vdev configuration, except for adding extra redundancy to an existing mirror, which can be done at any time or if all top level vdevs are mirrors with sufficient redundancy the zpool split command can be used to remove a vdev from each top level vdev in the pool, creating a 2nd pool with identical data.
IOPS performance of a ZFS storage pool can suffer if the ZFS raid is not appropriately configured. This applies to all types of RAID, in one way or another. If the zpool consists of only one group of disks configured as, say, eight disks in RAID Z2, then the IOPS performance will be that of a single disk (write speed will be equivalent to 6 disks, but random read speed will be similar to a single disk). However, there are ways to mitigate this IOPS performance problem, for instance add SSDs as L2ARC cache—which can boost IOPS into 100.000s. In short, a zpool should consist of several groups of vdevs, each vdev consisting of 8–12 disks, if using RAID Z. It is not recommended to create a zpool with a single large vdev, say 20 disks, because IOPS performance will be that of a single disk, which also means that resilver time will be very long (possibly weeks with future large drives).
Online shrink zpool remove was not supported until Solaris 11.4 released in August 2018
Resilver (repair) of a crashed disk in a ZFS RAID can take a long time which is not unique to ZFS, it applies to all types of RAID, in one way or another. This means that very large volumes can take several days to repair or to being back to full redundancy after severe data corruption or failure, and during this time a second disk failure may occur, especially as the repair puts additional stress on the system as a whole. In turn this means that configurations that only allow for recovery of a single disk failure, such as RAID Z1 (similar to RAID 5) should be avoided. Therefore, with large disks, one should use RAID Z2 (allow two disks to crash) or RAID Z3 (allow three disks to crash). ZFS RAID differs from conventional RAID by only reconstructing live data and metadata when replacing a disk, not the entirety of the disk including blank and garbage blocks, which means that replacing a member disk on a ZFS pool that is only partially full will take proportionally less time compared to conventional RAID.
Removal or abrupt failure of caching devices no longer causes pool loss. (At worst, loss of the ZIL may lose very recent transactions, but the ZIL does not usually store more than a few seconds’ worth of recent transactions. Loss of the L2ARC cache does not affect data.)
If the pool is unmountable, modern versions of ZFS will attempt to identify the most recent consistent point at which the pool which can be recovered, at the cost of losing some of the most recent changes to the contents. Copy on write means that older versions of data, including top-level records and metadata, may still exist even though they are superseded, and if so, the pool can be wound back to a consistent state based on them. The older the data, the more likely it is that at least some blocks have been overwritten and that some data will be irrecoverable, so there is a limit at some point, on the ability of the pool to be wound back.
Informally, tools exist to probe the reason why ZFS is unable to mount a pool, and guide the user or a developer as to manual changes required to force the pool to mount. These include using zdb (ZFS debug) to find a valid importable point in the pool, using dtrace or similar to identify the issue causing mount failure, or manually bypassing health checks that cause the mount process to abort, and allow mounting of the damaged pool.
As of March 2018, a range of significantly enhanced methods are gradually being rolled out within OpenZFS. These include:
Code refactoring, and more detailed diagnostic and debug information on mount failures, to simplify diagnosis and fixing of corrupt pool issues;
The ability to trust or distrust the stored pool configuration. This is particularly powerful, as it allows a pool to be mounted even when top-level vdevs are missing or faulty, when top level data is suspect, and also to rewind beyond a pool configuration change if that change was connected to the problem. Once the corrupt pool is mounted, readable files can be copied for safety, and it may turn out that data can be rebuilt even for missing vdevs, by using copies stored elsewhere in the pool.
The ability to fix the situation where a disk needed in one pool, was accidentally removed and added to a different pool, causing it to lose metadata related to the first pool, which becomes unreadable.
This section needs expansion. You can help by adding to it. (December 2013)
2008: Sun shipped a line of ZFS-based 7000-series storage appliances.
2013: Oracle shipped ZS3 series of ZFS-based filers and seized first place in the SPC-2 benchmark with one of them.
2013: iXsystems ships ZFS-based NAS devices called FreeNAS for SOHO and TrueNAS for the enterprise.
2014: Netgear ships a line of ZFS-based NAS devices called ReadyDATA, designed to be used in the enterprise.
2015: rsync.net announces a cloud storage platform that allows customers to provision their own zpool and import and export data using zfs send and zfs receive.
Comparison of file systems
List of file systems
Versioning file system – List of versioning file systems
Fork Yeah! The Rise and Development of illumos – slide show covering much of the history of Solaris, the decision to open source by Sun, the creation of ZFS, and the events causing it to be close sourced and forked after Oracle’s acquisition.
The best cloud File System was created before the cloud existed
Comparison of SVM mirroring and ZFS mirroring
EON ZFS Storage (NAS) distribution
End-to-end Data Integrity for File Systems: A ZFS Case Study
ZFS – The Zettabyte File System, archived from the original on February 28, 2013
ZFS and RAID-Z: The Über-FS?
ZFS: The Last Word In File Systems, by Jeff Bonwick and Bill Moore
Visualizing the ZFS intent log (ZIL), April 2013, by Aaron Toponce
Getting Started with ZFS, September 15, 2014, part of the illumos documentation
DISA eyes mainframe-based blockchain – GCN
The Defense Information Systems Agency is looking for help on developing a blockchain-as-a-service (BaaS) offering on Z system mainframes for its mission partners.
For more than a year, DISA has been eyeing a secure, agile and scalable BaaS solution mission partners could run on infrastructure inside accredited Defense Department environments, Sherri Sokol, innovation leader at DISA, told GCN in February 2019. “It would really just be the platform, infrastructure resource management and monitoring, which are services that DISA already offers,” she said.
The mainframe platform “takes advantage of the enterprise mainframe computing power and expertise DISA already offers and incorporates emerging technologies and approaches,” according to the June 2020 DISA Look Book.
BaaS could improve business processes across Defense Department networks by cutting down on the manual work of tracking data and assets across silos, improving accuracy and making information quickly available as a strategic asset.
BaaS would allow information to be selectively “shared among participants, enabling everyone to gain insights, accelerate informed decision-making, reduce the friction and cost in data exchanges and add new network members and data processes/workflows with relative ease,” the Look Book said. “Additionally, when blockchain is combined with other emerging technologies (e.g., artificial intelligence, machine learning, robotic process automation and internet of things), it can become a force multiplier.”
In a Nov. 4 request for information, DISA said it wants to find out what currently available products it could use in a solution stack to provide a scalable permissioned BaaS offering.
The permissioned BaaS capability must be able to limit membership and visibility for any given blockchain network and the information shared between members on those networks, DISA said.
Besides deployment on a Linux s390x architecture, the system must support FIPS 140-2 Level 4-compliant encrypted sessions and role-based access controls, containerization and deployment in an air-gapped environment.
Responses are due Nov. 10.
Author: About the Author
HTTP 404 – Wikipedia
301 Moved Permanently
303 See Other
404 Not Found
451 Unavailable for Legal Reasons
Basic access authentication
Digest access authentication
HTTP header injection
HTTP request smuggling
HTTP response splitting
HTTP parameter pollution
404.0 – Not found.
404.1 – Site Not Found.
404.2 – ISAPI or CGI restriction.
404.3 – MIME type restriction.
404.4 – No handler configured.
404.5 – Denied by request filtering configuration.
404.6 – Verb denied.
404.7 – File extension denied.
404.8 – Hidden namespace.
404.9 – File attribute hidden.
404.10 – Request header too long.
404.11 – Request contains double escape sequence.
404.12 – Request contains high-bit characters.
404.13 – Content length too large.
404.14 – Request URL too long.
404.15 – Query string too long.
404.16 – DAV request sent to the static file handler.
404.17 – Dynamic content mapped to the static file handler via a wildcard MIME mapping.
404.18 – Query string sequence denied.
404.19 – Denied by filtering rule.
404.20 – Too Many URL Segments.
The Wikimedia 404 message
Blue screen of death
List of HTTP status codes
A More Useful 404
404 Not Found of the Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content specification, at the Internet Engineering Task Force
ErrorDocument Directive – instructions on custom error pages for the Apache 2.0 web server
404: Not Found – an award-winning song about the error code