#33 "Filesystems and Beyond"

December 21, 2016 01:36:50
#33 "Filesystems and Beyond"
The Workflow Show
#33 "Filesystems and Beyond"

Dec 21 2016 | 01:36:50

/

Show Notes

How you store and retrieve your data on hardware/storage devices is controlled by a two-three-layer filesystem. Whether you have a desktop hard drive or a one petabyte storage system, it has a filesystem. Sounds simple enough, right? In reality, there are multiple filesystems available, with each one using different data structures and methods for how that data is stored, accessed and modified. Because these filesystems use different structures and methods, they will vary in properties such as speed, security and storage capacity. Some filesystems are better for production storage, some filesystems are better for back-up. Some are appropriate for Nearline storage needs. Why does this matter? As capacity needs increase, storage systems and how the data is organized become increasingly important. In the latest episode of the Workflow Show, the familiar voices of Chesapeake Systems’ Nick Gold, Jason Whetstone and Ben Kilburg join Senior Systems Engineer Brian Summa, a veteran and master in the industry, to discuss the basics of filesystems. They talk about the systems deployed by Chesapeake Systems and even discuss some of the questions they wrestle with when it comes to this side of data storage systems. They also begin the discussion of how filesystems compare to object storage (Stay tuned for more on this topic in a future show). And while many people focus on the IT/hardware aspect of data systems, they’ll explain why the layer of software, or management data, is just as important. Tune in to see what we’re talking about.
View Full Transcript

Episode Transcript

Speaker 0 00:00 Welcome to the workflow show. This is Nick gold and I am here with regular cohost Jason Whetstone as well as producer Ben Kilburg. Howdy. And today we have a special guest on the work flow show. This is his special senior systems engineer of Chesapeake systems. Brian Suma. Hello. Brian is really more or less the uh, the guy behind the curtain. I'd say at Chesapeake more, more than anyone else. Cause there's, there's a lot of key people at Chesapeake but, but Brian is the one we try to keep out of the public eye. I mean he's just so busy with orchestrating things from behind the scenes and making sure that our technical staff, our support staff or project staff are doing it right. And just, and you know what I can say from the non-sales side of the organization, he is the person that we all go to when we have questions. Speaker 0 01:02 So he is like our number one resource. I think the phrase I have used as kind of spiritual leader, he's the great and powerful Oz. Yeah. Yeah. So Brian as our senior systems engineer, I have had the pleasure of working with for the last 12 and a half years I've been at Chesapeake. Of course Brian was at Chesapeake before me. Nick, and not many people in the Chesapeake organization can say that. Brian was of the two original members of the Chesapeake systems pro video team, which included himself and Christian on the BizDev side, Christian Malone, shout outs to Christian if you're listening. And Brian and Brian was the one who was saddled with having to make all of this technology work for video editors and producers and broadcasters and all of that. And so Brian has many projects under his belt and Brian was really the main person to get us into more sophisticated storage systems at Chesapeake systems, which of course is a big part of what we do today. Speaker 0 02:02 Sans file servers. Most of our activities in these areas are the brain children of, you know, dreams come up with by Brian as far as areas that we could weight into. And so Brian is really our main storage guy, among other things. And so for today's episode, I thought we would talk Jason and the rest of you guys about file systems and to beyond. Brilliant. And that's where you should like in post, put the echo on beyond, beyond, right. Maybe that's just enough of that right there. So file systems, what is a file system? We throw that phrase around so much at Chesapeake systems we talk file systems and volumes and drives and all these fun things. So Jason, let me ask you before we get to the in the room, while you're also an expert in the room. So I just kind of was looking in your direction. So that's Jason's like I'm an expert on Jason. What do you think of when you think of a file system? Speaker 1 03:11 Well, um, it is a, a layer of abstraction I guess, or a layer, uh, between the operating system and the, uh, storage media. So the, you know, the storage media would be your spinning disc or your SSD and the operating system would be your Mac, iOS or your, you know, Cintas or whatever you got. Um, what is communicating between the operating system and the disc? Speaker 0 03:38 So, okay. And Brian, we're going to kind of stage this for you to just explode all over. Okay. Um, so this is a little leading here of course, but let, let, let us continue to lead this a little bit. So Jason, Speaker 1 03:55 By the way, that was a definition that I kind of came up with in my own head when I started hearing Brian talk about file systems many years ago. Cause I had never heard of the term file system before. I was like, Hmm, I wonder what that would be. Well it's probably but you were using them, right? Absolutely. Speaker 0 04:10 I think I can make this claim that if you have been using a computer really at pretty much any point if you've owned a computer including a mobile device of the last seven to 10 years or whatever it's been, you know, a laptop, a desktop, your smartphone, you're using file systems all day long, you might've even created one. Right? Sure. So when, when we'll have some of our customers including the non it support staff, just any Joe or Jane computer user, when would they have had reason to like more directly interact with file systems? Mr Whetstone? Well, Speaker 1 04:52 Uh, you, you mentioned creating one. Um, when you go to target or Walmart or best buy and you pick up a one terabyte, uh, you know, external drive and a USB drive, USB drive and it's, you know, $59 or something like that and you're like, wow, that's a great deal. I want to be a hard drive. It could be an SSD, it could be a hard drive, could be an SSD and you and you bring that home and you plug it into your Mac iOS 10 Sierra machine. Sometimes something happens. What happens? Well, dialogue comes up usually I think comes up and says, Hey, you know this, this is not authorized for use on this. Or Speaker 0 05:27 It says you have inserted a, what's the exact phrase, Brian? You probably remember it. You have inserted a drive at <inaudible> machine. Do you want to initialize? The drive is not readable. Okay. So in the world of it, which yes, we do inhabit, um, in addition to the world of media, um, we think of our it solutions as sort of bifurcated, right? They have two halves to them. There is a physical part of them, which is the hardware driver, the SSD, I mean it's the little pieces of metal that they're held within. It's the magnetic recording media with the magnetically adjustable bits essentially that that little platter reads or it's the little flash memory based thing, but it's obviously the thing you can like throw at the wall, but there's something you can't throw at the wall, which would be the software portion software. And let's just all take a minute to remember that most of what we all do every day is just rearrange magnet magnetic empowers on a piece of metal. Speaker 0 06:31 Yep. It's true. I mean the majority of the world, it's funny, we sell a lot of storage. We've sold petabytes and petabytes and petabytes and petabytes of storage in the 13 years or the 1516 years. W Brian and I have been with Chesapeake and you know most of our clients, and frankly, well I think less our own staff and our own side, but I think most of our clients, they think of this big storage system and they think of like those raids in the rack. They think of that server that they're all connecting to with the office. They think of flashing lights and something that costs a lot of money and if they go into their server room or they go to that little equipment rack that we set it up in, they can like knock their hand on it if they want. There's cords plugged into it. Speaker 0 07:13 There's files are in the computer, but we know that there's a layer to all of this that is software. It's bits, it's how the information itself is organized, not the physical medium that the data is stored on and that that layer that's non-physical entirely other than the energy that defines which position the bits are in and then how the bits process other bits. It's like that's invisible to most people and yet that's as big a part of the storage system that someone is buying from us. Absolutely. As the hardware. Right. What would the files be without the file system? Well, and that's what we're going to get into today and today, you know, a file system, you could make a career of it as Brian and some others have and people who write file systems for a living and companies that we work with that sell file systems and just the file system. Speaker 0 08:06 So today's episode we want to try to just quickly offer a primmer to our clientele or our listeners, um, as to what a file system is. What are some basic ones? What did they do? Why is looking into the file system side of a storage system that you're considering an investment in as worthy as looking at the hardware or any other spec to the storage system. Yeah. How many people are connecting? What's the bit rate, what kind of switching are we using it as a fiber channel or an ethernet network? How's the traffic? All work. But okay, those are all important factors. We talk about those all the time with our clients, we don't tend to get as deep into the file system conversation and it's something I think our clients should be more aware of as another very key aspect of what they're making the storage system investment in. Speaker 2 08:57 Absolutely. And this is where I'm going to just chime in real quick. So the reason why it's important that people understand about the file system layer, it also will help kind of drive home the point that we make on a daily, if not hourly basis to many people, is that raid is not a substitution for backup. At no point in time should rate ever be, you know, considered. Why I don't need to backup my data because, uh, I'm, I'm rate protected because the part of, of the, uh, the part of the, that they're usually missing is they're forgetting about the software wrapper that goes around that, you know, like we were talking about the physical medium. So Jason had already said that there's the SSDs are single discs, but then there's also the raid, which, you know, presents itself as a single disc to the operating system, but it's made up of many, many desks. Speaker 2 09:48 So one of the issues is, is that in order for those to be able to be used with your operating system, you have to put a wrapper around it so that the, the bits and the bites that you want to, um, save longterm, they have a structured means in which they get written to the disc, so that then the operating system knows, uh, how to retrieve them, number one, and then, and also how to append to them or modify them or, or secure them, you know, with things like permissions and, and, uh, securities and things like that. But you know, moreover, I think a lot of people think, all right, well I don't need to worry about, um, I don't need to worry about a backup because I have raid protection and while Ray protection, um, you know, yes, it's a layer of safety really at the end of the day raid is to uh, really minimize downtime. Speaker 2 10:42 You know, the likelihood that you're going to, um, you know, uh, have a hard drive failure is pretty significant. As they grow. The more hard drives you put into a Ray, the more likelihood statistically that you'll have a failure. But losing a drive or two, depending on your grade level, that is not about data protection. That is about uptime and making sure that, um, you know, you, you minimize, you know, the impact to your business because on top of that raid set or that physical medium is the file system layer and you can always have file system corruption or moreover, what you can have is you have a user who deletes something that they didn't. I Speaker 0 11:21 Mean to delete. Yep. It's still only one copy. That's still only one. Okay. So let's just pause for a moment. We're gonna, we're gonna get into this, but I like the points you guys are making, which is that when we look at storage, when we quote a storage system, when we support or deploy a storage system, we inherently look at it as this multilayered beast, right? We see servers, we see physical drives in servers, we see raid controllers and interfaces to networking connections. We see the network itself, we see servers that are maybe managing that storage network. You know, there's all of this stuff that we see when we sell these things. And, and yes, the file system layer is just one and it's, it's one layer of that and it's one that can cause problems. Much like having a drive fail can cause problems. Speaker 0 12:16 Having a raid controller that has cache memory that got corrupted can cause problems accidentally deleting a file or a malicious user formatting your entire sand can cause problems. And so I really want people to understand that this file system is yet, it's, it offers its own features, but it also offers its own sort of risks. And there are file systems that are more or less appropriate for different tasks. So let's absolutely start with that basic scenario. The one that our users are familiar with, you pop in the drive, a little dialog box comes up, it says, well this is kind of a dumb piece of hardware, right? This is a dumb piece of hardware that I don't know how to understand cause it's lacking something. You have to press okay to format it. That's the big thing that most people are familiar with. I have to format a hard drive or an SSD that I plug in or a USB stick. Speaker 0 13:04 So when I look at this act of formatting a physical recording medium, like a USB stick or a hard drive or an SSD or an entire rate or whatever it is, you know, I think of Ann and Brian, I want you to tell me, am I right? Am I wrong? Is there more nuance to it? But I think of these file systems as kind of being two things. First and foremost. Number one, if I am in front of a computer or I'm you know, not formatting a local hard drive, maybe I'm an admin like you and you're setting up a shared storage system for someone, but when it's time to create the file system number one, there is a whole layer of software that might be baked in at the operating system level for various computer. It might not be baked into the OSTP per se. Speaker 0 13:56 It may be kind of an application layer on an overall storage platform like StorNext. You have your store, next management servers, they're running sort of the store next software and so the first thing with these file systems is is they are a collection of of software of actually essentially applications that are running either on the storage platform and its system or even just your desktop iOS. And it's the software that first of all is capable of organizing data in a specific way. A specific file system is supported. It has that specific software. When I pop in that drive to a Mac and I get that dialogue, it gives me a list of a few different formats or a few different file system format that I can turn it into HFS, HFS plus HFS plus journaled, case sensitivity. These are, you Speaker 2 14:52 Know, there's, there's all these flavors. Well there's, there's something also to keep in mind too, for the, the average Mac user. A lot of what actually is happening when you form out the drive is obscured from the end user. So when you click on a drive and you basically say erase the two things that it's doing simultaneously is one, it's putting a partition, a map on the drive itself. Um, and then, uh, and then it's then putting the file system down on top on a more, uh, more sophisticated operating system like say Linux. Um, even though there are a lot of, uh, desktop Linux solutions out there that have a very similar, uh, disc format or when you do it at the command line level, you can really sort of see that the very first thing you need to do to prepare the desk to even put a file system on top of it is to put some short of uh, partition. Speaker 2 15:45 It's almost like the partition map. I just want to relate this like, you know, metaphorically, right? It's almost like before you build a building, before you even get to the foundation, you need to grade the land. It's almost like the partition map is a little bit like grading it. It's getting it ready for the foundation. I don't know, I wouldn't go with that so far. What I would probably, if I was trying to make a similar, um, sort of metaphor as I would say, it's, it's uh, if it's putting up new construction and you've got drywall and before you lay down the paint you need to put on primer. All right. Cause you need something that the paint can adhere to pretty, pretty well like that. So you need to put down the primer. So the, so the primer is real basic. It's usually, you know, it's gray, it's, it's white, it's whatever, you know, it's nothing. Speaker 2 16:36 There's no, there's no pigment to it, but its main jobs function is to make sure that the paint that goes on top of it adheres, uh, in the best, most, even manner possible. Right? So if you think about that when you wrap a, uh, so what you can do is you can wrap a hard drive and that partition table that you're putting on top of it. Um, it does a lot of things like one, one it says, am I going to have one file system or many file systems? Okay. So you can have a single physical disc and then you can carve that up into multiple segments because maybe they have a reason to segment your data. Um, besides telling it that it also says, um, can it tells the file system, um, you know how big the individual drive can be. So if you get, uh, ms dos, you know, file system, they usually don't handle jives larger than two terabytes. Speaker 2 17:33 Um, with, uh, something like a GPT table, a good partition map, something like that they can handle, drives a single drive. And again, we're not actually talking about a single driver. We're talking about a single, you know, single piece of medium. So you might have for one terabyte drives that presents as a four terabyte volume because they're a raid zero. Um, but at the same time, like the operating system, because that's happening at a much deeper level, doesn't know that it's just, he's this big giant thing. So the partition table says, all right, well, you've got four terabytes. Um, if you use the wrong, uh, partition map, you're, you're not going to get access to all that. Okay. So the, you know, the different labels that just basically that. And that has, you know, and then in Linux you get into, uh, there's a, you know, like, uh, the physical volumes that, you know, are their volume groups and logical volumes that basically allow for you to sort of shift and resize things. Speaker 2 18:33 Uh, Oh, it's 10 and the most recent years they have a core storage I think is what it's called now, which is basically just a rip off of Linux. LVM in my opinion. And again, everything you're talking about now is just happening at the software layer. Has nothing to do with it. No. See that's a CSS trick. I wouldn't even call it at the software layer at this point because when we talk about a file system, we're talking about kernel extensions, we're talking about modules. So it's almost like a driver layer. So you have a driver that says, I can connect to this piece of hardware, either using a scuzzy protocol or, or uh, you know, some variant of that. Um, but then like the next step up, you have, you have a module that basically then says, okay, now how do you interact with this? Speaker 0 19:16 So if that module is sort of the layer of the OSS stack or the storage platform software stack that kind of mediates, you know, the file system. Part of the, the second thing that I think of when I think of what are these file systems is I think of a usually hidden, subtle, not a file that you can find as a user sitting on your drive, but essentially what is an invisible layer of files, an invisible layer of data that is stored on these drives, right? Like when you format your drive, one of the things that people notice is, Oh, it's not quite as big as the box that it was going to be. And there's a few reasons that can occur. It can have to do with whether you're looking at things in base two versus base 10 and some manufacturers look at one versus the other. But let's say it's not the base to base 10 conversion issue of whether you're actually talking about megabytes or maybe bytes, which is a whole other workflow. So waiting to happen, but let's say it is that formatting act. I formatted it, it's now a little less. Well, it's because when you have formatted that drive with a file system, it created a day or a data layer on the drive that you can't really see directly as a user just going through the file system hierarchy. But it's on there, right? Speaker 2 20:38 Yeah. It tells you things like, cause that's one thing is, is without, without that partition label, the operating systems theoretically would not know the, so when we see a dry that's called, you know, Macintosh hard driver, you see a driver that's called like, you know, uh, capture scratch or whatever, you know, you know the lorry and because some people are clever and named that as their time machine backup, the problem is you can have another drive that's a human readable label that's meant for us. So the problem is, is without that machine label, which has really not, I'm sort of just making up a term here, but without labeling it in a way that the machine understands that this is a unique thing. You could have data collisions because you could have two devices that are named the same thing. But you know, the operating system wouldn't really know the difference because you could have two identical three terabyte drives and you could format both of those drives as HFS plus and without that subtle hidden metadata layer that basically says, well, yes, I know this is, you think this is called DeLorean. Speaker 2 21:40 And I know that you think this is called DeLorean, but really the operating system sees this as a, as a, as a GU, a, you know, a universal, unique identify a UID, right? Where it's, it's basically, it's got some sort of 32 bit, you know, hash on it and you can see this a lot of times at the command line. You can say, okay, how do I know the difference? Because drives a lot of times just they randomly show up and they said they get different device labels. So in, in, uh, <inaudible>, uh, you know, a lot of times you'll see something, it shows up as like <inaudible> I always tended to actually disc one disc to disc three. So, but that has nothing to do with the actual disc. That's just has to do the order that it happened to pop up. So at any given point, if you reboot your machine, you might have a disc that says it's disc one and then on and another distance disc two and then you reboot. And it just happens to be that because w the machine saw the other in a different order than the other one showed up. Speaker 0 22:36 So let me pause you for a second. You said you used a word that we use a lot at Chesapeake systems, but you used it in a way that's a little different than how we use it. A lot of the time when we use it, our favorite word metadata, which we say left and right just to sound like we, you know, have Tweed jackets with, you know, leather elbow patches on and we smoke pipes and fair enough, which we may or may not be doing while we're recording this episode. So, um, when we talk about metadata a lot of the time with our customers, we're talking about databases, we're talking about ma'ams, we're talking about, Oh, here's cat DV or reach engine or camp demo. Or this or that or the other, and we say, Oh well you can say, well who shot this and was it B roll and was it a good shot and what time of day was it recorded and what's the time code? Those are all forms of metadata, but you just used the word metadata specifically referring to this kind of quasi invisible data layer that gets put on to these volumes. Right? Right. I format it. It's now a volume that's true. Whether it's a giant petabyte thing or whether it's an individual drive on a USB cable plugged into my computer. Speaker 2 23:46 So when you use metadata here to talk about a file system, like what type of metadata are we talking about? I mean it's, it's machine level. So what it does is by having that partition label, it's going to tell, it's going to tell the machine number one, like when you plug it in, because here's the thing in, in Jason's scenario earlier, he said, I went to the store and I bought a one terabyte drive and I plugged it in and it prompted me to initialize it. That's not always true anymore because a lot of times they come pre formatted, but in the case of Mac versus windows, 90% of the drives, and that's just a made up percentage number. But you know, our format had probably for windows or NTFS, right? So max, uh, only have a read only driver for NTFS unless you're using a third party software like Paragon, right. Speaker 2 24:38 I'm not promoting them anyway because there's no need to. But the reality is is that when you put that disc in, it can read the partition label and the partition labeled then will inform the operating system, um, that, yeah, there's a label on here, but guess what? You don't have the file system module or driver or kernel extension to actually read the file system sign here. And then that's what prompts the disc arbitration tool. And others tend to pop up and say, Hey, you need to format me. It's honestly that it's not format, it's just not formatted in any way that, that, that particular, that particular, but that's done at that partition label. So a lot of times it can read that label but because it doesn't have the software to understand the next level up, which is the file system. But yeah, I mean basically you know metadata in the way that we normally talk about it. Speaker 2 25:30 If we break it down to the simplest form is always metadata is data about the data. So that metadata label that's on that disc, it's describing to the operating system what to expect. It's literally down to the level of like you know hard drives. You know the, the kind of granularity with which they record data referred to as blocks typically. And so I mean it's literally like this portion of this file is the string of blocks and the file continues. These ones cause I couldn't write contiguously by the way, this is also the name of the file. This is by the way, the whole directory structure and the name of every directory and this is the file creation date, modified date, last open date. In fact, different file systems have different types of metadata that they generate now and other operating systems that also the partition table is also where you store things. Speaker 2 26:23 Like whether or not that device is bootable. So for instance, if you have a Linux machine and if you ever ever run through the setup assistant, it'll actually tell you said, where do you want to, where do you want to locate the boot sector? You know, and, and typically it is in, it's in the partition label itself. So cause that gets read by the, the machine code cause the machine code is what's loaded before the operating systems. So, so these file systems are a layer of metadata that gets stored on web partition tables, that layer of metadata pumping them out. So the partition table is this very subtle layer of metadata that we typically try to make the distinction between file system metadata by calling it file system metadata, just to not confuse it between, you know, say metadata and cat, JV or reach engine. Speaker 2 27:18 So let's just say this, let's just say that really what that partition label does is that's to identify to the machine at the machine code level, what to expect on the disc before the operating system level loads. So really the partition map is a layer of metadata and then the file system is really kind of a layer of metadata on top of that, the file system contains metadata. I don't think it necessarily is metadata. That is one of the file system is, is, is the structure in which the data is stored. Now we have tools to interrogate file systems, but these are not always tools that end users are familiar with. Correct. Like, you know, there are tools that I'm guessing are like, come on line. That's just it. So that's, that's where it is. So not all. So, so not all file systems preserve metadata. Speaker 2 28:15 That's why it's, you can't say that a file system is metadata because, um, so some file systems can um, are, are extended, and I'm not trying to use that word to confuse it with Macko S extended, but some file systems are extended and that they can, they can actually contain things like extended attributes. They actually preserve a certain bits like so that they, they preserve like POSIX compliance bits like user owner group. Um, they, um, I guess when, excuse me, not user owner, group, user group, uh, other or owners. I guess when I say that the, the file system has some other metadata. I even mean like the directory structure and the name of the files, right? Those are that the file system. But that again, but calling that calling that metadata isn't really quite the same cause when we think about harvesting metadata. Yes, the path in which the file is stored might be considered metadata to us. But when we talk about like a deep dive into it, we're talking about things like the time of the file, the see time, the be time that basically when the file was born, um, you want some, if it's a physical disc, is the file located where, where the uh, yeah, like what blocks. So what's not, what folder it's in, but what you can even use some commands to know which blocks a file is on to know whether or not it's, it's a, yeah, you can even know if it's a special file Speaker 0 29:42 Versus a regular file. Um, versus a directory entry. So there's all of these additional pieces of info about the data that we're storing on whatever the storage is ended on choosing the right files. This is a good segue. Then we've covered a number of areas relating to file systems they have to do with this layer that gets put on top of the partition map. It kind of has some very specific metadata about the, uh, about where the files are, which blocks they're on. All of these characteristics of, you know, who gave rise to the file when it arose, you know, all of these things. I mean that's something that a lot of people don't know is like, you know, you know how when you had deleted a file you actually have to like really write over the data for it to truly be deleted because otherwise it's just a little flag that gets set at the file system layer, right that says you can write over these sectors of the disc now. Speaker 0 30:37 So again, like that's all happening at the file system layer. So you know we talked about a few of the local file systems. We said, okay, if you're a Mac user, chances are every fricking hard drive you've formatted in the last almost decade has been Macko S extended, which is otherwise known as HFS plus. And then with journaling is this new capability that they added in maybe about eight years ago. If memory serves, it was in server for a while and then I think they kicked it into everybody with like a 10 so it's an interesting thing actually that I don't want to go too far into. But when Apple modified HFS to add journaling, it was a feature that was introduced into the file system. What journaling did was create a little index on the drive, and if I remember correctly, it was largely to help prevent data loss and data corruption in the case of volumes or file systems being unmounted or essentially ejected from your, your client, your host, without being properly shut down. Speaker 0 31:43 And so because it's very easy to accidentally just pull out a lo, you know, a desktop drive, USB cable trip across something with HFS extended HFS plus, which is a very old file system at this point, decades at this point, or at least its original one is over 20 years. You know, that was leading to constantly people needing either losing data on their Mac drives, having to constantly run things like discord earlier or Apple's own disc utility tools to fix corrupted volumes because one of the leading causes of file system data corruption of both the file system data and the actual data data was these, you know, pulling drives without properly ejecting the power outage if it's your internal drive stuff. So journaling was a feature that Apple added to the file system that made it so each drive which had its own file system or file, um, you know, was a little more resilient and kind of was on, I guess you could say on a more ongoing basis doing a little housekeeping behind itself so that if something was disconnected suddenly there was less of a chance that that disconnection would lead to corrupted file system metadata and or data data on that drive. Speaker 0 33:01 Well I think the hope or the intention was is that if you're in the middle of a right when you started back up instead of the whole disc being corrupted, just the file that was, yes, that's right. And so one thing I'll just note of interest for many, many, many years, a lot of us in the Mac community have kind of been waiting with bated breath to hear Apple. Are you ever going to do something beyond HFS plus with journaling. And what's interesting is they have finally announced after many years that there is indeed an active project to transition us into the next generation Mac file system. But it is also nothing to hold your breath about. So why, why do you say that? So it is not a modern file system in the sense that it's not like a ZFS, it's not a B TRFs or butter Fs. Speaker 0 33:49 We'll get into some of these align more in a moment. It is essentially what it is is it is their effort to con to have a single file system that crosses all their different operating systems. That is the main, so something that's as useful to Apple on an iPhone as it is on a laptop, a Mac book or an iMac or a Mac pro. Right. That is the main, that is the main thing. But it doesn't have any advanced functionality like a lot of people were hoping it would like copy on. Right. Well we'll, we'll, we'll talk about advanced spouses. Okay. But what I did think is interesting is that it at least represents some additional features beyond the file system. I think. Yes, but it's very self serving. That's the big thing is Apple Apple making a move. I know. It's just like, wow, what not for the greater computing community, whatever. Speaker 0 34:42 Do you mean whatever. So, but yeah, so, so yeah, it is, it is. I'm not saying that they won't be able to because it's more modern code. They're not going to be able to write, um, additional features into it because if you look at what they do with HFS, it's pretty amazing. They added layer upon layer. They really drove that into the ground. I know, right? Honestly it's, it's kind of amazing. I think the, I forget the guy's name off the top of my head, but even the guy 20 years after the fact, when he looked at what they, all the little patches that they did on top of his original kernel code for HFS, like he was just like, I'm surprised that they're better able to get it to go as far out. And so maybe, and let's all cross our fingers here, maybe this at least represents a good new foundation on which Apple can build the next couple of decades of file system development. Speaker 0 35:31 And you know, something that Apple has always done. You know they've, they've sniped some very well known amongst this community of people, which is maybe not the mass population, but Apple has sniped some well known file system developers in the past. I don't know if he still works there, but I remember when B and very few people remember the B operating system at this point, but it was a at one time looked like it could emerge to be a very mainstream system. I think Apple grabbed the guy who developed a BS interesting file system at the time, but again there was very little evidence that it was really coming together into something we were using on a day by day basis. So maybe this is a sign that Apple is really reinvigorating, reinvigorating the file system team, but maybe taking a piecemeal approach. Okay. So we've been talking about file systems, what they are and that they have features associated with them. Speaker 0 36:22 One of the features that's probably worth us taking a moment to chat about guys is that some of file systems are inherently designed to have just a single user interacting with them at a given moment. Right. And some file systems are inherently designed to have multiple users. And remember when we talk about users here guys, and we'll go through some use cases in a moment, but it could be a computer that is a user to a system, like a server with a piece of software that automates certain tasks or it could be an actual human sitting in front of their editing workstation. So I want to just to reiterate that I user could be a machine or it could be a person, but some of these file systems are designed for one user to be modifying them and some are for many. Right. Speaker 3 37:08 Well I want to just to sort of just clarify, so most file systems can handle, I in fact, I don't know of a single file system besides like maybe like ISO, you know, like what you'd find on a CD rom or something like that. But most file systems are inherently multi-user reads, but, okay. But do you use a read, right. Cause having, you know, it's not, it's not saying that you can hook up five people, um, to a single drive. And you know, unless you do some like really cool finagling with the kernel, it's going to try to, you know, Mount that drive as read, right. For everybody. But, but most file systems can handle multiple reads. It's multiple rights, which is where it all falls over. Speaker 0 37:52 So what's the difference as far as a file system is concerned? Not that they are because they're non sentience, unless you believe in panpsychism like I do, but that's a totally different subject. Um, pan psychic file systems now panpsychism it's the belief that that consciousness is sort of baked into the physical reality of the universe. But again, subject for another day. So the system in the sky for the time being, wink, wink, nudge, nudge. Let's assume that file systems don't think about things, but if they did, you know, um, I guess, why does a file system care? Why is it a file system? Okay. Multiple people reading data off of it at once, but not okay with multiple people writing data to it. Or as we sometimes would say, doing anything that modifies the data, whether it's deleting data, changing data, adding data to it, those are all rights versus reacts. Speaker 0 38:54 So why does a file system care whether people are reading or writing? Like why would that affect it differently? What happens when two cars try to be in the same space at once? They crash giant. So what does that, what does that mean? Like what's going on at same time? Well, I mean the reality is, is that the reason I can handle multiple reads is because when you are on a multi-user operating system like <inaudible> or Unix, um, and when even if you only have like one user account, you have to realize that there are so many daemons that are operating as a process reading different things in the background. You say demons, I say aren't those like the cute animals in the golden compass series showing that people find fault with that because they're like, you know, if he's supposedly they are D a E M. O. N. I say Damon's because Speaker 1 39:47 I say Daymond's too. So how can, how can we, how can we take a look at all these daemons and what they're doing? What is it, why are we talking about demons, guys? Dangers from your file system. It is a process that runs in the background on your computer. It's not something user as a user. Exactly. It's not something that you, uh, that you intentionally started the mere act of turning on your machine and starting at max 10 <inaudible> guys. So I'm turning on my computer Speaker 0 40:18 And I log in and maybe I care a little bit about security. So I always set it up. So the log in screen comes up and I type Nick gold and then I type JAG Jabber walkie one, two, three, which isn't my password by the way. I said don't even try it, but know I type, whatever my password is, I come in and I think, okay, I am a user logged into this computer. You're telling me there's all these like demons, Speaker 1 40:42 Processes that are also logged in. Did they choose to call them, call them metadata D and spotlight D, call them ghosts. So what you can do and you know, I encourage anybody who's interested to try it, open up the activity monitor and your utilities folder. This is on a Mac, on a Mac. And um, uh, switch the view in, in the, I believe it's the window menu, switch the view to view all processes and then you will see owner column, correct? Yes. Or just open up a terminal screen type top. Yeah. There's another way to do it, you know, but, um, you will see a ton of things running and these are all things that are essential to the daily operation of your Mac. So what's like a super common one that's always running in the background launch D. So the launch demon, and this is if I remember correctly, Speaker 0 41:34 It handles what processes start up when you turn on your computer. Speaker 3 41:39 It runs about 45 and tells me I'm hungry. It runs, it's clever. I like that. Uh, it is, um, it's what replaced, um, like a knit scripts back in 10 dot. Three. So I think 10 dot three was when they first implemented launch. Daymond's but it's a, it's, it's what start it. So it spawns all the services. So if you ever, um, if you ever look it's, it's what, whether one, it starts spawning services that might tell other services. Okay. Speaker 0 42:09 So let me throw a curve ball at you gentlemen. If you're so smart with your demons and users and if, if file systems are typically only one user read at a time, I'm sorry, one user writes data at a time. One user can be modifying the file system, but multiple can be reading. And I'm a user and I don't seem to remember guys over the last 30 years of being a computer user of my computer ever saying, Oh wait, you can't save your file right now because launch D is running in the background and modifying some data. Speaker 3 42:42 Let me qualify then when we talk about a user like multi multi-user Reed, Mo multi-user, right, or single user, right? In that particular case we're talking about a, uh, an individual machine. So you're talking about a direct attached storage, which is like, you know, your USB drive or your things like that where you're not Speaker 0 43:03 Machine can have multiple users writing to the media, but some of those users might be humans, some of them might be pieces of software and some of them might be these invisible pieces or to most of our users, invisible pieces of software that they don't even realize are running them at the moment. Speaker 3 43:18 So, so those are all operating. But what, what I was sort of trying to maybe poorly get to the, it was that, you know, the file system you can have it be reading, um, constantly backing and understands that that's where it all falls apart is when you have multi machines trying to write to the same file system is that, is that each of them might get the, you know, the file system itself is stupid. So the machine is going to ask, Hey, I want to write this file. Where can I write it to? It's all happening way down deep, you know, deep below the surface. And what's going to happen is you can say, okay, you can write into these blocks. Well the problem is is that um, because a another machine is going to make the same request, it will probably overlap and say, well, you can write to these box too because the file system isn't going to be able to know that there's a different squares. Whereas the blocks when you, when the blocks are already written, when the machine asks, I want to read from this file, those box never changed. They're there. They're always going to be the same. That file is going to always have been written in the same place. So let ask you Speaker 0 44:32 A question then. So let's take this into a real world example. We sell people sometimes our own file servers. We'll use this as one quicker example of a file system that we use in the real world. We sell people a server. It's going to just be a file server. They're going to hit it over a file sharing protocol on an IP, a TCP IP network, ethernet network. You know, this might be one of our boxes that runs sort of the open E operating system. It might be one that runs sort of our Chesapeake Linux variant that we kind of use for our own dark purposes. Um, CS Linux, um, not an official Linux district by that variously backing up your data so it doesn't get lost. Yeah. Yeah. So we um, we, Speaker 3 45:20 So why does it work? Why do you have multiple people ran over file share? Speaker 0 45:24 We sell someone a file server. That file server is running the operating system that's running that server. Usually a Linux variant. Not always, but sometimes usually. And then that Linux, okay. We put a whole bunch of drives in that server and it has a raid controller card and we set it up to be a file system. And last time I remember we have multiple users hitting those file servers, both reading and writing to that file system at the same time. Everything you're telling me. Does that mean that the file system that we use for those NAS, those network attached storage devices, those, those file servers that we sell on those TCP IP networks are those multi-user file systems? Because those are multiple machines writing to them at once? It does not. Speaker 3 46:11 So what in that particular scenario, that's what's happening is whenever you write through an AFP, NFS or sifts or Samba share, that's what I would refer to as an application level. Right? So those file systems are file systems in the loosest sense because they're, they're really more VFS or virtualized file systems. Those file sharing protocols, right. You don't learn tool files. You would never, if you had a block level device and you put it into your machine and it said, I don't know what this is, please initialize it. You would never see an option to format that as SMB sifts or those are sharing protocols or sharing protocols that basically what it is they read from the block level device they read from the abstract. So they basically take, they're looking at the file system and then they're reordering it and presenting it as a another file system. But it's not really, it's not, it's not a block level file. It's an application. Right. Speaker 0 47:14 Just cause I'm in finder on a Mac or windows Explorer on windows and I connected to a file share and yeah, it has a little hard drive icon. Much like my local drive, maybe it has a little couple of guys holding hands if I'm on the Mac. Sure. And I open it, finder it, I navigate Speaker 3 47:30 Through it just like it's any other drive on the other directory. My startup drive. But it's not the same. Even though the behavior and the experience of interacting with it is the same. Right. What is happening is the reason why that all works with multi, multiple users reading and writing is that at the end of the day it's all going through a single machine that is writing to its local story server. So that's your file server NAS head. So the server that is the file server that has its own onboard storage, it has that. The one machine reading? Well, right, both reading in writing, it's the one that's committing the data is the file server itself. You're writing through that machine to the disc. You're basically telling the, telling the server, you can write this, please write this file. And the server is saying, okay, can I write this file? Yes. What file system is it actually running? Speaker 3 48:28 In the case of those little opening files from 99, 99% of the time we use XFS. So what's XFS so XFS um, was actually, uh, I think it was developed by SGI originally. Um, actually you might even been, you know what, I might be wrong, but I think XFS actually might've come out of the, uh, remember the real media player people, I think they might have interested. Sounds that sounds vaguely familiar. I can't remember if I, I might be remembering that wrong, but, um, but basically it's a low latency file system that has the ability to store, um, you know, uh, hundreds of thousands of files, you know. So basically the reason why we choose that file system 9% of the time is, is it, uh, if you look at a traditional Linux file systems like <inaudible>, if you ever wanted to see it break real quick, just do four I in, um, you know, uh, curly left curly bracket, one dot. Speaker 3 49:32 Dot, uh, you know, 75,000 talk bracket colon, do and then do touch and then dollar sign I semi-colon Don, right? So you write that once there's a four loop that's iterating through one through 75 terminal command, okay. And what that's going to do is if you are, if you are on an <inaudible> file system or you're, if you're on an ext three file system, ext, three at around 32,768 files written, it's going to choke. It's going to say, I can't write anymore just to be, you know, vague about that number. Sometimes it's a little bit higher, right? I don't remember the exact but, but then, um, but then you do the same thing with the <inaudible> and then that's going to choke it at about, when you say choke, you're saying it's not be able to write anymore. The feature of the file system has, it Speaker 0 50:28 Can only support a certain bit depth value for number of files. It can support more than those files if those files live in directories. But each, a single directory can only support that many files, roughly 30 to roughly what you would call 32,000 files on, on ext three and 64,000 files. So that, that's an issue when you're talking about like an entire movie and a DNG or you know, Speaker 3 50:52 <inaudible>. So XFS does not have that limitation. Um, I have tested it up to a hundred thousand files and I know that we've done more than that. So it's been a really good file system that, that which is, which is why we typically use is also, like I said, it's low latency. It has, um, you know, I can't speak highly enough about it. However, the downside to XFS is that, um, if depending on the individual size of the volume, we kind of have to limit the overall capacity at about 240 terabytes. Um, you know, previous, earlier versions we would, we would say let's not exceed a hundred terabytes on a single file system because, uh, in the event that you ever had to run a file system check because of a, um, improperly shut down machine, it could require up to like two terabytes of Ram to complete that file system check if you grew it. Speaker 0 51:49 So this is an interesting issue that you're talking about, which is that we still find ourselves, this has come up now several times and we'll move on in a moment, but I want to make this point. It's like, and tell me if I'm right or wrong here guys. It's like we're kind of using file systems that were designed years ago despite the fact that we're in the tech industry that has tech that changes like every two minutes. That just in the last, what, three, four years, we've gone from like two terabyte hard drives up to 10 terabyte hard drives. So you're saying that file systems have these functionalities, many of them, but some of them include how many files can be in a directory or on the entirety of the file system or whatever. And yet we have this hardware that keeps going bananas as far as its growth and how much denser it is, how much space you get out of a hard drive or an SSD. And so it's almost as if the hardware is often outpacing the development of these software and data layers that manage the data storage hardware. So we're constantly having, and I know this to be the fact, right, we're constantly looking at what's around the corner and how can we extend into new or different types of file systems or technologies that relate to file systems that will essentially allow us to even just take advantage of the hardware that's now on the market. Right. W w we definitely are looking at is we're trying to make sure that Speaker 3 53:26 Depending on the use case that we're matching the right file system with, with what the client is trying to do. Speaker 0 53:36 So I think we will have to revisit some of the more nuanced details of kind of our file system futures, um, in a future episode. And I'm happy that we get together and do a part two sometime in the not too distant future because I want to get more into what our R and D work is around these new file systems. But I kind of think that with time limitations we should probably save it for the next episode. So with that said, let's talk about another file system that we have a ton of experience with that from everything that's been described so far works differently, which is the way just to tie it up. Oh, XFS is actually SGI. Oh, it's actually looking at it. So thanks for your media store. Next. Speaking of commercial file systems, real media is Isilon then I bet that's probably where I got confused anyway. So, so, and some file systems are open source and some are proprietary. Right? Speaker 3 54:30 Right. So speaking of XFS, so just as just to also bring things around. So, so XFS by, uh, inherently is a sing single user. And when I say user in this case, I mean, um, machine client system, you know, it's a single machine. Um, you read, right. Okay. Um, although I guess you could, you could probably have it read, you know, if you set up Fs tab on sister servers to, you know, read that, it could probably read this, this, but for all of purposes, purpose, it's not, so they, they then, um, there's a version that's available on just about any Linux distro. Um, you know, I think it's standard now on all the enterprise Linux seven, uh, derivative. So REL seven, Santos seven, uh, scientific Linux, um, you know, Oracle, it's the, uh, it's the default now for the, um, for the operating system. Speaker 3 55:26 And I'm assuming for the, uh, uh, the boot drive. However, um, one of the things that they had done is they had realized the need to basically be able to cluster, um, it together. So there is a variant of that's not available open source that you have to pay for play, which is called CX EFS, which is a cluster file system. Um, and its main competitor is actually, um, in the space. And the market space is actually StorNext, which came from, uh, which was originally a central visions. Uh, if anybody's ever curious why it's called CV admin or CFS central central vision file system. But now it's StorNext. So now it's storing on very briefly, we don't have time to get super deep on a system which allows for what's that? It's, that's, so what that is is you have a set of metadata nodes and those metadata nodes are not responsible for writing the nodes or servers, right? Yeah, yeah. Metadata Speaker 0 56:24 Servers, or do they have servers, nodes? These are the Excellus servers in an Excellus or if you have a NetFlow directors, <inaudible> file system managers there sometimes referred to us. Right. So what they are responsible for is for um, not writing the, so in a NAS environment, the server is itself, is responsible for virtualizing the block level file system and resharing it to clients. Um, and then it's also responsible for reading and writing the data back as quickly as possible because it's having to do it for all connected users. Yeah, that sounds like an awful lot of overhead, right? So StorNext is different. So StorNext is different in that you have a set of metadata controllers that are not responsible for writing the actual blocks of data. They are responsible for writing, uh, basically the where that data is being stored and where you're allowed to, you know, the individual machines are. Speaker 0 57:25 But what's actually writing the data to the discs are multiple machines. So that's what it does. That's the same text as a true Sam StorNext is this true sand true cluster file system. Storage area network. Okay. So let me just reiterate here. That means StorNext has a few unique functions. Number one, because there are these servers acting as metadata controllers. And one thing we didn't mention but I will for the sake of completeness, they actually communicate on a network that is a nice is an ether net network and IP network that isn't your general house network and it's not the fiber channel network that the data actually gets sent over. It's a quote unquote file system metadata only IP network is that word again that those metadata controllers use to tell all of the various clients moment to moment. You editor a, you're ingesting video, you know so you can write to these sections of this clustered file system. Speaker 0 58:23 I'm going to pre allocate them for you and no one else I can write over them. It's like it gets right tokens so it's basically acting as a traffic cop. Absolutely. That's the number one thing that makes so because StorNext has features that are both features of the software but then get baked in via the metadata controller servers that are on that specific IP network that also all the sand clients are connected to. That passes all of these tokens and data about who can write where at what moment, but they are all are actually writing to the underlying storage individually. That's features of the StorNext filing system that not every file system has. And what an interesting thing that we've all known for many years and I think is why we've done a lot of StorNext business, although we've done a certainly a lot of NAS business as well, file servers, is that, you know, in these file server scenarios for many years, I think we were, Brian, you and I going back as many years as we have now with Chesapeake. Speaker 0 59:20 At first we were very reticent at looking at Nazism and file servers and TCP IP networks as real time postproduction edit and graphics storage environments. It took gigabit and 10 gig and that 40 and a hundred gig ethernet, better ether net switches, better servers that just had more power, the right selection of file systems to allow us to sort of cajole them into being sometimes perfectly acceptable real time postproduction storage systems. But because that's a little more recent, we had for many years focused on Exxon, which really is StorNext under the hood and then a much more direct relationship via quantum and StorNext because it was really written to do these things as a very baked in first principle of it as an operating system, as a file system. And that's really the big difference is that StorNext is a file system. Uh, these file servers are not, they're virtualizing a file system. Well, Speaker 2 00:20 Well then let's just, just so just so here's a lot of, a lot of people will confuse NAS for sand and sand for now. Right, right. And, um, it, it, it tweaks me to no end. Um, besides the re the very different, the, the major founding factor is, is that if you don't know if you're on a sand or, or if you're on a, a NAZ, um, it's, it's who's actually writing the data to the disk. So in an, if you are writing through AFP, NFS, SMB, S, M, B, um, it is the server that is directly attached to those discs that is writing the data to the Speaker 0 00:57 Client is performing a high speed transfer that's so fast it, but then it's transferring it to the server and it's writing the data, the underlying file system. It's almost more like a transfer that's just happening quickly and tricking you into thinking you're working off of the storage. But it's really only only the file surface. Right? Speaker 2 01:22 So, so, and, and, and there's benefits to that solution but, but if you want to know who, if you're on a sand, now, here's where it gets tricky is sometimes there are these hybrid things called IP sands and that has nothing to do with ice. Guzzy just to put that out there, a lot of people think, Oh, um, I connect to my storage or I'm running a store next sand. But instead of using fiber channel, I'm using ice Guzzy still not the same kind of thing. There are things called IP sands, um, one of which, um, you know, w uh, is the most interesting to me lately, which would be like, uh, uh, Gluster Fs, which, uh, ditches, the whole metadata controller notion. Um, instead it, it, it has these storage nodes. Now your commute, if you have a, uh, you know, an all Linux environment, you can have Gluster servers and you can uh, have Gluster clients. Speaker 2 02:09 And the, the interesting is you have these storage nodes and when you are, you're not writing through an obstruction layer, you're writing through the file system itself. So if you're, if you have one of a file that's been, uh, distributed across two storage nodes, even though the information is traveling back and forth through TCP IP, not through block level, you are, you are actually on an IP sand. Okay. Right? Because you are talking to the storage node who's then delivering you the file, not a single, you know, obstruction that's talking to a direct. Um, okay. So with that being said, but that's really the difference. I mean, you know, so yes, there are things and there's a different Panasas which is also w you know, again, if you, here's a great thing if anybody's familiar with the Panasas Parnassus for years, um, or luster, um, you if you are, if you have a luster, uh, uh, or a Panasas, uh, you know, storage system and you are um, connecting through, uh, if you Mount it as NFS, you, your Panasas is functioning as a mass. Speaker 2 03:19 However, with the most recent, um, version of their direct flow client for Mac, if you start mounting it as pan, Fs, direct flow, that same system you now are on an IP sand. So sometimes they can double duty and it's because the underlying file system is capable of serving both types of clientele, right? Just like you can actually put a NAS head in front of a store, next file system. So got it. So, okay, let's save some of our explorations and deep dives of what next generation file systems that we've begun to explore and what, what makes them unique and how they take us the next 10 years down the road. Let's save that for file systems. Part two. I want to take just the last couple of minutes we have now. I want to say one thing just to get, so the one thing I want, because I have a feeling given the state of this particular show that somebody's going to eventually say, well why are you using exit? Speaker 2 04:17 Cause you can look up X facet, it's not a new file system. Why would you use that? Why aren't you, why aren't you using something more modern like ZFS and, and you know, you know, the big thing to keep in mind is that it takes about 10 years for a file system to mature. And the reason why it takes that long is because of adoption, right? You can't figure out what's wrong with something if very few people are using it. And so the problem is, is you, you know, you have to sometimes go with the lesser of two evils. So yes is XFS perfect is uh, you know, HFS perfect is, uh, you know, you know, NTFS um, you know, when, no, none of these file systems, are they modern? Not really. And sometimes there's licensing issues, there's costs to shoes or they could even be murky licensing issues where they're sort of open sources. But the way that the licenses are written is that the company companies that own them sure who decide to change their policy, it's okay, it's okay for Speaker 0 05:18 You to install it, but we as the reseller can't and because we're making decisions as, as hopefully listeners to this episode, all five of them will be able to tell us afterwards when when quizzed, um, they'll understand that there's a lot of nuanced factors that Chesapeake is looking at when we say this is the one we have chosen, it's proven, it's been out there for a while. It has a certain feature set that's compatible with the use case. We aren't afraid of any weird licensing issues emerging. And the reason, and I'm kind of going into my sales guy mode here, but it's something we're all obviously really passionate about. We are literally making decisions as to how you are most essential data as a customer of ours is stored on the physical pieces of it equipment that we are potentially selling you. And like all other aspects of quote unquote workflow. Speaker 0 06:11 We take those decisions really seriously and it's not just technical factors and it's not just pricing factors and it's not just licensing and intellectual property factors. And probably four or five others I could think of off the top of my head. We do guys, please assure me we look at these things, right? I mean we're, we're thinking about these things and Brian, mr solutions architect. Well, I mean I look at things in a very, Ben's like, Oh yeah, I look at it all right. I just asked Brian and Brian says yes or no. The thing, and this is why we say Brian is sort of like the man behind the curtain because the reality is we have Brian here working on this, this type of R and D, this research developing, playing with things, experimenting with file systems, you know, understanding what the world at large and the community of people who both build these file systems and have to support them as admins. Speaker 0 07:09 I mean we have Brian kind of in the background doing this as a major part of his job that then kind of spreads out across our organization, you know, seriously. I mean, the amount of man hours in any given year, we have Brian kind of hacking around with file systems I dare say is probably more than most other systems integrators and probably only rivaled by very large it firms that are doing this with massive, massive enterprise deployments. And of course, so they have to be and um, and the storage manufacturers themselves who are often writing or at least using these things. And I also, I mean, I guess Speaker 3 07:48 The big thing is no, I think the thing that I think want most people to know is that, um, you know, we think about these things because we have seen both sides of the, the, the, the coin, so to speak in that, in that, yeah. You know, we went with what, what everybody else went with, um, because Speaker 2 08:15 We thought it was safe and then we hit these limitations where we went with what the manufacturer had recommended. And you know, it turned out that the only reason the manufacturer made that recommendation is because they had done no, uh, you know, it was the default, you know, like we've had a lot of recommendations on, on file systems that, um, quite honestly it just, it actually made some, some products because of the file system they chose to use. It did not scale. So you're saying that. So we, I guess the whole thing is like there's an active, like, you know, there's an active thought process that goes really safe to say that this is literally the thing that keeps us up at night. Tell me, I don't know about you. It keeps me up. I have insomnia because the data is exponentially grown so much. Speaker 2 09:05 Just in, when I first started, um, we had, uh, 14 drive extra raids with 180 terabyte drives. Patta drives apparel, ATA, those who were called one 33. Right. State of the art. Thank you. Alex. Gross back when I was in my twenties. Right. So, so then when, when they bumped that up to like two fifties, and then eventually five hundreds, um, you know, it seemed like, wow, like who could ever use this much data. But we're also living in a standard definition world. And, um, now we're in a forecast, we're in video three 60 video. It's growing so much X. So, so, you know, I've been with the company now well over, um, well over a decade. We'll just say that. Uh, I know it's probably verging on 13 or 14 years. I think it is 10 or 15 that time. So just, and, and I, so, and just put it this way, in the time that my daughter has been alive, which is about, you know, in, in, uh, March will be nine years. Um, you know, we have gone from where like the largest, um, physical draw, physical drive was a terabyte, um, to now like an order of magnitude larger. We're at 10 terabytes. We've gotten 10, 10, 10, 10 terabyte drives. And, and you know, w w at this point in time, you know, it's, it's a whole other discussion to even say is, Speaker 0 10:36 Oh, you just set me up to the best possible segue. So we're going to, I'm going to just write it out here and we got like five minutes to chat about this, but you just set up the best thing is raid enough. So, okay, now we like to think of ourselves as a forward looking progressive, you know, tech savvy, you know, kind of org. And I just said, we're always trying to keep on top of these factors that are changing in the tech industry and making sure that our own skillset is, is staying kind of in tandem with these, these changes. You just mentioned. A big one. Hard drives are getting really big, so we don't have enough time to delve into the side of things. Um, and we'll do a whole episode on just this subject. I promise you don't have to be on it. Speaker 0 11:22 That's fine. Let's just walk away with this and say this. We're good. This Asper Greek guy, Brian, uh, on, uh, he made it very clear that, uh, uh, raid is not back up and that if anybody uses raid in the same sentences back up, they are, it is true that that would say wrong, fully wrong rate rate, right rate is to maximize, to minimize your downtime. Do we want to say it actually makes you bad? No, no. We don't want this. We don't want it. I just, I just, it's just one of those things like, you know, like it's, it's, I've been an end user. I thought raid was uh, uh, you know, this great thing that that would keep all my data safe forever, forever. And then the first time your file system becomes unwrapped a partition, you know, your rate's fine, but your, but your file system labeled became undone and then you're like full on like, all right, well how do I get this back? Speaker 0 12:18 And then you buy like data rescue software for like way too much money and then you restore all your data and you find out that your, none of your data has names anymore. They're all just re what they restored your data is what the eye node numbers were. So you don't know what was a JPEG and what was an mov. And so you have to sit there and just like open it phonically methodically because you know all your resource forks are gone and you imagine that in a file that has hundreds if not thousands, if not tens of thousands of elements. I've experienced it with a 500 gigabyte, you know, uh, you know, 10 years ago and I, you know, we've got 10 terrible, yeah, I couldn't imagine. I remember painstaking, I gave up. Okay. So let me make my final point that we will only spend a few minutes on. Speaker 0 13:06 I, we're calling this episode and again, we'll continue with a part two and get more into this, but file systems and beyond. We've talked about how for discs, like hard drives and flash memory, we're not counting things like tape drives here because the way that they record data onto a tape is so different. But we're talking about these random access, physical data, storage mediums like hard drives and flash memory. SSDs fall into this category. So when, when looking at those types of storage technologies, we at Chesapeake are saying, Oh my God, I've been with Chesapeake for 12 and a half years. Brian's been here for roughly 14 Ben's been here for eight or something. Jason, you've been at least on the, yeah, but you've been on the buddies list for a lot longer than that. That's for darn sure. I know the private guest list, Jason Whetstone by the way, was the very first person, or at least to the organization he was with was the very first outfit we sold final cut conserver into. Speaker 0 14:04 And that was a while ago. So we've, we've known you for a while and we reading our tea leaves knowing that we have 10 terabyte hard drives now. We had one terabyte hard drives 10 years ago or even like eight years ago, Western digital digital's threatening 20 terabyte drives by. So this is what I keep hearing. I keep hearing that twenties thirties forties fifties like we'll see those easily in the next five to 10 years. Like we will, there is a good chance, not a perfect chance, but that same order of magnitude we did in the last decade, there's a chance it'll be roughly another order of magnitude in the next decade. And then you're talking about a single hard drive that could be, I mean, think about it a hundred terabytes on a drive. So let's pretend for a moment as we often do that that might be the type of reality we're dealing with or even a 40 terabyte hard drive five years from now. Speaker 0 15:04 Some of that. What's the data transfer thing? Right? And we'll only touch on it now, but some of those storage systems, the physical developments that are happening with hard drives because all of those same density improvements are happening with flash memory too. So even if, if you have talk SSDs, from what I've noticed as of late, the density of SSDs is actually growing much more quickly than hard drives right now. So are these file systems the right way from that software and data layer perspective for us to imagine we will likely be storing data on either these same file systems or even dare I say file systems at all in a few more years in, in five years, maybe even now. And so of course the, the, the elephant in the room that I'm implying here is this whole other methodology of storing data on hard drives and SSDs or flash memory that isn't file system technology. Speaker 0 16:09 It's this other thing called object storage. So Brian, you're the guru, you tell me and you already did years ago. So tell me again and our, and our listeners, um, what is this object storage thing that everyone's banding about and saying, this is the future of our storage industry and this is how you will be writing data to large collections of, of that could be petabytes in size, that could be spread over multiple locations that, um, maybe, you know, are not as susceptible as raid to the types of data loss scenarios that you were suggesting earlier that can even are even technically compatible with a 40 terabyte hard drive, theoretically several years from now. What are these objects system things? And just in a nutshell, what differentiates them from file systems? This is the cliffhanger for episode two. Keep that on the top of your head. All right, so, so Speaker 3 17:10 I mean, so basically object storage, it's less concerned about the blocks and it's more concerned about what the actual data is. So when you talk about like parody, uh, and, uh, you're, you're really, what you're concerned with is like how many, how many times this file repeats across multiple nodes in a node, not being an S, uh, disc per se. But you know, a whole separate server. Um, so it's, it's, um, it's complicated because in truth, um, and part of the reason why I'm hedging this a little bit is simply because there are some, um, object based storage companies out there who are not really object based but they're um, maybe kind of in between a file system. Right? Right. It's one of those things. So it's, it's it, you know, and then there's also, there's some people who uh, they, they claim to be um, object based raid, which is not quite the same as object based storage. Speaker 3 18:15 So you know, it's such a, it's such a um, it's, it's really kind of hard to nail down in 30 seconds or less what object based storage is because you kind of have to weed through, well maybe we were talking about versus what marketing of other companies are coming at you from this perspective then. But it's about the integrity of the individual file. So the file itself is basically your folder trees, your folder structures. They do not exist in terms of directory. There is an entry and I know there's no folders, no folders are virtual in the sense that like if you go to Amazon S three storage, when you, uh, when you are uploading a file you are uploading this object. And the big thing with object based storage, um, is that uh, you know, true object based storage is limited to two, I believe three commands, which is like put, get and delete. So when you want to rename something, what you're actually doing is you're putting a new copy someplace and then the leading the old, sometimes there's posting involved the put in posts. Speaker 0 19:24 Yes, put in post can be used inter. Okay. Interchanges. So let me, okay remember just trying to get some big picture ideas out there. So another thing I've noticed is I'm like, there's no way around using this word in this context. So one thing we do with file systems as we like to Mount them, right? Right. So I know I like mounting my file systems, I Mount them. So what does it mean to me? So when all of these file systems we've been talking about, when you as a user who has on whatever your client system is, that Mac system or whether it's a server doing something, we've talked about how you sort of have to have a driver to the file to make sense Speaker 3 20:08 Of it, right to read the data, right. The data file system client, you have a file system client and when you connect to, which is really just another way of using the term Mt, but we do use the term Mt when talking about mounting a drive. When I plugged that external USB drive in and it pops up on the desktop and I get the cute little hard drive icon or if I were to look at it in terminal, something with that drive name now exists at forward slash volumes forward slash you know, or in windows it's mounts and it has a drive name, but it's at <inaudible> or D colon or equal and those things like, okay, I've connected to it. There is a hierarchy of folders. They're named certain things. There's the root or kind of original layer, the files, you kind of walk through them and navigate them. Speaker 3 21:00 I can organize them with subdirectories and sub sub-directories and sub sub-directories and object based storage. You don't do any of that? Uh, usually. Well, so again, it's one of those things, you know, it depends on uh, you know, there's a number of people who who, so one of the truest object based storage out there I believe is the Amazon S three that most people would be familiar with. There are a couple of, uh, very pioneering fuse projects, which is a whole other topic, uh, file system and user space that actually can virtualize a, um, an object store, an object store like S3 and make it appear as like a network share. But it's kind of like the same lie that a file server is telling you. Yeah, it's, it's so, but I guess the biggest thing is if you've never experienced, uh, object based storage, um, if you've ever used like FTP and again, I know it gets a little shady because there are um, some virtual FTP clients, um, for the Mac and windows. Speaker 3 22:05 But if you've ever used anything like cyber duck, um, which is a really popular free, where on the Mac you can kind of get the idea of what object based storage is light. It's, it's a bunch of files that are no not where they're just all at the top level. There's no, there's no directory entries there. There are virtual directories in the sense that you can, um, you can tag it with metadata that says that, uh, yes, this really, this file's name is technically for it slash X for slash Y slash, Z and the file's name is Fu. And then it can present a directory tree to you. But if you were to look at the abstract file system itself, there's no directories. It's just happens to be that something is saying every time you see a forge slash in a file name, interpret that as a virtual folder. Speaker 3 22:53 Um, so it, you know, it's basically, it's just a bunch of data that is replicated across many nodes, nodes being servers with their own storage and your redundancy or your protection is that you, you have more than one copy. So if you get something like a bit flipped or bit rot, um, it can read from the alternate copy or copies and it can then repair that so that at all time, you know, uh, and really with the power of object based storage is unlike raid is that um, when you are rebuilding a raid, if you have 10 terabyte drives, you have to rebuild 10 terabytes worth of parody. But here's the thing. Okay. Without getting into the data Speaker 0 23:38 Technician side of it too much. Sure. The other thing about an object store that is is you know, forcing a change on people. And again, we'll save this for the real deep dive object store episode Speaker 3 23:51 That Brian Summa may or may not be featured on. Um, I guess I didn't make the cut. Speaker 0 23:56 Well no, what we'll do, we'll just use that new feature in premiere that lets us be able to say anything in your voice and we'll just basically, you know, say what we think you would have said. And I swear we won't say anything that you wouldn't say, Oh, I swore on our neck. I swear. I swear. And as our listeners can tell, there isn't a whole lot that's probably like not in the territory of what Brian might say at a given moment. So it's really anyone's guess. But anyway, software as in the applications we use, including our operating systems and including sort of the apps that we use. Premiere cat, DV, regeneration, our web browser, cyber duck you just mentioned. In order to use them with object storage, the apps that we use have to be written usually to even work with object storage, right? Like an app like premiere. You can't just ingest files typically. Again, some object stores have features on top of them, but forget that Speaker 3 24:57 Right? And the premiere in order for something like premiere or um, you know, an uh, cat cat DV or whatever, what it has to become aware of is right now, um, it basically is to become aware of your URLs. Okay. So w uh, some you might think, all right, well most things are aware of URLs because anybody who's ever done a deep dive into like XML of final, final cut seven, which is actually what premiers XML is based off of. They were using <inaudible>. Yeah, they're URI. But they're all local. Yup. Yup. That's file colon slash slash local host point, whatever. So what, so but what needs to happen is, is that they need to basically be able to, um, and this is like the weirdest, this is going to be the weirdest thing. And so something like premiere, in order for it to work with an object store, it's going to have to do two things. One, it's going to have to understand, uh, what, what that URL is pointing to. And it's also going to have to kind of understand that it's going to have to like fetch it and cash it somewhere locally and that it, you know, because for the time being, you know, I, Speaker 0 26:04 You know, I think it's going to happen, but object store is not particularly, it's not production stores. You know, we talk about the features of these things and we talk about the features that are in file systems. Well, the modern file systems that we use and we put on either your local storage or your shared storage, you know they have features that make them conducive to an application like a nonlinear editor or an like a graphics application to be able to shuffle data in and out of them very quickly in a way that is very specifically engineered for an app like that who is, which is doing all sorts of crazy data caching operations in the background and this that that even you as a user are not typically aware of those file systems have features that make them very good for pieces of software that need to do all that kind of stuff on a moment to moment basis to use them effectively. Speaker 0 26:56 Where as some of these, especially the more raw object storage platforms that are out there, which I'll mention you need to put data into, as Brian said with a put command, well that's an API call. So you need software that talks in the compatible set of API APIs to even put a file in it, get a file out of it, delete a file that's already in it, you know, post and maybe get some information about it or something. Whatever that is. The apps need to be written in a way that's even conducive to it and those objects stores themselves. You know, there's a lot of challenges with getting that much data moment to moment with a large number of clients, systems in and out when you are expecting very real time performance out of it. Like when you hit play on your space, Speaker 1 27:43 I have a 250 gig video that I need to play back. Speaker 0 27:46 I want to hit space bar and I expect this video to play. And when reading data off of a file system premiere is like a, okay, doing that out of an object store in the more raw form is going to present a challenge and might not even be a necessarily appropriate utilization of object store technology. So I guess the main, Speaker 1 28:05 It's kind of a, you know, if anyone's ever gone into Brian, I mentioned cyber duck earlier. If you've gone into cyber duck and you've said, Oh, here's this PDF or this text file on this FTP server that I want to look at, you can space bar that file, like just like you can in the finder and preview it. But what's happening in the background is that actually being downloaded and it's a very small text file or PDF. It's being downloaded and put into your, um, your cash folder and that's where you're actually looking at it. Speaker 0 28:30 And so we would need to rewrite all of these apps from top to bottom, or at least their data handling capabilities, which probably no one is in the middle of doing right now. So I guess the point I'm making, and Brian, you're the guru. You tell me. I think that probably if we were to have this chat five years from now, which likely we will, um, at least at the rate we're going, um, if we're talking about this five years from now, I would wager that out of all of the disc storage platforms we're selling people, we will continue to be selling systems that have file systems including potentially file systems that either are some of the ones we're selling today like StorNext or maybe derived from or maybe you could say next generation versions of ones that we're we're using today. You know, similar in maybe top level principal but okay now we get a chance to rewrite it 30 years later. Speaker 0 29:26 What would we do differently? So I have a feeling we'll still be using some familiar to us today, some that we can talk about in a deeper dive to our kind of next generation file system workflow show episode and maybe a few more months. But I also think we'll probably be selling more disc based storage that people are using as object stores. But that neither one of those two technologies is anytime soon. Going to completely put the other way. If anything, I would imagine that five years from now our clients have much more heterogeneous environments where for some disk based data storage applications, they are using object storage. That's typically going to be mediated by certain software layers in between and others will, I mean the very same department might continue to use a file system based shared storage system as like their production sand but nearline becomes an object store that you get to through your ma'am and not just another drive that you Mount as a file Speaker 1 30:26 Or maybe if it's managed automatically somehow and that, you know, this is a project that I'm currently working on now. This is all going to be pulled out of the object storage onto my production storage where I will edit from and it could be managed by time as far as like how often do I open this project, how you know, and then the little secret is that we could kind of do this today. <inaudible> can be done. It could be done like that today. Many people, but if we're talking about premier, no, not necessarily a man. We're talking about premiere. Sure. Um, I would see, you know, maybe in the next five years the, the capability being that we actually still work off of, uh, off of some sort of a local file system, whether it be a local driver of sand but maybe premier sort of intelligently manages the shuffling of data from the object storage to the production storage based on some criteria. How, how often you open that project. Um, when was the last time you opened it? Um, something like that. So, um, I would argue it would be you tell us Speaker 3 31:26 My, okay. So my argument would be, I don't think, and I could be wrong, but I don't think Adobe, I don't think, um, the Adobe is the apples, the, the Avids are going to integrate directly with object based storage. Speaker 1 31:43 And I agree with you there. I totally agree with you. Speaker 3 31:48 More realistically, what will happen is a file system possibly store next, which I know that's already kind of on the road not, but what they'll do is they'll act as a front end Speaker 2 32:00 To an object based storage, which I know is exactly what StorNext Speaker 0 32:03 Yeah. That's sort of StorNext with storage manager in mediating the flow of data between it and they lattice object store. Yeah. Speaker 2 32:11 And so I think what will happen is as you will, you will have a, you know, uh, like a basically a file system that has maybe made up of smaller um, SSD, smaller, faster SSD. So maybe like one or two terabyte SSDs. Um, that's basically acting as a disk cache for your object store. Speaker 0 32:31 Do you think that will be a new or on modified file system that hits the market, say or becomes a new community project? If it starts as an open source thing. And so what you're saying is the file system layer will start to emerge as something we all kind of inherently see as a caching tier because we all start more and more to take for granted that are bigger data sets. The big data is probably primarily living in an object storage world, but that for almost any industry, the need that will then start to occur is that local file systems are really just the thing. You need to host those files for certain types of applications to work with them on a moment by moment basis. But it's almost like a file system might be mediating what's coming in and out of the object store moment to moment, just as a basic feature. And that, you know, that's almost starts to become what the core job of a file system is, is acting as a more nuanced cash tier. And Oh, I think I, by the way, I think you're kind of right. I'm just going to throw that out. Speaker 2 33:44 Yeah. I mean that's, that's what I think is, I think that because if you've ever met or worked with any developers, um, they very clearly, they're like, they, they're, they're often very quick to be like, that's an application issue or that's a file system issue. And it's like, that's not our problem to resolve. And so I think that, um, when you, you know, Adobe is going to do what they're going to do, avid is going to do it. They're going to do, Apple's can do what they're going to do. And more likely than not is that, you know, there's this thing that's worked out very well, which is interacting directly with a file system traditionally. And you know, maybe in 20 years they'll working off of object stores, but I have a feeling what will happen is that somebody's going to come along and they're going to say, okay, it's the job of the file system, not the application to work with the object store. And so there's going to be, like I said, this, this caching level layer in front of it that like I said, well essentially be specifically you know, whether whatever criteria it uses but it'll basically everything will live in object store. Um, so as you start to ingest to your disk cache, uh, those rights are gonna start getting dumped right Speaker 0 34:58 Into the object store and then I think then as you start to request those it'll, it'll, it'll recall those and then gradually they'll expire. And basically I will say, and this will be the maybe final thought before we say our thanks and goodbyes, but that is I think the vision of what quantum StorNext really offers. The problem is is that Apple has an issue in finder and in terminal. I believe that prevents us from fully taking advantage of it. However, if you had a Mac non Mac environment, that's a big part of what the whole store next storage manager packages. Once you start using the actual policy driven systems for data flow between the file system layer and the object store layer. If you're using something like quantum lattice object store system where you're using Amazon web services as your object store or public cloud, which is all object store. Speaker 0 35:51 So listen clearly we have some followup episodes to do. This has been great. It's always a pleasure to pick all of your guys' brains, all three of you, including Ben here of course, and Jason Whetstone regular cohost, as well as the esteemed, the man behind the curtain, the great and powerful Oz, Brian Summa. Holy cow, man. It's just, you know, we get blown away by the level of knowledge and the man. That's right. Which is why we're cloning him into an army that will take over the world. It makes me sad that you guys are so easily impressed. We have miserable lives, horrible, miserable lives. Well, my life isn't romantic. The truth is we're not easily impressed. Now Brian is just one of the more modest people you'll tend to meet, which is part of why we love him. All right, Brian. Thank you so much. Thank you gentlemen, and we totally blew through the hour and so be it, man. I guess this is just how long it takes us to do workflow shows and if you've been with us this long, thank you. And stay tuned for the next ones. Take care. Bye. Bye. Bye.

Other Episodes

Episode 0

July 28, 2020 00:47:38
Episode Cover

#51 Workflow In the Cloud - Creativity Drives Innovation - A Talk with Michael Cioni

For this episode of The Workflow Show, we're delighted to have guest Michael Cioni, Global SVP of Innovation at Frame.io, as our guest. Jason...

Listen

Episode 0

September 28, 2012 01:12:32
Episode Cover

#7 "Storage SANity"

Storage Area Networks (SANs) are essential to make multi-workstation video post-production facilities work smoothly. In this episode, Nick Gold and Merrel Davis  explain in...

Listen

Episode 0

December 19, 2014 01:33:16
Episode Cover

#27 NAS vs. SAN Made Clear

What's behind that seemingly innocuous small hard drive icon on the desktop of client workstations within a collaborative video post-production environment? The answer to...

Listen