#56 Engineering Empathy: Building Innovative Access Systems and Preserving Video Testimony with Sam Gustman, CTO and Associate Dean at USC Shoah Foundation and USC Libraries

Speaker 0 00:00:00 Workflow therapy discussions on media, asset management solutions and stories about media production technology. You get it here on the workflow show. I'm Jason Whetstone, senior workflow engineer and developer at Chesapeake systems. And I'm Ben Kilburg senior solutions architect at Chesapeake. Today. We are pleased to chat with Sam Guzman, chief technology officer of the USC Shoah foundation. The foundation's mission statement clearly displayed prominently on their website. Boldly States. Our mission is to develop empathy, understanding and respect through testimony. The foundation's visual history archive, eye witness, and dimensions in testimony, museum pieces have cataloged preserved and made available the video testimonies of hundreds of thousands of survivors and witnesses of the Holocaust and other genocides. The foundation seeks to educate by documenting and preserving the stories of as many survivors as possible and sharing that their great sacrifices can never be lost to corruption accident or the ravages of time today on the workflow show, we will discuss Sam story and the story of the USC Shoah foundation. Speaker 0 00:01:05 First, here are a few quick reminders for our listeners. Do you have questions or thoughts? Reach out to workflow [email protected]. What more workflow therapy hit that subscribe button. You won't be disappointed. We've ramped up production and we want your ears and your brains to enjoy the fruits of our labor now onto our discussion. Sam Guzman. Thank you so much for joining us today. We're really pleased to have you. Hi, I just, and I just want to say to our listeners, Sam is joining us on the Friday before labor day when so many people are just ready to hit the road and work a half day. So, um, extra thanks to you Sam today for joining us. Um, so Sam is the chief technology officer of the USC Shoah foundation. So Sam, tell us a little bit about the show, a foundation starting with what, what is the meaning of the word Shoah? Speaker 1 00:01:54 Sure. So show is the Hebrew word used to for the Holocaust. It means it's how the Holocaust was referred to in Hebrew. Um, and the show foundation was started by Steven Spielberg after the movie Schindler's list, uh, which came out in 1993. He just, he started being approached by so many survivors. You said, you know what, I want to try and get all their stories. And I want to try and teach teachers in the world using these, these stories from these survivors. So in 94, he set up an organization that, uh, ended up going to 56 different countries, setting up offices and beginning to interview as many survivors in as many languages as possible about their experiences during the Holocaust and witnesses as well. So people in the military, et cetera, um, and not just Jewish survivors. And there were so many victims, um, hopeless witnesses, uh, homosexual survivors, et cetera. Speaker 1 00:02:50 So trying to get as much of a picture, um, as we could from witnesses and survivors of what happened during that time and them all tell their stories. So, uh, you know, we started, uh, interviewing in 94 and we stopped interviewing the initial collection and ended up with 52,000 interviews by 2000. Uh, so we would go into the home of the survivor with a videographer and we begin to videotape them, uh, bringing them back, digitize that material catalog it, uh, in 1994, the web was just coming out. So we're all looking at it going, Oh, great. We want to make all this video searchable. Like it was its own each minute was its own webpage. So by keywords people's names, latitude and longitude images, all that. So the catalogers all processed that. In fact, we had an average of about 60 catalogers, uh, sitting at our offices, processing the material and all the different native languages so that it all will be searchable. Speaker 0 00:03:48 Most people today, I think would think, wow, that sounds like a great job for AI. That's how it happened. Speaker 1 00:03:55 Yeah. It is a great job for AI. We can talk about some of the things that we've done with AI, but also there are things that people are really very good at that machine's not good at yet. So for instance, speech recognition, what's being said during the interviews, computers are really good if you're speaking your natural language, but if you're speaking a second language, they really start to fall apart. And then there's other things that make them fall apart. Like when people get emotional or as people get elderly, they start to fall apart. And in our collections, we had all those things. So there's only a certain amount you can do automatically when you start to interview different people, depending on what the situation is and what lens Speaker 0 00:04:34 I'm sure that the, that the subjects dialect, you know, if they're, if they have a, any sort of a dialect that figures into the speech recognition as well. So, uh, Speaker 1 00:04:43 Yeah, I mean, just the thing about English, you know, from the UK Australia us, well, it's the same for Russian. You've got Ukrainian Georgian, Russian from Russia, you know, all of these different languages have their own dialects and their own ways of people speaking. And they all pose their challenges for the automated world, but human beings are fantastic at listening to all of these, regardless of dialect or, or, or any of those things. And, uh, identifying, you know, if someone says I'm hungry you, and then someone says I'm starving, uh, identifying that those are really the same kind of. Speaker 0 00:05:17 So how did you, how did you come onto this path? Like I know that you started into computer programming very early on in life. So talk about that a little bit. How did you get into this field? Speaker 1 00:05:30 I grew up my dad's a professor at Dartmouth college and Dartmouth had one of the very first timesharing systems that was shared in the public high schools and public schools as well. And from the age of seven, I started my first programming classes, which in the eighties was, was very nice and it gave me a lot of opportunity and I liked it a lot. And so I grew up, uh, on a computer programming and working and working in a, a distributed fashion. And then as I went and got older, I ended up going to the university of Michigan, where I got my degree in computer engineering. And, uh, then after that, I went back to Hanover New Hampshire, where Dartmouth is, and there's a facility there called the cold region research and engineering laboratories. And I worked for the military for a few years before I got a call from the shell foundation from ed Hunter. It sounded very exciting to a 24 year old, which is what I was when I first got that call. And it didn't take much for me to pack my bags and head. Speaker 2 00:06:31 That's fantastic. So, um, how did you get started on something like this? Speaker 1 00:06:36 So, actually it was interesting when I got interviewed, they asked me to write a paper on how I would solve some of these problems. And in a nutshell, we were collecting a lot of data. We were having to store and preserve that data, then catalog it, then provide access to it. And my background was in geographic information systems, databases for maps. And so we take for granted Google maps and all those things now, and then the ability to interact with maps, but there was a lot of software behind that. And that's what I spent a lot of time on when I was at the army Corps engineers. And, um, I also spent a lot of time in school studying digital libraries, which are the computer systems behind storing multimedia content. And, but all these systems have a pattern of once you've collected it, you need to create basically a set algebra over it so that you can start putting sets of things together and ask questions of it. Like it's a database and gathered together things that are like so that you could present them to people in the way that they want to be able to see the information. And so basically I took everything I learned about geographic information systems and apply it to the video database that we were creating for the show foundation. So actually, anyone who looked at the first versions of the show foundations database would say, why does this look like a mapping data? That's great. And the reason is, is because, Speaker 2 00:07:57 And that's probably a, that's probably part of some of the successes that you found right away in terms of being able to find things in that system. Speaker 1 00:08:05 Absolutely. The idea of how you grouped together things that are not text-based, but, you know, in a map, you know, streams, where's rainfall data where things going to be moving, how do you move through them? The features and types and functions that you find in maps are very similar to how you might want to move through video, where someone's talking about a specific concentration camp and what it was like to be hungry in that camp. And one of the kinds of things that happen in those areas, and how would you bring together all the moments of video if you're not interested in just a testimony, but you're interested in what everyone's talking about, about a certain topic and the things around that topic. And that was what, Speaker 2 00:08:44 Right. So somebody might have been talking about a particular subject, like hiding food, Speaker 1 00:08:51 Food, and hiding in the altercation, or you'd want to, you'd want to be able to find very quickly all of the stories that, uh, that had subjects talking about those subjects and then being able to get to that content very quickly. It sounds like you mentioned earlier how almost every minute of every video is like its own webpage. So you're probably able to index that stuff very, you know, painstakingly and index all of that data and then be able to get to it very quickly. I'm sure people could S could see the value in that very quickly. Absolutely. And you can do that publicly. Now. You could see the results. We have a website VHA online.usc.edu. You can go to, and you can move through the metadata of the archive. And we actually put 4,000 of the 120,000 interviews up there for people to really move through and get a good idea of how one would interact in a research sense with the art ethic. Speaker 1 00:09:48 So say I'm coming back to the show. The show started, the collection went through 2000. We began cataloging the content in earnest around 1998. And then that went through 2005. And then in 2006, we said, okay, we're, we've done the collection. We've been cataloging it. We want to teach with it. We know we sorta chore transforming ourselves into an educational group, but we were located, uh, near Steven Spielberg's offices on the lot at universal. And that wasn't the best place to be able to build educational programming. So we were moved to a university and Steven Spielberg gifted the whole nonprofit to the university of Southern California. And we were set up as a department at USC in the Dornsife college. And this allows us to interact with all kinds of researchers throughout the university, as well as other universities. Because now we are both a part of an academic school, as well as, uh, able to build academic services and programming sort of like an archive or a library within that school and service educators, students and researchers with the content, from the archive for what it was meant to do. Speaker 1 00:11:10 What was it like moving to the move to USC was amazing to see our focus and the systems that we built were all around collection before we went to USC and we were building the cataloguing and all these other pieces, but our reach was a little bit different and universities are more focused on how you teach at versus production, which is what, you know, Hollywood's focused on, although they are focused on distribution, but it's a little bit different. So the distribution for education and becoming a part of an environment of a place that is focused on that was amazing. We had access to high performance, super computers that we didn't have access to do before. We had access to all kinds of researchers who were looking at the archive in ways that were different than we were used to. And it just helped us grow tremendously high performance computer. Speaker 1 00:12:07 As you spoke, you spoke to something that we have, we've actually discussed recently on the workflow show, which is a difference and focus with an educational institution on something like preservation and education. Those are two big sort of pillars of, of what a university would subjects that would be driving a university to invest in. Something would be to educate and preserve as opposed to, you know, like you said, produce, uh, we talked to, uh, Emily halavais from preserved South a few episodes back. And she mentioned that as well, that universities have a different sort of motive in terms of what they're looking to preserve. Absolutely. So at universities, you know, there's sort of this concept that's bandied about the word at USC is called the super techs. So the idea is like with Shakespeare or with some religious texts, like the Torah or the Bible or the Korean, they'd been around for hundreds, if not thousands of years, and that content has made it through time and tap. Speaker 1 00:13:04 And so if you have content that you believe deserves a chance to be a super text, as we do with the archive and the lessons that come from it, you want to be able to give it a chance to make it through time. But the issue around moving images is that the oldest one's 140 years old versus reading and writing, which goes back 5,000 years. So, you know, the context for how something makes it through thousands of years, the way some of the other texts that I mentioned is very different when you've only had 140 years of experience with moving into now, we're not storing these moving images and stone on stone walls or anything like that. These are, you know, media that consider it over time. So, and the SU is everything rots, right? So the conservative numbers we use is film. You get 50 years, you get get more, but you know, conservative numbers, we use 50 years for film 20 years for video date, five years for hard drive three year for bearing very tape like LTO a couple of years for optics, almost like the new of the tech, the faster rods, which is wonderful for the tech companies that want to sell storage. Speaker 1 00:14:14 But if you want to keep your stuff around for decades, if not hundreds of years, you have to put a whole bunch of infrastructure behind making sure your bits don't go away. Speaker 1 00:14:27 So Sam, it's interesting, you mentioned three years for LTO, certainly in the industry, they talk about it being much longer lived. And your experience short, life's a mass manufacturing. We have tens of thousands of tapes that we take care of. And, you know, every year we find depending on how it's going, and we swap tapes out every three years, but even with swapping them out every three years, we'll find somewhere between a dozen or 60 tapes a year that are just that right. And or they go bad after we write the data, because what we do is as soon as we put data in every six months, at least we check the files, make sure that they're okay. And none of the bits have been lost. We use various hash functions like Shaw one to be able to do that. And we find errors on tapes. And the manufacturers say, if you find a bad take, we're happy to replace the tape, but that doesn't really help you very much. If your data is gone. So we keep multiple copies, we track the media and we're constantly refreshing media. In fact, what we do on preservation is every six months we check every piece of media and weeing for every three years to replace every piece of media. Speaker 1 00:15:41 Nope. Speaker 0 00:15:42 You had mentioned shell one there, something that we do on the show typically is when we bandy about, um, things that the listeners might not have heard before we try and define it. So Shaw is the secure hash Al the algorithm, right, Speaker 1 00:15:58 Where you use that set of ones and zeros and computer files are made out of one and zeros, you pass it through this algorithm and it gives you a number, a key, so to speak. And depending on which one it's shorter or longer, but what you know is is that if you ever pass that file through that algorithm and the key changes, something, a one has changed with zero somewhere or zeros change to a one somewhere that's not supposed to. And it's most likely because the media is starting to have trouble that is bit raw and checking for bit raw and fixing it as called fixing. Right? Speaker 0 00:16:36 Yeah. Fixity is something that I think, uh, maybe some, some of our more technical listeners or listeners are familiar with, but that is a term that I only in the last few years had become familiar with. But yes, it is the process of checking what you have, you know, recorded on a media to make sure it is what you put on the media, right? Speaker 1 00:16:54 Yeah. So what we do is every piece of content that we get, we digitize, we put onto it and its highest form that we can, we, and we try to get all the, and we get all the original bits off of whatever the media is. We store that in there. And then what we do is we add this hash function and there's multiple ones you could put on there and we check it. And again, we check every six months that nothing has happened to the file on the piece of media that's there. And if we find it, then we do something about it. And what we do actually is we have multiple copies in different places around the world. So what you want for preservation, you want geographical diversity, you want organizational diversity. So if you can keep one at a nonprofit, one at a commercial group, one at a governmental group, uh, for copies you want, and what you want to do is keep those in sync. And as you checking each, make sure that it's updated if any, one of the are having a problem. Speaker 0 00:17:53 Yeah. And that those, that, that, that made is replaced. So it, it sounds like a lot of this preservation effort in terms of you talk about the, the geographies and then the organizational that the different organization types, it's all sort of planning around worst case scenarios. I mean, writer, at least as, as, as best we can. Speaker 1 00:18:11 Well, I mean, we live in a time when we get really unexpected disasters, right? And so what you want to do is diversify sorta like you do your financial portfolio, you want to give content a chance to make it through time. And there's actually a continuum of even different types of preservation that you can look at some. So for instance, we're spending time looking at blockchain as a technology where instead of storing and data centers with tape robots, we start to use the internet, leverage the internet, highly distribute the content as it's stored using this wonderful technology that people started off with, with, you know, Bitcoins and being able to financially manipulate and use those. But the fact that those have been so secure over time, and it's been able to stay there as a currency. Now we're starting to replace the Bitcoin with testimonies and use that same infrastructure to make sure that the content stays there over time. Speaker 1 00:19:09 And there it's wonderful because it's not just one organization, but it's the internet. As long as nobody controls more than 51% of the internet, your content is safe as you distribute things over these blockchain networks. And so that's really been an amazing area. It's not quite ready, but it's something we're doing on research, but I'm holding up my left hand saying that's on the very left the internet, the preserved stuff in the middle is what we're doing today, where we're using data centers and putting them all over the place and having them sync up together. And then on the far right side, what you have is you try and find material that will actually last through time that you trust. And there's a number of different technologies out there. And the things you look for in those technologies are not just that you can write something, but how simple is it to get the data off? Speaker 1 00:19:57 You want to be able to use light, like film is wonderful, right? You write the date on there, you shine a flashlight behind the film and you can get the data off the film. Very easy. How easy can you make that for the many petabytes of data? Our database for the show foundation is 25 petabytes, but all many, how many, how easy is it to be able to retrieve that information? How fast can you retrieve that information? So sort of the Holy grail of the stuff on the right is something that's completely clear, but very strong, like some super strong glass that you could shine a light through. And it would teach someone in the future how to read the content that they're getting off, but you don't need anything more than a microscope to be able to read the bits of information. Because now if you use something like M disc, which is basically stone, that's in the format of a blue rig where they have a hot laser that writes onto it, that can last a long time. It's it's minerals. It's a piece of rock, but you're going to always have to have that laser to be able to read it and, and use it. How long will that leave? Those lasers lasts and the technology to be able to build them. You need, you want to worry about something that's going to sit in a closet or an archive or wherever for hundreds, if not thousands of years, that it's going to be easy to figure out how to read it later. Speaker 0 00:21:20 Right? It reminds me of the, uh, the golden records they sent out with the Voyager spacecraft, right? And there were JPEG images encoded on those records. And then the pictograms essentially telling any aliens that might find our Intrepid space, how to decode those images. You mentioned a few minutes ago to rewind to is blockchain. It's not something that we have, have, have covered a lot here on the show. So again, I think because it's only, it's kind of nascent in the, uh, in the data storage space that we're talking about here and, and security had been used as, you know, used as transmitting currency, but you can just give us a like a high level overview of what blockchain is. Cause I know a lot of people have heard of it. Speaker 1 00:22:04 Oh sure. So I need to give a shout out to Stanford engineering who we've been working with on, on these projects. But basically what it is is it's a ledger that you is highly distributed across the internet, that points to the different sets of content and the fi it's not a full file that's stored on anyone's device, but pieces of the files are stored all over the place. And the blockchain tells you how to pull those pieces of information back together again, and keeps many copies of it and tells you whether you're getting a good version back as you're starting to read it. And the highly distributed and secure nature of it, it turns out is the content itself isn't necessarily secure. Everyone can read it, but it's really hard to change. If you don't have the right permissions to change it, you don't have the keys to be interested. Speaker 1 00:22:54 And so it's actually been very interesting for us because people are worried about deep fakes, all kinds of stuff. And what we're looking at using blockchain for is not only the ability to, in a distributed way store content on the internet, but as we take it, we talked about the show on these. It's almost like a digital fingerprint, but you can put all kinds of information in that fingerprint, for instance, on your phone, you know, the latitude and longitude, you can know barometric pressure, you can know temperature, you know, all kinds of things. And what we can do is use it as a, a stamp that says, we have taken this image, this video, this picture at this time, and it is entered the blockchain with all this information to prove that it's here. And it hasn't been changed since it was taken at this place at this time under these conditions. Understood. Okay, Speaker 0 00:23:39 Understood. So it's also about making sure that that has not changed in any way, whether it was intentional, you know, just something that happened as a result of bit rot, you know? Speaker 1 00:23:50 Yeah. We sort of have an umbrella name for the technology we're working on as a group it's called Starling. I don't know if you know those birds in Europe that all sort of formed either separate, but then they form these large wonderful shapes, sort of the idea. Let's shift the discussion a little Speaker 0 00:24:05 To some of the different access systems that you have at the show foundation. Why don't we start with the visual history archive. Talk a little bit about that. Speaker 1 00:24:14 Absolutely. If I could just back up on Asheville systems for a second, we have sort of four different kinds of users that we separate our access systems into for you. So for instance, we have teachers and educators and their students, right? That's one group of user interfaces that we build for people who are trying to learn about empathy through testimony. We have a whole program called eyewitness around that. Then we have a scholars who may use the teaching systems of eyewitness, but would also use what's called the visual history archive, which is meant to allow people to do deep research and use our catalog and really bring together all kinds of different information around the testimony. And that's what someone who's a scholar at a university would most likely use as a visual history archive. Then we have a systems that are meant for organizations and museums like the United nations or, or the U S Holocaust museum or Yaga sham, or one of those organizations. Speaker 1 00:25:08 There's something called dimensions in testimony where we put AI behind some of the testimony. So you can actually have someone ask questions of a survivor and get responses and have a voice to voice communication, a conversational AI communication with one of the survivors where you asked what you're interested in it. And the testimony actually communicates back with you the way that you want to receive it. And that's really meant for organization, so to speak. And then we have communities of folks. So for instance, we have probably close to a million people who are related to people in the archive, and there we have all kinds of efforts that we're starting to move towards, allowing people in the home, 11 affinity to the archive to begin interact with it. Our first toe dip into this as we just started a partnership with ancestry, but we want to expand how we begin to work with people in the home who have a real affinity to the survivors and the testimonies within the archive. Speaker 1 00:26:04 So you'd mentioned the visual history archive, which is focused on scholars and researchers. And so what that is, is a system where you're able to search using latitude and longitude or images, or there's 65,000 keyword topics through all 14 of the genocides that we've collected testimony on. We haven't just focused on the Holocaust. We expanded when we came to USC, passed the Holocaust to a whole number of other genocides. There's an unfortunate plethora of this kind of hate, uh, that's happened and is happening in the world. And so we do our best to collect content that people have collected in the past and help them preserve it and provide access to it. So for instance, we brought in a bunch of interviews on the Armenian genocide that are in our collection, which happened much earlier than the Holocaust. Um, but we now have, uh, well over a thousand hours of material on that, we also are collecting on current conflict. So for instance, we have testimony on the cities, which is a group of Kurds that are being, that have been attacked and slaughtered by ISIS and others, uh, in the middle East. And so we're growing the archive to bring context to what happens when tribalism and hate goes completely unfettered. Speaker 0 00:27:25 So yeah, you mentioned that the focus of the visualist history archive is on researchers and scholars. So I would imagine that affects how that system is structured and how that data is searchable. And I would imagine there's a lot of focus on the sort of search by not only by like a general keyword search, like you would see in Google, but some very know advanced methods of searching and that system. Speaker 1 00:27:48 Yeah, it's sort of the opposite of the education systems. We don't start with the story per se. And the research systems. We start with the topics and how one may want to move through 120,000 hours of material and search through all of that content. And so the starting point is, is research, search engines of different kinds that allow you to bring pieces of testimony from different testimonies together. You can watch an entire testimony, but it's almost the last thing you do. The first thing you do is get presented the testimony in all kinds of different ways where you can create your own sets of it. Okay. Speaker 0 00:28:24 I watched a video on YouTube, which we'll put a link to in our, uh, our episode notes here. And I know this was, this was taken some time ago. It was almost eight years ago, but there, there was a, you, you gave in that video, a few demos of, of these systems as they were back at that time. And even as they were back at that time, I was very impressed to see some of, even just some of the content that's there. It's, it's very touching. I believe it was in the eyewitness system. You download a video teaching about the basics of editing. And you know, of course me, I'm sitting here and I'm like, okay, the basics of editing, we're probably going to cover with like what a three point editors. And it was, it was really more about the concepts of editing, not necessarily how to do it, but overall concepts. Speaker 0 00:29:04 And then I got a very strong feeling that ethics was also a big part of it. So I want it senses that I got just from the content where teaching, uh, teaching younger people, how to consider ethics as they use technology consider what's, you know, w w what is a good ethical use of the technology. And then also another, another term that I had heard a lot was digital citizenship being a good digital citizen. So that whole idea of when you post something online, making sure that you understand that you would post it in a way that you were doing it in person. So that seems to be a very strong theme just in the, in the overall mission of, of the foundation and of the systems there. Speaker 1 00:29:45 Sure, absolutely. So structurally we've just moved from what the researchers were looking at with the visual history archives, to what educators use to teach their students with eyewitness. And a part of that is there's a number of different, um, activities or classes or lessons. We have a learning management system that we've built that teachers can go in. They can either use free package lessons, or they can build their own lesson for the students within there. And the students can put out all kinds of different kinds of output from just word documents that they type up as a result of their work, to answering questions, to actually creating an, offering their own video, using the system to even keyword clouds as they do searching. And so there, what we're doing is we're trying to build the scaffolding that we expect researchers to be able to pick up very quickly, you know, search how you search on the internet. Speaker 1 00:30:38 What does it mean to be able to search how you edit well, researchers, we don't care about the editing, but for students, you know, we then put all the scaffolding in place for how do you take pieces of content and present your ideas and your concepts. And so we have things like the eyewitness video challenge, where we have students from around the country competing to take content and upload it and mix it with testimony from their local community about something that's meaningful to them. And so these teachers in these classes are producing this kind of content, and then we're rewarding the best of it later on, but being able to put programs in place that really give the digital literacy, as well as the education that we're trying to do about tolerance and empathy and what it takes to be a good person in this world and mixing those together as what all of our educational learning management system platforms about. And that's called Speaker 0 00:31:36 That's fantastic. It seems like we could all use a little bit more of that in today's world, Speaker 1 00:31:44 Terrible, but hates a growth industry. Speaker 0 00:31:46 Yes. Yeah, for sure. That's an, it, that's a good way to put it. Um, Sam let's, let's move on next to some of the dimensions in testimony, uh, pieces, and some of the museums I watched a 60 minutes special, uh, where Leslie stall interviewed a survivor who had already been deceased. And she asked him all kinds of questions, including some questions about the weather and things like that, that he said he'd been out, and this is something I can't answer. I, that was just amazing to watch. And how does all that work? Uh, again, we'll post a link to this special and in our episode notes, but Sam talk a little bit about how that works, Speaker 1 00:32:25 Right? So, so going back to the beginning, ideas of it, the thing that makes people hate is a lack of humanization when they can dehumanize another group of people in whatever way. And it happens from all sides. It's just seems to be a mechanism of human beings. When you can consider them less than a human or an animal or something, that's not someone that you has a name and a life and a family and can love, then it's easy to kill them or destroy them or hurt them. And so, you know, one of the things that we do with testimony is we put by putting a human face on the content, not just a transcript, you're trying to pass more humanization of what's happening with that story. But we know that when survivors go into classrooms or they go to museums and they speak, it has so much more impact on students and others that are trying to learn than any other material that we can do. Speaker 1 00:33:20 Just sitting there with someone and having a conversation. It's the ultimate humanizing event. But unfortunately people get older just like our bits, people rock or whatever. I hear a lot of jokes about that, but the survivors, you know, the ones that were 10 years old during that are now 85 and 20 years old are now 95. And from the Holocaust and all these different things, people start to pass. And the question is, is how close to the most humanizing version of what they do in the classroom? Can we get right? We didn't know. So we started a research project where we started to take technologies around conversational AI. And there's all kinds of these around now that you could find from Google and IBM Watson and these other things. What if we mix that with testimony and what, what do we have to do to change the technology? Speaker 1 00:34:13 What do we have to do to present the technology, to be able to tell these stories with an AI behind it in a way that makes someone feel as close to the experience that they're getting when they're talking to a survivor in the classroom. And so we started recording this, we started down two paths. Technology. One is the interactive aspect of the AI aspect. The other is we started doing things in three D as well, holographic capture, not just the front facial 2d capture, but full 3d volumetric capture. And the sort of our man on the moon, which we haven't achieved yet, version of this would be someone would be able to sit in the classroom. They would see a holographic version of a survivor they're sort of the star Wars, princess Leia thing. And they would have the conversation back and forth with the AI version of that person. And it would bring as much of a humanistic effect to their story as possible to help kids or whoever is listening, learn how to humanize people as much as possible, because that's really what the survivor testimonies are in the end is there they're teaching mechanisms for people to learn how to humanize them Speaker 2 00:35:23 And amazingly noble cause this is fantastic. Yep. So digging into that tech a little bit. So you guys are using speech to text so that somebody can talk to what I imagine is a computer with a large database and thousands of video clips that have been here, Speaker 1 00:35:47 Correct, exactly. Do specific responses in these volume metric capture rates, or sometimes 2d. If we can't get them in front of a volume, that's this rig is really interesting. It's like a big bubble. It looks like something out of a scifi, right? Well, the very first ones were because we started this about eight years ago, but now we have remote version that we can send around. In fact, our newest tech is really about how in these Kobe times do we send out the simplest rig? We can. So someone who's in the hall, we could send a mass engineer to set it up in the home. And then we could do a zoom interview with the survivors as we start to do either the interactive or the regular, you know, classic Judy oral testimony with them. And so that's sort of been a lot of what we're doing and the blockchain stuff's coming in time right now even helped with that. Speaker 1 00:36:43 But we're pushing very hard on how do we start sending remote rigs around to be able to collect testimony from people in a pandemic. And it becomes even more important because the folks that we want to interview are at an age where this pandemic is killing them faster than others. And there's, there's obviously a lot of stress around that to someone coming into your home, even masked, you know, I, I know this is something that this is something that a lot of our listeners right now can relate to. A lot of our listeners are, you know, in the production and the media production industry and having to just think completely differently about how that even capture their footage and do their craft. Yeah. It's a, it's a whole almost, it's not a new field, but, but it's, there's a sense of urgency around this kind of remote ability to capture high quality and also store it. Yeah. You mentioned the, you know, the blockchain cannot not coming at a better time. I mean, any, uh, any kind of storage really that we're talking about these days is it's cloud first technology, you know, everybody's trying to do their work from wherever they are. Yes. Speaker 2 00:37:52 The one thing, so you guys are bringing this dimensions project into museums where people can go have experiences where they're actually interacting with, um, and that's version one, this program in these, in these people, right. That's version one, of course, right. It's not to star Wars yet, but it will be how are those data sets distributed? Are they, um, on a CDN? Is it going back or are there small data sets that are essentially just saying Speaker 1 00:38:19 <inaudible> in AI services, which is just a text interaction, if you can store the video locally in all of these places and build a really smooth experience. Yeah. That's great. Speaker 2 00:38:39 Wow. Yeah. I mean, it just looks astounding. I look forward to, once everything settles down again, to be able to go and see one of these for myself, for sure. The one thing that really the mission that the show of foundation is undertaken. The one thing that really just lights my heart on fire about the whole thing is these people are people that someone has tried to end their life and you guys are preserving their life in some way. Speaker 1 00:39:05 Lovely. And for all time, the ultimate, which is just terrible human beings, Speaker 2 00:39:12 You know, that is, that's a fantastic way of putting it. And I I'm glad you said it that yes. So this being the workflow show, we should definitely talk about, you know, your ingest process. How are you digitizing things? How are you cataloging things? And, um, especially about the mass migration process from platform to platform to when I heard that Dartmouth speech that really caught my caught my attention, because it's something that I don't think a lot of people. Speaker 1 00:39:44 Yeah. So we have, there's two ways. Content is birth for us either we shoot it new, or we reach out to someone who's collected in the past and help them preserve it. We have a program called preserving the legacy where there's been thousands and thousands of interviews from various genocides that others have taken. And we want to help them preserve catalog and provide access to that material as well. So if we should, it that it comes in today. It's already digital in the past, it was on tape today. It's already digital goes straight into the systems. We make many different copies, there's the top level archival format, but then there's a mezzanine format for editing and, you know, things you'd want to do on TV. Then there's a number of different internet formats, depending on what we're trying to do on the internet with, with the content. Speaker 1 00:40:33 And then it's copied to multiple places. All of the digital fingerprinting is done so that we're able to begin tracking the health of it. And the metadata starts getting entered for all of this content visits in, as it comes in, you know, we collect a lot of the production metadata about the health of the content. And we do, we have a very significant QA system because with this preserving the legacy program, not only do we get new content, but we get everybody's stuff where they may have had capes sitting in archives for decades and they'll have mold and other kinds of things in them. So I want my head of production. Baked has to bake tapes all the time. I'd call him chef Boyardee dust are constantly saving and preserving content to be able to be put it in there at the same time, we're collecting new content, getting it digitally, fingerprinted, putting it into the preservation systems, then getting it catalog so that it is all searchable. Speaker 1 00:41:30 Then putting it into on top of layer for the four different types of folks that we service the communities of people, the organizations, the educators, or the scholars getting all of that metadata and all of that content migrated to the appropriate content distribution networks, which is very similar to what you see in the movie world, where they're trying to get it out to iTunes and Netflix, we're trying to get it to calc Torah and other kinds of infrastructures so that people can watch the content from an educational perspective, but then putting it into learning management systems so that people are going to be able to build programmatic content again, to help people humanize other people is our top level goal. But there are people who are studying it from a history perspective, a business perspective, a law perspective, all kinds of different perspectives, even just language. We have 44 languages in the archive. And so this stack that goes from collection to digitization, to restoration, to preservation, to cataloguing, to access to programmatic learning, that's what we bring the whole information through in terms of our workflow. Speaker 2 00:42:43 Yeah. Yeah. That's great. Wow. So can we dig in a little bit about some of the, the video Speaker 1 00:42:56 Preservation standardized on gene, just to get nerdy at lossless for, for mezzanine, everything can be lossy, but, but for the preservation versions we want totally lossless. What can I say about JPEG 2000? It's a lovely standard that allows us to capture all the bits from all the different types of media, make a whole entire file. Um, we'll, we'll keep, like somebody comes in on P two or some other kind of camera we'll keep the original files. There is one version, but then we will make the JPEG 2000 version of it. And that becomes the piece that we consider our preservation, cautious that we really dig into. And we'll make sure that the bits make it through hundreds, if not thousands of years MXF. So we started with the Samma robot. So I spent some time, uh, at the library of Congress back in 2005, 2006, as they were putting together the pole pepper facility and, uh, helping with that. Speaker 1 00:43:55 But I said, you know, as I help you with this, I want to be able to take it back to apply to what we're doing at the show and foundation. And so we were one of the first commercial customers of the Santa robots. We still run them. Unfortunately, those robots got sold to a couple of different companies that got sold the front porch and then front punched. So Oracle and Oracle gave up on it and gave it to GrayMeta integrate Metta is using that product and making a new version of it. But we still have a lot of the original Samar robots are making the <inaudible> and stuff. And there's gentlemen, Jim Linder, I should give a shout out to who invented that, did a wonderful, because it allows us to process, you know, tens of thousands of times a month. And that's pretty amazing. Speaker 1 00:44:39 Wow. Yeah. You had mentioned the, I was just going to ask you, how does that play different things that we use high performance computing for the first is let's say that JPEG 3000 comes out instead of going back to the digitization room and re digitizing. If I, anything that I could parallelize on the super computer, I'd be able to retrans code the content from one format to the other. And basically what the supercomputer lets me do is it lets me treat petabytes of data like their megabytes or gigabytes, but then it allows us to do things like image rec. Anytime we want to do a full pass over the archive with image recognition or any other kind of tech that someone may come up with, or even for digital restoration were now able to go very quickly over all the content, whereas it might've taken a while other otherwise just because of the sheer amount of data, but basically for libraries in general of which, um, so one of the things we haven't talked about is I am actually, when we came to USC, we started merging with the libraries and I'm in charge of the idea of the libraries at UFC as well. Speaker 1 00:45:49 But you know, one of the things that's happening is this special collections, which is becoming an important part of libraries to be able to deal with all this digital information. You need high performance computing for people to be able to process move through and access tons of content, which is what libraries are for. But pairing libraries with HBC is arc advanced research computing now is an important area and it's counterintuitive too, right? I mean, it seems like a library should just be full of data tapes. It shouldn't be full of computers. It's just that there's so much content there that you need help to get through. All of it. My Dean in the library says that I love this line. She goes, look, Google will be the place that people go to search for bits, but if they need the curated and researched version of those bits of information, they should be at the universities saved, preserved catalog. Speaker 1 00:46:46 They made access. The DAS exactly is they can disappear on the internet. Otherwise if people aren't threatened right, there should be some sort of endowment set up to make sure that data lives on rather than what's happening, based on the research and interests of academic institutions and how they want to spend their research dollars on their archives and their digital archives. Gotcha. Let's talk a little bit about, uh, innovation in the industry, uh, in, in, in the industry of media production and preservation, what are some things that you have seen that you have considered to be really innovative in terms of like some of the projects you've worked on Sam? I mean, it's part of the thing that I I've been now at the show foundation for 26 years, but one of the things that keep me young is the constant surprise as someone comes in and shows me something, I go, Oh my gosh, I can't believe you can do that. Speaker 1 00:47:42 It used to take us forever to do whatever it is. And it happens in every one of these different areas that I talked about in workflow that we're doing, you know, from collection new things that people are starting to do with cameras to the blockchain work that we were talking about and the ability to treat the internet like a giant storage facility and then access. And the thing that things that you're able to do just on websites now in terms of interactivity with kids, you know, but the thing that kills me is watching the digital divide right now, the area that I saw a picture of these kids doing their homework at taco bell, because they couldn't get wifi in Northern California and it killed him looking at that. And so there's all of these sort of really edgy technologies. But the thing that I've really would love to see us do is work hard to make sure that the plumbing is there so that the utilities are there for everyone to be able to get access to these materials, because there's all kinds of interesting bleeding edge technologies that are being applied all over the place. Speaker 1 00:48:42 But if we can't get access to all of them, to these kids, then we're not going to be raising the next generation of next generation that will do the things we want them to be, which is in our case it's citizens. So I'm sorry. I went off on a different soap box. Speaker 2 00:49:00 Okay. That was a great answer. That's okay. I was just thinking about the innovation part of it. And some of the work you guys have been doing with a volume metric cap capture, and that idea of being able to talk to a hologram. I mean, that's extremely innovative and I know that some really futuristic stuff that probably a lot of people are working on. Is there anything, um, Speaker 1 00:49:29 Rush slow? It is an area that a lot of people are working on, but it's sort of like speech recognition where, where incrementally gets better, but I haven't seen anything that just sort of blows away the virtual reality. Stuff's amazing when you can put on goggles, but being able to see something in front of you without goggles is Speaker 2 00:49:53 Yeah, it's that whole thing of having the two eyeballs and the way our brains combine them and all that. Right. Speaker 1 00:50:06 Okay. Once I came to the shelter and the army Corps of engineers, I actually ended up, the first thing I did at the show foundation was build the biggest system that I built in many of these. And it doesn't exist anymore, which was the actual system to put a videographer and interviewer and a survivor in the same room around the world right now. And so these in terms of innovation, these things that we're doing with these remote cameras and everything was so much work to schedule and make 12,000 interviews a year occur. It's just amazing seeing all the different ways that people are now able to cooperate together with all the various packages and software systems that are online and the way that we can actually begin to remotely capture history. And that's where the blockchain stuff comes in. As we start to put things in there, because time, you know, we were digitizing and we were putting things in various large data centers, but now we can go straight from the interview into these blockchain environments, as well as into our data centers and make everything instantly preserved and sort of allow people to allow to store history over time in a way that we haven't been able to do in the past. Speaker 1 00:51:18 And then with cataloging, you know, there's so much work to be done in terms of different languages and accessibility, but it's getting so much better over time, but there's still things that people are much better at than the computers. But now we're here on zoom and our ability to interact and produce and do things that would have taken a lot more effort before we can now start to put people together in a way that allows them to produce despite geographical differences, all kinds of amazing products. And for us that would have even sped up faster. The 12,000 interviews we were doing a year, I think at the beginning, if we could have had the kind of interactivity we do today, that's helping us get through through COBIT. And then from a teaching perspective, the lessons of the Holocaust and the lessons of these genocides is so pertinent today. Speaker 1 00:52:08 I just remember a sort of a dip in 2000 saying, Oh, genocide, the Holocaust may not be as relevant anymore, but there's so much hate. And there's so, and the tribalism is so bad and splits are so bad, um, that all of these lessons, if, if we need people to listen and we needed to listen when they're young and if we can, we need them to listen when they're older. So that the hate that we are seeing and the separations and the lack of empathy for each other starts to go away a bit. And for me as an engineer, I want to build things that matter. So having the ability to work on systems that do that, it's been an amazing career for me and continues to be. Speaker 2 00:52:57 Is there any media that's been generated from the archive as you've worked on it over the years that has really stuck out? Yeah. Cause it seems like it's such a, Speaker 1 00:53:10 One of the reasons I'm talking about. I work at that library. I do some other things is that I had heard early on. Um, and I continue to hear from professionals that exposure to too much genocidal content is not good for you, especially over time. Um, while it's very good for you in terms of learning to humanize, you know, 26 years of it is not great. I think there are stories that I've heard that I don't think I'll ever forget. Uh, and the imagery that comes with those stories, I there's one particular, one of a woman who was trying to take care of some kids who'd been orphaned out in the woods. She went into town for some food came back and all the kids were slaughtered and hearing her talk about that was unbelievable. You know, I, I have a 13 year old daughter here. Speaker 1 00:54:03 So when I hear about these kinds of things that are happening, that happened, it gives him what people are willing to do to kids. And they're able to do them, not just adults, just MINDBODY, but you know, there's uplifting stuff too. I mean, these are some of the toughest people. They spent their high school years in concentration camps, and it is amazing how strong human beings can be. You know, they went from slavery to being successful people in the world and they did it either through religion or belief or whatever it was that they could, but hearing how they survive is. Yeah. Speaker 2 00:54:40 Yeah, definitely. And in the end, the worst of circumstances, I mean, not just survive but thrive. Yeah, exactly. Speaker 1 00:54:48 Later on in life, besides the nanny nanny booboo aspects of it. It's really very quick. Speaker 2 00:54:55 Thank you so much for joining us today. Sam Guzman, chief technology officer at the USC Shoah foundation, do you want to hear from a particular expert in the industry about a particular subject, send us an email to workflow [email protected] or at Jessa pro on Twitter, and please subscribe to the workflow show. So, you know, when you can expect to get more workflow therapy and don't forget to share us with your friends, I'm Jason Whetstone, senior workflow engineer, and I'm Ben Kilburg senior solutions architect, and Ben also records an edits. The show, the workflow show is a production of Chesapeake systems and more banana productions. Thanks for listening and make it a great day.

Show Notes

Episode Transcript

Other Episodes

Episode 0

#22 "Examining the success and future of CatDV" with CEO Dave Clack

Episode 0

#68 API First Live Video and Media Ingest with Cinedeck

Episode 0

#9 "In the Year 2525 . . ."