Software Sessions

Scott Hanselman on AI-assisted programming

Wed, 11 Mar 2026 14:40:54 -0000

Scott Hanselman is the the VP of Developer Community at Microsoft.

We discuss the ambiguity and non-determinism of agentic loops, expressing clear intent to steer models, the importance of knowing fundamentals, and why saying you vibed something means you don't care.

This is an extended version of an interview posted on Software Engineering Radio.

Transcript available here

Bryan Cantrill on Oxide Computer

Fri, 27 Feb 2026 05:28:12 -0000

Bryan Cantrill is the co-founder and CTO of Oxide Computer Company.

We discuss why the biggest cloud providers don't use off the shelf hardware, how scaling data centers at samsung's scale exposed problems with hard drive firmware, how the values of NodeJS are in conflict with robust systems, choosing Rust, and the benefits of Oxide Computer's rack scale approach.

This is an extended version of an interview posted on Software Engineering Radio.

Transcript available here

Elizabeth Figura on Wine and Proton

Wed, 24 Sep 2025 15:21:41 -0000

Elizabeth Figura is a Wine developer at Code Weavers.

We discuss how Wine and Proton make it possible to run Windows applications on other operating systems.

Transcript

You can help correct transcripts on GitHub.

Intro

[00:00:00] Jeremy: Today I am talking to Elizabeth Figuera. She's a wine developer at Code Weavers. And today we're gonna talk about what that is and, uh, all the work that goes into it.

[00:00:09] Elizabeth: Thank you Jeremy. I'm glad to be here.

What's Wine

[00:00:13] Jeremy: I think the first thing we should talk about is maybe saying what Wine is because I think a lot of people aren't familiar with the project.

[00:00:20] Elizabeth: So wine is a translation layer. in fact, I would say wine is a Windows emulator. That is what the name originally stood for. it re implements the entire windows. Or you say win 32 API. so that programs that make calls into the API, will then transfer that code to wine and and we allow that Windows programs to run on, things that are not windows. So Linux, Mac, os, other operating systems such as Solaris and BSD. it works not by emulating the CPU, but by re-implementing every API, basically from scratch and translating them to their equivalent or writing new code in case there is no, you know, equivalent.

System Calls

[00:01:06] Jeremy: I believe what you're doing is you're emulating system calls. Could you explain what those are and, and how that relates to the project?

[00:01:15] Elizabeth: Yeah. so system call in general can be used, referred to a call into the operating system, to execute some functionality that's built into the operating system. often it's used in the context of talking to the kernel windows applications actually tend to talk at a much higher level, because there's so much, so much high level functionality built into Windows. When you think about, as opposed to other operating systems that we basically, we end up end implementing much higher level behavior than you would on Linux.

[00:01:49] Jeremy: And can you give some examples of what some of those system calls would be and, I suppose how they may be higher level than some of the Linux ones.

[00:01:57] Elizabeth: Sure. So of course you have like low level calls like interacting with a file system, you know, created file and read and write and such. you also have, uh, high level APIs who interact with a sound driver.

[00:02:12] Elizabeth: There's, uh, one I was working on earlier today, called XAudio where you, actually, you know, build this bank of of sounds. It's meant to be, played in a game and then you can position them in various 3D space. And the, and the operating system in a sense will, take care of all of the math that goes into making that work.

[00:02:36] Elizabeth: That's all running on your computer and. And then it'll send that audio data to the sound card once it's transformed it. So it sounds like it's coming from a certain space. a lot of other things like, you know, parsing XML is another big one. That there's a lot of things. The, there, the, the, the space is honestly huge

[00:02:59] Jeremy: And yeah, I can sort of see how those might be things you might not expect to be done by the operating system. Like you gave the example of 3D audio and XML parsing and I think XML parsing in, in particular, you would've thought that that would be something that would be handled by the, the standard library of whatever language the person was writing their application as.

[00:03:22] Jeremy: So that's interesting that it's built into the os.

[00:03:25] Elizabeth: Yeah. Well, and languages like, see it's not, it isn't even part of the standard library. It's higher level than that. It's, you have specific libraries that are widespread but not. Codified in a standard, but in Windows you, in Windows, they are part of the operating system. And in fact, there's several different, XML parsers in the operating system. Microsoft likes to deprecate old APIs and make new ones that do the same thing very often.

[00:03:53] Jeremy: And something I've heard about Windows is that they're typically very reluctant to break backwards compatibility. So you say they're deprecated, but do they typically keep all of them still in there?

[00:04:04] Elizabeth: It all still It all still works.

[00:04:07] Jeremy: And that's all things that wine has to implement as well to make sure that the software works as well.

[00:04:14] Jeremy: Yeah.

[00:04:14] Elizabeth: Yeah. And, and we also, you know, need to make it work. we also need to implement those things to make old, programs work because there is, uh, a lot of demand, at least from, at least from people using wine for making, for getting some really old programs, working from the. Early nineties even.

What people run with Wine (Productivity, build systems, servers)

[00:04:36] Jeremy: And that's probably a good, thing to talk about in terms of what, what are the types of software that, that people are trying to run with wine, and what operating system are they typically using?

[00:04:46] Elizabeth: Oh, in terms of software, literally all kinds, any software you can imagine that runs on Windows, people will try to run it on wine. So we're talking games, office software productivity, software accounting. people will run, build systems on wine, build their, just run, uh, build their programs using, on visual studio, running on wine. people will run wine on servers, for example, like software as a service kind of things where you don't even know that it's running on wine. really super domain specific stuff. Like I've run astronomy, software, and wine. Design, computer assisted design, even hardware drivers can sometimes work unwind. There's a bit of a gray area.

How games are different

[00:05:29] Jeremy: Yeah, it's um, I think from. Maybe the general public, or at least from what I've seen, I think a lot of people's exposure to it is for playing games. is there something different about games versus all those other types of, productivity software and office software that, that makes supporting those different.

[00:05:53] Elizabeth: Um, there's some things about it that are different. Games of course have gotten a lot of publicity lately because there's been a huge push, largely from valve, but also some other companies to get. A lot of huge, wide range of games working well under wine. And that's really panned out in the, in a way, I think, I think we've largely succeeded.

[00:06:13] Elizabeth: We've made huge strides in the past several years. 5, 5, 10 years, I think. so when you talk about what makes games different, I think, one thing games tend to do is they have a very limited set of things they're working with and they often want to make things run fast, and so they're working very close to the me They're not, they're not gonna use an XML parser, for example.

[00:06:44] Elizabeth: They're just gonna talk directly as, directly to the graphics driver as they can. Right. And, and probably going to do all their own sound design. You know, I did talk about that XAudio library, but a lot of games will just talk directly as, directly to the sound driver as Windows Let some, so this is a often a blessing, honestly, because it means there's less we have to implement to make them work. when you look at a lot of productivity applications, and especially, the other thing that makes some productivity applications harder is, Microsoft makes 'em, and They like to, make a library, for use in this one program like Microsoft Office and then say, well, you know, other programs might use this as well. Let's. Put it in the operating system and expose it and write an API for it and everything. And maybe some other programs use it. mostly it's just office, but it means that office relies on a lot of things from the operating system that we all have to reimplement.

[00:07:44] Jeremy: Yeah, that's somewhat counterintuitive because when you think of games, you think of these really high performance things that that seem really complicated. But it sounds like from what you're saying, because they use the lower level primitives, they're actually easier in some ways to support.

[00:08:01] Elizabeth: Yeah, certainly in some ways, they, yeah, they'll do things like re-implement the heap allocator because the built-in heap allocator isn't fast enough for them. That's another good example.

What makes some applications hard to support (Some are hard, can't debug other people's apps)

[00:08:16] Jeremy: You mentioned Microsoft's more modern, uh, office suites. I, I've noticed there's certain applications that, that aren't supported. Like, for example, I think the modern Adobe Creative Suite. What's the difference with software like that and does that also apply to the modern office suite, or is, or is that actually supported?

[00:08:39] Elizabeth: Well, in one case you have, things like Microsoft using their own APIs that I mentioned with Adobe. That applies less, I suppose, but I think to some degree, I think to some degree the answer is that some applications are just hard and there's, and, and there's no way around it. And, and we can only spend so much time on a hard application. I. Debugging things. Debugging things can get very hard with wine. Let's, let me like explain that for a minute because, Because normally when you think about debugging an application, you say, oh, I'm gonna open up my debugger, pop it in, uh, break at this point, see what like all the variables are, or they're not what I expect. Or maybe wait for it to crash and then get a back trace and see where it crashed. And why you can't do that with wine, because you don't have the application, you don't have the symbols, you don't have your debugging symbols. You don't know anything about the code you're running unless you take the time to disassemble and decompile and read through it. And that's difficult every time. It's not only difficult, every time I've, I've looked at a program and been like, I really need to just. I'm gonna just try and figure out what the program is doing.

[00:10:00] Elizabeth: It takes so much time and it is never worth it. And sometimes you have to, sometimes you have no other choice, but usually you end up, you ask to rely on seeing what calls it makes into the operating system and trying to guess which one of those is going wrong. Now, sometimes you'll get lucky and it'll crash in wine code, or sometimes it'll make a call into, a function that we don't implement yet, and we know, oh, we need to implement that function. But sometimes it does something, more obscure and we have to figure out, well, like all of these millions of calls it made, which one of them is, which one of them are we implementing incorrectly? So it's returning the wrong result or not doing something that it should. And, then you add onto that the. You know, all these sort of harder to debug things like memory errors that we could make. And it's, it can be very difficult and so sometimes some applications just suffer from those hard bugs. and sometimes it's also just a matter of not enough demand for something for us to spend a lot of time on it.

[00:11:11] Elizabeth: Right.

[00:11:14] Jeremy: Yeah, I can see how that would be really challenging because you're, like you were saying, you don't have the symbols, so you don't have the source code, so you don't know what any of this software you're supporting, how it was actually written. And you were saying that I. A lot of times, you know, there may be some behavior that's wrong or a crash, but it's not because wine crashed or there was an error in wine.

[00:11:42] Jeremy: so you just know the system calls it made, but you don't know which of the system calls didn't behave the way that the application expected.

[00:11:50] Elizabeth: Exactly.

Test suite (Half the code is tests)

[00:11:52] Jeremy: I can see how that would be really challenging. and wine runs so many different applications. I'm, I'm kind of curious how do you even track what's working and what's not as you, you change wine because if you support thousands or tens thousands of applications, you know, how do you know when you've got a, a regression or not?

[00:12:15] Elizabeth: So, it's a great question. Um, probably over half of wine by like source code volume. I actually actually check what it is, but I think it's, i, I, I think it's probably over half is what we call is tests. And these tests serve two purposes. The one purpose is a regression test. And the other purpose is they're conformance tests that test, that test how, uh, an API behaves on windows and validates that we are behaving the same way. So we write all these tests, we run them on windows and you know, write the tests to check what the windows returns, and then we run 'em on wine and make sure that that matches. and we have just such a huge body of tests to make sure that, you know, we're not breaking anything. And that every, every, all the code that we, that we get into wine that looks like, wow, it's doing that really well. Nope, that's what Windows does. The test says so. So pretty much any code that we, any new code that we get, it has to have tests to validate, to, to demonstrate that it's doing the right thing.

[00:13:31] Jeremy: And so rather than testing against a specific application, seeing if it works, you're making a call to a Windows system call, seeing how it responds, and then making the same call within wine and just making sure they match.

[00:13:48] Elizabeth: Yes, exactly. And that is obviously, or that is a lot more, automatable, right? Because otherwise you have to manually, you know, there's all, these are all graphical applications.

[00:14:02] Elizabeth: You'd have to manually do the things and make sure they work. Um, but if you write automateable tests, you can just run them all and the machine will complain at you if it fails it continuous integration.

How compatibility problems appear to users

[00:14:13] Jeremy: And because there's all these potential compatibility issues where maybe a certain call doesn't behave the way an application expects. What, what are the types of what that shows when someone's using software? I mean, I, I think you mentioned crashes, but I imagine there could be all sorts of other types of behavior.

[00:14:37] Elizabeth: Yes, very much so. basically anything, anything you can imagine again is, is what will happen. You can have, crashes are the easy ones because you know when and where it crashed and you can work backwards from there. but you can also get, it can, it could hang, it could not render, right? Like maybe render a black screen. for, you know, for games you could very frequently have, graphical glitches where maybe some objects won't render right? Or the entire screen will be read. Who knows? in a very bad case, you could even bring down your system and we usually say that's not wine's fault. That's the graphics library's fault. 'cause they're not supposed to do that, uh, no matter what we do. But, you know, sometimes we have to work around that anyway. but yeah, there's, there's been some very strange and idiosyncratic bugs out there too.

[00:15:33] Jeremy: Yeah. And like you mentioned that uh, there's so many different things that could have gone wrong that imagine's very difficult to find. Yeah. And when software runs through wine, I think,

Performance is comparable to native

[00:15:49] Jeremy: A lot of our listeners will probably be familiar with running things in a virtual machine, and they know that there's a big performance impact from doing that.

[00:15:57] Jeremy: How does the performance of applications compare to running natively on the original Windows OS versus virtual machines?

[00:16:08] Elizabeth: So. In theory. and I, I haven't actually done this recently, so I can't speak too much to that, but in theory, the idea is it's a lot faster. so there, there, is a bit of a joke acronym to wine. wine is not an emulator, even though I started out by saying wine is an emulator, and it was originally called a Windows emulator. but what this basically means is wine is not a CPU emulator. It doesn't, when you think about emulators in a general sense, they're often, they're often emulators for specific CPUs, often older ones like, you know, the Commodore emulator or an Amiga emulator. but in this case, you have software that's written for an x86 CPU. And it's running on an x86 CPU by giving it the same instructions that it's giving on windows. It's just that when it says, now call this Windows function, it calls us instead. So that all should perform exactly the same. The only performance difference at that point is that all should perform exactly the same as opposed to a, virtual machine where you have to interpret the instructions and maybe translate them to a different instruction set. The only performance difference is going to be, in the functions that we are implementing themselves and we try to, we try to implement them to perform. As well, or almost as well as windows. There's always going to be a bit of a theoretical gap because we have to translate from say, one API to another, but we try to make that as little as possible. And in some cases, the operating system we're running on is, is just better than Windows and the libraries we're using are better than Windows.

[00:18:01] Elizabeth: And so our games will run faster, for example. sometimes we can, sometimes we can, do a better job than Windows at implementing something that's, that's under our purview. there there are some games that do actually run a little bit faster in wine than they do on Windows.

[00:18:22] Jeremy: Yeah, that, that reminds me of how there's these uh, gaming handhelds out now, and some of the same ones, they have a, they either let you install Linux or install windows, or they just come with a pre-installed, and I believe what I've read is that oftentimes running the same game on both operating systems, running the same game on Linux, the battery life is better and sometimes even the performance is better with these handhelds.

[00:18:53] Jeremy: So it's, it's really interesting that that can even be the case.

[00:18:57] Elizabeth: Yeah, it's really a testament to the huge amount of work that's gone into that, both on the wine side and on the, side of the graphics team and the colonel team. And, and of course, you know, the years of, the years of, work that's gone into Linux, even before these gaming handhelds were, were even under consideration.

Proton and Valve Software's role

[00:19:21] Jeremy: And something. So for people who are familiar with the handhelds, like the steam deck, they may have heard of proton. Uh, I wonder if you can explain what proton is and how it relates to wine.

[00:19:37] Elizabeth: Yeah. So, proton is basically, how do I describe this? So, proton is a sort of a fork, uh, although we try to avoid the term fork. It's a, we say it's a downstream distribution because we contribute back up to wine. so it is a, it is, it is a alternate distribution fork of wine. And it's also some code that basically glues wine into, an embedding application originally intended for steam, and developed for valve. it has also been used in, others, but it has also been used in other software. it, so where proton differs from wine besides the glue part is it has some, it has some extra hacks in it for bugs that are hard to fix and easy to hack around as some quick hacks for, making games work now that are like in the process of going upstream to wine and getting their code quality improved and going through review.

[00:20:54] Elizabeth: But we want the game to work now, when we distribute it. So that'll, that'll go into proton immediately. And then once we have, once the patch makes it upstream, we replace it with the version of the patch from upstream. there's other things to make it interact nicely with steam and so on. And yeah, I think, yeah, I think that's, I got it.

[00:21:19] Jeremy: Yeah. And I think for people who aren't familiar, steam is like this, um, I, I don't even know what you call it, like a gaming store and a

[00:21:29] Elizabeth: store game distribution service. it's got a huge variety of games on it, and you just publish. And, and it's a great way for publishers to interact with their, you know, with a wider gaming community, uh, after it, just after paying a cut to valve of their profits, they can reach a lot of people that way. And because all these games are on team and, valve wants them to work well on, on their handheld, they contracted us to basically take their entire catalog, which is huge, enormous. And trying and just step by step. Fix every game and make them all work.

[00:22:10] Jeremy: So, um, and I guess for people who aren't familiar Valve, uh, softwares the company that runs steam, and so it sounds like they've asked, uh, your company to, to help improve the compatibility of their catalog.

[00:22:24] Elizabeth: Yes. valve contracted us and, and again, when you're talking about wine using lower level libraries, they've also contracted a lot of other people outside of wine. Basically, the entire stack has had a tremendous, tremendous investment by valve software to make gaming on Linux work. Well.

The entire stack receives changes to improve Wine compatibility

[00:22:48] Jeremy: And when you refer to the entire stack, like what are some, some of those pieces, at least at a high level.

[00:22:54] Elizabeth: I, I would, let's see, let me think. There is the wine project, the. Mesa Graphics Libraries. that's a, that's another, you know, uh, open source, software project that existed, has existed for a long time. But Valve has put a lot of, uh, funding and effort into it, the Linux kernel in various different ways.

[00:23:17] Elizabeth: the, the desktop, uh, environment and Window Manager for, um, are also things they've invested in.

[00:23:26] Jeremy: yeah. Everything that the game needs, on any level and, and that the, and that the operating system of the handheld device needs.

Wine's history

[00:23:37] Jeremy: And wine's been going on for quite a while. I think it's over a decade, right?

[00:23:44] Elizabeth: I believe. Oh, more than, oh, far more than a decade. I believe it started in 1990, I wanna say about 1995, mid nineties. I'm, I probably have that date wrong. I believe Wine started about the mid nineties.

[00:24:00] Jeremy: Mm.

[00:24:00] Elizabeth: it's going on for three decades at this rate.

[00:24:03] Jeremy: Wow. Okay.

[00:24:06] Jeremy: And so all this time, how has the, the project sort of sustained itself? Like who's been involved and how has it been able to keep going this long?

[00:24:18] Elizabeth: Uh, I think as is the case with a lot of free software, it just, it just keeps trudging along. There's been. There's been times where there's a lot of interest in wine. There's been times where there's less, and we are fortunate to be in a time where there's a lot of interest in it. we've had the same maintainer for almost this entire, almost this entire existence. Uh, Alexander Julliard, there was one person starting who started, maintained it before him and, uh, left it maintainer ship to him after a year or two. Uh, Bob Amstat. And there has been a few, there's been a few developers who have been around for a very long time. a lot of developers who have been around for a decent amount of time, but not for the entire duration. And then a very, very large number of people who come and submit a one-off fix for their individual application that they want to make work.

[00:25:19] Jeremy: How does crossover relate to the wine project? Like, it sounds like you had mentioned Valve software hired you for subcontract work, but crossover itself has been around for quite a while. So how, how has that been connected to the wine project?

[00:25:37] Elizabeth: So I work for, so the, so the company I work for is Code Weavers and, crossover is our flagship software. so Code Weavers is a couple different things. We have a sort of a porting service where companies will come to us and say, can we port my application usually to Mac? And then we also have a retail service where Where we basically have our own, similar to Proton, but you know, older, but the same idea where we will add some hacks into it for very difficult to solve bugs and we have a, a nice graphical interface. And then, the other thing that we're selling with crossover is support. So if you, you know, try to run a certain application and you buy crossover, you can submit a ticket saying this doesn't work and we now have a financial incentive to fix it. You know, we'll try to, we'll try to fix your, we'll spend company resources to fix your bug, right? So that's been so, so code we v has been around since 1996 and crossover, I don't know the date, but it's crossover has been around for probably about two decades, if I'm not mistaken.

[00:27:01] Jeremy: And when you mention helping companies port their software to, for example, MacOS.

[00:27:07] Jeremy: Is the approach that you would port it natively to MacOS APIs or is it that you would help them get it running using wine on MacOS?

[00:27:21] Elizabeth: Right. That's, so that's basically what makes us so unique among porting companies is that instead of rewriting their software, we just, we just basically stick it inside of crossover and, uh, and, and make it run.

[00:27:36] Elizabeth: And the idea has always been, you know, the more we implement, the more we get correct, the, the more applications will, you know, work. And sometimes it works out that way. Sometimes not really so much. And there's always work we have to do to get any given application to work, but. Yeah, so it's, it's very unusual because we don't ask companies for any of their code. We don't need it. We just fix the windows API

[00:28:07] Jeremy: And, and so in that case, the ports would be let's say someone sells a MacOS version of their software. They would bundle crossover, uh, with their software.

[00:28:18] Elizabeth: Right? And usually when you do this, it doesn't look like there's crossover there. Like it just looks like this software is native, but there is soft, there is crossover under the hood.

Loading executables and linked libraries

[00:28:32] Jeremy: And so earlier we were talking about how you're basically intercepting the system calls that these binaries are making, whether that's the executable or the, the DLLs from Windows. Um, but I think probably a lot of our listeners are not really sure how that's done. Like they, they may have built software, but they don't know, how do I basically hijack, the system calls that this application is making.

[00:29:01] Jeremy: So maybe you could talk a little bit about how that works.

[00:29:04] Elizabeth: So there, so there's a couple steps to go into it. when you think about a program that's say, that's a big, a big file that's got all the machine code in it, and then it's got stuff at the beginning saying, here's how the program works and here's where in the file the processor should start running. that's, that's your EXE file. And then in your DLL files are libraries that contain shared code and you have like a similar sort of file. It says, here's the entry point. That runs this function, this, you know, this pars XML function or whatever have you.

[00:29:42] Elizabeth: And here's this entry point that has the generate XML function and so on and so forth. And, and, then the operating system will basically take the EXE file and see all the bits in it. Say I want to call the pars XML function. It'll load that DLL and hook it up. So it, so the processor ends up just seeing jump directly to this pars XML function and then run that and then return and so on.

[00:30:14] Elizabeth: And so what wine does, is it part of wine? That's part of wine is a library, is that, you know, the implementing that parse XML and read XML function, but part of it is the loader, which is the part of the operating system that hooks everything together. And when we load, we. Redirect to our libraries. We don't have Windows libraries.

[00:30:38] Elizabeth: We like, we redirect to ours and then we run our code. And then when you jump back to the program and yeah.

[00:30:48] Jeremy: So it's the, the loader that's a part of wine. That's actually, I'm not sure if running the executable is the right term.

[00:30:58] Elizabeth: no, I think that's, I think that's a good term. It's, it's, it's, it starts in a loader and then we say, okay, now run the, run the machine code and it's executable and then it runs and it jumps between our libraries and back and so on.

[00:31:14] Jeremy: And like you were saying before, often times when it's trying to make a system call, it ends up being handled by a function that you've written in wine. And then that in turn will call the, the Linux system calls or the MacOS system calls to try and accomplish the, the same result.

[00:31:36] Elizabeth: Right, exactly.

[00:31:40] Jeremy: And something that I think maybe not everyone is familiar with is there's this concept of user space versus kernel space. you explain what the difference is?

[00:31:51] Elizabeth: So the way I would explain, the way I would describe a kernel is it's the part of the operating system that can do anything, right? So any program, any code that runs on your computer is talking to the processor, and the processor has to be able to do anything the computer can do.

[00:32:10] Elizabeth: It has to be able to talk to the hardware, it has to set up the memory space. That, so actually a very complicated task has to be able to switch to another task. and, and, and, and basically talk to another program and. You have to have something there that can do everything, but you don't want any program to be able to do everything. Um, not since the, not since the nineties. It's about when we realized that we can't do that. so the kernel is a part that can do everything. And when you need to do something that requires those, those permissions that you can't give everyone, you have to talk to the colonel and ask it, Hey, can you do this for me please? And in a very restricted way where it's only the safe things you can do. And a degree, it's also like a library, right? It's the kernel. The kernels have always existed, and since they've always just been the core standard library of the computer that does the, that does the things like read and write files, which are very, very complicated tasks under the hood, but look very simple because all you say is write this file. And talk to the hardware and abstract away all the difference between different drivers. So the kernel is doing all of these things. So because the kernel is a part that can do everything and because when you think about the kernel, it is basically one program that is always running on your computer, but it's only one program. So when a user calls the kernel, you are switching from one program to another and you're doing a lot of complicated things as part of this. You're switching to the higher privilege level where you can do anything and you're switching the state from one program to another. And so it's a it. So this is what we mean when we talk about user space, where you're running like a normal program and kernel space where you've suddenly switched into the kernel.

[00:34:19] Elizabeth: Now you're executing with increased privileges in a different. idea of the process space and increased responsibility and so on.

[00:34:30] Jeremy: And, and so do most applications. When you were talking about the system calls for handling 3D audio or parsing XML. Are those considered, are those system calls considered part of user space and then those things call the kernel space on your behalf, or how, how would you describe that?

[00:34:50] Elizabeth: So most, so when you look at Windows, most of most of the Windows library, the vast, vast majority of it is all user space. most of these libraries that we implement never leave user space. They never need to call into the kernel. there's the, there only the core low level stuff. Things like, we need to read a file, that's a kernel call. when you need to sleep and wait for some seconds, that's a kernel. Need to talk to a different process. Things that interact with different processes in general. not just allocate memory, but allocate a page of memory, like a, from the memory manager and then that gets sub allocated by the heap allocator. so things like that.

[00:35:31] Jeremy: Yeah, so if I was writing an application and I needed to open a file, for example, does, does that mean that I would have to communicate with the kernel to, to read that file?

[00:35:43] Elizabeth: Right, exactly.

[00:35:46] Jeremy: And so most applications, it sounds like it's gonna be a mixture. You're gonna have a lot of things that call user space calls. And then a few, you mentioned more low level ones that are gonna require you to communicate with the kernel.

[00:36:00] Elizabeth: Yeah, basically. And it's worth noting that in, in all operating systems, you're, you're almost always gonna be calling a user space library. That might just be a thin wrapper over the kernel call. It might, it's gonna do like just a little bit of work in end call the kernel.

[00:36:19] Jeremy:

[00:36:19] Elizabeth: In fact, in Windows, that's the only way to do it. Uh, in many other operating systems, you can actually say, you can actually tell the processor to make the kernel call. There is a special instruction that does this and just, and it'll go directly to the kernel, and there's a defined interface for this. But in Windows, that interface is not defined. It's not stable. Or backwards compatible like the rest of Windows is. So even if you wanted to use it, you couldn't. and you basically have to call into the high level libraries or low level libraries, as it were, that, that tell you that create a file. And those don't do a lot.

[00:37:00] Elizabeth: They just kind of tweak their parameters a little and then pass them right down to the kernel.

[00:37:07] Jeremy: And so wine, it sounds like it needs to implement both the user space calls of windows, but then also the, the kernel, calls as well. But, but wine itself does that, is that only in Linux user space or MacOS user space?

[00:37:27] Elizabeth: Yes. This is a very tricky thing. but all of wine, basically all of what is wine runs in, in user space and we use. Kernel calls that are already there to talk to the colonel, to talk to the host Colonel. You have to, and you, you get, you get, you get the sort of second nature of thinking about the Windows, user space and kernel.

[00:37:50] Elizabeth: And then there's a host user space and Kernel and wine is running all in user, in the user, in the host user space, but it's emulating the Windows kernel. In fact, one of the weirdest, trickiest parts is I mentioned that you can run some drivers in wine. And those drivers actually, they actually are, they think they're running in the Windows kernel. which in a sense works the same way. It has libraries that it can load, and those drivers are basically libraries and they're making, kernel calls and they're, they're making calls into the kernel library that does some very, very low level tasks that. You're normally only supposed to be able to do in a kernel. And, you know, because the kernel requires some privileges, we kind of pretend we have them. And in many cases, you're even the drivers are using abstractions. We can just implement those abstractions kind of over the slightly higher level abstractions that exist in user space.

[00:39:00] Jeremy: Yeah, I hadn't even considered the being able to use hardware devices, but I, I suppose if in, in the end, if you're reproducing the kernel, then whether you're running software or you're talking to a hardware device, as long as you implement the calls correctly, then I, I suppose it works.

[00:39:18] Elizabeth: Cause you're, you're talking about device, like maybe it's some kind of USB device that has drivers for Windows, but it doesn't for, for Linux.

[00:39:28] Elizabeth: no, that's exactly, that's a, that's kind of the, the example I've used. Uh, I think there is, I think I. My, one of my best success stories was, uh, drivers for a graphing calculator.

[00:39:41] Jeremy: Oh, wow.

[00:39:42] Elizabeth: That connected via USB and I basically just plugged the windows drivers into wine and, and ran it. And I had to implement a lot of things, but it worked. But for example, something like a graphics driver is not something you could implement in wine because you need the graphics driver on the host. We can't talk to the graphics driver while the host is already doing so.

[00:40:05] Jeremy: I see. Yeah. And in that case it probably doesn't make sense to do so

[00:40:11] Elizabeth: Right?

[00:40:12] Elizabeth: Right. It doesn't because, the transition from user into kernel is complicated. You need the graphics driver to be in the kernel and the real kernel. Having it in wine would be a bad idea. Yeah.

[00:40:25] Jeremy: I, I think there's, there's enough APIs you have to try and reproduce that. I, I think, uh, doing, doing something where,

[00:40:32] Elizabeth: very difficult

[00:40:33] Jeremy: right.

Poor system call documentation and private APIs

[00:40:35] Jeremy: There's so many different, calls both in user space and in kernel space. I imagine the, the user space ones Microsoft must document to some extent, but, oh. Is that, is that a

[00:40:51] Elizabeth: well, sometimes,

[00:40:54] Jeremy: Sometimes. Okay.

[00:40:55] Elizabeth: I think it's actually better now than it used to be. But some, here's where things get fun, because sometimes there will be, you know, regular documented calls. Sometimes those calls are documented, but the documentation isn't very good. Sometimes programs will just sort of look inside Microsoft's DLLs and use calls that they aren't supposed to be using. Sometimes they use calls that they are supposed to be using, but the documentation has disappeared. just because it's that old of an API and Microsoft hasn't kept it around. sometimes some, sometimes Microsoft, Microsoft own software uses, APIs that were never documented because they never wanted anyone else using them, but they still ship them with the operating system. there was actually a kind of a lawsuit about this because it is an antitrust lawsuit, because by shipping things that only they could use, they were kind of creating a trust. and that got some things documented. At least in theory, they kind of haven't stopped doing it, though.

[00:42:08] Jeremy: Oh, so even today they're, they're, I guess they would call those private, private APIs, I suppose.

[00:42:14] Elizabeth: I suppose. Uh, yeah, you could say private APIs. but if we want to get, you know, newer versions of Microsoft Office running, we still have to figure out what they're doing and implement them.

[00:42:25] Jeremy: And given that they're either, like you were saying, the documentation is kind of all over the place. If you don't know how it's supposed to behave, how do you even approach implementing them?

[00:42:38] Elizabeth: and that's what the conformance tests are for. And I, yeah, I mentioned earlier we have this huge body of conformance tests that double is regression tests. if we see an API, we don't know what to do with or an API, we do know, we, we think we know what to do with because the documentation can just be wrong and often has been. Then we write tests to figure out what it's supposed to behave. We kind of guess until we, and, and we write tests and we pass some things in and see what comes out and see what. The see what the operating system does until we figure out, oh, so this is what it's supposed to do and these are the exact parameters in, and, and then we, and, and then we implement it according to those tests.

[00:43:24] Jeremy: Is there any distinction in approach for when you're trying to implement something that's at the user level versus the kernel level?

[00:43:33] Elizabeth: No, not really. And like I, and like I mentioned earlier, like, well, I mean, a kernel call is just like a library call. It's just done in a slightly different way, but it's still got, you know, parameters in, it's still got a set of parameters. They're just encoded differently. And, and again, like the, the way kernel calls are done is on a level just above the kernel where you have a library, that just passes things through. Almost verbatim to the kernel and we implement that library instead.

[00:44:10] Jeremy: And, and you've been working on i, I think, wine for over, over six years now.

[00:44:18] Elizabeth: That sounds about right.

Debugging and having broad knowledge of Wine

[00:44:20] Jeremy: What does, uh, your, your day to day look like? What parts of the project do you, do you work on?

[00:44:27] Elizabeth: It really varies from day to day. and I, I, a lot of people, a lot of, some people will work on the same parts of wine for years. Uh, some people will switch around and work on all sorts of different things.

[00:44:42] Elizabeth: And I'm, I definitely belong to that second group. Like if you name an area of wine, I have almost certainly contributed a patch or two to it. there's some areas I work on more than others, like, 3D graphics, multimedia, a, I had, I worked on a compiler that exists, uh, socket. So networking communication is another thing I work a lot on. day to day, I kind of just get, I, I I kind of just get a bug for some program or another. and I take it and I debug it and figure out why the program's broken and then I fix it. And there's so much variety in that. because a bug can take so many different forms like I described, and, and, and the, and then the fix can be simple or complicated or, and it can be in really anywhere to a degree.

[00:45:40] Elizabeth: being able to work on any part of wine is sometimes almost a necessity because if a program is just broken, you don't know why. It could be anything. It could be any sort of API. And sometimes you can hand the API to somebody who's got a lot of experience in that, but sometimes you just do whatever. You just fix whatever's broken and you get an experience that way.

[00:46:06] Jeremy: Yeah, I mean, I was gonna ask about the specialized skills to, to work on wine, but it sounds like maybe in your case it's all of them.

[00:46:15] Elizabeth: It's, there's a bit of that. it's a wine. We, the skills to work on wine are very, it's a very unique set of skills because, and it largely comes down to debugging because you can't use the tools you normally use debug.

[00:46:30] Elizabeth: You have to, you have to be creative and think about it different ways. Sometimes you have to be very creative. and programs will try their hardest to avoid being debugged because they don't want anyone breaking their copy protection, for example, or or hacking, or, you know, hacking in sheets. They want to be, they want, they don't want anyone hacking them like that.

[00:46:54] Elizabeth: And we have to do it anyway for good and legitimate purposes. We would argue to make them work better on more operating systems. And so we have to fight that every step of the way.

[00:47:07] Jeremy: Yeah, it seems like it's a combination of. F being able, like you, you were saying, being able to, to debug. and you're debugging not necessarily your own code, but you're debugging this like behavior of,

[00:47:25] Jeremy: And then based on that behavior, you have to figure out, okay, where in all these different systems within wine could this part be not working?

[00:47:35] Jeremy: And I, I suppose you probably build up some kind of, mental map in your head of when you get a, a type of bug or a type of crash, you oh, maybe it's this, maybe it's here, or something

[00:47:47] Elizabeth: Yeah. That, yeah, there is a lot of that. there's, you notice some patterns, you know, after experience helps, but because any bug could be new, sometimes experience doesn't help and you just, you just kind of have to start from scratch.

Finding a bug related to XAudio

[00:48:08] Jeremy: At sort of a high level, can you give an example of where you got a specific bug report and then where you had to look to eventually find which parts of the the system were the issue?

[00:48:21] Elizabeth: one, one I think good example, that I've done recently. so I mentioned this, this XAudio library that does 3D audio. And if you say you come across a bug, I'm gonna be a little bit generics here and say you come across a bug where some audio isn't playing right, maybe there's, silence where there should be the audio. So you kind of, you look in and see, well, where's that getting lost? So you can basically look in the input calls and say, here's the buffer it's submitting that's got all the audio data in it. And you look at the output, you look at where you think the output should be, like, that library will internally call a different library, which programs can interact with directly.

[00:49:03] Elizabeth: And this our high level library interacts with that is the, give this sound to the audio driver, right? So you've got XAudio on top of, um. mdev, API, which is the other library that gives audio to the driver. And you see, well, the ba the buffer is that XAudio is passing into MM Dev, dev API. They're empty, there's nothing in them. So you have to kind of work through the XAudio library to see where is, where's that sound getting lost? Or maybe, or maybe that's not getting lost. Maybe it's coming through all garbled. And I've had to look at the buffer and see why is it garbled. I'll open up it up in Audacity and look at the weight shape of the wave and say, huh, that shape of the wave looks like it's, it looks like we're putting silence every 10 nanoseconds or something, or, or reversing something or interpreting it wrong. things like that. Um, there's a lot of, you'll do a lot of, putting in print fs basically all throughout wine to see where does the state change. Where was, where is it? Where is it? Right? And then where do things start going wrong?

[00:50:14] Jeremy: Yeah. And in the audio example, because they're making a call to your XAudio implementation, you can see that Okay, the, the buffer, the audio that's coming in. That part is good. It, it's just that later on when it sends it to what's gonna actually have it be played by the, the hardware, that's when missing. So,

[00:50:37] Elizabeth: We did something wrong in a library that destroyed the buffer. And I think on a very, high level a lot of debugging, wine is about finding where things are good and finding where things are bad, and then narrowing that down until we find the one spot where things go wrong. There's a lot of processes that go like that.

[00:50:57] Jeremy: like you were saying, the more you see these problems, hopefully the, the easier it gets to, to narrow down where,

[00:51:04] Elizabeth: Often. Yeah. Especially if you keep debugging things in the same area.

How much code is OS specific?c

[00:51:09] Jeremy: And wine supports more than one operating system. I, I saw there was Linux, MacOS I think free BSD. How much of the code is operating system specific versus how much can just be shared across all of them?

[00:51:27] Elizabeth: Not that much is operating system specific actually. so when you think about the volume of wine, the, the, the, vast majority of it is the high level code that doesn't need to interact with the operating system on a low level. Right? Because Windows keeps putting, because Microsoft keeps putting lots and lots of different libraries in their operating system. And a lot of these are high level libraries. and even when we do interact with the operating system, we're, we're using cross-platform libraries or we're using, we're using ics. The, uh, so all these operating systems that we are implementing are con, basically conformed to the posix standard. which is basically like Unix, they're all Unix based. Psic is a Unix based standard. Microsoft is, you know, the big exception that never did implement that. And, and so we have to translate its APIs to Unix, APIs. now that said, there is a lot of very operating system, specific code. Apple makes things difficult by try, by diverging almost wherever they can. And so we have a lot of Apple specific code in there.

[00:52:46] Jeremy: another example I can think of is, I believe MacOS doesn't support, Vulkan

[00:52:53] Elizabeth: yes. Yeah.Yeah, That's a, yeah, that's a great example of Mac not wanting to use, uh, generic libraries that work on every other operating system. and in some cases we, we look at it and are like, alright, we'll implement a wrapper for that too, on top of Yuri, on top of your, uh, operating system. We've done it for Windows, we can do it for Vulkan. and that's, and then you get the Molten VK project. Uh, and to be clear, we didn't invent molten vk. It was around before us. We have contributed a lot to it.

Direct3d, Vulkan, and MoltenVK

[00:53:28] Jeremy: Yeah, I think maybe just at a high level might be good to explain the relationship between Direct 3D or Direct X and Vulcan and um, yeah. Yeah. Maybe if you could go into that.

[00:53:42] Elizabeth: so Direct 3D is Microsoft's 3D API. the 3D APIs, you know, are, are basically a way to, they're way to firstly abstract out the differences between different graphics, graphics cards, which, you know, look very different on a hardware level.

[00:54:03] Elizabeth: Especially. They, they used to look very different and they still do look very different. and secondly, a way to deal with them at a high level because actually talking to the graphics card on a low level is very, very complicated. Even talking to it on a high level is complicated, but it gets, it can get a lot worse if you've ever been a, if you've ever done any graphics, driver development. so you have a, a number of different APIs that achieve these two goals of, of, abstraction and, and of, of, of building a common abstraction and of building a, a high level abstraction. so OpenGL is the broadly the free, the free operating system world, the non Microsoft's world's choice, back in the day.

[00:54:53] Elizabeth: And then direct 3D was Microsoft's API and they've and Direct 3D. And both of these have evolved over time and come up with new versions and such. And when any, API exists for too long. It gains a lot of croft and needs to be replaced. And eventually, eventually the people who developed OpenGL decided we need to start over, get rid of the Croft to make it cleaner and make it lower level.

[00:55:28] Elizabeth: Because to get in a maximum performance games really want low level access. And so they made Vulcan, Microsoft kind of did the same thing, but they still call it Direct 3D. they just, it's, it's their, the newest version of Direct 3D is lower level. It's called Direct 3D 12. and, and, Mac looked at this and they decided we're gonna do the same thing too, but we're not gonna use Vulcan.

[00:55:52] Elizabeth: We're gonna define our own. And they call it metal. And so when we want to translate D 3D 12 into something that another operating system understands. That's probably Vulcan. And, and on Mac, we need to translate it to metal somehow. And we decided instead of having a separate layer from D three 12 to metal, we're just gonna translate it to Vulcan and then translate the Vulcan to metal. And it also lets things written for Vulcan on Windows, which is also a thing that exists that lets them work on metal.

[00:56:30] Jeremy: And having to do that translation, does that have a performance impact or is that not really felt?

[00:56:38] Elizabeth: yes. It's kind of like, it's kind of like anything, when you talk about performance, like I mentioned this earlier, there's always gonna be overhead from translating from one API to another. But we try to, what we, we put in heroic efforts to. And try, try to make sure that doesn't matter, to, to make sure that stuff that needs to be fast is really as fast as it can possibly be.

[00:57:06] Elizabeth: And some very clever things have been done along those lines. and, sometimes the, you know, the graphics drivers underneath are so good that it actually does run better, even despite the translation overhead. And then sometimes to make it run fast, we need to say, well, we're gonna implement a new API that behaves more like windows, so we can do less work translating it. And that's, and sometimes that goes into the graphics library and sometimes that goes into other places.

Targeting Wine instead of porting applications

[00:57:43] Jeremy: Yeah. Something I've found a little bit interesting about the last few years is

[00:57:49] Jeremy: Developers in the past, they would generally target Windows and you might be lucky to get a Mac port or a Linux port. And I wonder, like, in your opinion now, now that a lot of developers are just targeting Windows and relying on wine or, or proton to, to run their software, is there any, I suppose, downside to doing that?

[00:58:17] Jeremy: Or is it all just upside, like everyone should target Windows as this common platform?

[00:58:23] Elizabeth: Yeah. It's an interesting question. I, there's some people who seem to think it's a bad thing that, that we're not getting native ports in the same sense, and then there's some people who. Who See, no, that's a perfectly valid way to do ports just right for this defacto common API it was never intended as a cross platform common API, but we've made it one.

[00:58:47] Elizabeth: Right? And so why is that any worse than if it runs on a different API on on Linux or Mac and I? Yeah, I, I, I guess I tend to, I, that that argument tends to make sense to me. I don't, I don't really see, I don't personally see a lot of reason for, to, to, to say that one library is more pure than another.

[00:59:12] Elizabeth: Right now, I do think Windows APIs are generally pretty bad. I, I'm, this might be, you know, just some sort of, this might just be an effect of having to work with them for a very long time and see all their flaws and have to deal with the nonsense that they do. But I think that a lot of the. Native Linux APIs are better. But if you like your Windows API better. And if you want to target Windows and that's the only way to do it, then sure why not? What's wrong with that?

[00:59:51] Jeremy: Yeah, and I think the, doing it this way, targeting Windows, I mean if you look in the past, even though you had some software that would be ported to other operating systems without this compatibility layer, without people just targeting Windows, all this software that people can now run on these portable gaming handhelds or on Linux, Most of that software was never gonna be ported. So yeah, absolutely. And

[01:00:21] Elizabeth: that's

[01:00:22] Jeremy: having that as an option. Yeah.

[01:00:24] Elizabeth: That's kind of why wine existed, because people wanted to run their software. You know, that was never gonna be ported. They just wanted, and then the community just spent a lot of effort in, you know, making all these individual programs run. Yeah.

[01:00:39] Jeremy: I think it's pretty, pretty amazing too that, that now that's become this official way, I suppose, of distributing your software where you say like, Hey, I made a Windows version, but you're on your Linux machine. it's officially supported because, we have this much belief in this compatibility layer.

[01:01:02] Elizabeth: it's kind of incredible to see wine having got this far. I mean, I started working on a, you know, six, seven years ago, and even then, I could never have imagined it would be like this.

[01:01:16] Elizabeth: So as we, we wrap up, for the developers that are listening or, or people who are just users of wine, um, is there anything you think they should know about the project that we haven't talked about?

[01:01:31] Elizabeth: I don't think there's anything I can think of.

[01:01:34] Jeremy: And if people wanna learn, uh, more about the wine project or, or see what you're up to, where, where should they, where should they head?

Getting support and contributing

[01:01:45] Elizabeth: We don't really have any things like news, unfortunately. Um, read the release notes, uh, follow some, there's some, there's some people who, from Code Weavers who do blogs. So if you, so if you go to codeweavers.com/blog, there's some, there's, there's some codeweavers stuff, uh, some marketing stuff. But there's also some developers who will talk about bugs that they are solving and. And how it's easy and, and the experience of working on wine.

[01:02:18] Jeremy: And I suppose if, if someone's. Interested in like, like let's say they have a piece of software, it's not working through wine. what's the best place for them to, to either get help or maybe even get involved with, with trying to fix it?

[01:02:37] Elizabeth: yeah. Uh, so you can file a bug on, winehq.org,or, or, you know, find, there's a lot of developer resources there and you can get involved with contributing to the software. And, uh, there, there's links to our mailing list and IRC channels and, uh, and, and the GitLab, where all places you can find developers.

[01:03:02] Elizabeth: We love to help you. Debug things. We love to help you fix things. We try our very best to be a welcoming community and we have got a long, we've got a lot of experience working with people who want to get their application working. So, we would love to, we'd love to have another.

[01:03:24] Jeremy: Very cool. Yeah, I think wine is a really interesting project because I think for, I guess it would've been for decades, it seemed like very niche, like not many people

[01:03:37] Jeremy: were aware of it. And now I think maybe in particular because of the, the Linux gaming handhelds, like the steam deck,wine is now something that a bunch of people who would've never heard about it before, and now they're aware of it.

[01:03:53] Elizabeth: Absolutely. I've watched that transformation happen in real time and it's been surreal.

[01:04:00] Jeremy: Very cool. Well, Elizabeth, thank you so much for, for joining me today.

[01:04:05] Elizabeth: Thank you, Jeremy. I've been glad to be here.

François Daoust on the W3C

Tue, 16 Sep 2025 12:04:20 -0000

François Daoust is a W3C staff member and co-chair of the Web Developer Experience Community Group.

We discuss the W3C's role and what it's like to go through the browser standardization process.

Transcript

You can help correct transcripts on GitHub.

Intro

[00:00:00] Jeremy: Today I'm talking to François Daoust. He's a staff member at the W3C. And we're gonna talk about the W3C and the recommendation process and discuss, François's experience with, with how these features end up in our browsers.

[00:00:16] Jeremy: So, François, welcome.

[00:00:18] François: Thank you Jeremy and uh, many thanks for the invitation. I'm really thrilled to be part of this podcast.

What's the W3C?

[00:00:26] Jeremy: I think many of our listeners will have heard about the W3C, but they may not actually know what it is. So could you start by explaining what it is?

[00:00:37] François: Sure. So W3C stands for the Worldwide Web Consortium. It's a standardization organization. I guess that's how people should think about W3C. it was created in 1994. I, by, uh, Tim Berners Lee, who was the inventor of the web. Tim Berners Lee was the, director of W3C for a long, long time.

[00:01:00] François: He retired not long ago, a few years back. and W3C is, has, uh, a number of, uh. Properties, let's say first the goal is to produce royalty free standards, and that's very important. Uh, we want to make sure that, uh, the standard that get produced can be used and implemented without having to pay, fees to anyone.

[00:01:23] François: We do web standards. I didn't mention it, but it's from the name. Standards that you find in your web browsers. But not only that, there are a number of other, uh, standards that got developed at W3C including, for example, XML. Data related standards. W3C as an organization is a consortium.

[00:01:43] François: The, the C stands for consortium. Legally speaking, it's a, it's a 501c3 meaning in, so it's a US based, uh, legal entity not for profit. And the, the little three is important because it means it's public interest. That means we are a consortium, that means we have members, but at the same time, the goal, the mission is to the public.

[00:02:05] François: So we're not only just, you know, doing what our members want. We are also making sure that what our members want is aligned with what end users in the end, need. and the W3C has a small team. And so I'm part of this, uh, of this team worldwide. Uh, 45 to 55 people, depending on how you count, mostly technical people and some, uh, admin, uh, as well, overseeing the, uh, the work, that we do, uh, at the W3C.

Funding through membership fees

[00:02:39] Jeremy: So you mentioned there's 45 to 55 people. How is this funded? Is this from governments or commercial companies?

[00:02:47] François: The main source comes from membership fees. So the W3C has a, so members, uh, roughly 350 members, uh, at the W3C. And, in order to become a member, an organization needs to pay, uh, an annual membership fee. That's pretty common among, uh, standardization, uh, organizations.

[00:03:07] François: And, we only have, uh, I guess three levels of membership, fees. Uh, well, you may find, uh, additional small levels, but three main ones. the goal is to make sure that, A big player will, not a big player or large company, will not have more rights than, uh, anything, anyone else. So we try to make sure that a member has the, you know, all members have equal, right?

[00:03:30] François: if it's not perfect, but, uh, uh, that's how things are, are are set. So that's the main source of income for the W3C. And then we try to diversify just a little bit to get, uh, for example, we go to governments. We may go to governments in the u EU. We may, uh, take some, uh, grant for EU research projects that allow us, you know, to, study, explore topics.

[00:03:54] François: Uh, in the US there, there used to be some, uh, some funding from coming from the government as well. So that, that's, uh, also, uh, a source. But the main one is, uh, membership fees.

Relations to TC39, IETF, and WHATWG

[00:04:04] Jeremy: And you mentioned that a lot of the W3C'S work is related to web standards. There's other groups like TC 39, which works on the JavaScript spec and the IETF, which I believe worked, with your group on WebRTC, I wonder if you could explain W3C'S connection to other groups like that.

[00:04:28] François: sure. we try to collaborate with a, a number of, uh, standard other standardization organizations. So in general, everything goes well because you, you have, a clear separation of concerns. So you mentioned TC 39. Indeed. they are the ones who standardize, JavaScript. Proper name of JavaScript is the EcmaScript.

[00:04:47] François: So that's tc. TC 39 is the technical committee at ecma. and so we have indeed interactions with them because their work directly impact the JavaScript that you're going to find in your, uh, run in your, in your web browser. And we develop a number of JavaScript APIs, uh, actually in W3C.

[00:05:05] François: So we need to make sure that, the way we develop, uh, you know, these APIs align with the, the language itself. with IETF, the, the, the boundary is, uh, uh, is clear as well. It's a protocol and protocol for our network protocols for our, the IETF and application level. For W3C, that's usually how the distinction is made.

[00:05:28] François: The boundaries are always a bit fuzzy, but that's how things work. And usually, uh, things work pretty well. Uh, there's also the WHATWG, uh, and the WHATWG is more the, the, the history was more complicated because, uh, t of a fork of the, uh, HTML specification, uh, at the time when it was developed by W3C, a long time ago.

[00:05:49] François: And there was been some, uh, Well disagreement on the way things should have been done, and the WHATWG took over got created, took, took this the HTML spec and did it a different way. Went in another, another direction, and that other, other direction actually ended up being the direction.

[00:06:06] François: So, that's a success, uh, from there. And so, W3C no longer works, no longer owns the, uh, HTML spec and the WHATWG has, uh, taken, uh, taken up a number of, uh, of different, core specifications for the web. Uh, doing a lot of work on the, uh, on interopoerability and making sure that, uh, the algorithm specified by the spec, were correct, which, which was something that historically we haven't been very good at at W3C.

[00:06:35] François: And the way they've been working as a, has a lot of influence on the way we develop now, uh, the APIs, uh, from a W3C perspective.

[00:06:44] Jeremy: So, just to make sure I understand correctly, you have TC 39, which is focused on the JavaScript or ECMAScript language itself, and you have APIs that are going to use JavaScript and interact with JavaScript. So you need to coordinate there. The, the have the specification for HTML. then the IATF, they are, I'm not sure if the right term would be, they, they would be one level lower perhaps, than the W3C.

[00:07:17] François: That's how you, you can formulate it. Yes. The, the one layer, one layer layer in the ISO network in the ISO stack at the network level.

How WebRTC spans the IETF and W3C

[00:07:30] Jeremy: And so in that case, one place I've heard it mentioned is that webRTC, to, to use it, there is an IETF specification, and then perhaps there's a W3C recommendation and

[00:07:43] François: Yes. so when we created the webRTC working group, that was in 2011, I think, it was created with a dual head. There was one RTC web, group that got created at IETF and a webRTC group that got created at W3C. And that was done on purpose. Of course, the goal was not to compete on the, on the solution, but actually to, have the two sides of the, uh, solution, be developed in parallel, the API, uh, the application front and the network front.

[00:08:15] François: And there was a, and there's still a lot of overlap in, uh, participation between both groups, and that's what keep things successful. In the end. It's not, uh, you know, process or organization to organization, uh, relationships, coordination at the organization level. It's really the fact that you have participants that are essentially the same, on both sides of the equation.

[00:08:36] François: That helps, uh, move things forward. Now, webRTC is, uh, is more complex than just one group at IETF. I mean, web, webRTC is a very complex set of, uh, of technologies, stack of technologies. So when you, when you. Pull a little, uh, protocol from IETFs. Suddenly you have the whole IETF that comes with you with it.

[00:08:56] François: So you, it's the, you have the feeling that webRTC needs all of the, uh, internet protocols that got, uh, created to work

Recommendations

[00:09:04] Jeremy: And I think probably a lot of web developers, they may hear words like specification or standard, but I believe the, the official term, at least at the W3C, is this recommendation. And so I wonder if you can explain what that means.

[00:09:24] François: Well. It means it means standard in the end. and that came from industry. That comes from a time where. As many standardization organizations. W3C was created not to be a standardization organization. It was felt that standard was not the right term because we were not a standardization organization.

[00:09:45] François: So recommend IETF has the same thing. They call it RFC, request for comment, which, you know, stands for nothing in, and yet it's a standard. So W3C was created with the same kind of, uh thing. We needed some other terminology and we call that recommendation. But in the end, that's standard. It's really, uh, how you should see it.

[00:10:08] François: And one thing I didn't mention when I, uh, introduced the W3C is there are two types of standards in the end, two main categories. There are, the de jure standards and defacto standards, two families. The de jure standards are the ones that are imposed by some kind of regulation. so it's really usually a standard you see imposed by governments, for example.

[00:10:29] François: So when you look at your electric plug at home, there's some regulation there that says, this plug needs to have these properties. And that's a standard that gets imposed. It's a de jure standard. and then there are defacto standards which are really, uh, specifications that are out there and people agree to use it to implement it.

[00:10:49] François: And by virtue of being used and implemented and used by everyone, they become standards. the, W3C really is in the, uh, second part. It's a defacto standard. IETF is the same thing. some of our standards are used in, uh, are referenced in regulations now, but, just a, a minority of them, most of them are defacto standards.

[00:11:10] François: and that's important because that's in the end, it doesn't matter what the specific specification says, even though it's a bit confusing. What matters is that the, what the specifications says matches what implementations actually implement, and that these implementations are used, and are used interoperably across, you know, across browsers, for example, or across, uh, implementations, across users, across usages.

[00:11:36] François: So, uh, standardization is a, is a lengthy process. The recommendation is the final stage in that, lengthy process. More and more we don't really reach recommendation anymore. If you look at, uh, at groups, uh, because we have another path, let's say we kind of, uh, we can stop at candidate recommendation, which is in theoretically a step before that.

[00:12:02] François: But then you, you can stay there and, uh, stay there forever and publish new candidate recommendations. Um, uh, later on. What matters again is that, you know, you get this, virtuous feedback loop, uh, with implementers, and usage.

[00:12:18] Jeremy: So if the candidate recommendation ends up being implemented by all the browsers, what's ends up being the distinction between a candidate and one that's a normal recommendation.

[00:12:31] François: So, today it's mostly a process thing. Some groups actually decide to go to rec Some groups decide to stay at candidate rec and there's no formal difference between the, the two. we've made sure we've adopted, adjusted the process so that the important bits that, applied at the recommendation level now apply at the candidate rec level.

Royalty free patent access

[00:13:00] François: And by important things, I mean the patent commitments typically, uh, the patent policy fully applies at the candidate recommendation level so that you get your, protection, the royalty free patent protection that we, we were aiming at.

[00:13:14] François: Some people do not care, you know, but most of the world still works with, uh, with patents, uh, for good, uh, or bad reasons. But, uh, uh, that's how things work. So we need to make, we're trying to make sure that we, we secure the right set of, um, of patent commitments from the right set of stakeholders.

[00:13:35] Jeremy: Oh, so when someone implements a W3C recommendation or a candidate recommendation, the patent holders related to that recommendation, they basically agree to allow royalty-free use of that patent.

[00:13:54] François: They do the one that were involved in the working group, of course, I mean, we can't say anything about the companies out there that may have patents and uh, are not part of this standardization process. So there's always, It's a remaining risk. but part of the goal when we create a working group is to make sure that, people understand the scope.

[00:14:17] François: Lawyers look into it, and the, the legal teams that exist at the all the large companies, basically gave a green light saying, yeah, we, we we're pretty confident that we, we know where the patterns are on this particular, this particular area. And we are fine also, uh, letting go of the, the patterns we own ourselves.

Implementations are built in parallel with standardization

[00:14:39] Jeremy: And I think you had mentioned. What ends up being the most important is that the browser creators implement these recommendations. So it sounds like maybe the distinction between candidate recommendation and recommendation almost doesn't matter as long as you get the end result you want.

[00:15:03] François: So, I mean, people will have different opinions, uh, in the, in standardization circles. And I mentioned also W3C is working on other kind of, uh, standards. So, uh, in some other areas, the nuance may be more important when we, but when, when you look at specification, that's target, web browsers. we've switched from a model where, specs were developed first and then implemented to a model where specs and implementing implementations are being, worked in parallel.

[00:15:35] François: This actually relates to the evolution I was mentioning with the WHATWG taking over the HTML and, uh, focusing on the interoperability issues because the starting point was, yeah, we have an HTML 4.01 spec, uh, but it's not interoperable because it, it's not specified, are number of areas that are gray areas, you can implement them differently.

[00:15:59] François: And so there are interoperable issues. Back to candidate rec actually, the, the, the, the stage was created, if I remember correctly. uh, if I'm, if I'm not wrong, the stage was created following the, uh, IE problem. In the CSS working group, IE6, uh, shipped with some, version of a CSS that was in the, as specified, you know, the spec was saying, you know, do that for the CSS box model.

[00:16:27] François: And the IE6 was following that. And then the group decided to change, the box model and suddenly IE6 was no longer compliant. And that created a, a huge mess on the, in the history of, uh, of the web in a way. And so the, we, the, the, the, the candidate recommendation sta uh, stage was introduced following that to try to catch this kind of problems.

[00:16:52] François: But nowadays, again, we, we switch to another model where it's more live. and so we, you, you'll find a number of specs that are not even at candidate rec level. They are at the, what we call a working draft, and they, they are being implemented, and if all goes well, the standardization process follows the implementation, and then you end up in a situation where you have your candidate rec when the, uh, spec ships.

[00:17:18] François: a recent example would be a web GPU, for example. It, uh, it has shipped in, uh, in, in Chrome shortly before it transition to a candidate rec. But the, the, the spec was already stable. and now it's shipping uh, in, uh, in different browsers, uh, uh, safari, uh, and uh, and uh, and uh, Firefox. And so that's, uh, and that's a good example of something that follows, uh, things, uh, along pretty well. But then you have other specs such as, uh, in the media space, uh, request video frame back, uh, frame, call back, uh, requestVideoFrameCallback() is a short API that allows you to get, you know, a call back whenever the, the browser renders a video frame, essentially.

[00:18:01] François: And that spec is implemented across browsers. But from a W3C specific, perspective, it does not even exist. It's not on the standardization track. It's still being incubated in what we call a community group, which is, you know, some something that, uh, usually exists before. we move to the, the standardization process.

[00:18:21] François: So there, there are examples of things where some things fell through the cracks. All the standardization process, uh, is either too early or too late and things that are in spec are not exactly what what got implemented or implementations are too early in the process. We we're doing a better job, at, Not falling into a trap where someone ships, uh, you know, an implementation and then suddenly everything is frozen. You can no longer, change it because it's too late, it shipped. we've tried, different, path there. Um, mentioned CSS, the, there was this kind of vendor prefixed, uh, properties that used to be, uh, the way, uh, browsers were deploying new features without, you know, taking the final name.

[00:19:06] François: We are trying also to move away from it because same thing. Then in the end, you end up with, uh, applications that have, uh, to duplicate all the properties, the CSS properties in the style sheets with, uh, the vendor prefixes and nuances in the, in what it does in, in the end.

[00:19:23] Jeremy: Yeah, I, I think, is that in CSS where you'll see --mozilla or things like that?

Why requestVideoFrameCallback doesn't have a formal specification

[00:19:30] Jeremy: The example of the request video frame callback. I, I wonder if you have an opinion or, or, or know why that ended up the way it did, where the browsers all implemented it, even though it was still in the incubation stage.

[00:19:49] François: On this one, I don't have a particular, uh, insights on whether there was a, you know, a strong reason to implement it,without doing the standardization work.

[00:19:58] François: I mean, there are, it's not, uh, an IPR (Intellectual Property Rights) issue. It's not, uh, something that, uh, I don't think the, the, the spec triggers, uh, you know, problems that, uh, would be controversial or whatever.

[00:20:10] François: Uh, so it's just a matter of, uh, there was no one's priority, and in the end, you end up with a, everyone's happy. it's, it has shipped. And so now doing the spec work is a bit,why spend time on something that's already shipped and so on, but the, it may still come back at some point with try to, you know, improve the situation.

[00:20:26] Jeremy: Yeah, that's, that's interesting. It's a little counterintuitive because it sounds like you have the, the working group and it, it sounds like perhaps the companies or organizations involved, they maybe agreed on how it should work, and maybe that agreement almost made it so that they felt like they didn't need to move forward with the specification because they came to consensus even before going through that.

[00:20:53] François: In this particular case, it's probably because it's really, again, it's a small, spec. It's just one function call, you know? I mean, they will definitely want a working group, uh, for larger specifications. by the way, actually now I know re request video frame call back. It's because the, the, the final goal now that it's, uh, shipped, is to merge it into, uh, HTML, uh, the HTML spec.

[00:21:17] François: So there's a, there's an ongoing issue on the, the WHATWG side to integrate request video frame callback. And it's taking some time but see, it's, it's being, it, it caught up and, uh, someone is doing the, the work to, to do it. I had forgotten about this one. Um,

[00:21:33] Jeremy:

Tension from specification review (horizontal review)

[00:21:33] François: so with larger specifications, organizations will want this kind of IPR regime they will want commit commitments from, uh, others, on the scope, on the process, on everything. So they will want, uh, a larger, a, a more formal setting, because that's part of how you ensure that things, uh, will get done properly.

[00:21:53] François: I didn't mention it, but, uh, something we're really, uh, Pushy on, uh, W3C I mentioned we have principles, we have priorities, and we have, uh, specific several, uh, properties at W3C. And one of them is that we we're very strong on horizontal reviews of our specs. We really want them to be reviewed from an accessibility perspective, from an internationalization perspective, from a privacy and security, uh, perspective, and, and, and a technical architecture perspective as well.

[00:22:23] François: And that's, these reviews are part of the formal process. So you, all specs need to undergo these reviews. And from time to time, that creates tension. Uh, from time to time. It just works, you know. Goes without problem. a recurring issue is that, privacy and security are hard. I mean, it's not an easy problem, something that can be, uh, solved, uh, easily.

[00:22:48] François: Uh, so there's a, an ongoing tension and no easy way to resolve it, but there's an ongoing tension between, specifying powerful APIs and preserving privacy without meaning, not exposing too much information to applications in the media space. You can think of the media capabilities, API. So the media space is a complicated space.

[00:23:13] François: Space because of codecs. codecs are typically not relative free. and so browsers decide which codecs they're going to support, which audio and video codecs they, they're going to support and doing that, that creates additional fragmentation, not in the sense that they're not interoperable, but in the sense that applications need to choose which connect they're going to ship to stream to the end user.

[00:23:39] François: And, uh, it's all the more complicated that some codecs are going to be hardware supported. So you will have a hardware decoder in your, in your, in your laptop or smartphone. And so that's going to be efficient to decode some, uh, some stream, whereas some code are not, are going to be software, based, supported.

[00:23:56] François: Uh, and that may consume a lot of CPU and a lot of power and a lot of energy in the end. So you, you want to avoid that if you can, uh, select another thing. Even more complex than, codecs have different profiles, uh, lower end profiles higher end profiles with different capabilities, different features, uh, depending on whether you're going to use this or that color space, for example, this or that resolution, whatever.

[00:24:22] François: And so you want to surface that to web applications because otherwise, they can't. Select, they can't choose, the right codec and the right, stream that they're going to send to the, uh, client devices. And so they're not going to provide an efficient user experience first, and even a sustainable one in terms of energy because they, they're going to waste energy if they don't send the right stream.

[00:24:45] François: So you want to surface that to application. That's what the media, media capabilities, APIs, provides.

Privacy concerns

[00:24:51] François: Uh, but at the same time, if you expose that information, you end up with ways to fingerprint the end user's device. And that in turn is often used to track users across, across sites, which is exactly what we don't want to have, uh, for privacy reasons, for obvious privacy reasons.

[00:25:09] François: So you have to balance that and find ways to, uh, you know, to expose. Capabilities without, without necessarily exposing them too much. Uh,

[00:25:21] Jeremy: Can you give an example of how some of those discussions went? Like within the working group? Who are the companies or who are the organizations that are arguing for We shouldn't have this capability because of the privacy concerns, or

[00:25:40] François: In a way all of the companies, have a vision of, uh, of privacy. I mean, the, you will have a hard time finding, you know, members saying, I don't care about privacy. I just want the feature. Uh, they all have privacy in mind, but they may have a different approach to privacy.

[00:25:57] François: so if you take, uh, let's say, uh, apple and Google would be the, the, I guess the perfect examples in that, uh, in that space, uh, Google will have a, an approach that is more open-ended thing. The, the user agents has this, uh, should check what the, the, uh, given site is doing. And then if it goes beyond, you know, some kind of threshold, they're going to say, well, okay, well, we'll stop exposing data to that, to that, uh, to that site.

[00:26:25] François: So that application. So monitor and react in a way. apple has a more, uh, you know, has a stricter view on, uh, on privacy, let's say. And they will say, no, we, the, the, the feature must not exist in the first place. Or, but that's, I mean, I guess, um, it's not always that extreme. And, uh, from time to time it's the opposite.

[00:26:45] François: You will have, uh, you know, apple arguing in one way, uh, which is more open-ended than the, uh, than, uh, than Google, for example. And they are not the only ones. So in working groups, uh, you will find the, usually the implementers. Uh, so when we talk about APIs that get implemented in browsers, you want the core browsers to be involved.

[00:27:04] François: Uh, otherwise it's usually not a good sign for, uh, the success of the, uh, of the technology. So in practice, that means Apple, uh, Microsoft, Mozilla which one did I forget?

[00:27:15] Jeremy: Google.

[00:27:16] François: I forgot Google. Of course. Thank you. that's, uh, that the, the core, uh, list of participants you want to have in any, uh, group that develops web standards targeted at web browsers.

Who participates in working groups and how much power do they have?

[00:27:28] François: And then on top of that, you want, organizations and people who are directly going to use it, either because they, well the content providers. So in media, for example, if you look at the media working group, you'll see, uh, so browser vendors, the ones I mentioned, uh, content providers such as the BBC or Netflix.

[00:27:46] François: Chip set vendors would, uh, would be there as well. Intel, uh, Nvidia again, because you know, there's a hardware decoding in there and encoding. So media is, touches on, on, uh, on hardware, uh, device manufacturer in general. You may, uh, I think, uh, I think Sony is involved in the, in the media working group, for example.

[00:28:04] François: and these companies are usually less active in the spec development. It depends on the groups, but they're usually less active because the ones developing the specs are usually the browser again, because as I mentioned, we develop the specs in parallel to browsers implementing it. So they have the.

[00:28:21] François: The feedback on how to formulate the, the algorithms. and so that's this collection of people who are going to discuss first within themselves. W3C pushes for consensual dis decisions. So we hardly take any votes in the working groups, but from time to time, that's not enough.

[00:28:41] François: And there may be disagreements, but let's say there's agreement in the group, uh, when the spec matches. horizontal review groups will look at the specs. So these are groups I mentioned, accessibility one, uh, privacy, internationalization. And these groups, usually the participants are, it depends.

[00:29:00] François: It can be anything. It can be, uh, the same companies. It can be, but usually different people from the same companies. But it the, maybe organizations with a that come from very, a very different angle. And that's a good thing because that means the, you know, you enlarge the, the perspectives on your, uh, on the, on the technology.

[00:29:19] François: and you, that's when you have a discussion between groups, that takes place. And from time to time it goes well from time to time. Again, it can trigger issues that are hard to solve. and the W3C has a, an escalation process in case, uh, you know, in case things degenerate. Uh, starting with, uh, the notion of formal objection.

[00:29:42] Jeremy: It makes sense that you would have the, the browser. Vendors and you have all the different companies that would use that browser. All the different horizontal groups like you mentioned, the internationalization, accessibility. I would imagine that you were talking about consensus and there are certain groups or certain companies that maybe have more say or more sway.

[00:30:09] Jeremy: For example, if you're a browser, manufacturer, your Google. I'm kind of curious how that works out within the working group.

[00:30:15] François: Yes, it's, I guess I would be lying if I were saying that, uh, you know, all companies are strictly equal in a, in a, in a group. they are from a process perspective, I mentioned, you know, different membership fees with were design, special specific ethos so that no one could say, I'm, I'm putting in a lot of money, so you, you need to re you need to respect me, uh, and you need to follow what I, what I want to, what I want to do.

[00:30:41] François: at the same time, if you take a company like, uh, like Google for example, they send, hundreds of engineers to do standardization work. That's absolutely fantastic because that means work progresses and it's, uh, extremely smart people. So that's, uh, that's really a pleasure to work with, uh, with these, uh, people.

[00:30:58] François: But you need to take a step back and say, well, the problem is. Defacto that gives them more power just by virtue of, uh, injecting more resources into it. So having always someone who can respond to an issue, having always someone, uh, editing a spec defacto that give them more, uh, um, more say on the, on the directions that, get forward.

[00:31:22] François: And on top of that, of course, they have the, uh, I guess not surprisingly, the, the browser that is, uh, used the most, currently, on the market so there's a little bit of a, the, the, we, we, we, we try very hard to make sure that, uh, things are balanced. it's not a perfect world.

[00:31:38] François: the the role of the team. I mean, I didn't talk about the role of the team, but part of it is to make sure that. Again, all perspectives are represented and that there's not, such a, such big imbalance that, uh, that something is wrong and that we really need to look into it. so making sure that anyone, if they have something to say, make making sure that they are heard by the rest of the group and not dismissed.

[00:32:05] François: That usually goes well. There's no problem with that. And again, the escalation process I mentioned here doesn't make any, uh, it doesn't make any difference between, uh, a small player, a large player, a big player, and we have small companies raising formal objections against some of our aspects that happens, uh, all large ones.

[00:32:24] François: But, uh, that happens too. There's no magical solution, I guess you can tell it by the way. I, uh, I don't know how to formulate the, the process more. It's a human process, and that's very important that it remains a human process as well.

[00:32:41] Jeremy: I suppose the role of, of staff and someone in your position, for example, is to try and ensure that these different groups are, are heard and it isn't just one group taking control of it.

[00:32:55] François: That's part of the role, again, is to make sure that, uh, the, the process is followed. So the, I, I mean, I don't want to give the impression that the process controls everything in the groups. I mean, the, the, the groups are bound by the process, but the process is there to catch problems when they arise.

[00:33:14] François: most of the time there are no problems. It's just, you know, again, participants talking to each other, talking with the rest of the community. Most of the work happens in public nowadays, in any case. So the groups work in public essentially through asynchronous, uh, discussions on GitHub repositories.

[00:33:32] François: There are contributions from, you know, non group participants and everything goes well. And so the process doesn't kick in. You just never say, eh, no, you didn't respect the process there. You, you closed the issue. You shouldn't have a, it's pretty rare that you have to do that. Uh, things just proceed naturally because they all, everyone understands where they are, why, what they're doing, and why they're doing it.

[00:33:55] François: we still have a role, I guess in the, in the sense that from time to time that doesn't work and you have to intervene and you have to make sure that,the, uh, exception is caught and, uh, and processed, uh, in the right way.

Discussions are public on github

[00:34:10] Jeremy: And you said this process is asynchronous in public, so it sounds like someone, I, I mean, is this in GitHub issues or how, how would somebody go and, and see what the results of

[00:34:22] François: Yes, there, there are basically a gazillion of, uh, GitHub repositories under the, uh, W3C, uh, organization on GitHub. Most groups are using GitHub. I mean, there's no, it's not mandatory. We don't manage any, uh, any tooling. But the factors that most, we, we've been transitioning to GitHub, uh, for a number of years already.

[00:34:45] François: Uh, so that's where the work most of the work happens, through issues, through pool requests. Uh, that's where. people can go and raise issues against specifications. Uh, we usually, uh, also some from time to time get feedback from developers and countering, uh, a bug in a particular implementations, which we try to gently redirect to, uh, the actual bug trackers because we're not responsible for the respons implementations of the specs unless the spec is not clear.

[00:35:14] François: We are responsible for the spec itself, making sure that the spec is clear and that implementers well, understand how they should implement something.

Why the W3C doesn't specify a video or audio codec

[00:35:25] Jeremy: I can see how people would make that mistake because they, they see it's the feature, but that's not the responsibility of the, the W3C to implement any of the specifications. Something you had mentioned there's the issue of intellectual property rights and how when you have a recommendation, you require the different organizations involved to make their patents available to use freely.

[00:35:54] Jeremy: I wonder why there was never any kind of, recommendation for audio or video codecs in browsers since you have certain ones that are considered royalty free. But, I believe that's never been specified.

[00:36:11] François: At W3C you mean? Yes. we, we've tried, I mean, it's not for lack of trying. Um, uh, we've had a number of discussions with, uh, various stakeholders saying, Hey, we, we really need, an audio or video code for our, for the web. the, uh, png PNG is an example of a, um, an image format which got standardized at W3C and it got standardized at W3C similar reasons. There had to be a royalty free image format for the web, and there was none at the time. of course, nowadays, uh, jpeg, uh, and gif or gif, whatever you call it, are well, you know, no problem with them. But, uh, um, that at the time P PNG was really, uh, meant to address this issue and it worked for PNG for audio and video.

[00:37:01] François: We haven't managed to secure, commitments by stakeholders. So willingness to do it, so it's not, it's not lack of willingness. We would've loved to, uh, get, uh, a royalty free, uh, audio codec, a royalty free video codec again, audio and video code are extremely complicated because of this.

[00:37:20] François: not only because of patterns, but also because of the entire business ecosystem that exists around them for good reasons. You, in order for a, a codec to be supported, deployed, effective, it really needs, uh, it needs to mature a lot. It needs to, be, uh, added to at a hardware level, to a number of devices, capturing devices, but also, um, uh, uh, of course players.

[00:37:46] François: And that takes a hell of a lot of time and that's why you also enter a number of business considerations with business contracts between entities. so I'm personally, on a personal level, I'm, I'm pleased to see, for example, the Alliance for Open Media working on, uh, uh, AV1, uh, which is. At least they, uh, they wanted to be royalty free and they've been adopting actually the W3C patent policy to do this work.

[00:38:11] François: So, uh, we're pleased to see that, you know, they've been adopting the same process and same thing. AV1 is not yet at the same, support stage, as other, codecs, in the world Yeah, I mean in devices. There's an open question as what, what are we going to do, uh, in the future uh, with that, it's, it's, it's doubtful that, uh, the W3C will be able to work on a, on a royalty free audio, codec or royalty free video codec itself because, uh, probably it's too late now in any case.

[00:38:43] François: but It's one of these angles in the, in the web platform where we wish we had the, uh, the technology available for, for free. And, uh, it's not exactly, uh, how things work in practice.I mean, the way codecs are developed remains really patent oriented.

[00:38:57] François: and you will find more codecs being developed. and that's where geopolitics can even enter the, the, uh, the play. Because, uh, if you go to China, you will find new codecs emerging, uh, that get developed within China also, because, the other codecs come mostly from the US so it's a bit of a problem and so on.

[00:39:17] François: I'm not going to enter details and uh, I would probably say stupid things in any case. Uh, but that, uh, so we continue to see, uh, emerging codecs that are not royalty free, and it's probably going to remain the case for a number of years. unfortunately, unfortunately, from a W3C perspective and my perspective of course.

[00:39:38] Jeremy: There's always these new, formats coming out and the, rate at which they get supported in the browser, even on a per browser basis is, is very, there can be a long time between, for example, WebP being released and a browser supporting it. So, seems like maybe we're gonna be in that situation for a while where the codecs will come out and maybe the browsers will support them. Maybe they won't, but the, the timeline is very uncertain.

Digital Rights Management (DRM) and Media Source Extensions

[00:40:08] Jeremy: Something you had, mentioned, maybe this was in your, email to me earlier, but you had mentioned that some of these specifications, there's, there's business considerations like with, digital rights management and, media source extensions. I wonder if you could talk a little bit about maybe what media source extensions is and encrypted media extensions and, and what the, the considerations or challenges are there.

[00:40:33] François: I'm going to go very, very quickly over the history of a, video and audio support on the web. Initially it was supported through plugins. you are maybe too young to, remember that. But, uh, we had extensions, added to, uh, a realplayer.

[00:40:46] François: This kind of things flash as well, uh, supporting, uh, uh, videos, in web pages, but it was not provided by the web browsers themselves. Uh, then HTML5 changed the, the situation. Adding these new tags, audio and video, but that these tags on this, by default, support, uh, you give them a resources, a resource, like an image as it's an audio or a video file.

[00:41:10] François: They're going to download this, uh, uh, video file or audio file, and they're going to play it. That works well. But as soon as you want to do any kind of real streaming, files are too large and to stream, to, to get, you know, to get just a single fetch on, uh, on them. So you really want to stream them chunk by chunk, and you want to adapt the resolution at which you send the stream based on real time conditions of the user's network.

[00:41:37] François: If there's plenty of bandwidth you want to send the user, the highest possible resolution. If there's a, some kind of hiccup temporary in the, in the network, you really want to lower the resolution, and that's called adaptive streaming. And to get adaptive streaming on the web, well, there are a number of protocols that exist.

[00:41:54] François: Same thing. Some many of them are proprietary and actually they remain proprietary, uh, to some extent. and, uh, some of them are over http and they are the ones that are primarily used in, uh, in web contexts. So DASH comes to mind, DASH for Dynamic Adaptive streaming over http. HLS is another one. Uh, initially developed by Apple, I believe, and it's, uh, HTTP live streaming probably. Exactly. And, so there are different protocols that you can, uh, you can use. Uh, so the goal was not to standardize these protocols because again, there were some proprietary aspects to them. And, uh, same thing as with codecs.

[00:42:32] François: There was no, well, at least people wanted to have the, uh, flexibility to tweak parameters, adaptive streaming parameters the way they wanted for different scenarios. You may want to tweak the parameters differently. So they, they needed to be more flexibility on top of protocols not being truly available for use directly and for implementation directly in browsers.

[00:42:53] François: It was also about providing applications with, uh, the flexibility they would need to tweak parameters. So media source extensions comes into play for exactly that. Media source extensions is really about you. The application fetches chunks of its audio and video stream the way it wants, and with the parameters it wants, and it adjusts whatever it wants.

[00:43:15] François: And then it feeds that into the, uh, video or audio tag. and the browser takes care of the rest. So it's really about, doing, you know, the adaptive streaming. let applications do it, and then, uh, let the user agent, uh, the browser takes, take care of the rendering itself. That's media source extensions.

[00:43:32] François: Initially it was pushed by, uh, Netflix. They were not the only ones of course, but there, there was a, a ma, a major, uh, proponent of this, uh, technical solution, because they wanted, uh, they, uh, they were, expanding all over the world, uh, with, uh, plenty of native, applications on all sorts of, uh, of, uh, devices.

[00:43:52] François: And they wanted to have a way to stream content on the web as well. both for both, I guess, to expand to, um, a new, um, ecosystem, the web, uh, providing new opportunities, let's say. But at the same time also to have a fallback, in case they, because for native support on different platforms, they sometimes had to enter business agreements with, uh, you know, the hardware manufacturers, the whatever, the, uh, service provider or whatever.

[00:44:19] François: and so that was a way to have a full back. That kind of work is more open, in case, uh, things take some time and so on. So, and they probably had other reasons. I mean, I'm not, I can't speak on behalf of Netflix, uh, on others, but they were not the only ones of course, uh, supporting this, uh, me, uh, media source extension, uh, uh, specification.

[00:44:42] François: and that went kind of, well, I think it was creating 2011. I mean, the, the work started in 2011 and the recommendation was published in 2016, which is not too bad from a standardization perspective. It means only five years, you know, it's a very short amount of time.

Encrypted Media Extensions

[00:44:59] François: At the same time, and in parallel and complement to the media source extension specifications, uh, there was work on the encrypted media extensions, and here it was pushed by the same proponent in a way because they wanted to get premium content on the web.

[00:45:14] François: And by premium content, you think of movies and, uh. These kind of beasts. And the problem with the, I guess the basic issue with, uh, digital asset such as movies, is that they cost hundreds of millions to produce. I mean, some cost less of course. And yet it's super easy to copy them if you have a access to the digital, uh, file.

[00:45:35] François: You just copy and, uh, and that's it. Piracy uh, is super easy, uh, to achieve. It's illegal of course, but it's super easy to do. And so that's where the different legislations come into play with digital right management. Then the fact is most countries allow system that, can encrypt content and, uh, through what we call DRM systems.

[00:45:59] François: so content providers, uh, the, the ones that have movies, so the studios here more, more and more, and Netflix is one, uh, one of the studios nowadays. Um, but not only, not only them all major studios will, uh, would, uh, push for, wanted to have something that would allow them to stream encrypted content, encrypted audio and video, uh, mostly video, to, uh, to web applications so that, uh, you.

[00:46:25] François: Provide the movies, otherwise, they, they are just basically saying, and sorry, but, uh, this premium content will never make it to the web because there's no way we're gonna, uh, send it in clear, to, uh, to the end user. So Encrypting media extensions is, uh, is an API that allows to interface with, uh, what's called the content decryption module, CDM, uh, which itself interacts with, uh, the DR DRM systems that, uh, the browser may, may or may not support.

[00:46:52] François: And so it provides a way for an application to receive encrypted content, pass it over get the, the, the right keys, the right license keys from a whatever system actually. Pass that logic over to the, and to the user agent, which passes, passes it over to, uh, the CDM system, which is kind of black box in, uh, that does its magic to get the right, uh, decryption key and then the, and to decrypt the content that can be rendered.

[00:47:21] François: The encrypted media extensions triggered a, a hell of a lot of, uh, controversy. because it's DRM and DRM systems, uh, many people, uh, uh, things should be banned, uh, especially on the web because the, the premise of the web is that the, the user has trusts, a user agent. The, the web browser is called the user agent in all our, all our specifications.

[00:47:44] François: And that's, uh, that's the trust relationship. And then they interact with a, a content provider. And so whatever they do with the content is their, I guess, actually their problem. And DRM introduces a third party, which is, uh, there's, uh, the, the end user no longer has the control on the content.

[00:48:03] François: It has to rely on something else that, Restricts what it can achieve with the content. So it's, uh, it's not only a trust relationship with its, uh, user agents, it's also with, uh, with something else, which is the content provider, uh, in the end, the one that has the, uh, the license where provides the license.

[00:48:22] François: And so that's, that triggers, uh, a hell of a lot of, uh, of discussions in the W3C degenerated, uh, uh, into, uh, formal objections being raised against the specification. and that escalated to, to the, I mean, at all leverage it. It's, it's the, the story in, uh, W3C that, um, really, uh, divided the membership into, opposed camps in a way, if you, that's was not only year, it was not really 50 50 in the sense that not just a huge fights, but the, that's, that triggered a hell of a lot of discussions and a lot of, a lot of, uh, of formal objections at the time.

[00:49:00] François: Uh, we were still, From a governance perspective, interestingly, um, the W3C used to be a dictatorship. It's not how you should formulate it, of course, and I hope it's not going to be public, this podcast. Uh, but the, uh, it was a benevolent dictatorship. You could see it this way in the sense that, uh, the whole process escalated to one single person was, Tim Burners Lee, who had the final say, on when, when none of the other layers, had managed to catch and to resolve, a conflict.

[00:49:32] François: Uh, that has hardly ever happened in, uh, the history of the W3C, but that happened to the two for EME, for encrypted media extensions. It had to go to the, uh, director level who, uh, after due consideration, uh, decided to, allow the EME to proceed. and that's why we have a, an EME, uh, uh, standard right now, but still re it remains something on the side.

[00:49:56] François: EME we're still, uh, it's still in the scope of the media working group, for example. but the scope, if you look at the charter of the working group, we try to scope the, the, the, the, the updates we can make to the specification, uh, to make sure that we don't reopen, reopen, uh, a can of worms, because, well, it's really a, a topic that triggers friction for good and bad reasons again.

[00:50:20] Jeremy: And when you talk about the media source extensions, that is the ability to write custom code to stream video in whatever way you want. You mentioned, the MPEG-DASH and http live streaming. So in that case, would that be the developer gets to write that code in JavaScript that's executed by the browser?

[00:50:43] François: Yep, that's, uh, that would be it. and then typically, I guess the approach nowadays is more and more to develop low level APIs into W3C or web in, in general, I guess. And to let, uh. Libraries emerge that are going to make lives of a, a developer, uh, easier. So for MPEG DASH, we have the DASH.js, which does a fantastic job at, uh, at implementing the complexity of, uh, of adaptive streaming.

[00:51:13] François: And you just, you just hook it into your, your workflow. And that's, uh, and that's it.

Encrypted Media Extensions are closed source

[00:51:20] Jeremy: And with the encrypted media extensions I'm trying to picture how those work and how they work differently.

[00:51:28] François: Well, it's because the, the, the, the key architecture is that the, the stream that you, the stream that you may assemble with a media source extensions, for example. 'cause typically they, they're used in collaboration. When you hook the, hook it into the video tag, you also. Call EME and actually the stream goes to EME.

[00:51:49] François: And when it goes to EME, actually the user agent hands the encrypted stream. You're still encrypted at this time. Uh, encrypted, uh, stream goes to the CDM content decryption module, and that's a black box well, it has some black, black, uh, black box logic. So it's not, uh, even if you look at the chromium source code, for example, you won't see the implementation of the CDM because it's a, it's a black box, so it's not part of the browser se it's a sand, it's sandboxed, it's execution sandbox.

[00:52:17] François: That's, uh, the, the EME is kind of unique in, in this way where the, the CDM is not allowed to make network requests, for example, again, for privacy reasons. so anyway, the, the CDM box has the logic to decrypt the content and it hands it over, and then it depends, it depends on the level of protection you.

[00:52:37] François: You need or that the system supports. It can be against software based protection, in which case actually, a highly motivated, uh, uh, uh, attacker could, uh, actually get access to the decoded stream, or it can be more hardware protected, in which case actually the, it goes to the, uh, to your final screen.

[00:52:58] François: But it goes, it, it goes through the hardware in a, in a mode that the US supports in a mode that even the user agent doesn't have access to it. So it doesn't, it can't even see the pixels that, uh, gets rendered on the screen. There are, uh, several other, uh, APIs that you could use, for example, to take a screenshot of your, of your application and so on.

[00:53:16] François: And you cannot apply them to, uh, such content because they're just gonna return a black box. again, because the user agent itself does not see the, uh, the pixels, which is exactly what you want with encrypted content.

[00:53:29] Jeremy: And the, the content decryption module, it's, if I understand correctly, it's something that's shipped with the browsers, but you were saying is if you were to look at the public source code of Chromium or of Firefox, you would not see that implementation.

Content Decryption Module (Widevine, PlayReady)

[00:53:47] François: True. I mean, the, the, um, the typical examples are, uh, uh, widevine, so wide Vine. So interestingly, uh, speaking in theory, these, uh, systems could have been provided by anyone in practice. They've been provided by the browser vendors themselves. So Google has Wide Vine. Uh, Microsoft has something called PlayReady. Apple uh, the name, uh, escapes my, uh, sorry. They don't have it on top of my mind. So they, that's basically what they support. So they, they also own that code, but in a way they don't have to. And Firefox actually, uh, they, uh, don't, don't remember which one, they support among these three. but, uh, they, they don't own that code typically.

[00:54:29] François: They provide a wrapper around, around it. Yeah, that's, that's exactly the, the crux of the, uh, issue that, people have with, uh, with DRMs, right? It's, uh, the fact that, uh, suddenly you have a bit of code running there that is, uh, that, okay, you can send box, but, uh, you cannot inspect and you don't have, uh, access to its, uh, source code.

[00:54:52] Jeremy: That's interesting. So the, almost the entire browser is open source, but if you wanna watch a Netflix movie for example, then you, you need to, run this, this CDM, in addition to just the browser code. I, I think, you know, we've kind of covered a lot.

Documenting what's available in browsers for developers

[00:55:13] Jeremy: I wonder if there's any other examples or anything else you thought would be important to mention in, in the context of the W3C.

[00:55:23] François: There, there's one thing which, uh, relates to, uh, activities I'm doing also at W3C. Um. Here, we've been talking a lot about, uh, standards and, implementations in browsers, but there's also, uh, adoption of these browser, of these technology standards by developers in general and making sure that developers are aware of what exists, making sure that they understand what exists and one of the, key pain points that people, uh.

[00:55:54] François: Uh, keep raising on, uh, the web platform is first. Well, the, the, the web platform is unique in the sense that there are different implementations. I mean, if you,

[00:56:03] François: Uh, anyway, there are different, uh, context, different run times where there, there's just one provided by the company that owns the, uh, the, the, the system. The web platform is implemented by different, uh, organizations. and so you end up the system where no one, there's what's in the specs is not necessarily supported.

[00:56:22] François: And of course, MDN tries, uh, to document what's what's supported, uh, thoroughly. But for MDN to work, there's a hell of a lot of needs for data that, tracks browser support. And this, uh, this data is typically in a project called the Browser Compat Data, BCD owned by, uh, MDN as well. But, the Open Web Docs collective is a, uh, is, uh, the one, maintaining that, uh, that data under the hoods.

[00:56:50] François: anyway, all of that to say that, uh, to make sure that, we track things beyond work on technical specifications, because if you look at it from W3C perspective, life ends when the spec reaches standards, uh, you know, candidate rec or rec, you could just say, oh, done with my work. but that's not how things work.

[00:57:10] François: There's always, you need the feedback loop and, in order to make sure that developers get the information and can provide the, the feedback that standardization can benefit from and browser vendors can benefit from. We've been working on a project called web Features with browser vendors mainly, and, uh, a few of the folks and MDN and can I use and different, uh, different people, to catalog, the web in terms of features that speak to developers and from that catalog.

[00:57:40] François: So it's a set of, uh, it's a set of, uh, feature IDs with a feature name and feature description that say, you know, this is how developers would, uh, understand, uh, instead of going too fine grained in terms of, uh, there's this one function call that does this because that's where you, the, the kind of support data you may get from browser data and MDN initially, and having some kind of a coarser grained, uh, structure that says these are the, features that make sense.

[00:58:09] François: They talk to developers. That's what developers talk about, and that's the info. So the, we need to have data on these particular features because that's how developers are going approach the specs. Uh. and from that we've derived the notion of baseline badges that you have, uh, are now, uh, shown on MDN on can I use and integrated in, uh, IDE tool, IDE Tools such as visual, visual studio, and, uh, uh, libraries, uh, linked, some linters have started to, um, to integrate that data.

[00:58:41] François: Uh, so, the way it works is, uh, we've been mapping these coarser grained features to BCDs finer grained support data, and from there we've been deriving a kind of a, a batch that says, yeah, this, this feature is implemented well, has limited availability because it's only implemented in one or two browsers, for example.

[00:59:07] François: It's, newly available because. It was implemented. It's been, it's implemented across the main browser vendor, um, across the main browsers that people use. But it's recent, and widely available, which we try to, uh, well, there's been lots of discussion in the, in the group to, uh, come up with a definition which essentially ends up being 30 months after, a feature become, became newly available.

[00:59:34] François: And that's when, that's the time it takes for the, for the versions of the, the different versions of the browser to propagate. Uh, because you, it's not because there's a new version of a, of a browser that, uh, people just, Ima immediately, uh, get it. So it takes a while, to propagate, uh, across the, uh, the, the user, uh, user base.

[00:59:56] François: And so the, the goal is to have a, a, a signal that. Developers can rely on saying, okay, well it's widely available so I can really use that feature. And of course, if that doesn't work, then we need to know about it. And so we are also working with, uh, people doing so developer surveys such as state of, uh, CSS, state of HTML, state of JavaScript.

[01:00:15] François: That's I guess, the main ones. But also we are also running, uh, MDN short surveys with the MDN people to gather feedback on. On the, on these same features, and to feed the loop and to, uh, to complete the loop. and these data is also used by, internally, by browser vendors to inform, prioritization process, their prioritization process, and typically as part of the interop project that they're also running, uh, on the site

[01:00:43] François: So a, a number of different, I've mentioned, uh, I guess a number of different projects, uh, coming along together. But that's the goal is to create links, across all of these, um, uh, ongoing projects with a view to integrating developers, more, and gathering feedback as early as possible and inform decision.

[01:01:04] François: We take at the standardization level that can affect the, the lives of the developers and making sure that it's, uh, it affects them in a, in a positive way.

[01:01:14] Jeremy: just trying to understand, 'cause you had mentioned that there's the web features and the baseline, and I was, I was trying to picture where developers would actually, um, see these things. And it sounds like from what you're saying is W3C comes up with what stage some of these features are at, and then developers would end up seeing it on MDN or, or some other site.

[01:01:37] François: So, uh, I'm working on it, but that doesn't mean it's a W3C thing. It's a, it's a, again, it's a, we have different types of group. It's a community group, so it's the Web DX Community group at W3C, which means it's a community owned thing. so that's why I'm mentioning a working with a representative from, and people from MDN people, from open Web docs.

[01:02:05] François: so that's the first point. The second point is, so it's, indeed this data is now being integrated. If you, and you look, uh, you'll, you'll see it in on top of the MDN pages on most of them. If you look at, uh, any kind of feature, you'll see a, a few logos, uh, a baseline banner. and then can I use, it's the same thing.

[01:02:24] François: You're going to get a baseline, banner. It's more on, can I use, and it's meant to capture the fact that the feature is widely available or if you may need to pay attention to it. Of course, it's a simplification, and the goal is not to the way it's, the way the messaging is done to developers is meant to capture the fact that, they may want to look, uh, into more than just this, baseline status, because.

[01:02:54] François: If you take a look at web platform tests, for example, and if you were to base your assessment of whether a feature is supported based on test results, you'll end up saying the web platform has no supported technology because there are absolutely no API that, uh, where browsers pass 100% of the, of the, of the test suite.

[01:03:18] François: There may be a few of them, I don't know. But, there's a simplification in the, in the process when a feature is, uh, set to be baseline, there may be more things to look at nevertheless, but it's meant to provide a signal that, uh, still developers can rely on their day-to-day, uh, lives.

[01:03:36] François: if they use the, the feature, let's say, as a reasonably intended and not, uh, using to advance the logic.

[01:03:48] Jeremy: I see. Yeah. I'm looking at one of the pages on MDN right now, and I can see at the top there's the, the baseline and it, it mentions that this feature works across many browsers and devices, and then they say how long it's been available. And so that's a way that people at a glance can, can tell, which APIs they can use.

[01:04:08] François: it also started, uh, out of a desire to summarize this, uh, browser compatibility table that you see at the end of the page of the, the bottom of the page in on MDN. but there are where developers were saying, well, it's, it's fine, but it's, it goes too much into detail. So we don't know in the end, can we, can we use that feature or can we, can we not use that feature?

[01:04:28] François: So it's meant as a informed summary of, uh, of, of that it relies on the same data again. and more importantly, we're beyond MDN, we're working with tools providers to integrate that as well. So I mentioned the, uh, visual Studio is one of them. So recently they shipped a new version where when you use a feature, you can, you can have some contextual, uh.

[01:04:53] François: A menu that tells you, yeah, uh, that's fine. You, this CSS property, you can, you can use it, it's widely available or be aware this one is limited Availability only, availability only available in Firefox or, or Chrome or Safari work kit, whatever.

[01:05:08] Jeremy: I think that's a good place to wrap it up, if people want to learn more about the work you're doing or learn more about sort of this whole recommendations process, where, where should they head?

[01:05:23] François: Generally speaking, we're extremely open to, uh, people contributing to the W3C. and where should they go if they, it depends on what they want. So I guess the, the in usually where, how things start for someone getting involved in the W3C is that they have some kind of a technology in mind.

[01:05:41] François: They have some kind of pet project, something that they care about. And so they find the right GitHub repository in this case probably, and they start a discussion with people who are actually developing the spec. And they may be feedback, it may be negative feedback, but it may be a, Hey, I had this idea So there are different places where you can raise ideas.

[01:06:01] François: Um, so I don't know where to point at a single place. I guess I mean, there's probably a contact email, but it doesn't make a lot of sense. Um, and, uh, and that's usually how things start. And then, uh, and then when people get involved, some of them will like it and, uh, it's always a pleasure with someone, uh, enjoys interacting, uh, on, on standards and will find ways to, encourage, uh, their, their contributions.

[01:06:27] François: Uh, of course if they represent, uh. Companies, there will come a time where, uh, the, the process is going to kick in for patent reasons for different, uh, uh, different things and for membership fees. because again, that's how the W3C, the budget, uh, from the BFC, come from membership fees. and, and so, uh, things will kick in and, uh, we'll get to the next level.

[01:06:52] François: But, uh, initially it's really about just come find a feature you that doesn't work for you, uh, where you find something that looks clunky in one of the spec. If you read the spec, it's, uh, not always a fan fancy read, but, uh, and report, uh, report a bug, not in the implementation, but in the spec.

[01:07:10] François: Um, and start or propose a new idea to the, to the group that develops the specification you like. Um, and we'll take it from there, and we're very happy to have that feedback.

[01:07:22] Jeremy: So maybe find the, if there is an existing specification, go find the, the GitHub repository where that's located. if you have something that you want implemented newly, maybe find the working group that would be related and start from there.

[01:07:41] François: Absolutely.

[01:07:42] Jeremy: François, thank you so much for, for chatting with me today. I think we, we covered a lot about the, the W3C and, and how it relates to all the things that we use in our browsers every day. So thank you.

[01:07:53] François: Thanks a lot for the invitation again. It's been a pleasure.

Brandon Liu on Protomaps

Sun, 06 Apr 2025 01:18:04 -0000

Brandon Liu is an open source developer and creator of the Protomaps basemap project.

We talk about how static maps help developers build sites that last, the PMTiles file format, the role of OpenStreetMap, and his experience funding and running an open source project full time.

Protomaps

Protomaps
PMTiles (File format used by Protomaps)
Self-hosted slippy maps, for novices (like me)
Why Deploy Protomaps on a CDN

User examples

Related projects

OpenStreetMap (Dataset protomaps is based on)
Mapzen (Former company that released details on what to display based on zoom levels)
Mapbox GL JS (Mapbox developed source available map rendering library)
MapLibre GL JS (Open source fork of Mapbox GL JS)

Transcript

You can help correct transcripts on GitHub.

Intro

[00:00:00] Jeremy: I'm talking to Brandon Liu. He's the creator of Protomaps, which is a way to easily create and host your own maps. Let's get into it.

[00:00:09] Brandon: Hey, so thanks for having me on the podcast. So I'm Brandon. I work on an open source project called Protomaps. What it really is, is if you're a front end developer and you ever wanted to put maps on a website or on a mobile app, then Protomaps is sort of an open source solution for doing that that I hope is something that's way easier to use than, um, a lot of other open source projects.

Why not just use Google Maps?

[00:00:36] Jeremy: A lot of people are gonna be familiar with Google Maps. Why should they worry about whether something's open source? Why shouldn't they just go and use the Google maps API?

[00:00:47] Brandon: So Google Maps is like an awesome thing it's an awesome product. Probably one of the best tech products ever right? And just to have a map that tells you what restaurants are open and something that I use like all the time especially like when you're traveling it has all that data.

And the most amazing part is that it's free for consumers but it's not necessarily free for developers. Like if you wanted to embed that map onto your website or app, that usually has an API cost which still has a free tier and is affordable. But one motivation, one basic reason to use open source is if you have some project that doesn't really fit into that pricing model.

You know like where you have to pay the cost of Google Maps, you have a side project, a nonprofit, that's one reason. But there's lots of other reasons related to flexibility or customization where you might want to use open source instead.

Protomaps examples

[00:01:49] Jeremy: Can you give some examples where people have used Protomaps and where that made sense for them?

[00:01:56] Brandon: I follow a lot of the use cases and I also don't know about a lot of them because I don't have an API where I can track a hundred percent of the users. Some of them use the hosted version, but I would say most of them probably use it on their own infrastructure. One of the cool projects I've been seeing is called Toilet Map.

And what toilet map is if you're in the UK and you want find a public restroom then it maps out, sort of crowdsourced all of the public restrooms. And that's important for like a lot of people if they have health issues, they need to find that information. And just a lot of different projects in the same vein.

There's another one called Pinball Map which is sort of a hobby project to find all the pinball machines in the world. And they wanted to have a customized map that fit in with their theme of pinball. So these sorts of really cool indie projects are the ones I'm most excited about.

Basemaps vs Overlays

[00:02:57] Jeremy: And if we talk about, like the pinball map as an example, there's this concept of a basemap and then there's the things that you lay on top of it. What is a basemap and then is the pinball locations is that part of it or is that something separate?

[00:03:12] Brandon: It's usually something separate. The example I usually use is if you go to a real estate site, like Zillow, you'll open up the map of Seattle and it has a bunch of pins showing all the houses, and then it has some information beneath it. That information beneath it is like labels telling, this neighborhood is Capitol Hill, or there is a park here. But all that information is common to a lot of use cases and it's not specific to real estate. So I think usually that's the distinction people use in the industry between like a base map versus your overlay.

The overlay is like the data for your product or your company while the base map is something you could get from Google or from Protomaps or from Apple or from Mapbox that kind of thing.

PMTiles for hosting the basemap and overlays

[00:03:58] Jeremy: And so Protomaps in particular is responsible for the base map, and that information includes things like the streets and the locations of landmarks and things like that. Where is all that information coming from?

[00:04:12] Brandon: So the base map information comes from a project called OpenStreetMap. And I would also, point out that for Protomaps as sort of an ecosystem. You can also put your overlay data into a format called PMTiles, which is sort of the core of what Protomaps is. So it can really do both. It can transform your data into the PMTiles format which you can host and you can also host the base map.

So you kind of have both of those sides of the product in one solution.

[00:04:43] Jeremy: And so when you say you have both are you saying that the PMTiles file can have, the base map in one file and then you would have the data you're laying on top in another file? Or what are you describing there?

[00:04:57] Brandon: That's usually how I recommend to do it. Oftentimes there'll be sort of like, a really big basemap 'cause it has all of that data about like where the rivers are. Or while, if you want to put your map of toilets or park benches or pickleball courts on top, that's another file. But those are all just like assets you can move around like JSON or CSV files.

Statically Hosted

[00:05:19] Jeremy: And I think one of the things you mentioned was that your goal was to make Protomaps or the, the use of these PMTiles files easy to use. What does that look like for, for a developer? I wanna host a map. What do I actually need to, to put on my servers?

[00:05:38] Brandon: So my usual pitch is that basically if you know how to use S3 or cloud storage, that you know how to deploy a map. And that, I think is the main sort of differentiation from most open source projects. Like a lot of them, they call themselves like, like some sort of self-hosted solution. But I've actually avoided using the term self-hosted because I think in most cases that implies a lot of complexity.

Like you have to log into a Linux server or you have to use Kubernetes or some sort of Docker thing. What I really want to emphasize is the idea that, for Protomaps, it's self-hosted in the same way like CSS is self-hosted. So you don't really need a service from Amazon to host the JSON files or CSV files. It's really just a static file.

[00:06:32] Jeremy: When you say static file that means you could use any static web host to host your HTML file, your JavaScript that actually renders the map. And then you have your PMTiles files, and you're not running a process or anything, you're just putting your files on a static file host.

[00:06:50] Brandon: Right. So I think if you're a developer, you can also argue like a static file server is a server. It's you know, it's the cloud, it's just someone else's computer. It's really just nginx under the hood. But I think static storage is sort of special. If you look at things like static site generators, like Jekyll or Hugo, they're really popular because they're a commodity or like the storage is a commodity.

And you can take your blog, make it a Jekyll blog, hosted on S3. One day, Amazon's like, we're charging three times as much so you can move it to a different cloud provider. And that's all vendor neutral. So I think that's really the special thing about static storage as a primitive on the web.

Why running servers is a problem for resilience

[00:07:36] Jeremy: Was there a prior experience you had? Like you've worked with maps for a very long time. Were there particular difficulties you had where you said I just gotta have something that can be statically hosted?

[00:07:50] Brandon: That's sort of exactly why I got into this. I've been working sort of in and around the map space for over a decade, and Protomaps is really like me trying to solve the same problem I've had over and over again in the past, just like once and forever right? Because like once this problem is solved, like I don't need to deal with it again in the future.

So I've worked at a couple of different companies before, mostly as a contractor, for like a humanitarian nonprofit for a design company doing things like, web applications to visualize climate change. Or for even like museums, like digital signage for museums. And oftentimes they had some sort of data visualization component, but always sort of the challenge of how to like, store and also distribute like that data was something that there wasn't really great open source solutions. So just for map data, that's really what motivated that design for Protomaps.

[00:08:55] Jeremy: And in those, those projects in the past, were those things where you had to run your own server, run your own database, things like that?

[00:09:04] Brandon: Yeah. And oftentimes we did, we would spin up an EC2 instance, for maybe one client and then we would have to host this server serving map data forever. Maybe the client goes away, or I guess it's good for business if you can sign some sort of like long-term support for that client saying, Hey, you know, like we're done with a project, but you can pay us to maintain the EC2 server for the next 10 years.

And that's attractive. but it's also sort of a pain, because usually what happens is if people are given the choice, like a developer between like either I can manage the server on EC2 or on Rackspace or Hetzner or whatever, or I can go pay a SaaS to do it. In most cases, businesses will choose to pay the SaaS.

So that's really like what creates a sort of lock-in is this preference for like, so I have this choice between like running the server or paying the SaaS. Like businesses will almost always go and pay the SaaS.

[00:10:05] Jeremy: Yeah. And in this case, you either find some kind of free hosting or low-cost hosting just to host your files and you upload the files and then you're good from there. You don't need to maintain anything.

[00:10:18] Brandon: Exactly, and that's really the ideal use case. so I have some users these, climate science consulting agencies, and then they might have like a one-off project where they have to generate the data once, but instead of having to maintain this server for the lifetime of that project, they just have a file on S3 and like, who cares?

If that costs a couple dollars a month to run, that's fine, but it's not like S3 is gonna be deprecated, like it's gonna be on an insecure version of Ubuntu or something. So that's really the ideal, set of constraints for using Protomaps.

[00:10:58] Jeremy: Yeah. Something this also makes me think about is, is like the resilience of sites like remaining online, because I, interviewed, Kyle Drake, he runs Neocities, which is like a modern version of GeoCities. And if I remember correctly, he was mentioning how a lot of old websites from that time, if they were running a server backend, like they were running PHP or something like that, if you were to try to go to those sites, now they're like pretty much all dead because there needed to be someone dedicated to running a Linux server, making sure things were patched and so on and so forth. But for static sites, like the ones that used to be hosted on GeoCities, you can go to the internet archive or other websites and they were just files, right? You can bring 'em right back up, and if anybody just puts 'em on a web server, then you're good. They're still alive.

Case study of news room preferring static hosting

[00:11:53] Brandon: Yeah, exactly. One place that's kind of surprising but makes sense where this comes up, is for newspapers actually. Some of the users using Protomaps are the Washington Post. And the reason they use it, is not necessarily because they don't want to pay for a SaaS like Google, but because if they make an interactive story, they have to guarantee that it still works in a couple of years.

And that's like a policy decision from like the editorial board, which is like, so you can't write an article if people can't view it in five years. But if your like interactive data story is reliant on a third party, API and that third party API becomes deprecated, or it changes the pricing or it, you know, it gets acquired, then your journalism story is not gonna work anymore.

So I have seen really good uptake among local news rooms and even big ones to use things like Protomaps just because it makes sense for the requirements.

Working on Protomaps as an open source project for five years

[00:12:49] Jeremy: How long have you been working on Protomaps and the parts that it's made up of such as PMTiles?

[00:12:58] Brandon: I've been working on it for about five years, maybe a little more than that. It's sort of my pandemic era project. But the PMTiles part, which is really the heart of it only came in about halfway.

Why not make a SaaS?

[00:13:13] Brandon: So honestly, like when I first started it, I thought it was gonna be another SaaS and then I looked at it and looked at what the environment was around it.

And I'm like, uh, so I don't really think I wanna do that.

[00:13:24] Jeremy: When, when you say you looked at the environment around it what do you mean? Why did you decide not to make it a SaaS?

[00:13:31] Brandon: Because there already is a lot of SaaS out there. And I think the opportunity of making something that is unique in terms of those use cases, like I mentioned like newsrooms, was clear. Like it was clear that there was some other solution, that could be built that would fit these needs better while if it was a SaaS, there are plenty of those out there.

And I don't necessarily think that they're well differentiated. A lot of them all use OpenStreetMap data. And it seems like they mainly compete on price. It's like who can build the best three column pricing model. And then once you do that, you need to build like billing and metrics and authentication and like those problems don't really interest me.

So I think, although I acknowledge sort of the indie hacker ethos now is to build a SaaS product with a monthly subscription, that's something I very much chose not to do, even though it is for sure like the best way to build a business.

[00:14:29] Jeremy: Yeah, I mean, I think a lot of people can appreciate that perspective because it's, it's almost like we have SaaS overload, right? Where you have so many little bills for your project where you're like, another $5 a month, another $10 a month, or if you're a business, right? Those, you add a bunch of zeros and at some point it's just how many of these are we gonna stack on here?

[00:14:53] Brandon: Yeah. And honestly. So I really think like as programmers, we're not really like great at choosing how to spend money like a $10 SaaS. That's like nothing. You know? So I can go to Starbucks and I can buy a pumpkin spice latte, and that's like $10 basically now, right? And it's like I'm able to make that consumer choice in like an instant just to spend money on that.

But then if you're like, oh, like spend $10 on a SaaS that somebody put a lot of work into, then you're like, oh, that's too expensive. I could just do it myself. So I'm someone that also subscribes to a lot of SaaS products. and I think for a lot of things it's a great fit.

Many open source SaaS projects are not easy to self host

[00:15:37] Brandon: But there's always this tension between an open source project that you might be able to run yourself and a SaaS. And I think a lot of projects are at different parts of the spectrum. But for Protomaps, it's very much like I'm trying to move maps to being it is something that is so easy to run yourself that anyone can do it.

[00:16:00] Jeremy: Yeah, and I think you can really see it with, there's a few SaaS projects that are successful and they're open source, but then you go to look at the self-hosting instructions and it's either really difficult to find and you find it, and then the instructions maybe don't work, or it's really complicated.

So I think doing the opposite with Protomaps. As a user, I'm sure we're all appreciative, but I wonder in terms of trying to make money, if that's difficult.

[00:16:30] Brandon: No, for sure. It is not like a good way to make money because I think like the ideal situation for an open source project that is open that wants to make money is the product itself is fundamentally complicated to where people are scared to run it themselves. Like a good example I can think of is like Supabase.

Supabase is sort of like a platform as a service based on Postgres. And if you wanted to run it yourself, well you need to run Postgres and you need to handle backups and authentication and logging, and that stuff all needs to work and be production ready. So I think a lot of people, like they don't trust themselves to run database backups correctly.

'cause if you get it wrong once, then you're kind of screwed. So I think that fundamental aspect of the product, like a database is something that is very, very ripe for being a SaaS while still being open source because it's fundamentally hard to run. Another one I can think of is like tailscale, which is, like a VPN that works end to end.

That's something where, you know, it has this networking complexity where a lot of developers don't wanna deal with that. So they'd happily pay, for tailscale as a service. There is a lot of products or open source projects that eventually end up just changing to becoming like a hosted service.

Businesses going from open source to closed or restricted licenses

[00:17:58] Brandon: But then in that situation why would they keep it open source, right? Like, if it's easy to run yourself well, doesn't that sort of cannibalize their business model? And I think that's really the tension overall in these open source companies. So you saw it happen to things like Elasticsearch to things like Terraform where they eventually change the license to one that makes it difficult for other companies to compete with them.

[00:18:23] Jeremy: Yeah, I mean there's been a number of cases like that. I mean, specifically within the mapping community, one I can think of was Mapbox's. They have Mapbox gl. Which was a JavaScript client to visualize maps and they moved from, I forget which license they picked, but they moved to a much more restrictive license.

I wonder what your thoughts are on something that releases as open source, but then becomes something maybe a little more muddy.

[00:18:55] Brandon: Yeah, I think it totally makes sense because if you look at their business and their funding, it seems like for Mapbox, I haven't used it in a while, but my understanding is like a lot of their business now is car companies and doing in dash navigation. And that is probably way better of a business than trying to serve like people making maps of toilets.

And I think sort of the beauty of it is that, so Mapbox, the story is they had a JavaScript renderer called Mapbox GL JS. And they changed that to a source available license a couple years ago. And there's a fork of it that I'm sort of involved in called MapLibre GL. But I think the cool part is Mapbox paid employees for years, probably millions of dollars in total to work on this thing and just gave it away for free.

Right? So everyone can benefit from that work they did. It's not like that code went away, like once they changed the license. Well, the old version has been forked. It's going its own way now. It's quite different than the new version of Mapbox, but I think it's extremely generous that they're able to pay people for years, you know, like a competitive salary and just give that away.

[00:20:10] Jeremy: Yeah, so we should maybe look at it as, it was a gift while it was open source, and they've given it to the community and they're on continuing on their own path, but at least the community running Map Libre, they can run with it, right? It's not like it just disappeared.

[00:20:29] Brandon: Yeah, exactly. And that is something that I use for Protomaps quite extensively. Like it's the primary way of showing maps on the web and I've been trying to like work on some enhancements to it to have like better internationalization for if you are in like South Asia like not show languages correctly.

So I think it is being taken in a new direction. And I think like sort of the combination of Protomaps and MapLibre, it addresses a lot of use cases, like I mentioned earlier with like these like hobby projects, indie projects that are almost certainly not interesting to someone like Mapbox or Google as a business. But I'm happy to support as a small business myself.

Financially supporting open source work (GitHub sponsors, closed source, contracts)

[00:21:12] Jeremy: In my previous interview with Tom, one of the main things he mentioned was that creating a mapping business is incredibly difficult, and he said he probably wouldn't do it again. So in your case, you're building Protomaps, which you've admitted is easy to self-host. So there's not a whole lot of incentive for people to pay you. How is that working out for you? How are you supporting yourself?

[00:21:40] Brandon: There's a couple of strategies that I've tried and oftentimes failed at. Just to go down the list, so I do have GitHub sponsors so I do have a hosted version of Protomaps you can use if you don't want to bother copying a big file around. But the way I do the billing for that is through GitHub sponsors.

If you wanted to use this thing I provide, then just be a sponsor. And that definitely pays for itself, like the cost of running it. And that's great. GitHub sponsors is so easy to set up. It just removes you having to deal with Stripe or something. 'cause a lot of people, their credit card information is already in GitHub.

GitHub sponsors I think is awesome if you want to like cover costs for a project. But I think very few people are able to make that work. A thing that's like a salary job level. It's sort of like Twitch streaming, you know, there's a handful of people that are full-time streamers and then you look down the list on Twitch and it's like a lot of people that have like 10 viewers.

But some of the other things I've tried, I actually started out, publishing the base map as a closed source thing, where I would sell sort of like a data package instead of being a SaaS, I'd be like, here's a one-time download, of the premium data and you can buy it. And quite a few people bought it I just priced it at like $500 for this thing.

And I thought that was an interesting experiment. The main reason it's interesting is because the people that it attracts to you in terms of like, they're curious about your products, are all people willing to pay money. While if you start out everything being open source, then the people that are gonna be try to do it are only the people that want to get something for free.

So what I discovered is actually like once you transition that thing from closed source to open source, a lot of the people that used to pay you money will still keep paying you money because like, it wasn't necessarily that that closed source thing was why they wanted to pay. They just valued that thought you've put into it your expertise, for example.

So I think that is one thing, that I tried at the beginning was just start out, closed source proprietary, then make it open source. That's interesting to people. Like if you release something as open source, if you go the other way, like people are really mad if you start out with something open source and then later on you're like, oh, it's some other license.

Then people are like that's so rotten. But I think doing it the other way, I think is quite valuable in terms of being able to find an audience.

[00:24:29] Jeremy: And when you said it was closed source and paid to open source, do you still sell those map exports?

[00:24:39] Brandon: I don't right now. It's something that I might do in the future, you know, like have small customizations of the data that are available, uh, for a fee. still like the core OpenStreetMap based map that's like a hundred gigs you can just download. And that'll always just be like a free download just because that's already out there.

All the source code to build it is open source. So even if I said, oh, you have to pay for it, then someone else can just do it right? So there's no real reason like to make that like some sort of like paywall thing. But I think like overall if the project is gonna survive in the long term it's important that I'd ideally like to be able to like grow like a team like have a small group of people that can dedicate the time to growing the project in the long term. But I'm still like trying to figure that out right now.

[00:25:34] Jeremy: And when you mentioned that when you went from closed to open and people were still paying you, you don't sell a product anymore. What were they paying for?

[00:25:45] Brandon: So I have some contracts with companies basically, like if they need a feature or they need a customization in this way then I am very open to those. And I sort of set it up to make it clear from the beginning that this is not just a free thing on GitHub, this is something that you could pay for if you need help with it, if you need support, if you wanted it.

I'm also a little cagey about the word support because I think like it sounds a little bit too wishy-washy. Pretty much like if you need access to the developers of an open source project, I think that's something that businesses are willing to pay for. And I think like making that clear to potential users is a challenge.

But I think that is one way that you might be able to make like a living out of open source.

[00:26:35] Jeremy: And I think you said you'd been working on it for about five years. Has that mostly been full time?

[00:26:42] Brandon: It's been on and off. it's sort of my pandemic era project. But I've spent a lot of time, most of my time working on the open source project at this point. So I have done some things that were more just like I'm doing a customization or like a private deployment for some client. But that's been a minority of the time. Yeah.

[00:27:03] Jeremy: It's still impressive to have an open source project that is easy to self-host and yet is still able to support you working on it full time. I think a lot of people might make the assumption that there's nothing to sell if something is, is easy to use.

But this sort of sounds like a counterpoint to that.

[00:27:25] Brandon: I think I'd like it to be. So when you come back to the point of like, it being easy to self-host. Well, so again, like I think about it as like a primitive of the web. Like for example, if you wanted to start a business today as like hosted CSS files, you know, like where you upload your CSS and then you get developers to pay you a monthly subscription for how many times they fetched a CSS file.

Well, I think most developers would be like, that's stupid because it's just an open specification, you just upload a static file. And really my goal is to make Protomaps the same way where it's obvious that there's not really some sort of lock-in or some sort of secret sauce in the server that does this thing.

How PMTiles works and building a primitive of the web

[00:28:16] Brandon: If you look at video for example, like a lot of the tech for how Protomaps and PMTiles works is based on parts of the HTTP spec that were made for video. And 20 years ago, if you wanted to host a video on the web, you had to have like a real player license or flash. So you had to go license some server software from real media or from macromedia so you could stream video to a browser plugin.

But now in HTML you can just embed a video file. And no one's like, oh well I need to go pay for my video serving license. I mean, there is such a thing, like YouTube doesn't really use that for DRM reasons, but people just have the assumption that video is like a primitive on the web. So if we're able to make maps sort of that same way like a primitive on the web then there isn't really some obvious business or licensing model behind how that works. Just because it's a thing and it helps a lot of people do their jobs and people are happy using it. So why bother?

[00:29:26] Jeremy: You mentioned that it a tech that was used for streaming video. What tech specifically is it?

[00:29:34] Brandon: So it is byte range serving. So when you open a video file on the web, So let's say it's like a 100 megabyte video. You don't have to download the entire video before it starts playing. It streams parts out of the file based on like what frames... I mean, it's based on the frames in the video. So it can start streaming immediately because it's organized in a way to where the first few frames are at the beginning.

And what PMTiles really is, is it's just like a video but in space instead of time. So it's organized in a way where these zoomed out views are at the beginning and the most zoomed in views are at the end. So when you're like panning or zooming in the map all you're really doing is fetching byte ranges out of that file the same way as a video.

But it's organized in, this tiled way on a space filling curve. IIt's a little bit complicated how it works internally and I think it's kind of cool but that's sort of an like an implementation detail.

[00:30:35] Jeremy: And to the person deploying it, it just looks like a single file.

[00:30:40] Brandon: Exactly in the same way like an mp3 audio file is or like a JSON file is.

[00:30:47] Jeremy: So with a video, I can sort of see how as someone seeks through the video, they start at the beginning and then they go to the middle if they wanna see the middle. For a map, as somebody scrolls around the map, are you seeking all over the file or is the way it's structured have a little less chaos?

[00:31:09] Brandon: It's structured. And that's kind of the main technical challenge behind building PMTiles is you have to be sort of clever so you're not spraying the reads everywhere. So it uses something called a hilbert curve, which is a mathematical concept of a space filling curve. Where it's one continuous curve that essentially lets you break 2D space into 1D space.

So if you've seen some maps of IP space, it uses this crazy looking curve that hits all the points in one continuous line. And that's the same concept behind PMTiles is if you're looking at one part of the world, you're sort of guaranteed that all of those parts you're looking at are quite close to each other and the data you have to transfer is quite minimal, compared to if you just had it at random.

[00:32:02] Jeremy: How big do the files get? If I have a PMTiles of the entire world, what kind of size am I looking at?

[00:32:10] Brandon: Right now, the default one I distribute is 128 gigabytes, so it's quite sizable, although you can slice parts out of it remotely. So if you just wanted. if you just wanted California or just wanted LA or just wanted only a couple of zoom levels, like from zero to 10 instead of zero to 15, there is a command line tool that's also called PMTiles that lets you do that.

Issues with CDNs and range queries

[00:32:35] Jeremy: And when you're working with files of this size, I mean, let's say I am working with a CDN in front of my application. I'm not typically accustomed to hosting something that's that large and something that's where you're seeking all over the file. is that, ever an issue or is that something that's just taken care of by the browser and, and taken care of by, by the hosts?

[00:32:58] Brandon: That is an issue actually, so a lot of CDNs don't deal with it correctly. And my recommendation is there is a kind of proxy server or like a serverless proxy thing that I wrote. That runs on like cloudflare workers or on Docker that lets you proxy those range requests into a normal URL and then that is like a hundred percent CDN compatible.

So I would say like a lot of the big commercial installations of this thing, they use that because it makes more practical sense. It's also faster. But the idea is that this solution sort of scales up and scales down. If you wanted to host just your city in like a 10 megabyte file, well you can just put that into GitHub pages and you don't have to worry about it.

If you want to have a global map for your website that serves a ton of traffic then you probably want a little bit more sophisticated of a solution. It still does not require you to run a Linux server, but it might require (you) to use like Lambda or Lambda in conjunction with like a CDN.

[00:34:09] Jeremy: Yeah. And that sort of ties into what you were saying at the beginning where if you can host on something like CloudFlare Workers or Lambda, there's less time you have to spend keeping these things running.

[00:34:26] Brandon: Yeah, exactly. and I think also the Lambda or CloudFlare workers solution is not perfect. It's not as perfect as S3 or as just static files, but in my experience, it still is better at building something that lasts on the time span of years than being like I have a server that is on this Ubuntu version and in four years there's all these like security patches that are not being applied.

So it's still sort of serverless, although not totally vendor neutral like S3.

Customizing the map

[00:35:03] Jeremy: We've mostly been talking about how you host the map itself, but for someone who's not familiar with these kind of tools, how would they be customizing the map?

[00:35:15] Brandon: For customizing the map there is front end style customization and there's also data customization. So for the front end if you wanted to change the water from the shade of blue to another shade of blue there is a TypeScript API where you can customize it almost like a text editor color scheme.

So if you're able to name a bunch of colors, well you can customize the map in that way you can change the fonts. And that's all done using MapLibre GL using a TypeScript API on top of that for customizing the data.

So all the pipeline to generate this data from OpenStreetMap is open source. There is a Java program using a library called PlanetTiler which is awesome, which is this super fast multi-core way of building map tiles. And right now there isn't really great hooks to customize what data goes into that. But that's something that I do wanna work on. And finally, because the data comes from OpenStreetMap if you notice data that's missing or you wanted to correct data in OSM then you can go into osm.org.

You can get involved in contributing the data to OSM and the Protomaps build is daily. So if you make a change, then within 24 hours you should see the new base map. Have that change. And of course for OSM your improvements would go into every OSM based project that is ingesting that data. So it's not a protomap specific thing. It's like this big shared data source, almost like Wikipedia.

OpenStreetMap is a dataset and not a map

[00:37:01] Jeremy: I think you were involved with OpenStreetMap to some extent. Can you speak a little bit to that for people who aren't familiar, what OpenStreetMap is?

[00:37:11] Brandon: Right. So I've been using OSM as sort of like a tools developer for over a decade now. And one of the number one questions I get from developers about what is Protomaps is why wouldn't I just use OpenStreetMap? What's the distinction between Protomaps and OpenStreetMap? And it's sort of like this funny thing because even though OSM has map in the name it's not really a map in that you can't... In that it's mostly a data set and not a map.

It does have a map that you can see that you can pan around to when you go to the website but the way that thing they show you on the website is built is not really that easily reproducible. It involves a lot of c++ software you have to run.

But OpenStreetMap itself, the heart of it is almost like a big XML file that has all the data in the map and global. And it has tagged features for example. So you can go in and edit that. It has a web front end to change the data. It does not directly translate into making a map actually.

Protomaps decides what shows at each zoom level

[00:38:24] Brandon: So a lot of the pipeline, that Java program I mentioned for building this basemap for protomaps is doing things like you have to choose what data you show when you zoom out. You can't show all the data. For example when you're zoomed out and you're looking at all of a state like Colorado you don't see all the Chipotle when you're zoomed all the way out. That'd be weird, right?

So you have to make some sort of decision in logic that says this data only shows up at this zoom level. And that's really what is the challenge in optimizing the size of that for the Protomaps map project.

[00:39:03] Jeremy: Oh, so those decisions of what to show at different Zoom levels those are decisions made by you when you're creating the PMTiles file with Protomaps.

[00:39:14] Brandon: Exactly. It's part of the base maps build pipeline. and those are honestly very subjective decisions. Who really decides when you're zoomed out should this hospital show up or should this museum show up nowadays in Google, I think it shows you ads. Like if someone pays for their car repair shop to show up when you're zoomed out like that that gets surfaced.

But because there is no advertising auction in Protomaps that doesn't happen obviously. So we have to sort of make some reasonable choice. A lot of that right now in Protomaps actually comes from another open source project called Mapzen. So Mapzen was a company that went outta business a couple years ago.

They did a lot of this work in designing which data shows up at which Zoom level and open sourced it. And then when they shut down, they transferred that code into the Linux Foundation. So it's this totally open source project, that like, again, sort of like Mapbox gl has this awesome legacy in that this company funded it for years for smart people to work on it and now it's just like a free thing you can use. So the logic in Protomaps is really based on mapzen.

[00:40:33] Jeremy: And so the visualization of all this... I think I understand what you mean when people say oh, why not use OpenStreetMaps because it's not really clear it's hard to tell is this the tool that's visualizing the data? Is it the data itself? So in the case of using Protomaps, it sounds like Protomaps itself has all of the data from OpenStreetMap and then it has made all the decisions for you in terms of what to show at different Zoom levels and what things to have on the map at all. And then finally, you have to have a separate, UI layer and in this case, it sounds like the one that you recommend is the Map Libre library.

[00:41:18] Brandon: Yeah, that's exactly right. For Protomaps, it has a portion or a subset of OSM data. It doesn't have all of it just because there's too much, like there's data in there. people have mapped out different bushes and I don't include that in Protomaps if you wanted to go in and edit like the Java code to add that you can.

But really what Protomaps is positioned at is sort of a solution for developers that want to use OSM data to make a map on their app or their website. because OpenStreetMap itself is mostly a data set, it does not really go all the way to having an end-to-end solution.

Financials and the idea of a project being complete

[00:41:59] Jeremy: So I think it's great that somebody who wants to make a map, they have these tools available, whether it's from what was originally built by Mapbox, what's built by Open StreetMap now, the work you're doing with Protomaps. But I wonder one of the things that I talked about with Tom was he was saying he was trying to build this mapping business and based on the financials of what was coming in he was stressed, right?

He was struggling a bit. And I wonder for you, you've been working on this open source project for five years. Do you have similar stressors or do you feel like I could keep going how things are now and I feel comfortable?

[00:42:46] Brandon: So I wouldn't say I'm a hundred percent in one bucket or the other. I'm still seeing it play out. One thing, that I really respect in a lot of open source projects, which I'm not saying I'm gonna do for Protomaps is the idea that a project is like finished. I think that is amazing.

If a software project can just be done it's sort of like a painting or a novel once you write, finish the last page, have it seen by the editor. I send it off to the press is you're done with a book. And I think one of the pains of software is so few of us can actually do that.

And I don't know obviously people will say oh the map is never finished. That's more true of OSM, but I think like for Protomaps. One thing I'm thinking about is how to limit the scope to something that's quite narrow to where we could be feature complete on the core things in the near term timeframe.

That means that it does not address a lot of things that people want. Like search, like if you go to Google Maps and you search for a restaurant, you will get some hits. that's like a geocoding issue. And I've already decided that's totally outta scope for Protomaps. So, in terms of trying to think about the future of this, I'm mostly looking for ways to cut scope if possible.

There are some things like better tooling around being able to work with PMTiles that are on the roadmap. but for me, I am still enjoying working on the project. It's definitely growing. So I can see on NPM downloads I can see the growth curve of people using it and that's really cool.

So I like hearing about when people are using it for cool projects. So it seems to still be going okay for now.

[00:44:44] Jeremy: Yeah, that's an interesting perspective about how you were talking about projects being done. Because I think when people look at GitHub projects and they go like, oh, the last commit was X months ago. They go oh well this is dead right? But maybe that's the wrong framing. Maybe you can get a project to a point where it's like, oh, it's because it doesn't need to be updated.

[00:45:07] Brandon: Exactly, yeah. Like I used to do a lot of c++ programming and the best part is when you see some LAPACK matrix math library from like 1995 that still works perfectly in c++ and you're like, this is awesome. This is the one I have to use. But if you're like trying to use some like React component library and it hasn't been updated in like a year, you're like, oh, that's a problem.

So again, I think there's some middle ground between those that I'm trying to find. I do like for Protomaps, it's quite dependency light in terms of the number of hard dependencies I have in software. but I do still feel like there is a lot of work to be done in terms of project scope that needs to have stuff added.

You mostly only hear about problems instead of people's wins

[00:45:54] Jeremy: Having run it for this long. Do you have any thoughts on running an open source project in general? On dealing with issues or managing what to work on things like that?

[00:46:07] Brandon: Yeah. So I have a lot. I think one thing people point out a lot is that especially because I don't have a direct relationship with a lot of the people using it a lot of times I don't even know that they're using it. Someone sent me a message saying hey, have you seen flickr.com, like the photo site?

And I'm like, no. And I went to flickr.com/map and it has Protomaps for it. And I'm like, I had no idea. But that's cool, if they're able to use Protomaps for this giant photo sharing site that's awesome. But that also means I don't really hear about when people use it successfully because you just don't know, I guess they, NPM installed it and it works perfectly and you never hear about it.

You only hear about people's negative experiences. You only hear about people that come and open GitHub issues saying this is totally broken, and why doesn't this thing exist? And I'm like, well, it's because there's an infinite amount of things that I want to do, but I have a finite amount of time and I just haven't gone into that yet.

And that's honestly a lot of the things and people are like when is this thing gonna be done? So that's, that's honestly part of why I don't have a public roadmap because I want to avoid that sort of bickering about it. I would say that's one of my biggest frustrations with running an open source project is how it's self-selected to only hear the negative experiences with it.

Be careful what PRs you accept

[00:47:32] Brandon: 'cause you don't hear about those times where it works. I'd say another thing is it's changed my perspective on contributing to open source because I think when I was younger or before I had become a maintainer I would open a pull request on a project unprompted that has a hundred lines and I'd be like, Hey, just merge this thing.

But I didn't realize when I was younger well if I just merge it and I disappear, then the maintainer is stuck with what I did forever. You know if I add some feature then that person that maintains the project has to do that indefinitely. And I think that's very asymmetrical and it's changed my perspective a lot on accepting open source contributions.

I wanna have it be open to anyone to contribute. But there is some amount of back and forth where it's almost like the default answer for should I accept a PR is no by default because you're the one maintaining it. And do you understand the shape of that solution completely to where you're going to support it for years because the person that's contributing it is not bound to those same obligations that you are.

And I think that's also one of the things where I have a lot of trepidation around open source is I used to think of it as a lot more bazaar-like in terms of anyone can just throw their thing in. But then that creates a lot of problems for the people who are expected out of social obligation to continue this thing indefinitely.

[00:49:23] Jeremy: Yeah, I can totally see why that causes burnout with a lot of open source maintainers, because you probably to some extent maybe even feel some guilt right? You're like, well, somebody took the time to make this. But then like you said you have to spend a lot of time trying to figure out is this something I wanna maintain long term? And one wrong move and it's like, well, it's in here now.

[00:49:53] Brandon: Exactly. To me, I think that is a very common failure mode for open source projects is they're too liberal in the things they accept. And that's a lot of why I was talking about how that choice of what features show up on the map was inherited from the MapZen projects. If I didn't have that then somebody could come in and say hey, you know, I want to show power lines on the map.

And they open a PR for power lines and now everybody who's using Protomaps when they're like zoomed out they see power lines are like I didn't want that. So I think that's part of why a lot of open source projects eventually evolve into a plugin system is because there is this demand as the project grows for more and more features. But there is a limit in the maintainers. It's like the demand for features is exponential while the maintainer amount of time and effort is linear.

Plugin systems might reduce need for PRs

[00:50:56] Brandon: So maybe the solution to smash that exponential down to quadratic maybe is to add a plugin system.

But I think that is one of the biggest tensions that only became obvious to me after working on this for a couple of years.

[00:51:14] Jeremy: Is that something you're considering doing now?

[00:51:18] Brandon: Is the plugin system? Yeah. I think for the data customization, I eventually wanted to have some sort of programmatic API to where you could declare a config file that says I want ski routes. It totally makes sense. The power lines example is maybe a little bit obscure but for example like a skiing app and you want to be able to show ski slopes when you're zoomed out well you're not gonna be able to get that from Mapbox or from Google because they have a one size fits all map that's not specialized to skiing or to golfing or to outdoors.

But if you like, in theory, you could do this with Protomaps if you changed the Java code to show data at different zoom levels. And that is to me what makes the most sense for a plugin system and also makes the most product sense because it enables a lot of things you cannot do with the one size fits all map.

[00:52:20] Jeremy: It might also increase the complexity of the implementation though, right?

[00:52:25] Brandon: Yeah, exactly. So that's like. That's really where a lot of the terrifying thoughts come in, which is like once you create this like config file surface area, well what does that look like? Is that JSON? Is that TOML, is that some weird like everything eventually evolves into some scripting language right?

Where you have logic inside of your templates and I honestly do not really know what that looks like right now. That feels like something in the medium term roadmap.

[00:52:58] Jeremy: Yeah and then in terms of bug reports or issues, now it's not just your code it's this exponential combination of whatever people put into these config files.

[00:53:09] Brandon: Exactly. Yeah. so again, like I really respect the projects that have done this well or that have done plugins well. I'm trying to think of some, I think obsidian has plugins, for example. And that seems to be one of the few solutions to try and satisfy the infinite desire for features with the limited amount of maintainer time.

Time split between code vs triage vs talking to users

[00:53:36] Jeremy: How would you say your time is split between working on the code versus issue and PR triage?

[00:53:43] Brandon: Oh, it varies really. I think working on the code is like a minority of it. I think something that I actually enjoy is talking to people, talking to users, getting feedback on it. I go to quite a few conferences to talk to developers or people that are interested and figure out how to refine the message, how to make it clearer to people, like what this is for.

And I would say maybe a plurality of my time is spent dealing with non-technical things that are neither code or GitHub issues. One thing I've been trying to do recently is talk to people that are not really in the mapping space. For example, people that work for newspapers like a lot of them are front end developers and if you ask them to run a Linux server they're like I have no idea.

But that really is like one of the best target audiences for Protomaps. So I'd say a lot of the reality of running an open source project is a lot like a business is it has all the same challenges as a business in terms of you have to figure out what is the thing you're offering.

You have to deal with people using it. You have to deal with feedback, you have to deal with managing emails and stuff. I don't think the payoff is anywhere near running a business or a startup that's backed by VC money is but it's definitely not the case that if you just want to code, you should start an open source project because I think a lot of the work for an opensource project has nothing to do with just writing the code.

It is in my opinion as someone having done a VC backed business before, it is a lot more similar to running, a tech company than just putting some code on GitHub.

Running a startup vs open source project

[00:55:43] Jeremy: Well, since you've done both at a high level what did you like about running the company versus maintaining the open source project?

[00:55:52] Brandon: So I have done some venture capital accelerator programs before and I think there is an element of hype and energy that you get from that that is self perpetuating.

Your co-founder is gungho on like, yeah, we're gonna do this thing. And your investors are like, you guys are geniuses. You guys are gonna make a killing doing this thing. And the way it's framed is sort of obvious to everyone that it's like there's a much more traditional set of motivations behind that, that people understand while it's definitely not the case for running an open source project.

Sometimes you just wake up and you're like what the hell is this thing for, it is this thing you spend a lot of time on. You don't even know who's using it. The people that use it and make a bunch of money off of it they know nothing about it. And you know, it's just like cool.

And then you only hear from people that are complaining about it. And I think like that's honestly discouraging compared to the more clear energy and clearer motivation and vision behind how most people think about a company. But what I like about the open source project is just the lack of those constraints you know?

Where you have a mandate that you need to have this many customers that are paying by this amount of time. There's that sort of pressure on delivering a business result instead of just making something that you're proud of that's simple to use and has like an elegant design. I think that's really a difference in motivation as well.

Having control

[00:57:50] Jeremy: Do you feel like you have more control? Like you mentioned how you've decided I'm not gonna make a public roadmap. I'm the sole developer. I get to decide what goes in. What doesn't. Do you feel like you have more control in your current position than you did running the startup?

[00:58:10] Brandon: Definitely for sure. Like that agency is what I value the most. It is possible to go too far. Like, so I'm very wary of the BDFL title, which I think is how a lot of open source projects succeed. But I think there is some element of for a project to succeed there has to be somebody that makes those decisions.

Sometimes those decisions will be wrong and then hopefully they can be rectified. But I think going back to what I was talking about with scope, I think the overall vision and the scope of the project is something that I am very opinionated about in that it should do these things. It shouldn't do these things. It should be easy to use for this audience. Is it gonna be appealing to this other audience? I don't know.

And I think that is really one of the most important parts of that leadership role, is having the power to decide we're doing this, we're not doing this. I would hope other developers would be able to get on board if they're able to make good use of the project, if they use it for their company, if they use it for their business, if they just think the project is cool.

So there are other contributors at this point and I want to get more involved. But I think being able to make those decisions to what I believe is going to be the best project is something that is very special about open source, that isn't necessarily true about running like a SaaS business.

[00:59:50] Jeremy: I think that's a good spot to end it on, so if people want to learn more about Protomaps or they wanna see what you're up to, where should they head?

[01:00:00] Brandon: So you can go to Protomaps.com, GitHub, or you can find me or Protomaps on bluesky or Mastodon.

[01:00:09] Jeremy: All right, Brandon, thank you so much for chatting today.

[01:00:12] Brandon: Great. Thank you very much.

Hong Minhee on ActivityPub

Fri, 28 Feb 2025 07:25:00 -0000

Hong Minhee is an open source developer and the creator of the Fedify ActivityPub server framework.

We talk about how applications like Mastodon and Misskey communicate with one another using ActivityPub. This includes discussions on built-in activites, extending the specification in a backwards compatible way, difficulties implementing JSON-LD, the inbox model, and his experience implementing the specification.

Hong Minhee:

Specifications:

ActivityPub implementations:

ActivityPub tools:

Transcript

You can help correct transcripts on GitHub.

What's ActivityPub?

[00:00:00] Jeremy: Today, I'm talking to Hong Minhee. He is the developer of Fedify. A TypeScript library for building ActivityPub server applications.

The first thing I think we should start with is defining ActivityPub. what is ActivityPub?

[00:00:16] Hong: ActivityPub is the protocol that lets social networks talk to each other and it's officially recommended by W3C. It's what powers this thing we call the Fediverse which is basically a way for different social media platforms to work together.

Users of ActivityPub

[00:00:39] Jeremy: Can you give some examples that people might have heard of -- either users of ActivityPub or things that are a part of this fediverse?

[00:00:50] Hong: Mastodon is probably the biggest one out there. And you know what's interesting?

Meta threads has actually started implementing ActivityPub this summer. So this still pretty much a one way street right now. In East Asia, especially Japan, there's this really popular microblogging platform called misskey.

It's got so many forks that people actually joke around and called them forkeys. but it's not just about Twitter style microblogging, there's Pixelfed which is kind of like Instagram, but for the fediverse. And those same folks recently launched loops. Which is basically doing what TikTok does, but in the Fediverse.

Then you've got stuff like Lemmy and which are doing the reddit thing up in the Fediverse.

[00:02:00] Jeremy: Oh like Reddit.

[00:02:01] Hong: Yeah. There are so much more out there that I haven't even mentioned. Um, most of it is open source, which is pretty cool.

[00:02:13] Jeremy: So the first few examples you gave, Mastodon and Meta's threads, they're very similar to, to Twitter, right? So that's what you were calling the, the Microblogging applications.

And I think what you had said, which is a little bit interesting is you had said Metas threads is only one way. So could you kind of describe like what you mean by that?

[00:02:37] Hong: Currently meta threads only can be followed by other ActivityPub applications but you cannot follow other people in the fediverse.

[00:02:55] Jeremy: People who are using another Microblogging platform like Mastodon can follow someone on Meta's Threads platform. But the other way is not true. If you're on threads, you can't follow someone on Mastodon.

[00:03:07] Hong: Yes, that's right.

[00:03:09] Jeremy: And that's not a limitation of the protocol itself. That's a design decision or a decision made by Meta.

[00:03:17] Hong: Yeah. They are slowly implementing ActivityPub and I hope they will implement complete ActivityPub in the future.

Interoperability through Activities

[00:03:27] Jeremy: And then the other examples you gave, one is I believe it was Pixel Fed is very similar to Instagram. And then the last examples you gave was I think it was Lemmy, you said it's similar to Reddit. Because you mentioned the term Fediverse before and you mentioned that these all use ActivityPub and since these seem like different kinds of applications, what does it mean for them to interact?

Because with Mastodon and Threads I can kind of understand because they're both similar to Twitter. So you're posting messages and replying, but, but what does it mean, for example, for someone on Mastodon to interact with someone on Lemmy which is like Reddit because they seem very different.

[00:04:16] Hong: People in Lemmy and Mastodon are called actors and can follow each other.

They have interactions between them called activities. And there are several types of activities like, create and follow and undo, like, and so on.

So, ActivityPub applications tend to, use these vocabulary to implement their features.

So, for example, Lemmy uses like activities for upvoting and like activities for down voting and it's translated to likes in Mastodon. So if you submit a post on Lemmy and it shows up on your Mastodon timeline. If you like that post (it) is upvoting in Lemmy.

[00:05:36] Jeremy: And probably similarly with Pixelfed, which you said is like Instagram, if you follow someone's Pixelfed account in Mastodon and they post a photo in Pixel Fed, they would see it as a post in Mastodon natively and they could give it a like there.

Adding activities or properties

[00:05:56] Jeremy: And these activities that you mentioned -- So the like and the dislike are those part of ActivityPub itself?

[00:06:05] Hong: Yes, and this vocabulary can be extended.

[00:06:10] Jeremy: So you can add, additional actions (activities) or are you adding information (properties) to the existing actions?

[00:06:37] Hong:

It is called activity vocabulary, and there are, things like accept, add, arrive, block, create lead, dislike, flag, follow, ignore invite, join, and so on.

So, basically, almost everything you need to build social media is already there in the vocabulary, but if you want to extend some more, you can define your, own vocabulary.

[00:06:56] Jeremy: Most of the things that an Instagram or a Twitter, or a Reddit would need is already there. But you're saying that you can have your own vocabulary. So if there's an action or an activity that is not covered by the specification, you can create one yourself.

[00:07:13] Hong: Yes. For example, Misskey and Pleroma defined emoji reactor to represent emoji reactions.

[00:07:25] Jeremy: Because the systems can extend the vocabulary. What are some other examples of cases where mastodon or any other of these systems has found that the existing vocabulary is not enough. What are some other examples of applications extending it?

[00:07:45] Hong: For example, uh, mastodon defined suspended -- suspended property. They are not activities, but they are properties in the activity.

ActivityPub consists of several types of objects and there are activities and normal objects like, article. they can have properties and there are several existing properties, but they can be also extended. So Mastodon extended some properties they need. So for example, they define suspended or discoverable.

[00:08:44] Hong: Suspended for to tell if an actor is suspended by moderators. Discoverable tells if an actor itself wants to be, searched and indexed, and there are much more properties. Mastodon extended.

Actors

[00:09:12] Jeremy: And these are, these are properties of the actor. These are properties of the user?

[00:09:19] Hong: Yes. Actors.

[00:09:21] Jeremy: Cause I think earlier you mentioned that. The concept of a user is an actor, and it sounds like what you're saying is an actor can have all these properties. There's probably a, a username and things like that, but Mastodon has extended the properties so that, you can have a property on whether you wanna be searched or indexed you can have a property that says you're suspended.

So I guess your account, is still there, but can't be used anymore.

Something we should probably talk about then is, so you have these actors, you have these activities that I'm assuming the actors are performing on one another. What does that data look like and what does the communication look like?

[00:10:09] Hong: Actors have their own dereferencable URI and when you look up that URI you get all the info about the actor in JSON-LD format

[00:10:22] Jeremy: JSON-LD?

[00:10:23] Hong: Yeah. JSON-LD. linked data.

(The) Actor has all the stuff you expect to find on a social account name, bio URL to the profile page, profile picture, head image and more. And there are five main types of actors: application, group, organization, person and service.

And you know how sometimes on Mastodon you will see an account marked as a bot?

[00:10:58] Jeremy: A bot?

[00:10:59] Hong: Yeah. Bot and that's what an actor of type service looks like. And the ActivityPub spec actually let you create other types beyond these five. But I haven't seen anyone actually do that yet.

JSON-LD

[00:11:15] Jeremy: And you mentioned that these are all JSON objects. but the LD part, the linked data part, I'm not familiar with. So what different about the linked data part of the JSON?

[00:11:31] Hong: So JSON-LD is the special way of writing RDF. Which was originally used in the semantic web. Usually RDF uses (a) format (that) is called triples.

[00:11:48] Jeremy: Triples?

[00:11:49] Hong: Yeah, subject and predicate and object.

[00:11:55] Jeremy: Subject, predicate, object. Can you give an example of what those three would be?

[00:12:00] Hong: For example, is a person, it's a triple. John is a subject and is a predicate

[00:12:11] Jeremy: is, is the predicate.

[00:12:12] Hong: Okay. And person is a object. That's great for showing how things are connected, but it is pretty different from how we usually handle data in REST for APIs and stuff. Like normally we say a personal object has property like name, DOB, bio, and so on. And a bunch of subject predicated object triples that's where JSON-LD comes in -- is designed to look more like the JSON we are used to working with, while still being able to represent RDF Graphs.

RDF graph are ontology. It's a way to represent factual data, but is, quite different from, how we represent data in relational database. And it's a bunch of triples each subject and objects are nodes and predicates connect these nodes.

Semantic Web

[00:13:30] Jeremy: You mentioned the Semantic web, what does that mean? What is the semantic web?

[00:13:35] Hong: It's a way to represent web in the structural way, is machine readable so that you can, scan the data in the web, using scrapers or crawlers.

[00:13:52] Jeremy: Scrapers -- or what was the second one? Crawling.

[00:13:59] Hong: Yeah.

Then you can have graph data of web and you can, query information about things from the data.

[00:14:14] Jeremy: So is the web as it exists now, is that the Semantic web or is it something different?

[00:14:24] Hong: I think it is partially semantic web, you have several metadata in Your HTML. For example, there are several specification for semantic web, like, OpenGraph metadata.

[00:14:32] Jeremy: Cause when I think about OpenGraph, I think about the metadata on a webpage that, that tells other applications or websites that if you link to this page: show this image or show this title and description. You're saying that specifically you consider part of the semantic web?

[00:15:05] Hong: That's, semantic web.

To make your website semantic web. Your website should be able to, provide structural data. And other people can make Scrapers to scan, structural data from your website.

There are a bunch of attributes and text for HTML to represent metadata. For example you have relation attribute rel so if you have a link with rel=me to your another social profile. Then other people can tell two web pages represent the same person.

[00:16:10] Jeremy: Oh, I see. So you could have more than one website. Maybe one is your blog and maybe one is your favorite birds or something like that. But you could put a rel tag with information about you as a person so that someone who scrapes both websites could look at that tag and see that both of these websites are by, Hong, by this person.

JSON-LD is difficult to implement and not used as intended

[00:16:43] Hong: Yeah. I think JSON-LD is, designed for semantic web, but in reality, ActivityPub implementations, most of them are, not aware of semantic web.

[00:17:01] Jeremy: The choice of JSON Linked Data, the JSON-LD, by the people who made the specification -- They had this idea that things that implemented ActivityPub would be a part of this semantic web, but the actual implementation of a Mastodon or a Pixelfed, they use JSON-LD because it's part of the specification, but the way they use it, it ends up not really being a part of this semantic web.

[00:17:34] Hong: Yeah, that's exactly..

[00:17:37] Jeremy: You've mentioned that implementing it is difficult. What makes implementing JSON LD particularly hard?

[00:17:48] Hong: The JSON-LD is quite complex. Which is why a lot of programming language don't even have JSON-LD implementations and it's pretty slow compared to just working with the regular JSON.

So, what happens is a lot of ActivityPub implementations just treat JSON-LD like (it) is regular JSON without using a proper JSON-LD processor. You can do that, but it creates a source of headache. In JSON-LD there are weird equivalences like if a property is missing or if it's an empty array, that means the same thing.

Or if a property has one value versus an array with just that one value in it, same thing.

So when you are writing code to parse JSON-LD, you've got to keep checking if something's an array how long it is and all that is super easy to mess up. It's not just reading JSON-LD that's tricky. Creating it is just as bad. Like you might forget to include the right context metadata for a vocabulary and end up with a JSON-LD document that's either invalid or means something totally different from what you wanted.

Even the big ActivityPub implementations mess this up pretty often. With Fedify we've got a JSON-LD processor built in and we keep running into issues where major ActivityPub implementations create invalidate JSON-LD. We've had to create workaround for all of them, but it's not pretty and causes kind of a mess.

[00:19:52] Jeremy: Even though there is a specification for JSON-LD, it sounds like the implementers don't necessarily follow it. So you are kind of parsing JSON-LD, but not really. You're parsing something that. Looks like JSON-LD, but isn't quite it.

[00:20:12] Hong: Yes, that's right.

[00:20:14] Jeremy: And is that true in the, the biggest implementations, Mastodon, for example, are there things that it sends in its activities that aren't valid JSON-LD?

[00:20:26] Hong: Those implementations that had bad JSON-LD tends to fix them soon as a possible. But regressions are so often made. Yeah.

[00:20:45] Jeremy: Even within Mastodon, which is probably one of the largest implementers of ActivityPub, there are cases where it's not valid, JSON-LD and somebody fixes it. But then later on there are other messages or other activities that were valid, but aren't valid anymore. And so it's this, it's this back and forth of fixing them and causing new issues it sounds ...

[00:21:15] Hong: Yeah. Yeah. Right.

[00:21:17] Jeremy: Yeah. That sounds very difficult to deal with.

How instances communicate (Inbox)

[00:21:20] Jeremy: We've been talking about the messages themselves are this special format of JSON that's very particular. but how do these instances communicate with one another?

[00:21:32] Hong: Most of time, it all starts with a follow. Like when John follows Alice, then Alice adds both John and John's inbox URI to her followers list, and after John follows Alice, Whenever Alice posts something new that activities get sent to John's inbox behind the scenes.

This is just one HTTP post request. Even though ActivityPub is built on HTTP. It doesn't really care about the HTTP response beyond did it work or not. If you want to reply to an activity, you need to figure out the standard inbox, URI and send or reply activity there.

[00:22:27] Jeremy: If we define all the terms, there's the actor, which is the person, each actor can send different activities. those activities are in the form of a JSON linked data.

[00:22:40] Hong: Yeah.

[00:22:42] Jeremy: And everybody has an inbox. And an inbox is an HTTP URL that people post to.

[00:22:50] Hong: Right.

[00:22:52] Jeremy: And so when you think about that, you had mentioned that if you have a list of followers, let's say you have a hundred followers, would that mean that you have the URLs to all hundred of those follower's inboxes and that you would send one HTTP post to each inbox every time you had a new message?

[00:23:16] Hong: Pretty much all ActivityPub implementations have, a thing called shared inbox, it's exactly what it sounds like. One inbox that all actors on a server share. Private stuff like DMs don't go there (it) is just for public posts and thoughts.

[00:23:36] Jeremy: I think we haven't really talked about the fact that, when you have multiple users, usually they're on a server, right? That somebody chooses. So you could have tens of thousands, I don't know how many people can fit on the same server. But, rather than, you having to post to each user individually, you can post to the shared inbox on this server.

So let's say, of your 100 followers, 50 them are on the same server, and you have a new post, you only need to post to the shared inbox once.

[00:24:16] Hong: Yes, that's right.

[00:24:18] Jeremy: And in that message you would I assume have links to each of the profiles or actors that you wanted to send that message to.

[00:24:30] Hong: Yeah.

Scaling challenges

[00:24:31] Jeremy: Something that I've seen in the past is there are people who have challenges with scaling. Their Mastodon instance or their implementations of ActivityPub. As the, the number of followers grow, I've seen a post about, ghost one of the companies you work with mentioning that they've had challenges there.

What are the challenges there and, and how do you think those can be resolved?

[00:25:04] Hong: To put this in context, when Ghost mentioned the scaling, they were not using Message Queue yet. I'm pretty sure using Message Queue would help a lot of their scaling problems.

That said it is definitely true that a lot of activity post software has trouble with scaling right now.

I think part of the problem is that everyone's using this purely event driven approach to sending activities around.

One of the big issues is that when their delivery fails it's the sender who has to retry and not the receiver. Plus there's all this overhead because the sender has to authenticate itself with HTTP signatures every time. Actually the ActivityPub spec suggests using polling too so I'd love to see more ActivityPub software try using both approaches together.

[00:26:16] Jeremy: You mean the followers would poll who they're following instead of the person posting the messages having to send their posts to everyone's inboxes.

[00:26:29] Hong: Yeah.

[00:26:29] Jeremy: I see. So that's a part of the ActivityPubs specification, but not implemented in a lot of ActivityPub implementations,

And so it sounds like maybe that puts a lot of burden on the servers that have people with a lot of followers because they have to post to every single, follower server and maybe the server is slow or they can't reach it. And like you said, they have to just keep trying and trying. There could be a lot of challenges there.

[00:27:09] Hong: Right.

Account migration

[00:27:10] Jeremy: We've talked a little bit about the fact that each person each actor is hosted by a server and those servers can host multiple actors. But if you want to move to another server either because your server is shutting down or you just would like to change servers, what are some of the challenges there?

[00:27:38] Hong: ActivityPub and Fediverse already have the specification for an account move. It's called FEP-7628 Move Actor.

First thing you need to do when moving an account is prove that both the old and new accounts belong to the same person. You do this by adding the all accounts, add the URI to the new account's AlsoKnownAs property. And then the old account contacts all the other instances it's moving by sending out a move activity.

When a server gets this move activity, it checks that both accounts really do belong to the same parts, and then it makes all the accounts that, uh, were following the, all the accounts start to, following the new one instead.

that's how the new account gets to keep all the, all the accounts follow us. pretty much all, all the major activity post software has this feature built in, for example, Mastodon Misskey you name it.

[00:29:04] Jeremy: This is very similar to the post where when you execute a move, the server that originally hosted that actor, they need to somehow tell every single other server that was following that account that you've moved. And so if there's any issues with communicating with one of those servers, or you miss one, then it just won't recognize that you've moved. You have to make sure that you talk to every single server.

[00:29:36] Hong: That's right.

[00:29:38] Jeremy: I could see how that could be a difficult problem sometimes if you have a lot of followers.

[00:29:45] Hong: Yeah.

Fedify

[00:29:46] Jeremy: You've created a TypeScript library Fedify for building ActivityPub powered applications. What was the reason you decided to create Fedify?

[00:29:58] Hong: Fedify is (a) ActivityPub servers framework I built for TypeScript. It basically takes away a lot of headaches you'd get trying to implement (an) ActivityPub server from scratch. The whole thing started because I wanted to build hollo -- A single user microblogging platform I built. But when I tried, to implement ActivityPub from (the) ground up it was kind of a nightmare.

Imagine trying to write a CGI program in Perl or C back in the late nineties, where you are manually printing, HTTP headers and HTML as bias. there just wasn't any good abstraction layer to go with. There were already some libraries and frameworks for ActivityPub out there but none of them really hit the sweet spot I was looking for. They were either too high level and rigid. Like you could only build a mastodon clone or they barely did anything at all. Or they were written in languages I didn't really know.

Ghost and Fedify

[00:31:24] Jeremy: I saw that you are doing some work with, ghost. How is Ghost using fedify?

[00:31:30] Hong: Ghost is an open source publishing platform. They have put some money into fedify which is why I get to work on it full time now. Their ActivityPub feature is still in private beta but it should be available to everyone pretty soon.

We work together to improve fedify. Basically they are a user of fedify. They report bugs request new features to fedify then I fix them or implement them, first.

[00:32:16] Jeremy: Ghost to my understanding is a blogging platform and a a newsletter platform. So what does it mean for them to implement ActivityPub? What would somebody using Mastodon, for example, get when they follow somebody using Ghost?

[00:32:38] Hong: Ghost will have a fediverse handle for each blog. If you follow them in your mastodon or something (similar) then a new post is published. These post will show up (in) your timeline in Mastodon and you can like them or share them. Andin the dashboard of Ghost you can see who liked their posts or shared their posts and so on.

It is like how mastodon works but in Ghost.

[00:33:26] Jeremy: I see. So if you are writing a ghost blog and somebody follows your blog from Mastodon, sort of like we were talking about earlier, they can like your post, and on the blog itself you could show, oh, I have 200 likes. And those aren't necessarily people who were on your ghost website, they could be people that were liking your post from Mastodon.

[00:33:58] Hong: Yes.

Misskey / Forkey development in Asia

[00:34:00] Jeremy: Something you mentioned at the beginning was there is a community of developers in Asia making forks of I believe of Mastodon, right?

[00:34:13] Hong: Yeah.

[00:34:14] Jeremy: Do you have experience working in that development community? What's different about it compared to the more Western centric community?

[00:34:24] Hong: They are very similar in most ways. The key difference is language of course. They communicate in Japanese primarily.

They also accept pull requests with English. But there are tons of comments in Japanese in their code. So you need to translate them into English or your first language to understand what code does.

So I think that makes a barrier for Western developers. In fact, many Western developers that contribute to misskey or forkey are able to speak a little Japanese. And many of the developers of misskey and forkey are kind of otaku.

[00:35:31] Jeremy: Oh otaku okay.

[00:35:33] Hong: It's not a big deal, but you can see (the) difference in a glance.

[00:35:41] Jeremy: Yeah. You mentioned one of the things that I believe misskey implemented was the emoji reactions and maybe one of the reasons they wanted that was so that they could react to each other's posts with you know anime pictures or things like that.

[00:35:58] Hong: Yeah, that's right.

[00:36:01] Jeremy: You've mentioned misskey and forkey. So is misskey a fork of Mastodon and then is forkey a fork of misskey?

[00:36:10] Hong: No, misskey is not a fork of mastodon. (It) is built from scratch. It's its own implementation. And forkeys are forks of Mastodon.

[00:36:22] Jeremy: Oh, I see. But both of those are primarily built by Japanese developers.

[00:36:30] Hong: Yes. Whereas Mastodon (is) written in Ruby. Ruby on Rails. But misskey is built in TypeScript.

[00:36:40] Jeremy: And because of ActivityPub -- they all implement it. So you can communicate with people between mastodon and misskey because they all understand the same activities.

[00:36:56] Hong: Yes.

Backwards compatible activity implementations

[00:36:57] Jeremy: You did mention since there are extensions like misskey has the emoji reactions. When there is an activity that an implementation doesn't support what happens between the two servers? Do you send it to a server's inbox and then the server just doesn't do anything with it?

[00:37:16] Hong: Some implementers consider backwards compatibility. So they design (it) to work with other implementations that don't support that activity.

For example misskey uses like activity for emoji reaction. So if you put an emoji to a Mastodon post then in Mastodon you get one like. So it's intended behavior by misskey developers that they fall back to normal likes.

But sometimes ActivityPub implementers introduce entirely new activity types. For example Pleroma introduced the emoji react. And if you put emoji reaction to Mastodon post from Pleroma in Mastodon you have nothing to see because Mastodon just ignores them.

[00:38:37] Jeremy: If I understand correctly, both misskey and Pleroma are independent implementations of ActivityPub, but with misskey, they can tell when or their message is backwards compatible where it's if you don't understand the emoji reaction, it'll be embedded inside of a like message.

Whereas with Pleroma they send an activity that Mastodon can't understand at all. So it just doesn't do anything.

[00:39:11] Hong: Yes, right. But, Misskey also understands (the) emoji react activity. So between pleroma and misskey they have exchanged emoji reactions with no problem.

[00:39:27] Jeremy: Oh, I see. So they, they both understand that activity. They both implement it the same way, but then when misskey communicates with Mastodon or with an instance that it knows doesn't understand it, it sends something different.

[00:39:45] Hong: Yeah, that's right.

[00:39:47] Jeremy: The servers -- can they query one another to know which activities they support?

[00:39:53] Hong: Usually ActivityPub implementations also implement NodeInfo specification.

It's like a user agent-like thing in Fediverse. Implementations tell the other instance (if it) is Mastodon or something else. You can query the type of server.

[00:40:20] Jeremy: Okay, so within ActivityPub are each of the servers -- is the term node is that the word they use for each server?

[00:40:31] Hong: Yes. Right.

[00:40:32] Jeremy: You have the nodes, which can have any number of actors and the servers send activities to one another, to each other's inboxes. And so those are the way they all communicate.

[00:40:49] Hong: Yeah.

Building an ActivityPub implementation

[00:40:50] Jeremy: You've implemented ActivityPub with Fedify because you found like there weren't good enough implementations or resources already. Did you implement it based off of the specification or did you look at existing implementations while you were building your implementation?

[00:41:12] Hong: To be honest, instead of just, diving into the spec. I usually start by looking at actually ActivityPub software code first. The ActivityPub spec is so vague that you can't really build something just from reading it. So when we talk about ActivityPub, we are actually talking about a whole bunch of other technical standards too, WebFinger, HTTP signatures and more. So you need to understand all of these as well.

[00:41:47] Jeremy: With the specification alone, you were saying it's too vague and so what ends up being -- I'm not sure if it's right to call it a spec, but looking at the implementations that people have already made that collectively becomes the spec because trying to follow the spec just by itself is maybe too difficult.

[00:42:12] Hong: Yes.

[00:42:14] Jeremy: Maybe that brings up the issues you were talking about before where you have specifications like JSON-LD where they're so complicated that even the biggest implementations aren't quite following it exactly.

[00:42:28] Hong: Yeah.

[00:42:29] Jeremy: If somebody wanted to, to get started with understanding a little bit more about ActivityPub or building something with it where would you recommend they start?

[00:42:44] Hong: I recommend to dig into a lot of code from actual implementations.

First, Mastodon, Misskey, Akkoma and so on. There are are some really cool tools that have been so helpful. For example, ActivityPub Academy is this awesome mastodon server for debugging ActivityPub. It makes it super easy to create a temporary account and see what activities are going back and forth.

There is also BrowserPub. BrowserPub is this neat tool for looking up and browsing ActivityPub objects. It's really handy when you want to see how different ActivityPub software handles various features. I also recommend to use Fedify. I've got to mention the Fedify CLI, which comes with some really useful tools.

[00:43:46] Jeremy: So if someone uses Fedify they're writing an application in TypeScript, then it sounds like they have to know the high level concepts. They have to know what are the different activities, what is inside of an actor. But the actual implementation of how do I create and parse JSON linked data, those kinds of things are taken care of by the library.

[00:44:13] Hong: Yes, right.

[00:44:16] Jeremy: So in some ways it seems like it might be good to, like you were saying, use the tools you mentioned to create a test Mastodon account, look at the messages being sent back and forth, and then when you're trying to implement it, starting with something like Fedify might be good because then you can really just focus on the concepts and not worry so much about the, the implementation details.

[00:44:43] Hong: Yes, that's right.

[00:44:45] Jeremy: Is there anything else you. Wanted to mention or thought we should have talked about?

[00:44:52] Hong: Mm. I want to, talk about, a lot of stuff about ActivityPub but it's difficult to speak in English for me, so, it's a shame to talk about it very little.

[00:45:15] Jeremy: We need everybody to learn Korean right?

[00:45:23] Hong: Yes, please. (laughs)

[00:45:23] Jeremy: Yeah. Well, I wanna thank you for taking the time. I know it must have been really challenging to give an interview in, you know, a language that's not your native one. So thank you for spending the time to talk with me.

[00:45:38] Hong: Thank you for having me.

Prefetcher on Building PinkSea on the AT Protocol

Tue, 25 Feb 2025 07:10:00 -0000

Kacper "prefetcher" Staroń created the PinkSea oekaki BBS on top of the AT Protocol. He also made the online multiplayer game MicroWorks with Noam "noam 2000" Rubin. He's currently studying Computer Science at the Lublin University of Technology.

We discuss the appeal of oekaki BBSs, why and how PinkSea was created, web design of the early 2000s, flash animations, and building an application on top of the AT Protocol.

Prefetcher

Bluesky
Github
Personal site
Microworks (Free multiplayer game)

PinkSea and Harbor

Early web design

Web Design Museum
Pixel Art in Web Design
Kaliber10000
Eboy
Assembler
2advanced
epuls.pl (Polish social networking site)
Wipeout 3 aesthetic
Restorativland (Geocities archive)

Flash sites and animations

Geocities style web hosts

AT Protocol / Bluesky

PDS
Relay
AppViews
PLC directory
Decentralized Identifier
lexicon
Jetstream
XRPC
ATProto scraping (List of custom PDS and did:web)

Tools to view PDS data

Posters mentioned

vertigris (Artist that promoted PinkSea)
Mary (AT Protocol enthusiast)
Brian Newbold (Bluesky employee)

Oekaki drawing applets

Group drawing canvas

Transcript

You can help correct transcripts on GitHub.

Intro

[00:00:00] Jeremy: Today I am talking to Kacper Staroń. He created an oekaki BBS called PinkSea built on top of the AT protocol, and he's currently studying computer science at the Lublin University of Technology.

We are gonna discuss the appeal of oekaki BBS, the web design of the early 2000s, flash animations, and building an application on top of the AT protocol.

Kacper, thanks for talking with me today.

[00:00:16] Prefetcher: Hello. Thank you for having me on. I'm Kacper Staroń also probably you know me as Prefetcher online. And as Jeremy's mentioned, PinkSea is an oekaki drawing bulletin board. You log in with your Bluesky account and you can draw and post images. It's styled like a mid to late 2000s website to keep it in the spirit.

What's an oekaki BBS?

[00:00:43] Jeremy: For someone who isn't familiar with oekaki BBSs what is different about them as opposed to say, a photo sharing website?

[00:00:53] Prefetcher: The difference is that a photo sharing website you have the image already premade be it a photo or a drawing made in a separate application. And you basically log in and you upload that image.

For example on Instagram or pixiv for artists even Flickr. But in the case of an oekaki BBS the thing that sets it apart is that oekaki BBSes already have the drawing tools built in. You cannot upload an already pre-made image with there being some caveats. Some different oekaki boards allow you to upload your already pre-made work.

But Pinksea restricts you to a tool called Tegaki. Tegaki being a drawing applet that was built for one of the other BBSes and all of the drawing tools are inside of it. So you draw from within PinkSea and you upload it to the atmosphere. Every image that's on PinkSea is basically drawn right on it by the artists.

No one can technically upload any images from elsewhere.

How PinkSea got started and grew

[00:01:56] Jeremy: You released this to the world. How did people find it and how many people are using it?

[00:02:02] Prefetcher: I'll actually begin with how I've made it 'cause it kind of ties into how PinkSea got semi-popular. One day I was just browsing through Bluesky somewhere in the late 2024s. I was really interested in the AT Protocol and while browsing, one of the artists that I follow vertigris posted a post basically saying they'd really want to see something a drawing canvas like Drawpile or Aggie on AT Protocol or something like an oekaki board. And considering that I was really looking forward to make something on the AT Protocol. I'm like, that sounds fun.

I used to be a member of some oekaki boards. I don't draw well but it's an activity that I was thinking this sounds like a fun thing to do. I'm absolutely down for it. From like, the initial idea to what I'd say was the first time I was proud to let someone else use it. I think it was like two weeks.

I was posting progress on Bluesky and people seemed eager to use it. That kept me motivated. And yeah. Right as I approached the finish I posted about it as a response to vertigris' posts and people seemed to like it.

I sent the early version to a bunch of artists. I basically just made a post calling for them. Got really positive feedback, things to fix, and I released it. And thanks to vertigris the post went semi-viral.

The launch I got a lot of people which I would also tie to the fact that it was right after one of the user waves that came to Bluesky from other platforms. The website also seemed really popular in Japan. I remember going to sleep, waking up the next day, and I saw like a Japanese post about PinkSea and it had 2000 reposts and 3000 likes and I was just unable to believe it.

Within I think the first week we got like 1000 posts overall which to me is just insane. For a week straight I just kept looking at my phone and clicking, refresh, refresh, refresh, just seeing the new posts flow in. There was a bunch of like really insane talented artists just posting their works.

And I just could not believe it. PinkSea got I'd say fairly popular as an alternative AppView. People seem to really want oekaki boards back and I saw people going, oh look, it's like one of those 2000s oekaki boards! Oh, that's so cool! I haven't seen them in forever!

The art stands out because it's human made

[00:04:58] Prefetcher: And it made me so happy every single time seeing it. It's been since November, like four months, give or take. And today alone we got five posts. That doesn't sound seem like a lot but given that every single post is hand drawn it's still insane.

People go on there and spend their time to produce their own original artworks.

[00:05:26] Jeremy: This is especially relevant now when you have so much image generation stuff and they're making images that look polished but you're kind of like well... did you draw it?

[00:05:39] Prefetcher: Yeah.

[00:05:40] Jeremy: And when you see people draw with these oekaki boards using the tools that are there I think there's something very human and very nostalgic about oh... This came from you.

[00:05:53] Prefetcher: Honestly, yeah. To me seeing even beginner artists 'cause PinkSea has a lot of really, really talented and popular people (and) also beginner artists that do it as a hobby. Ones that haven't been drawing for a long time.

And no matter what you look at you just get like that homely feeling that, oh, that's someone that just spent time. That's someone that just wanted to draw for fun. And at least to me, with generative AI like images it really lacks that human aspect to it.

You generate an image, you go, oh, that's cool. And it just fades away. But in this case you see people that spent their time drawing it spent their own personal time. And no matter if it's a masterpiece or not it's still incredibly nice to see people just do it for fun.

[00:06:54] Jeremy: Yeah. I think whether it's drawing or writing or anything now more than ever people wanna see something that you made yourself right? They wanna know that a human did this.

[00:07:09] Prefetcher: Yeah. absolutely.

[00:07:11] Jeremy: So it sounds like, in terms of getting the initial users and the ones that are there now, it really all came out of a single Bluesky posts that an existing artist (vertigris) noticed and boosted. And like you said, you were lucky enough to go viral and that carried you all the way to now and then it just keeps going from there,

[00:07:36] Prefetcher: Basically if not for vertigris PinkSea (would) just not exist because I honestly did not think about it. My initial idea on making something on ATProto and maybe in the future I'll do something like that would be a platform like StumbleUpon -- Something that would just allow you to go on a website, press a button, and it gets uploaded to your repo and your friends would be able to see oh -- you visited that website and there would be an AppView that would just recommend you sites based on those categories.

I really liked that idea and I was dead set on making it but then like I noticed that post (from vertigris) and I'm like, no, that's better. I really wanna make that. And yeah. So right here I want to give a massive shout out to vertigris 'cause they've been incredibly nice to me. They've even contributed the German translation of PinkSea which was just insane to me.

And yeah, massive shout out to every single other artist that, Reposted it, liked it, used it because, it's all just snowballed from there and even recently I've had another wave of new users from the PinkSea account. So there are periods where it goes up and it like goes chill -- and then popular again.

Old internet and flash

[00:08:59] Jeremy: Yeah. And so something that you mentioned is that some people who came across it they mentioned how it was nostalgic or it looked like the old oekaki BBSs from the early internet. And I noticed that that was something that you posted on your own website that you have an interest in that specifically.

I wonder what about that part of the internet interests you?

[00:09:26] Prefetcher: That is a really good question. Like, to me, even before PinkSea my interests lie in the early internet. I run on Twitter and also on Bluesky now an account called My Flash Archive, which was an archive of very random, like flash animations. And I still do that just not as much anymore 'cause I have a lot of other things to do.

I used to on Google just type in Flash and look through the oldest archived random folders just having flash videos. And I would just go over them save all of that or go on like the dagobah or Z0r or swfchan. 'cause the early internet to me, it was really like more explorative.

'cause like now you have, people just concentrated in those big platforms like Twitter, Instagram, Facebook, whatever.

And back then at least to me you had more websites that you would just go on, you would find cool stuff. And the designs were like sometimes very minimal, aesthetically pleasing.

I'd named here one of my favorite sites, Kaliber10000 which had just fantastic web design. Like, I, I also spend a lot of time on like the web design museum just like looking at old web design and just in awe.

My flash archive on Twitter at least got very popular. I kind of abandoned that account, but I think it was sitting at 12,000 followers if not more? And showed that people also yearn for that early internet vibe. And to me it feels really warm. Really different from the internet nowadays.

Even with the death of flash you don't really have interactive experiences like it anymore. 'cause flash was supposed to be replaced by HTML5 and JavaScript and whatever but you don't really make interactive experiences that just come packaged in a single file like flash.

You need a website and everything. In flash, it just had a single file. It could be shared on multiple sites and just experienced. That kind of propelled my interest. Plus I, I dunno, I just really like the old internet design aesthetics it really warms me (and really close..?)

Flash loops

[00:12:01] Jeremy: The flash one specifically. Were they animations or games or was there a specific type of a flash project that spoke to you?

[00:12:15] Prefetcher: Something we call loops. Basically, it's sometimes animations. 'cause, surprisingly while I like flash games they weren't my main collection. What spoke to me more were loops. Basically someone would take a song, find a gif they liked, and they would just pair it together. Something like YTMND did.

At least from loops I found some of my favorite musical artists, some of my favorite songs, a lot of interesting series, be it anime or TV or whatever. And you basically saw people make stuff about their favorite series and they would just share it online. I would go over those.

For example, a good website as an example is z0r.de, which is surprisingly still active and updated to this day. And you would see people making loops about members of that community or whatever they like. And you would for example see like 10 posts about the same thing.

So you would know someone decided to make 10 loops and just upload them at once. And yeah, to me, loops basically were like, I mean, they weren't always the highest quality or the most unique thing, but you would see someone liked something enough that they decided to make something about it.

And I always found that really cool. I would late at night just browse for loops and I'm like, oh, oh, this series, I remember it. I liked it (laughs)!

But of course flash games as well. I mean, I used to play a lot when I was younger, but specifically loops, even animations and especially like when someone took like their time to animate something like really in depth.

My favorite example is, the music video to a song by the band Juicy Panic called Otari. Someone liked that song enough that they made an entire flash animated music video, which was basically vectorized art of various series like Azumanga Daioh or Neon Genesis Evangelion as well, and other things. And it was so cool, at least to me, like a lot of these loops just basically have an intense, like immense feeling on me (laughs). I just really liked collecting them.

[00:14:38] Jeremy: And in that last example, it sounded more like it was a complete music video, not just a brief loop?

[00:14:45] Prefetcher: No, it was like a five minute long music video that someone else made.

[00:14:48] Jeremy: Five. Oh my gosh.

[00:14:49] Prefetcher: Yeah. You would really see people's creativity shine through on just making those weird things that not a lot of people have seen, but you look at it and it's like, wow.

It's different than YouTube (Sharable single file, vectorized)

[00:15:01] Jeremy: It's interesting because you can technically do and see a lot of these things on, say, YouTube today, but I think it does feel a little different for some reason.

[00:15:16] Prefetcher: It really is. Of course I'm not denying on YouTube you see a lot of creative things and whatever. But first and foremost, the fact that Flash is scalable. You don't lose the quality. So be able to open, I don't know, any of the IOSYS flash music videos for like their Touhou songs and the thing would just scale and you would see like in 4K and it's like, wow.

And yeah, the fact that on YouTube you have like a central place where you just like put something and it just stays there. Of course not counting reuploads, but with Flash you just had like this one animation file that you would just be able to share everywhere and I don't know, like the aspect of sharing, just like having those massive collections, you would see this flash right here on this website and on that website and also on this website.

And also seeing people's personal collections of flash videos and jrandomly online and you would also see this file and this file that you haven't seen it -- it really gives it, it's like explorative to me and that's what I like. You put in the effort to like go over all those websites and you just like find new and new cool stuff.

[00:16:32] Jeremy: Yeah, that's a good point too that I hadn't thought about. You can open these files and you have basically the primitives of how it was made and since, like you said, it's vector based, there's no, oh, can you please upload it in 1080 p or 4K? You can make it as big as you want.

[00:16:53] Prefetcher: Yeah.

Web design differences, pixel art, non-responsive

[00:16:55] Jeremy: I think web design as well it was very distinct. Maybe because the tools just weren't there, so a lot of people were building things more from scratch rather than pulling a template or using a framework. A lot of people were just making the design theirs I think rather than putting words on a page and filling into some template.

[00:17:21] Prefetcher: Honestly, you raise a good point here that I did not think much about. 'cause like nowadays we have all of this tooling to make web design easier and you have design languages and whatnot. And you see people make really, in my opinion, still pretty websites, very usable websites on top of that.

But all of them have like the same vibes to them. All of them have like a unified design language and all of them look very similar. And you kind of lose that creativity that some people had. Of course, you still find pretty websites that were made from scratch. But you don't really get the same vibes that you did get like back then.

Like my favorite, for example, trend that used to be back on like the old internet is pixel art in web design. For example, Kaliber10000, or going off the top of my head, you had the Eboy or all the sites and then Poland, for example, ... (polish website) those websites use minimal graphics, like pixel graphics and everything to build really interesting looking websites.

They had their own very massive charm to them that, I don't know, I don't see a lot in more modern internet. And it's also because back then you were limited by screen size, so you didn't have to worry about someone being on a Mac with high DPI or on a 32x9 monitor like I am right now.

And just having to scale it up. So you would see people go more for images, like UI elements, images instead of just like building everything from scratch and CSS and whatnot. So, yeah, internet design had to accommodate the change. So we couldn't stay how it was forever 'cause technology changed. Design language has changed, but to me it's really lost its charm. Every single website was different, specific, the web design had like this weird form, at least on websites where it was like. I like to call it futuristic minimalism. They looked very modern and also very minimal and sort of dated.

And I dunno, I just really like it. I absolutely recommend checking, on the web design museum fantastic website. I love them and the pixel art in web design sub page. Like those websites to me they just look fantastic.

[00:19:52] Jeremy: Yeah, and that's a good point you brought up about the screen sizes where now you have to make sure your website looks good on a phone, on a tablet, on any number of monitor sizes. Back then in the late 90s, early 2000s, I think most people were looking at these websites on their 4x3 small CRT monitors.

[00:20:20] Prefetcher: My favorite this website is best viewed with an 800 by 600 monitor. It's like ... what?

[00:20:28] Jeremy: Exactly. Even if you open your personal site now the design is very reminiscent of those times and it looks really cool but at the same time on a lot of monitors it's a small box in the middle of the monitor, so it's like --

[00:20:49] Prefetcher: I saw that issue, 'cause I was making it on a 1080p monitor and now I have a 32x9 monitor and it does not scale. I've been working on reworking that website, but, also on the topic of my website, I, I wanna shout out a website from the 2000s that still exists today.

'cause, my website was really inspired by a website called Assembler. And Assembler, from what I could gather, was like a net art or like internet design collective. And the website still works to this day. You still had like, all of their projects, including the website that my website was based off of.

[00:21:28] Jeremy: Yeah, I mean there, there definitely was an aesthetic to that time. And it's probably, like you said, it's probably people seeing someone else's site in this case, what, what did you call it? Assem? Assembler?

[00:21:42] Prefetcher: Assembler.

[00:21:42] Jeremy: Yeah. You see someone else's website and then maybe you try to copy some of the design language or you look at the HTML and the CSS and I mean, really at the time, these websites weren't being made with a ton of JavaScript. There weren't the minifiers, so you really could view source and just pull whatever you wanted from there.

[00:22:06] Prefetcher: We also had those design studios, design agencies, notably 2advanced which check in now, their website still works, and their website is still in the same aesthetic as it was those 20 so years ago just dictating this futuristic design style that people really like.

'cause a lot of people nowadays also really like this old futurism minimalism for example a lot of people still love the Wipeout 3 aesthetic that was designed by one of my favorite studios overall the designers republic. And yeah, it's just hard for me to explain, but it feels so soulful in a way.

[00:22:53] Jeremy: I think there are some trade offs. There's what we were talking about earlier with the flexibility of screen size. But there used to be with a lot of websites that used Flash, there used to be these very elaborate intros where the site is loading and there's these really neat animations.

But at the same time, it's sort of like, well, to actually get to the content, it's a bit much, but, everything is a trade off.

[00:23:25] Prefetcher: People had flash at their disposal and they just wanted to make, I have the tooling, I'm going to use all of the tooling and all of it.

[00:23:33] Jeremy: Yeah. Yeah. but yeah, I definitely get what you're saying where when I went to make my own website I made it very utilitarian and in some ways boring, right? I think we do kind of miss some of what we used to have.

[00:23:54] Prefetcher: I mean, in my opinion, utilitarian websites are just as fine. Like in some cases you don't really need a lot of flashy things and a lot of very modern very CPU intensive or whatever animations. Sometimes it is better to go on a website and just like, see, oh, there's the play button and that's it.

[00:24:17] Jeremy: Yeah. Well definitely the animations and the intro and all that stuff. I guess more in terms of the aesthetics or the designs. It's tricky because there's definitely people making very cool things now things that weren't even possible back then.

But it does feel like maybe the default is I'll pick this existing style sheet or this existing framework and just go with that.

[00:24:47] Prefetcher: A lot of modern websites just go for similar aesthetics, similar designs, which they aren't bad, but they are also very just bland. They, they are futuristic, they are very well designed. But when you see the same website. The same -- five websites have the same feel.

And this is especially, at least in my opinion, visible with websites built on top of NextJS or other frameworks. And it just feels corporate kind of dead. Like someone just makes a website that they want to sell something to you and not for fun.

[00:25:26] Jeremy: With landing pages especially it's like, wow, this looks the same as every other site, but I guess it must work.

[00:25:38] Prefetcher: It works. And it really cuts down on development time. You don't need to think much about it. You just already have a lot of well-established design rules that you just follow and you get a cohesive and responsive design system.

Designing the PinkSea look and feel

[00:25:56] Jeremy: Let's talk about that in connection with PinkSea. What was your thinking when you designed how PinkSea would look and feel?

[00:26:06] Prefetcher: Honestly, at first I have to admit I looked at other websites. I looked at Bluesky first and foremost. I looked at, front page. I looked at Smoke Signal, and I thought that I might also build something that's modern and sleek and I sketched it out in an application and I showed it to some friends.

One of them suggested I go for more like a 2000 aesthetic. I'm like, yeah, okay. I like that. As the website was built, I just saw more and more of how much I feel this could sit with others. Especially with the fact that it's an oekaki page an oekaki BBS and as you scroll through oekaki has a very distinct style to it.

And as you scroll and you see all of those, pixel shaded, all those dithered images, non anti-aliased pens and whatnot. It feels really really cohesive somehow with the design aesthetic. But of course, PinkSea in itself is a modern website. Like if you were to go to my PinkSea repository. It's a modern website built up on top of Vue3, which talks via like XRPC API calls in real time and it's a single page app and whatever.

That's kind of the thing I merged the modern way of making sites with a very oldish design language. And I feel, in my opinion, it somehow just really works. And especially it sets PinkSea apart from the other websites. It gives it that really weird aesthetic.

You would go on it and you would not be like, oh, this is a modern site that connects with a modern protocol on top of a big decentralized network. This is just someone's weird BBS stuck in the 2000s that they forgot to shut down. (laughs)

[00:28:00] Jeremy: Yeah. And I think that's a good reminder too, that when people are intentional about design, the tools we have now are so much better than what we used to have. There's nothing stopping us from making websites that when people go to them they really feel like something's different.

I know I did not just land on Instagram.

[00:28:27] Prefetcher: Yeah. And making PinkSea taught me that it's really easy to fall into that full string of thought that every site has to look modern. Because I was like, oh yeah, this is a modern protocol, a modern everything, and it has to look the part.

It has to look interesting to people and everything. And after talking with a bunch of friends and other people and just going, huh, that's maybe like the 2000s isn't as bad as I thought. And yeah, the website especially it's design people seem to just really like it.

Me too. I, I just absolutely love how PinkSea turned out it is really a reminder that you don't need modernness in web design always. And people really appreciate quirky looking pages, so to say, quirky like interesting.

[00:29:23] Jeremy: I interviewed the, the creator of Neocities which is like kind of a modern version of GeoCities and yeah, that's really what one of the aspects that I think makes things so interesting to people from that era is, is that it really felt like you're creating your own thing, and not just everything looks the same. The term I think he used is homesteading. You're taking care of your place and it can match your sensibilities, your style, your likes, rather than having to, like you said, try to force everything to be this, this sort of base modern, look.

The old spirit of the internet is coming back

[00:30:08] Prefetcher: I mean Neocities and by extension also Nekoweb are websites that I often when I don't have much to do -- I like just going through them because you see a bunch of people just make their own places. And you see that even in 2025 when we have those big social media sites.

You have platforms where you can get a ton of followers. You can get a ton of attention and everything. People to some extent still want that aspect of self-expression. They want to be able to make something that's uniquely theirs and you see people just make just really amazing websites build insane things on those old Geocities-like platforms using nothing but a code editor.

You see them basically just wanting thing to express, oh, that's mine and no one else has it. So to say that's why. Yeah. I feel like to some extent the old school train of thought when it comes to the internet is slowly coming back. Especially with the advent of protocols like ATProto. And you'll experience more websites that just allow people to make their own homes on the internet.

Cause in my opinion, one of the biggest problems is that people do not really want to register on a lot of platforms. 'cause you already have this place where you get all of your followers, you have all of your connections, and then you want to move and then you'll lose all of your connections and everything.

But with something like ATProto, you can use the social graph of, for example, Bluesky. I want to add followers on PinkSea. So for example, you have an artist that has like 30,000 followers for example, I can just click import my following from Bluesky. And just like that they would already get all of the artists that they follow on Bluesky already added as followers on PinkSea.

And for example, someone else joins and they followed that big artist and they instantly followed them on PinkSea as well. I think that we are slowly coming back to the advent of people owning their place online.

PinkSea and ATProto (PDS)

[00:32:24] Jeremy: Yeah. So let's talk a little bit more about how PinkSea fits into ATProto. For people who aren't super familiar with ATProto, maybe you could talk about how it's split up. You've got the PDS, the relays, the AppView. What are those and how do those fit into what PinkSea is?

[00:32:48] Prefetcher: My favorite analogy, ATProto is a massive network, and at least me, when I saw the initial graph I was just very confused. I absolutely did not know what I'm looking at. But let's start with the base building block, something that ATProto wouldn't exist with.

And it's the PDS. Think of the PDS as like a filing cabinet. You have a bunch of folders in which you have files, so to say. So you have a filing cabinet with your ID, this is the DID part that sometimes shows up and scares people. It's what we call a decentralized identifier. Basically that identifier is not really tied to the PDS, it just exists somewhere. And the end goal is that every user controls their DID.

So for example, if your PDS shuts down, you can always move to somewhere else. Still keep like, for example, that you are prefetcher.miku.place. But in that filing cabinet the PDS going back to it you have your own little zone, your own cabinets, and that has your identifier, it's uniquely yours.

Every single application on the AT protocol creates data. They create data and they store the data in a structured format called a record. A record is basically just a bunch of data that explains what that thing is, be it a like, a post on Bluesky an oekaki on PinkSea and an upvote on front page, or even a pixel on place.blue.

And all of those records are organized into folders in your cabinet. And that folder is named with something we call a collection id. So for example, a like is, if I remember correctly, it's app.bsky.feed.like, so you see that it belongs to Bluesky. The app.bsky part. it's a feed thing, and the same way, PinkSea, for example, the oekaki and PinkSea uses com.shinolabs.pinksea.oekaki with com.shinolabs being the the collective that I use as a, pen name, so to say.

PinkSea being, well, PinkSea and oekaki just being the name. It's an oekaki. If you want to see that there are a lot of tools, for example, PDSls or atp.tools or ATProto browser, if you had to go into one of those and you would type in for example, prefetcher.miku.place, you would see all of your records, the things that, you've created on the AT protocol network.

Relay

[00:35:19] Prefetcher: So you have a PDS, you have your data, but for example, imagine you have a PDS that you made yourself, you hosted yourself. How will, for example, Bluesky know that you exist? 'cause it won't, it's just a server in the middle of nowhere.

That's where we have a relay. A relay is an application that listens to every single server. So every time you create something or you delete something, or for example, you edit a post, you delete an oekaki. You create a new, like -- Your PDS, your filing cabinet generates a record of that. It generates an event, something we call a commit. So, anytime you do something, your PDS goes, Hey, I did that thing. And relays function as big servers that a PDS can connect to. And it's a massive shout box. The PDS goes, Hey, I made this. Then the relay aggregates all of those PDSs into one and creates a massive stream of every single event that's going on the network at once. That's also where the name firehose comes from. 'cause the, the end result, the stream is like a firehose. It just shoots a lot of data directly at anyone who can connect to it. And the thing that makes AT Protocol open and able to be built on is that anyone can just go, I want to connect to jetstream1.west.bluesky.network.

They just make a connection to it and boom they just get everything that's happening. You can, for example, see that via firesky.tv. If you go to it, you would open it in your browser. Every single Bluesky post being made in real time right directly in your computer. So you have the PDSs that store data, you have the relay that aggregates every, like, builds a stream of every single event on the network.

AppViews

[00:37:26] Prefetcher: You just get records. You can't interact with it. You can see that someone made a new record with that name, but to a human, you won't really understand what a cid is or what property something else is. That's why you have what we call AppViews. An AppView, or in full an application view is an application that runs on the AT protocol network. It connects to the relay and it transforms the network into a state that it can be used by people.

That's why it's called an application view. 'cause it's a, a specialized view into the whole network. So, for example, PinkSea connects, and then it goes, hey, I want to listen on every single thing that's happening to com.shinolabs.pinksea.oekaki, and it sees all of those, new records coming in and PinkSea understands, oh, I can turn it into this, and then I can take this thing, store it in the database, and then someone can connect with a PinkSea front end.

And then it can like, transform those things, those records into something that the front end understands. And then the front end can just display, for example, the timeline, the same way Bluesky, for example -- Bluesky gets every single event, every single new file, new record coming in from the network. And it goes. Okay, so this will translate into one more like on this post. And this post is a reply to that post. So I should chain it together. Oh. And this is a new feed, so I should probably display it to the user if they ask for feeds. And it basically just gets a lot of those disjoint records and it makes sense of them all.

The end user has a different API to the Bluesky AppView. And then they can get a more specialized view into Bluesky.

PinkSea does not store the original images, the PDS does

[00:39:26] Jeremy: And so in that example, the PDSs, they can be hosted by Bluesky the company, or they could be hosted by any person. And so PinkSea itself, when somebody posts a new oekaki, a new image, they're actually telling PinkSea to go create the image in the user's PDS, right? PinkSea is itself not the the source of truth I guess you could say.

[00:40:00] Prefetcher: PinkSea in itself. I don't remember which Bluesky team member said it, but I like the analogy that AppViews are like Google. So in Google, when you search something, Google doesn't have those websites. Google just knows that this thing is on that website. In the same vein, PinkSea, when you create a new oekaki, you tell PinkSea, Hey, go to my PDS and create that record for me. And then the person owns the PDS. So for example, let's say that in a year, of course I won't do it, but hypothetically here, I just go rogue and I shut down PinkSea, I delete the database. You still own the things. So for example, if someone else would clone the PinkSea repository and go here, there's PinkSea 2.

They can still use all of those images that were already on the network. So, AppViews in a way basically just work as a search engine for the network. PinkSea doesn't store anything. PinkSea just indexes that a user made a thing on that server. And here I can show you how to get to it somehow.

Those images aren't stored by PinkSea, but instead, I know that the image itself is stored, for example, on pds.example.com, and of course to reduce the load, we have a proxy. PinkSea asks the proxy to go to pds.example.com and fetch the image, and then it just returns it to the user.

[00:41:37] Jeremy: And so what it sounds like then is if someone were to create oekaki on their own PDS completely independently of Pink Sea the fact that they had created that image would be sent to one of the relays, and then PinkSea would receive an event that says oh, this person created a new image then at that point your index could see, oh, somebody created a new image and they didn't even have to go through the PinkSea website or call the PinkSea APIs. Is that right?

Sharing PDS records with other applications

[00:42:14] Prefetcher: Yep. That is exactly right. For example, someone could now go, Hey, I'm making my own PinkSea-like application. And then they would go, I want to be compatible with PinkSea. So I'm using the same record. Or what we call a lexicon, basically describe how records look like. I forgot to mention that, but every single record has an attached lexicon.

And lexicons serve as a blueprint. So a lexicon specifies, oh, this has an image, this has a for example, the tags attached to it, a description of the image. Validate that the record is correct, that you don't get someone just making up random stuff.

But yeah, someone could just go, Hey, I'm making another website. Let's call it GreenForest for example. And GreenForest is also an oekaki website, but it uses, for example, chickenpaint instead of tegaki but I want to be able to interoperate with PinkSea. so I'm also gonna use com.shinolabs.pinksea.oekaki the collection, the same record, the same lexicon.

And for example, they have their own servers and the servers just create regular oekaki records. So for example, GreenForest gets a new user, they log in, create, draw their beautiful image, and then they click upload it.

So GreenForest goes to that person's PDS and tells the PDS, Hey, I want to make a new. com.shinolabs.pinksea.oekaki record. The PDS goes okay, I've done it for you. Let me just inform the relay that I did so, relay gets the notification that someone made that new PinkSea oekaki record. And so the main PinkSea instance, pinksea.art, which is listening in on the relay, gets a notification from the relay going, Hey, there is this new oekaki record. And PinkSea goes, sure, I'll index it. And so PinkSea just gets that GreenForest image directly in itself. And in the same vein, someone at PinkSea could draw something in tegaki -- their own beautiful character. And the same thing would happen with GreenForest. GreenForest would get that PinkSea image, that PinkSea record, and index it locally.

So the two platforms, despite being completely different, doing completely different things, they would still be able to share images with each other.

Bluesky PDS stores other AppView's data but they could stop at anytime

[00:44:38] Jeremy: And these images, since they're stored in the PDS, what that would mean is that anybody building an application on ATProto, they can basically use Bluesky's PDS or the user's PDS as their storage. They could put any number of images in there and they could get into gigabytes of images.

And that's the responsibility of the PDS and not yourself to keep track of.

[00:45:12] Prefetcher: Yes, that can be the case. Of course, there is a hard limit on how big a single upload can be, which is, if I remember correctly, I don't wanna lie, I think it's 50 megabytes, I don't recall there being a hard cap on how big a single repository can be. I know of some people whose repositories are in the single gigabyte digits but this kind of is a thing scares app developers.

'cause you never know when Bluesky the company -- 'cause most people registering, are registering on Bluesky. We don't really know whether Bluesky, the company will want to keep it for free. Forever allow us to do something like that. You already have projects like, for example, ATFile, which just allow you to upload any arbitrary data just to store it, on their servers and they are paying for you.

So we'll never know whether Bluesky will decide, okay, our services are only for Bluesky if you want to use PinkSea you have to deal with it. Or whether they go, okay, if you want to use alternative AppViews you have to pay us in order to host them. So, that also leads me to the fact that decentralization is an important part of AT protocol as Bluesky themselves say that they are a potential adversary.

You cannot trust them in the long term. Right now they are benign right now, they're very nice, but, we never know how Bluesky will end up in a year or two. So if you want to be in the full control of your data, you need to sadly host it by yourself. And it's honestly really easy in order to do so.

There is a ton of really useful online content blogs and whatever. I think I've set up my PDS in 10 minutes on a break between classes and university. But to a person that's non-technical that doesn't know much I'd say around an hour to two hours

The liability and potential abuse from running a PDS

[00:47:14] Jeremy: Yeah, I think the scary thing for a lot of people is technical or not, is even if it's easy to set up, you gotta make sure it keeps running. You gotta have backups. And so it could be a lot.

[00:47:30] Prefetcher: Yeah.

This is to be expected by the fact that you're in control of your data. Keeping it secure the same way, for your personal photos or your documents, for example, your master's diploma or whatever. And it's on you to keep your Bluesky interaction secure. On one hand, it's easier to get someone to do it, and I expect in the future we'll get people that are hosting public PDSes I sometimes thought of doing that for PinkSea, just like allowing people to register by PinkSea.

But, doing so as a person, you also have to be constantly on call for abuse. So if someone decides to register via PinkSea and do some illicit activities, you are solely responsible for it.

PDS and AppView moderation liability

[00:48:17] Jeremy: So if they were to upload content that's illegal, for example, it's hosted on your servers so then it's your problem.

[00:48:27] Prefetcher: Yeah, it is my problem.

[00:48:29] Jeremy: At least the way that it works now, the majority of the people, their PDS is gonna be hosted by Bluesky. So if they upload content that's breaks the law, then that's the Bluesky company's problem at least currently.

[00:48:44] Prefetcher: Yeah. That is something that Bluesky has to deal with. But I do believe that in the future we are going to have, more like independent entities just building infrastructure for ATProto, not even the relay it's just like PDSs for people to be able to join the atmosphere, but not directly via Bluesky.

[00:49:06] Jeremy: I'm kind of curious also with the current PDSs, if it's hosted by Bluesky, are they, are they moderating what people upload to their PDSs?

[00:49:16] Prefetcher: Good question. Honestly, I don't think they're moderating everything 'cause, it's infeasible for them to, for example, other than moderate Bluesky to also moderate PinkSea and moderate front page and whatnot. So it's the obvious responsibility to moderate itself and to report abuse. I'd say that if someone started uploading illicit material, I do not think, and this is not legal advice, I do not think that they would catch on until some point let's say.

[00:49:52] Jeremy: I mean, from what you were describing too, it seems like the AppViews would also, have issues with this because if, let's say someone created a PinkSea record in their PDS directly and the image they put in was not an oekaki image, it's instead something pretty illegal in the country that your AppView is hosted then, Wouldn't that go straight to the PinkSea users viewing the website?

[00:50:20] Prefetcher: Yes, sadly, this is something that you have to sign up as you're making an AppView and especially one with images. Sooner or later you are going to get material that you have to moderate and it's entirely on you. That's why, you have to think of moderation while you're working on an AppView.

Bluesky has an insanely complicated, at least in my opinion, moderation system, which is composable and everything, which I like. But for smaller AppViews, I think it's too much to build the same level of tooling. So you have to rely more on manual work.

Thankfully so far the user base on PinkSea has been nothing but stellar. I didn't have to deal with any law breaking stuff, but I am absolutely ready for one day where I'll have to sadly make some drastic moderation issues.

[00:51:18] Jeremy: Yeah. I think to me that's the most terrifying thing about making any application that's open to user content.

[00:51:29] Prefetcher: I get it, sadly. I'm no stranger to having issues with people, abusing my websites. Because since 2016, my, first major project was a text board based off of, a text board in a video game called DANGER/U/. It was semi-popular, during the biggest spike in activity in like 2017 and 2016, it had in the tens of thousands of monthly visitors.

And sadly, yeah, even though it was only text, I've had to deal with a lot of annoying issues. So to say the worst I think was I remember waking up and people are telling me that DANGER/U/ is down. So I log in the activity logs and someone hit me with two terabytes of traffic in a day.

There was a really dedicated person that just hated my website and just either spam me with posts or just with traffic. So, yeah, sadly I have experience with that. I know what to expect that's something that you sadly have to sign up for making a website that allows user content.

Pinksea is a single server

[00:52:42] Jeremy: To my understanding so far, PinkSea is just a single server. Is that right?

[00:52:47] Prefetcher: It is a single server. Yeah.

[00:52:48] Jeremy: That's kind of interesting in that, I think a lot of people when they make a project, they worry about scaling and things like that. But, was it a case where you just had a existing VPS and you're like, well hopefully this is, this is good enough?

[00:53:03] Prefetcher: I actually ordered a new one even though it's not really powerful, but my train of thought was that I didn't expect it to blow up. I didn't expect it to require more than a single VPS with 8 gigabytes of RAM and whatnot. And so far it's handling it pretty well. I do not expect ever to reach the amounts of traffic that Bluesky does, so I do not really have to worry about insane scalability and whatever.

But yeah. I thought of it always as a toy project until the day I released it and realized that it's a bit more than a toy project at this point. To this day, I just kind of think that that website even if it were popular, I would never expect it to have -- And in the best, most amazing case scenario, like a hundred posts a day. I do not have to deal with the amount of traffic that Bluesky does. So one VPS it is.

[00:53:59] Jeremy: Yeah, that makes a lot of sense. I mean the application is also mostly reads, right? Most people are coming to see the posts and like you said, you get a few submissions a day, but all the read stuff can probably be cached.

Harbor image proxy

[00:54:15] Prefetcher: Yeah. The heaviest, thing that PinkSea requires is the image proxy harbor, and that's something that right now only runs on that server. It's in Luxembourg. I think that's where my coprovider hosts it but yeah, that gets the most reads.

'cause in most cases, PinkSea, all it does, all you get is reads from a database, which is just, it's a solved problem. It's really lightweight. But with something like image proxying, you have this whole new problem. 'cause it's a lot of data, and you somehow have to send it -- it's enough for me to just host it locally on that PinkSea server and just direct people to it. But sooner or later, I can always just put it behind something like Bunny CDN or whatnot to have it be worldwide.

[00:55:09] Jeremy: So Harbor is something I think you added recently. How did the images work before and what is Harbor doing in its place?

[00:55:18] Prefetcher: Before I did what a lot of us currently do and I just freeload atop of Bluesky CDN 'cause Bluesky CDN is just open so far. But it's something that personally irked me. 'cause, I want PinkSea to be completely independent of Bluesky Corporation. I, I wanted to persevere even if Bluesky just decides to randomly, for example, close, the CDN to others or the relay to others or the PLC directory in the worst case scenario.

So I wanted to make my own CDN more like proxy. You can't really call it a CDN because it's not worldwide. It's just a single server but let's just say image proxy. So Harbor whenever a person goes to PinkSea, they start loading in all of the images and every single image instead of going to, for example, the PDS or to cdn.bluesky.app. They go to harbor.pinksea.art, you get attached the identifier of the user and what we call a content identifier.

Every single, thing uploaded to a PDS has an attached content identifier, which identifies it in a secure way so to say. So Harbor does in reality a really simple set of things. First and foremost, if the user has not seen it, like, not loaded it before first Harbor asks the local cache, do I have this file? If they do, if Harbor does, it just sends the file and it tells the browser, Hey, by the way, please don't ask me about this file for the next day. And in most cases, after one refresh, the user, all of the images load instantly because the web browser just goes, of those files were already sent. And Harbor asked me not to like, ask it more about the same file.

So in the case of the image isn't in harbor's local cache, Harbor, first does a lot of those steps to resolve, the users identifier through their PDS, basically resolving that identifier, the DID to a DID document, which is a document basically explaining how that user, what is their, alias, what is their handle and where can we find them, which PDS.

So we find the PDS and we then ask the PDS, Hey, send us this file for this user. The PDS sends it or doesn't, in which case we just throw an error and, Harbor just saves it locally and it sends it to the client.

It basically just that. But to my knowledge, it's the first non Bluesky image proxy that's deployed for any AppView. Which also caught the attention of Brian Newbold one of the Bluesky employees and made me really happy.

DID PLC Lookup

[00:58:14] Jeremy: The lookup when you have the user's, DID and you wanna find out where their PDS is that's talking to something called, I think it's the PLC directory?

[00:58:25] Prefetcher: Actually there are two different ways.

First is PLC directory, PLC originally standed for a placeholder, and then Bluesky realized that it's not a placeholder anymore, and they stealthily changed it to public ledger of credentials. So we have PLC and we have web, the most common version is PLC.

The document, the DID document is stored on Bluesky controlled servers under the moniker of PLC directory. They expose a web API that basically just allows you to say, Hey, give me the document for did:plc, whatever. And, the directory goes, have it. And this is the less decentralized version.

You can host your own PLC directory and you can basically ask (their) PLC directory to just send you every single document and just you can have your local copy, which some people already do, you kind of sacrifice the fact that you are not in control of the document.

It's still on a centralized server, even if you control the keys. 'cause every single DID document also has a key. And that key is used to sign changes to the document. So technically, if you define your own set of keys, you can prevent anyone else from modifying your document, even Bluesky.

'cause every single document is verifiable back and forth. You can see the previous document and its key is used to sign the next document and the chain of trust is visible and no one can just make random changes to your identity, but yeah, it's still on Bluesky to control service and it's a point of contention.

Bluesky eventually wants to move it to a nonprofit standards organization, but we have yet to see anything come out of it, sadly.

DID WEB lookup

The next method is web. And web instead of -- 'cause in did:plc, you have did:plc, and a random string of characters.

[01:00:30] Prefetcher: Web relies on domains. So for example, the domain would already like be the sole authority of where the file is.

So for example, if I had did:web:example.com, I would parse the DID and I would see it's hosted at example.com. So I go to example.com, I go to /.wellknown/did.json which is the well-known location for the file. And I would have the same DID document as I would have if I used, for example, a PLC DID resolved via the PLC directory. the web method, you are in control of the document entirely. It's on your server under your domain. While it's the more decentralized version, it's just kind of hard for non-technical people to make them. 'cause it relies on a bunch of things.

And also the problem is that if you lose your domain, you also lose your identity.

[01:01:23] Jeremy: Yeah. So unlike the PLC where it's not really tied to a specific domain, you can change domains. With the web way, you have to always keep the same domain 'cause it's a part of the DID and yeah, like you said, you can't let your renewal lapse or your credit card not work. 'cause then you just lose everything.

[01:01:49] Prefetcher: Yeah. You would still be able to change handles, but you would be tied for that domain to forever send your DID otherwise you would just lose it forever.

[01:01:57] Jeremy: Yeah, I had mostly only seen the PLC and I wasn't too familiar with the web, form of identification, but yeah that makes sense.

[01:02:06] Prefetcher: I think the web if I remember correctly, there is slightly over 300 accounts total on the entire network that use it. Mary who is a person on Bluesky that does a lot of like, ATProto related things, has a GitHub repository that basically gives insight into the network.

And on her GitHub repository, you can find the list of every single custom PDS and also how many DID webs there are in existence. And I think it was slightly over 300.

[01:02:38] Jeremy: So are you on that list?

[01:02:40] Prefetcher: My PDS Yeah. If you were to scroll down. I don't use a web DID 'cause I registered my account before when I was brand new to ATProto, so I didn't know anything. But if you had to scroll down, you would see pds.ata.moe, which is my custom PDS just running.

[01:02:55] Jeremy: Cool.

[01:02:57] Prefetcher: Yeah.

Harbor image proxy can cache any image blob

[01:02:58] Jeremy: So something I noticed about harbor, you take the, I believe you take the DID and then you take the CID, the content identifier. I noticed if you take any of those pairs from the ATProto network, like I go find a image somebody posted on Bluesky, I pass that post DID and CID for the image into harbor.

Harbor downloads it and caches it. So it's like, does that mean anybody could technically use you as a ATProto CDN?

[01:03:38] Prefetcher: Yes, the same way anyone could use like the Bluesky CDN to for example, run PinkSea like I did. cause I do not know if there is a good way to check if a CID of an image or a blob basically. 'cause files on ATProto are called blobs. I do not think there is a nice way to check if that blob is directly tied to a specific record. But that also allows you to make cool, interesting things.

Crossposting to Bluesky talks directly to the PDS

[01:04:06] Prefetcher: 'cause for example, PinkSea has that, cross post to Bluesky thing. So when you create an image, You already have an option to cross post it to Bluesky, which a lot of people liked.

And it was a suggestion from one of the early users of PinkSea. And the way it works is that when we create a PinkSea record, we upload that image, right? And then PinkSea goes, okay, I'm gonna use that same image, the same content identifier, and just create a Bluesky post. So Bluesky and PinkSea all share the same image. I don't upload it twice, I just upload it once. use it in PinkSea and I also use it in Bluesky. And the same way Bluesky its CDN, can just fetch the image. I can also fetch the image from mine, 'cause blobs aren't tied to specific records.

They just exist outside of that realm. And you could just query anything. Not even images. You could probably query a video or even a text file.

[01:05:04] Jeremy: So when you cross post to Bluesky, you're creating a record directly in the person's PDS, not going through bluesky's API.

[01:05:14] Prefetcher: No, I sidestep Bluesky's API completely. And, I basically directly talk to the PDS at all times. I just tell them, Hey, please, for me, create a app.bsky.feed.post record. And you have the image, the text, which also required me to manually parse text into rich text.

'cause like, Bluesky doesn't automatically detect for example, links or tags And you basically get -- like PinkSea creates a record directly with the link to the image. And all of those tags, like the PinkSea tag and whatever, And I completely sidestep. Bluesky's API. If Bluesky, the AppView would cease to exist, PinkSea would still happily create Bluesky crossposts for you.

Bluesky CDN compression

[01:07:37] Jeremy: And I think, this might have been a post from you. I think I saw somebody saying that when you view an image from the CDN that the Bluesky CDN specifically, there's some kind of compression going on that that messes with certain types of art.

[01:07:55] Prefetcher: It's especially noticeable artists are complaining about it all the time, left and right. Bluesky is very happy with jpeg compression, by default, their CDN, -- like to every single image it applies a really not good amount of jpeg compression which is especially not small.

If you compare an image that's uploaded via PinkSea, view an image on PinkSea, and view the same image, which is, it's the same content id. It's the same blob. And you view it on Bluesky, it loses so much fidelity, it loses so much of that aliasing on the pen.

You just see everything become really blurry. And on top of that, when you upload an image via Bluesky itself, if I remember correctly, I don't wanna lie here, but they also downscale the image to 1024 pixels by default.

So every single image, not only big ones, and artists usually work with really big canvases, they get, downscaled and also additionally they get jpegified.

So for example, PinkSea directly uploads PNG files to the PDS. And for example, Harbor gives back the original file. It does no transformations on it, but Bluesky transforms all of them into JPEG compressed images and for photos, it's fine sometimes. 'cause I've also seen people just compare directly, downloaded images of the PDS versus images viewed on Bluesky.

But for art it's especially noticible. And people really (do) not like that.

[01:09:31] Jeremy: Yeah, that's kind of odd. 'cause if, if I understand correctly, then if you post directly to your PDS and Bluesky pulls it in you'll avoid that, that 1024 resizing. So your images will be higher quality?

[01:09:47] Prefetcher: I actually do not know. That's an interesting question.

Cause I know that the maybe their CDN also does that 'cause that's what I've heard from others, that on upload the image gets processed and squashed down. So I don't know if doing it via an alternative AppView would change it or would Bluesky just directly reject this post? Because for example, PinkSea, I have built-in which I think I might change in the future -- PinkSea will reject your post if it's bigger than 800x800. 'cause then it'll notice that something is off. This could not have been made with PinkSea.

[01:10:26] Jeremy: Yeah, that's a good point I suppose we know at the very least, they have some third party and internal moderation tools that they feed the images through to, so they, they can do some automatic content tagging. But yeah, I, I don't know, like you said, whether, the resizing and all that stuff is at the CDN level

[01:10:50] Prefetcher: The jpegification is definitely at the CDN level. 'cause,

Bluesky is actually running an open source image proxy. It's called imgproxy.

Brian Newbold talked about it a bit on that harbor post. And, yeah, so a lot of the compression, the end user things are done via image proxy, but that, downscaling, I don't know, you'd have to ask someone who's a bit more intimate with Bluesky's internals.

[01:11:19] Jeremy: Cool. yeah, I think we've, we've covered a lot. Is there, is there anything else, you wanted to mention or thought we should have talked about?

[01:11:26] Prefetcher: Regarding PinkSea I think I've mentioned a ton both the behind the scenes things and, the user things, the design principles. What I'd want to absolutely say, and it will sound cheesy, and, is that I'm eternally grateful to anyone who's actually visited PinkSea. It's definitely grown outta all of my like dreams for the platform, to the point where I'm sitting here just talking about it. I definitely hope that the future will bring us more applications (in) ATProto. I definitely have ideas on how to expand PinkSea, a lot of ideas, a lot of things I want to do, and I'm also a very busy person, so I never get around them. But yeah, think that's it, at least regarding PinkSea.

[01:12:15] Jeremy: Cool. Well, if people want to check out PinkSea or see what you're up to, where can they find you?

[01:12:22] Prefetcher: So PinkSea is at pinksea.art. That's the website and Bluesky Handle is at pinksea.art and me, well, search prefetcher on Bluesky, you'll probably find me. My tag is at prefetcher.miku.place. all of my socials are probably there. I'm Prefetcher pretty much every single platform except for the platforms that already had someone called Prefetcher.

GitHub, github.com/purifetchi because Prefetcher was taken. And, yeah, hit me up. I'm always eager to talk. I don't bite.

[01:13:00] Jeremy: Very cool. Well, Kacper thanks. Thanks for taking the time. This was fun.

[01:13:04] Prefetcher: Thank you so much, Jeremy, for having me over. It was a pleasure.

Tom MacWright on Shutting down Placemark

Thu, 06 Feb 2025 06:38:00 -0000

Tom MacWright is a prolific contributor in the geospatial open source community. He made geojson.io, Mapbox Studio, and was the lead developer on the OpenStreetMap editor. He's currently on the team at Val Town.

In 2021 he bootstrapped a solo business and created the Placemark mapping application.

He acquired customers and found steady growth but after spending two years on the project he decided it was financially unsustainable. He open sourced the code and shut down the business.

In this interview Tom speaks candidly about why geospatial is difficult, chasing technical rabbit holes, the mental impact of bootstrapping, and his struggles to grow a customer base.

If you're interested in geospatial or the good and bad of running a solo business I think you'll enjoy this conversation with Tom.

Geospatial Companies mentioned

Transcript

You can help correct transcripts on GitHub.

[00:00:00] Introduction

Jeremy: Today I'm talking to Tom MacWright. He worked at Mapbox as a, a very early employee. He's had a lot of experience in the geospatial community, the open source community. One of his most recent projects was a mapping project called Placemark he started and ran on his own.

So I wanted to talk to Tom about his experience going solo and, eventually having to, shut that down. Tom, thanks for agreeing to chat today.

Tom: Yeah, thanks for having me.

[00:00:32] Tools and Open Source at Mapbox

Jeremy: So maybe to give everyone some context on, what your background was before you started Placemark. Um, let's talk a little bit about your experience at, at Mapbox. What did you work on there and, and what would you say are like the big things you learned from that experience?

Tom: Yeah, so if you include the time that I was at Development Seed, which essentially turned into Mapbox, I kind of signed the paper to get fired from Development Seed and hired at Mapbox within the same 20 seconds. Uh, I was there for eight and a half years. so it was a lifetime in tech years. and the company really evolved from, uh, working for Human Rights Watch and Amnesty International and the World Bank and doing these small, little like micro websites to the point at which I left it.

It had. Raised a lot of money, had a lot of employees. I think it was 350 or so when I left. and yeah, just expanded into a lot of different, uh, try trying to own more and more of the mapping stack. but yeah, I was kind of really focused on the creative and tooling side of it. that's kind of where I see a lot of the, the fun and programming is making these tools where, uh, they can give people the same kind of fun like interaction loop that programming has where you, you know, you do a little bit of math and you see the result and you're able to just play with, uh, what you're working on, letting people have that in other domains.

so it was really cool to figure out how to get A map design tool where somebody changes the background color and it just automatically changes that in your browser. and it covered like data editing. It covered, um, map styling and we did, uh, three different versions of that tool over the years.

and then Mapbox is also a company that was, it came from, kind of people who are working on the Howard Dean campaign. And so it was pretty ideological and part of the ideology was being pretty hardcore about open source. we hired a lot of people who were working on open source projects before and basically just paid them to work on the open source projects, uh, for their whole time there.

And during my time there, I just tried to make as much of my work, uh, open as possible, which was, you know, at the time it was, it was pretty great. I think in the long term it's been, o open source has changed a lot. but during the time that we were there, we both kind of, helped things like leaflet and mapnik and openstreetmap, uh, but also made like some larger contributions to the open source world.

yeah, that, that's kind of like the, the internal company facing side. And also like what I try to create as like a more of a, uh, enduring work. I think the open source stuff will hopefully have more of a, a long term, uh, benefit.

[00:03:40] How open source has changed (value capture by large companies)

Jeremy: When I was working on a project that needed offline maps, um, we couldn't use Google Maps or any of the, the other publicly available, cloud APIs. So yeah, we actually used a, a tool, called Tile Mill that I, I hadn't known that you'd worked on, but recently found out you did.

So that actually let us pull in OpenStreetMap data and then use this style, uh, language called carto to, to basically let us choose what the colors would be and how the different, uh, the roads and the buildings would look. What's kind of interesting to me is that it being open source really let us, um, build something we otherwise wouldn't have been able to do.

But like, at the same time, we also didn't pay Mapbox any money. (laughs) So I'm, I'm kind of curious, like, if it's changed, like what the thinking was in terms of, you know, we pay for people to build all these things. We make it open source. but then people may just not ever pay us, you know, for all these things we did.

Tom: Yeah. Yeah. I think that the main thing that's changed since the era of tilemill is, the dominance of cloud platforms. Like back then, I think, uh, Mapbox was still using, we were using like a little bit of AWS but people were still just on like VPSs and, uh, configuring things in cPanel and sometimes even running their own servers.

And the, the danger of people using the product for free was such a small thing for us. especially when tile Mill was also funded by the Knight Foundation, so, you know, that at least paid half of my salary for, or, well, sorry, probably, yeah, maybe half of my salary for the first year that I was there and half of three other people's salaries.

but that, yeah, so like when we built Tile Mill, a few companies have really like built on those same tools. Uh, there's a company called Carto coincidentally, they had the same name as Carto CSS, and they built on a lot of the same stack they built on mapnik. Um, and it was, was...

I mean, I'm not gonna say that it was all like, you know, sunshine and roses, but it was never a thing that we talked about in terms of like this being a brutal competition between us and these other startups.

Mapbox eventually closed source some stuff. they made it a source available license. and eventually Mapbox Studio was a closed source product. Um, and that was actually a decision that I advocated for. And that's mostly just because at one point, Esri, Microsoft, Amazon, all had whitelisted versions of Mapbox code, which, uh, hurts a little bit on a personal level and also makes it pretty hard to think about.

working almost like it. You don't want to go to your scrappy open source company and do unpaid labor for Amazon. Uh, you know, Bezos can afford to pay for the labor himself. that's just kind of my personal, uh, that I'm obviously, I haven't worked there in a long time, so I'm not speaking for the company, but that's kind of how it felt like.

and it yeah, kind of changed the arithmetic of open source in this way that. It made it less fun and, more risky, um, for people I think.

[00:07:11] Don't worry about the small free users

Jeremy: Yeah. So it sounds like the thinking was if someone on a small team or an individual, they took the open source software and they used it for their own projects, that was fine. Like you expected that and didn't worry about it. It's more that when these really large organizations like a, a Microsoft comes in and, just like you said, white labels the software, and doesn't really contribute significantly back.

That's, that's when it, the, the thinking sort of shifted.

Tom: Yeah, like a lot of the people who can't pay full price in USD to use your product are great users and they're doing cool stuff. Like when I was working on Placemark and when I was like selling. The theme for my blog, I would get emails from like some kid in India and it's like, you know, you're selling this for a hundred dollars, which is a ton of money.

And like, you know, why, why should I care? Why shouldn't I like, just send them the zip file for free? it's like nothing to me and a lot to them. and mapping tools are really, really expensive. So the fact that Mapbox was able to create a free alternative when, you know, ArcGIS was $500 a month sometimes, um, depending on your license, obviously.

That's, that's good. You're always gonna find a way for, like, your salespeople are gonna find a way to charge the big companies a lot of money. They're great at that. Um, and that's what matters really for your, for the revenue.

[00:08:44] ESRI to Google Maps with little in-between

Jeremy: That's a a good point too about like the, my impression of the, the mapping space, and maybe this has changed more recently, but you had the, probably the biggest player Esri, who's selling things at enterprise prices and then there were, or there are like a few open source options. but they feel like the, the barrier to entry feels a little high.

And so, and then I guess you have stuff like Google Maps, right? That's, um, that's very accessible, but it's pretty limited, so. There's this big gap, it feels like right between the, the Esri and the, the Google Maps and open source. It's, it's sort of like, there's almost like there's no sweet spot. guess May, maybe it's just because people's uses are so different, but I'm, I'm not sure, um, what makes maps so unique in that way

Tom: Yeah, I have come to understand what Esri and QGIS do as like an extension of what CAD is like. And if you've used CAD software recently, it's just as crazy and as expensive and as powerful. and it's really hard to capture like the people who are motivated enough to make a map but don't want to go down the whole rabbit hole.

I think that was one of the hardest things about Placemark was trying to be in the middle of those things and half of the people were mystified by the complexity and half the people wanted more complexity. Uh, and I just couldn't figure out how to get it to the right in between spot.

[00:10:25] Placemark and its origins in geojson.io

Jeremy: Yeah. So let's, let's talk a little bit about Placemark then, in terms of from its start. What was your, your goal with Placemark and, and what was the product itself?

Tom: So the seed of the idea for Placemark, uh, is this website called geojson.io, uh, which is still around. And, Chris Fong (correction -- Whong) at, at Mapbox is still, uh, developing it. And that had become pretty useful for a lot of people who I knew in the industry who were in this position of managing geospatial data but not wanting to boot up ArcGIS uh, geojson.io is based on, I just tweeted, I was like, why?

Why is there not a thing where you can edit data on a map and have a GeoJSON representation and just go Back and forth between the two really easily. and it started with that, and then it kind of grew to be a little bit more powerful. And then it was just a tool that was useful for everyone. And my theory was just that I wanted that to be more useful.

And I knew just like anything else that you build and you work on for a long time, you know exactly how it could be so much better. And, uh, all the things that you would do better if you did it again. And I was, uh, you know, hoping that there was something where like if you make that more powerful and you make it something that's like so essential that somebody's using every day, then maybe there's some some value in that.

And so Placemark kind of started as being like, oh, this is the thing where if you're tasking a satellite and you need a bounding box on a specific city, this is the easiest way to do that. Um, and it grew a little bit into being like a tool for collaborating because people were collaborating on it.

And I thought that that would be, you know, an interesting thing to support. but yeah, I think it, it like tried to be in that middle of like, not exactly Google my Maps and certainly a lot, uh, simpler than, uh, QGIS or ArcGIS

Jeremy: something I noticed, so I've actually used geojson.io as well when I was first learning how to put stuff on a map and learning that GeoJSON was a format that a lot of things were using, it was actually really helpful to, to be able to draw, uh, polygons and see, okay, this is how the JSO looks and all that stuff.

And it was. Like just very simple. I think there's something like very powerful about, websites or applications like that where it, it does this one thing and when you go there, you're like, oh, okay, I, I, I know what I'm doing and it's, it's, uh, you know, it's gonna help me do the, this very specific thing I'm trying to do.

[00:13:16] Placemark use cases (Farming, Transportation, Interior mapping, Satellite viewsheds)

Jeremy: I think with Placemark, so, one question I would have is, you gave an example of, uh, someone, I think you said for a satellite, they're, are they drawing the, the area? What, what was the area specifically for?

Tom: the area of interest, the area where they want the, uh, to point the camera.

Jeremy: so yeah, with, with Placemark, I mean, were there, what were some of the specific customers or use cases you had in mind?

'cause that's, that's something about. Um, placemark as a product I noticed was it's sort of like, here's this thing where you can draw polygons put markers and there's all these like things you can do, but I think unless you already have the specific use case, it's not super clear, who uses it for what.

So maybe you could give some examples of what you had in mind.

Tom: I didn't have much in mind, but I can tell you what people, what some people used it for. so some of the more interesting uses of it, a bunch of, uh, farming oriented use cases, uh, especially like indoor and small scale farming. Um, there were some people who, uh, essentially had a bunch of flower farms and had polygons on the map, and they wanted to, uh, mark the ones that had mites or needed to be watered, other things that could spread in a geometric way.

And so it's pretty important to have that geospatial component to it. and then a few places were using it for basically transportation planning. Um, so drawing out routes of where buses would go, uh, in Luxembourg. And, then there was also a little bit of like, kind of interesting, planning of what to buy more or less.

Uh, so something of like, do we want to buy this tract of land or do we wanna buy this tract of land or do we wanna buy access to this one high speed internet cable or this other high speed internet cable? and yeah, a lot of those things were kind of like emergent use cases. Um, there's a lot of people who were doing either architecture or internal or in interior mapping essentially.

Jeremy: Interior, you mean, inside of a building

Tom: yeah.

yeah.

Jeremy: Hmm. Okay.

Tom: Which I don't think it was the best tool for. Uh, but you know, people used it for that.

Jeremy: Interesting. Yeah. I guess, would people normally use some kind of a CAD tool for that, or

Tom: Yeah. Uh, there's CAD tools and there are a few, uh, companies that do just, there's a company that just does interior maps especially of airports, and that's their whole business model. Um, but it's, it's kind of an interesting, uh, problem because most CAD architecture work is done with like a local coordinate system, and you have like very good resolution of everything, and then you eventually place it in geo geospatial space.

Uh, but if you do it all in latitude and longitude, you know, you're, you're moving a door and it's moving the 10th or 12th decimal point, and eventually you have some precision problems.

Jeremy: So it's almost like if you start with latitude and longitude, it's hard to go the other way. Right? you have to start more specific and then you can move it into the, the geospatial, uh, area.

Tom: Yeah. Uh, that's kind of why we have local projections for towns is that you can do a lot of work just in that local projection. And the numbers are kind of small 'cause your town's small, relatively.

Jeremy: yeah, those are kind of interesting. So it sounds like just anytime somebody wants to, like you gave the example of transportation planning or you want to visually see where things are, like your crops or things like that, and that, that kind of makes sense. I mean, I think if you just think about paper maps, if somebody wants to sketch something out and, and sort of track the layout of something, this could serve the same purpose but be editable.

and like you said, I think it's also. Collaborative so you can have multiple people editing the same, um, map. that makes sense. I think something that I believe I saw on your website is you said though that it was, it's like an editing tool, but it's not necessarily a visualization tool.

Uh, I'm kind of curious what you, what you meant by that.

[00:17:39] An editing tool that allows you to export data not a visualization tool

Tom: Yeah, I, when you say a map, I think there's, people can interpret that as everything from raw data to satellite imagery and raster data. and then a lot of it is like, can I use this to make a choropleth map of the voter turnout in our, in my country? and that placemark did a little bit, but I think that it was, it was never going to be the, the thing that it did super well.

and so, yeah, and also like the, the two things kind of, don't mesh all that well. Like if you have a scale point map and you have that kind of visualization of it and then you're editing the points at the same time and you're dragging around these like gigantic points because this point means a lot of population, it just doesn't really make that much sense.

There are probably ways to square that circle and have different views, but, uh, I felt like for visualizations, I mean partly I just think data wrapper is kind of great and uh, I had already worked for observable at that point, which is also, which I think also does like great visualization work.

Jeremy: Would that be the case of somebody could make a map inside a placemark and then they would take the GeoJSON and then import that into another visualization tool? Is that what you were kind of imagining people would do?

Tom: Yeah. Yeah, exactly.

Jeremy: And I could see from the customer's perspective, a lot of them, they may have that end, uh, visualization in mind. So they might look for a tool that kind of just does both. Right.

Tom: Yeah. Yeah. Certain people definitely, wanted that. And yeah, it was an interesting direction to go down. I think that market was going to be a lot different than the people who wanted to manage and edit data. And also, I, one thing that I had in mind a lot, uh, was if Placemark didn't work out, how much would people be burned?

and I think if I, if I built it in a way that like everyone was heavily relying on the API and embeds, people would be suffer a lot more, if I eventually had to shut it down. every API that you release is really a, a long-term commitment. And instead for me, like guilt wise, having a product where you can easily export everything that you ever did in any format that you want was like the least lock in, kind of.

Jeremy: Yeah. And I imagine the, the scope of the project too, you're making it much smaller if you, if you stick to that editing experience and not try to do everything.

Tom: Yeah. Yeah. I, the scope was already pretty big. as you can tell from the open source project, it's, it's bigger than I wish it was. the whole time I was really hoping that I could figure out some niche that was much more compact. there's, I forget the name, but there's somebody who has a, an application that's very similar to Placemark in.

Technical terms, but is just a hundred percent focused on planning septic systems. And I'm just like, if I just did this just for septic systems, like would that be a much, would that be 10,000 lines of code instead of 40,000 lines of code? And it would be able to perfectly serve those customers. but you know, that I didn't do enough experimentation to figure that out.

Um, I, that's, I think one thing that I wish I had done a lot more was, pivot and do experiments.

Jeremy: that septic example, do you know if it's a, a business in and of itself where it can actually support one person or a staff of people? Or is it, is that market just too small?

Tom: I think it's still a solo bootstrapped project. yeah. And it's, it's so hard to tell whether a company's doing well or not. I could ask the person over DM.

[00:21:58] Built the base technology before going public

Jeremy: So when you were first starting. placemark. You were, you were doing it as a solo, developer. A solo entrepreneur, reallyyou worked on it for quite a while, I think before you announced, right? Like maybe a year or so?

Tom: Yeah, yeah. Almost, almost a year, I think, maybe, maybe 10 months in the dark.

Jeremy: I think that there's, there was a lot of overlap between the different directions that I would eventually go in and. So just building a collaborative editor that can edit map data fairly quickly and checks all the boxes of being able to import and export things, um, that is, was a lot of work. and I mean also I, I was, uh, freelancing during part of it, so it wasn't a hundred percent of my time.

Tom: But that, that core, I think even now if I were to build something similar, I would probably still use that work. because that, whether you're doing the septic planning application or you're doing a general purpose kind of map editor or some kind of social application, a lot of that stuff will be in common.

Um, and so I wanted to really get, like, to figure out that problem space and get a few solutions that I could live with.

Jeremy: The base. libraries or technologies you were gonna pick to get the map and have the collaborative aspect. Those are all things you wanted to get settled first. And then you figured, okay, once I have this base, then I can go find the, you know, the, the, the customers or, or find the specifics of what I'm gonna build.

Tom: Yeah, exactly.

Jeremy: I I think you had said that going forward when you're gonna work on another project, you would probably still start the same way.

[00:23:51] Geospatial is a tough industry, no public companies

Tom: if I was working on a project in the geospatial space, I would probably heavily reference the work that I already did here. but I don't know if I'll go back to, to maps again. It's a tough industry.

Jeremy: Is it because of the, the customer base? Is it because like people don't really understand the market in terms of who actually needs the maps? I'm kind of curious what you feel makes it tough.

Tom: I think, well there are no, there are no public mapping companies. Esri is I think one of the 10 largest private companies in the us. but it's not like any of these geospatial companies have ever been like a pure play. And I think that makes it hard. I think maps are just, they're kind of like fonts in a way in which they are this.

Very deep well of complexity, which is absolutely fascinating. If you're in it, it's enough fun and engineering to spend an entire career just working on that stuff. And then once you're out of it, you talk to somebody and you're just like, oh, I work on this thing. And they're like, oh, that you Google maps.

Um, or, you know, I work at a font type like a, you know, a type factory and it's like, oh, do you make, uh, you know, courier in, uh, word. It's really infrastructure, uh, that we mostly take for granted, which is, that's, that means it's good in some ways. but at the same time, I, it's hard to really find a niche in which the mapping component is that, that is that useful.

A lot of the companies that are kind of mapping companies. Like, I think you could say that like Strava and Palantir are kind of geospatial companies, both of them. but Strava is a fitness company and Palantir is a military company. so if you're, uh, a mapping expert, you kind of have to figure out what, how it ties into the real world, how it ties into the business world and revenue.

And then maps might be 50% of the solution or 75% of the solution, but it's probably not going to be, this is the company that makes mapping software.

Jeremy: Yeah, it's more like, I have this product that I'm gonna sell and it happens to have a map as a part of it. versus I'm going to sell you, tools that, uh, you know, help you make your own map. That seems like a, a harder, harder sell.

Tom: yeah. And especially pro tools like the. The idea of people being both invested in terms of paying and invested in terms of wanting to learn the tool. That's, uh, that's a lot to ask out of people.

[00:26:49] Knowing the market is tough but going for it anyways

Jeremy: I think the things we had just talked about, about mapping being a tough industry and about there being like the low end is taken care of by Google, the high end is taken care of by Esri with ArcGIS. Uh, I think you mentioned in a blog post that when you started Placemark you, you, you knew all this from the start.

So I'm kind of curious, like, knowing that, what made you decide like, I'm gonna, I'm gonna go for it and, you know, do it anyways.

Tom: uh, I, well, I think that having seen, I, like I am a co-founder of val.town now, and every company that I've worked for, I've been pretty early enough to see how the sausage is made and the sausage is made with chaos. Like every company doesn't know what it's doing and is in an impossible fight against some Goliath figure.

And the product that succeeds, if it ever does succeed, is something that you did not think of two or three years in advance. so I looked at this, I looked at the odds, and I was like, oh, these are the typical odds, you know, maybe someday I'll see something where it's, uh, it's an obvious open blue water market opportunity.

But I think for the, for the most part, I was expecting to grind. Uh, you know, like even, even if, uh, the odds were worse, I probably would've still done it. I think I, I learned a lot. I should have done a lot more marketing and business and, but I have, I have no regrets about, you know, taking, taking a one try at solving a very hard to solve problem.

Jeremy: Yeah, that's a good point in that the, the odds, like you said, are already stacked against you. but sometimes you just gotta try it and see how it goes,

Tom: Yeah. And I had the, like I was at a time where I was very aware of how my life was set up. I was like, I could do a startup right now and kind of burn money for a little while and have enough time to work on it, and I would not be abandoning an infant child or, you know, like all of the things that, all the life responsibilities that I will have in the near future.

Um. So, you know, uh, the, the time was then, I guess,

[00:29:23] Being a solo developer

Jeremy: And comparing it to your time at Mapbox and the other startups and, and I suppose now at val.town, when you were working on Placemark, you're the sole developer, you're in charge of everything. how did that feel? Did you enjoy that experience or was it more like, I, I really wish I had other people to, you know, to kind of go through this with,

Tom: Uh, around the end I started to chat with people who, like might be co-founders and I even entertained some chats with, uh, venture capital people. I am fine with the, the day to day of working on stuff alone of making a lot of decisions. That's what I have done in a lot of companies anyway. when you're building the prototype or turning a prototype into something that can be in production, I think that having, uh, having other people there,

It would've been better for my mentality in terms of not feeling like it was my thing.

Um, you know, like feeling detached enough from the product to really see its flaws and really be open to, taking more radical shifts in approach. whereas when it's just you, you know, it's like you and the customers and your email inbox and, uh, your conscience and your existential dread. Uh, and you know, it's not like a co-founder or, uh, somebody to work with is gonna solve all of that stuff for you, but, uh, it probably would've been maybe a little bit better.

I don't know. but then again, like I've also seen those kinds of relationships blow up a lot. and I wanted to kind of figure out what I was doing before, adding more people, more complexity, more money into the situation. But maybe you, maybe doing that at the beginning is kind of the same, you know, like you, other people are down for the same kind of risk that you are.

Jeremy: I'm sure it's always different trade offs. I mean, I, I think there probably is a power to being able to unilaterally say like, Hey, this is, this is what I wanna do, so I'm gonna do it.

Tom: Yeah.

[00:31:52] Spending too much time on multiplayer without a business case

Jeremy: You mentioned how there were certain flaws or things you may not have seen because you were so in it.

Looking back, what, what were some of those things?

Tom: I think that, uh, probably the, I I don't think that most technical decisions are all that important, um, that it never seems like the thing that means life or death for companies. And, you know, Facebook is still on PHP, they've fought, fixed, the problem with, with money. but I think I got rabbit holed into a few things where if I had like a business co-founder, then they would've grilled me about like, why are we spending?

The, the main thing that comes to mind, uh, is real time multiplayer, real time. It was a fascinating problem and I was so ready to think about that all the time and try to solve it. And I think that took up a lot of my time and energy. And in the long term, most people are not editing a map. At the same time, seeing the cursors move around is a really fun party trick, and it's great for marketing, but I think that if I were to take a real look at that, that was, that was a mistake.

Especially when the trade off was things that actually mattered. Like the amount of time, the amount, the amount of data that the, that could be handled at. At the same time, I could have figured out ways to upload a one gigabyte or two gigabyte or three gigabyte shape file and for it to just work in that same time, whereas real time made it harder to solve that problem, which was a lot closer to what, Paying customers cared about and where people's expectations were?

Jeremy: When you were working on this realtime collaborative functionality, was this before the product was public? Was this something you, built from the start?

Tom: Yeah. I built the whole thing without it and then added it in. Not as like a rewrite, but like as a, as a big change to a lot of stuff.

Jeremy: Yeah, I, I could totally see how that could happen because you are trying to envision people using this product, and you think of something like Google Docs, right? It's very powerful to be typing in a document and see the other cursors and, um, see other people typing.

So, I could see how you, you would make that leap and say like, oh, the map should, should do that too. Yeah.

[00:34:29] Financial pressures of bootstrapping, high COL, and healthcare

Tom: Yeah. Yeah. Um, and, you know, Figma is very cool. Like the, it's, it's amazing. It's an amazing thing. But the Figma was in the dark for way longer than I was, and uh, Evan is a lot smarter than I was.

Jeremy: He probably had a big bag of money too. Right.

Tom: Yeah.

Jeremy: I, I don't actually know the history of Figma, but I'm assuming it's, um, it's VC funded, right?

Tom: Uh, yeah, they're, they're kind of famous for just having, I don't think they raised that much in the beginning, but they just didn't hire very much and it was just like the two co-founders, or two or three people and they just kept building for long time. I feel like it's like well over three years.

Jeremy: Oh wow. Okay. I think like in your case, I, I saw a comment from you where you were saying, this was your sole source of income and you gotta pay for your health insurance, and so you have no outside investments. So, the pressures are, are very different I think.

Tom: Yeah. Yeah. And that's really something to on, to appreciate about venture capital. It gives you the. Slack in your, in your budget to make some mistakes and not freak out about it. and sadly, the rent is not going down anytime soon in, in Brooklyn, and the health insurance is not going down anytime soon.

I think it's, it's kind of brutal to like leave a job and then realize that like, you know, to, to be admitted to a hospital, you have to pay $500 a month.

Jeremy: I'm, I'm sure that was like, shocking, right? The first time you had to pay for it yourself.

Tom: Yeah. And it's not even good. Uh, we need to fix this like that. If there's anything that we could do to fix entrepreneurship in this country, it's just like, make it possible to do this without already being wealthy. Um, it was, it was a constant stress.

[00:36:29] Growth and customers

Jeremy: As you worked on it, and maybe especially as you, after you had shipped, was there a period where. You know, things were going really well in terms of customers and you felt like, okay, this is really gonna work.

Tom: I was, so, like, I basically started out by dropping, I think $5,000 in the business bank account. And I was like, if I break even soon, then I'll be happy. And I broke even in the first month. And that was amazing. I mean, the costs were low and everything, but I was really happy to just be at that point and that like, it never went down.

I think that probably somebody with more, uh, determination would've kept going after, after I had stopped. but yeah, like, and also The people who used Placemark, who I actually chatted with, and, uh, all that stuff, they were awesome. I wish that there were more of them. but like a lot of the customers were doing cool stuff.

They were supportive. They gave me really informative feedback. Um, and that felt really good. but there was never a point at which like the, uh, the growth scale looked like, oh, we're going to hit a point at which this will be a sustainable business within a year. I think it, according to the growth when I left it, it would've been like maybe three years until I would've been, able to pay my rent and health insurance and, live a comfortable life in, in New York.

Jeremy: So when you mentioned you broke even that was like the expenses into the business, but not for actually like rent and health insurance and food and all that. Okay. Okay. can you say like roughly how much was coming in or how many customers you had?

Tom: Uh, yeah, the revenue initially I think was, uh, 1500 MRR, and eventually it was like 4,000 or so.

Jeremy: And the growth was pretty steady.

[00:38:37] Bootstrapping vs fundraising

Tom: Um, so yeah, I mean, the numbers where you're just like, maybe I could have kept going. but it's, the other weird thing about VCs is just that I think I have this rich understanding of like, if you're, if you're running a business that will be stressful, but be able to pay your bills and you're in control of it, versus running a startup where you might make life changing money and then not have to run a business again.

It's like the latter is kind of better. Uh, if stress affects you a lot, and if you're not really wedded to being super independent. so yeah, I don't know between the two ways of like living your life, I, I have some appreciation for, for both. doing what Placemark entailed if I was living cheaply in a, in a cheap city and it didn't stress me out all the time, would've been a pretty good deal.

Um, but doing it in Brooklyn with all the stress was not it, it wasn't affecting my life in positive ways and I, I wanted to, you know, go see shows at night with my friends and not worry about the servers going down.

Jeremy: Even putting the money aside, I think that's being the only person responsible for the app, right? Probably feels like you can't really take a vacation. Right.

Tom: Yeah, I did take a vacation during it. Like I went to visit my partner who was in, uh, Germany at the time, and we were like on a boat, uh, between Germany, across the lake to Switzerland, and like the servers went down and I opened up my laptop and fixed the servers. It's just like, that is, it's a sacrifice that people make, but it is hard.

Jeremy: There's, there's on call, but usually it's not just you 24 7.

Tom: Yeah. If you don't pick up somebody else

[00:40:28] Financial stress and framing money spent as an investment

Jeremy: Yeah, yeah, yeah, I guess at what point, because I'm trying to think. You started in 2021 and then maybe wrapped up, was it sometime in 2024?

Tom: Uh, I took a job in, uh, I, I mean I joined val.town in the early 2023 and then wrapped up in November, 2023.

Jeremy: At what point did you really start feeling the, the stress? Like I, I imagine maybe when you first started out, you said you were doing consulting and stuff, so, um, probably things were okay, but once you kind of shifted away from that, is that kind of when the, the, the worries about money started coming in?

Tom: Yeah. Um, I think maybe it was like six or eight months, um, in. Just that I felt like I wasn't finding, uh, like a, a way to grow the product without adding lots of complexity to it. and being a solo founder, the idea of succeeding, but having built like this hulking mess of a product felt just as bad as not succeeding.

like ideally it would be something that I could really be happy maintaining for the long term. Uh, but I was just seeing like, oh, maybe I could succeed by adding every feature in QGIS and that's just not, not a, not something that I wanted to commit to. but yeah, I don't, I don't know. I've been, uh, do you know, uh, Ramit Sethie he's like a,

Jeremy: I don't.

Tom: an internet money guy.

He's less scummy than the rest of them, but still, I. an internet money guy. Um, but he does adjust a lot of stuff about like, money psychology. And that has made me realize that a lot of what I thought at the time and even think now is kind of a rational, you know, like, I think one of the main things that I would do differently is just set a budget for Placemark.

Like if I had just set away, like, you know, enough money to live on for a year and put that in, like the, this is for Placemark bucket, then it would've felt better to me then having it all be ad hoc, month to month, feeling like you're burning money instead of investing money in a thing. but yeah, nobody told me, uh, how to, how to think about it then.

Uh, yeah, you only get experience by experiencing it.

Jeremy: You're just seeing your, your bank account shrinking and there's this, psychological toll, right?

Where you're not, you're not used to that feeling and it, it probably feels like something's wrong,

Tom: Yeah, yeah. I'm, I think it, I'm really impressed by people who can say, oh, I invested, uh, you know, 50 or a hundred thousand dollars into this business and was comfortable with that risk. And like, maybe it works out, maybe it doesn't. Maybe you just like threw a lot of money down into that. and the people, I think with the healthy, productive, uh, relationship with it. Do think of it as like, oh, I, I paid for kind of a bet on a risk. and that's, that's what I was doing anyway. You know, like I was paying my rent and my health insurance and spending all my time working on the product instead of paying, uh, freelance work. but if you don't frame it that way, it doesn't feel like an investment.

It feels like you're making a risky gamble.

Jeremy: Yeah. And I think that makes sense to, to actually, I think, like you were saying, have a separate account or a separate thing set aside where you are like, this is, this is this money for this purpose. And like you said, look at it as an investment, which with regular investments can go down.

Tom: Yeah, exactly. Yeah.

Jeremy: Yeah

[00:44:26] In hindsight might have raised money or tried smaller bets

Jeremy: Were there, there other things, whether technical or or business wise, that, that if you were to to do it again, you would do differently?

Tom: I go back and forth on whether I should have raised venture capital. there are, there's kind of a, an assumption in venture capital that once you're on it, you have to go the whole way. You have to become a billion dollar company, uh, or at least really tell people that you're going to be a billion dollar company and I am not. yeah, I, I don't know. I've seen, I've seen other companies in my space, or like our friends of my current company who are not really targeting that, or ones who were, and then they had somewhere in between the billion dollar and the very small outcome. Uh, and that's a little bit of a point in the favor of accepting a big pile of money from the venture capitalists. I'm also a little bit biased right now because val.town has one investor and he's like the, the best venture capitalist that I have ever met. Big fan. don't quote me on that. If he sacks me in like a year, we'll see. Um, but uh, yeah, there, I, I think that I understand more why people take that approach. or I've understood more why people take like the venture capital but not taking $300 million from SoftBank approach. yeah, and I don't know, I think that, trying a lot of things also seems really appealing. Uh, people who do the same kind of.

of Maybe 10 months, but they build four or five different products or three different products instead of just one.

I think that, that feels, feels like a good idea to me.

Jeremy: And in doing that, would that be more of a, like as a solo entrepreneur or you, you're thinking you would take investment and then say, I'm gonna try all these things with, with your money.

Tom: Oh, I've seen both. I, that I, yeah, one friend's company has pivoted like four times between very different ideas and yeah, it, it's one way to do it, but I think in the long term, I would want to do that as a solo developer and try to figure out, you know, something. but yeah, I, I think, uh, so much of it is mindset, that even then if I was working on like three different projects, I think I.

My qualifications for something being worth, really adopting and spending all my time doing, you just have to accept, uh, a lot of hits and a lot of misses and a lot of like keeping things alive and finding out how to turn them into something. I am really inspired by my friends who like started around the same time that I did and they're not that much further in terms of revenue and they're like still, still doing it because that is what they want to do in life.

and if you develop the whole ecosystem and mindset around it, I think that's somewhere that people can stay and, and be happy. just trying to find, trying to find a company that they own and control and they like.

Jeremy: While, while making the the expenses work.

Tom: Yeah.

Yeah. that's the, that's the hard part, like freelancing on the side also. I probably could have kept that up. I liked my freelance clients. I would probably still work with them as well. but I kind of just wanted the, I wanted the focus, I wanted the motivation of, of being without a net.

Jeremy: Yeah, I mean, energy wise, do you think that that would've worked? I mean, I imagine that Placemark took a lot of your time when you were working full time, so you're trying to balance, you know, clients and all your customers and everything you're doing with the software. It just feels like it might be a lot.

Tom: Yeah. Yeah. Maybe with different freelance clients. I, I loved my freelance clients because I, after. leaving config. I, I wanted to work on climate change stuff and so I was working for climate change foundations and that is not the way to max out your paycheck. It's the way to feel good about your conscience.

And so I still feel great about those projects, but in the future, yeah, I would probably just work for, uh, you know, a hedge fund or something.

[00:49:02] Marketing to developers but not potential customers

Jeremy: I think something you mentioned in one of your posts is that you maybe could have spent more time or had a different approach with marketing. Maybe you could kind of say what you did do and then what maybe worked and what didn't.

Tom: Yeah. So I like my sweet spot is writing documentation and blog posts and technical stuff. And so I did a lot of that and a lot of that like worked in a way that didn't matter. I am at this point, weirdly good at writing stuff that gets on Hacker News. I've written a lot of stuff that's gotten to the top of Hacker News and unfortunately, writing about your technical approach and your geospatial project for handling errors, uh, in your JavaScript code is not really a way to get customers.

and I think doing a lot of documentation was also great, but it was also,

I think that the, the thing that was missing is the thing that I think Mapbox does fairly well now, in which the homepage really pushes you toward use cases immediately. and I should have been saying to each customer who had anything compelling as a use case, like, let's write an article about you and what you're doing, and here's how you use this in your industry.

and that probably would've also been like a good, a good way to figure out which of those verticals was the one that was most worth spending all the time on. yeah. So it, it was, it was a lot of good marketing to nerds. and it could have been better in terms of marketing to actual customers and to people who are making the buying decisions.

Jeremy: Yeah. Looking at the, the Placemark blog, I can definitely see how as a developer, a lot of the posts are appealing to me, right? It's about how you worked on a technical challenge or decisions you made, but maybe less so to somebody who they wanna. Draw a map to manage their crops. They're like, I don't care about any of this.

Right.

Tom: Yeah, like the Mapbox blog used to be, just all that stuff as well. We would write about designing protocol buffer layouts, and it was amazing for hiring and amazing for getting nerds in the door. But now it's just, Toyota is launching with, Mapbox Maps or something like that. And that's, that's what you, you should do if you're trying to sell a product.

Jeremy: Yeah. And I think the, the sort of technical aspect, it makes sense too. If you're venture funded and you are looking to hire, right? You wanna build your team and you just want to increase like, the amount of stuff you're building and not worrying so much about, am I gonna have a paycheck next

Tom: Yeah. Yeah. I, I just kind of do it because it's fun, which is not the right reason to do it, but, Yeah, I mean, I still write my blog mostly just because it's, it's a fun thing to do, but it's not the best way to, um, to run a business.

Jeremy: Yeah. Well, the fun part is important too though.

Tom: Yeah. Yeah. That's, that's maybe the whole thing. May, that's maybe the most important thing, but you can't do it if you don't do the, the money part.

[00:52:35] Most customers came from existing audience

Jeremy: Right. So the people who did find you, was it mostly word of mouth from people who did identify with the technical posts, or were there places that surprised you, that people found you?

Tom: Uh, a lot of it was people who were familiar with the Mapbox ecosystem or with, with me. and then eventually, yeah, a few of the users came in through, um, through Hacker News, but it was mostly, mostly word of mouth also. The geospatial community is like fairly tight and it's, and it's not too hard to be the person who writes the article about some geospatial challenge that everyone finds.

Jeremy: Hmm. Okay. Yeah, that's a good point about like being in that community, especially since you've done so much work in geospatial and in open source that you have this little, this built-in audience, I guess.

Tom: yeah. Which I appreciate. It makes me nervous, but yeah.

[00:53:43] Val.town marketing to developers

Jeremy: Comparing that to something like val.town, how is val.town marketing? How is it finding users? 'cause from what I can tell, it's, it's getting a lot of, uh, a lot of people coming in, right?

Tom: Yeah. Uh, well, right now our, our kind of target user, or the user that we think of is a hobbyist, is somebody who's, sometimes a pro developer or somebody, sometimes just somebody who's really interested in the field. And so writing these things that are just about, you know, programming, does super well.

Uh, but it, we have exactly the same problem and that that is kind of being revamped as we speak. uh, we hired somebody who actually knows marketing and has a good sense for it. And so a lot of that stuff is shifting to show you what you can do with val.town because it, it suffers from the same problem as well.

It's an empty text field in which you can type, type script, code, and it runs. And knowing what you can do with that or what you should do with that is, is hard if you don't have a grasp of TypeScript and web applications. so pretty soon we'll have pages which are like, here's how to connect linear and GitHub with OW Town, or, you know, two nouns connect them, for all of those companies and to do automations and all these like concrete applications.

I think that's, you have to do it. You have to figure it out.

Jeremy: Just briefly for someone who hasn't heard of val.town, like what, what does it do?

Tom: Uh, val.town is a social website, so it has comments and likes and all of that stuff. but it's for writing these little snippets of TypeScript and JavaScript code that run. So a lot of them are websites, some of them are automations, so they receive emails or send emails or connect one service to another.

And yeah, it's, it's like combining some aspects of, GitHub or like a code platform, uh, but with the assumption that every time that you save, everything's instantly deployed.

Jeremy: So it's maybe a little bit like, um, like a glitch, I guess?

Tom: Uh, yeah. Yeah, it takes a lot of experience, a lot of, uh, inspiration from Glitch.

Jeremy: And I, I think, like you had mentioned, you enjoy writing the, the technical blog posts and the documentation. And so at least with val.town, your audience is developers versus, the geospatial community who probably largely doesn't care about, TypeScript and the, the different technical decisions there.

Tom: Yeah, it, it makes it easier, that's for sure. The customer is, is me.

[00:56:30] Shifting from solo to in-person teams

Jeremy: Nice. Yeah. Looking at, you know, you, you worked as a, a solo developer for Placemark, and then now you've got a team of, is it like maybe five

Tom: Uh, it is seven at the moment.

Jeremy: Seven people. Okay. Are you all in person or is it, remote

Tom: We all sit around two tables in Brooklyn. It's very nice.

Jeremy: So how did that feel? Like shifting from, I'm in, I don't know if you worked from home while you were working on Placemark or if you were in coworking spaces, but you're, you're shifting from I'm like in my own head space doing everything myself to, to, I'm in a room with all these people and we're like working on this thing together.

I'm kind of curious like how that felt for you.

Tom: Yeah, it's been a big difference. And I think that I was just talking with, um, one, one of our, well an engineer at, at val.town about how everyone kind of had, had been working remote for obvious pandemic world reasons. And this kind of privilege of just being around the same table, if that's what you like is, a huge difference in terms of, I just remember having to.

Trick myself into going on a walk around the block because I would get into such a dark mental head space of working on the same project for eight hours straight and skipping lunch. and now there's a little bit more structure. yeah, it's, it's been, it's been a overall, an improvement. Some days I wish that I could go on a run at noon 'cause that's the warmest time of the day.

but, uh, overall, like it makes things so much easier. just reading the emotions in people's faces when they're telling you stuff and being able to, uh, not get into discussions that you don't need to get into because you can talk and just like understand each other very quickly. It's, it's very nice.

I don't wanna force everyone to do it, you know, but it it for the people who want it, they, they, uh, really enjoy it.

Jeremy: Yeah. I think if you have the right set of people, it's definitely more enjoyable. And um, if you don't, maybe not so

Tom: Yeah, we haven't hired any, like, extremely loud chewers yet or anything like that, but yeah, maybe my story will change.

Jeremy: No, no one microwaving fish.

Tom: No, there's, uh, yeah, thankfully the microwave is outside of the office.

Jeremy: Do you live close to the office?

Tom: Yeah. Yeah. Like most of the team is within a 20 or 30 minute walk of the office and

it's very fortunate. I think there's been something of a mass migration to New York. A lot of us didn't live in New York before four years ago, and now all of us do. it's, it's, uh, it's very comfortable to be here.

Jeremy: I think that makes, uh, such a big difference. 'cause I think the majority of people, at least within the US you know, you're, you're getting in your car, you're sitting in traffic. and I know people who, during the pandemic, they actually moved further, right? Because they went, oh, like, uh, I don't need to come into the office.

but yeah, if you are close enough where you can walk, yeah, I think that makes a big difference.

Tom: Oh yeah. If I had to drive to work, I think my blood pressure would be so much higher. Uh, especially in New York. Oh, I feel so bad for the people who have to drive, whereas I'm just walking with, you know, a bagel in hand, enjoying listening to the birds.

Jeremy: Yeah. Yeah. well now they have, what is it, the congestion pricing in

Tom: Yeah. Yeah. We're all in Brooklyn, so it doesn't affect us that much, but it's supposedly, it's, it's working great. Um, yeah. I hope we can keep it.

Jeremy: I've never driven in New York and I, I wouldn't want to

Tom: Yeah. It's only for the brave or the crazy.

[01:00:37] The value of public writing and work

Jeremy: I think that's probably a good place to, to wrap up, but is there any other thoughts you had or things you wanted to mention?

Tom: No, I've just, uh, thank you so much. This has been, this has been a lot of fun. You're, you're very good at this as well. I feel like it's, uh,

Jeremy: Thank you

Tom: It's not easy to, to steer a conversation in a way that makes awkward people sound, uh, normal.

Jeremy: I wouldn't say that, but um, what's been actually pretty helpful to me is, you have such a body of work, I guess I would say, in terms of your blogging and, just the amount that you write and the long history of projects that, that there's, you know, there's a lot to talk about and I'm sure it helps, helps your thought process as well.

Tom: Yeah. I, I've been lucky to have a lot of jobs where people, where companies were like, cool with publishing everything, you know? so a lot of what I've done is, uh, is public. it's, it's, uh, I'm very, very thankful for like, early on that being a big part of company culture.

Jeremy: And you can definitely tell, I think for people who look at the Placemark blog posts or, or now your, your val.town blog posts, like there's, there's a clear difference when somebody like is very intentional and, um, you know, it's good at writing versus you're doing it because, um, it's your corporate responsibility or whatever, like people can tell.

Yeah.

Tom: Yeah. You can't fake being interested. so you gotta work on things that are interesting.

Jeremy: Tom, thanks again for, for agreeing to chat. This was fun.

Tom: Yeah thank you so much.

Paul Frazee on Bluesky and ATProto

Thu, 16 Jan 2025 04:22:47 -0000

Paul Frazee is the CTO of Bluesky. He previously worked on the Beaker browser and the peer-to-peer social media protocol Secure Scuttlebutt.

Paul discusses how Bluesky and ATProto got started, scaling up a social media site, what makes ATProto decentralized, lessons ATProto learned from previous peer-to-peer projects, and the challenges of content moderation.

Episode transcript available here. My Bluesky profile.

Transcript

You can help correct transcripts on GitHub.

[00:00:00] Jeremy: Today I am talking to Paul Frazee. He's the current CTO of bluesky, and he previously worked on other decentralized applications like Beaker and Secure Scuttlebutt.

[00:00:15] Paul: Thanks for having me.

What's bluesky

[00:00:16] Jeremy: For people who aren't familiar with bluesky, what is it?

[00:00:20] Paul: So bluesky is an open social network, simplest way to put it, designed in particular for high scale. That's kind of one of the big requirements that we had when we were moving into it. and it is really geared towards making sure that the operation of the social network is open amongst multiple different organizations.

[00:00:44] So we're one of the operators, but other folks can come in, spin up the software, all the open source software, and essentially have a full node with a full copy of the network active users and have their users join into our network. And they all work functionally as one shared application.

[00:01:03] Jeremy: So it, it sounds like it's similar to Twitter but instead of there being one Twitter, there could be any number and there is part of the underlying protocol that allows them to all connect to one another and act as one system.

[00:01:21] Paul: That's exactly right. And there's a metaphor we use a lot, which is comparing to the web and search engines, which actually kind of matches really well. Like when you use Bing or Google, you're searching the same web. So on the AT protocol on bluesky, you use bluesky, you use some alternative client or application, all the same, what we're we call it, the atmosphere, all one shared network,

[00:01:41] Jeremy: And more than just the, the client. 'cause I think sometimes when people think of a client, they'll think of, I use a web browser. I could use Chrome or Firefox, but ultimately I'm connecting to the same thing. But it's not just people running alternate clients, right?

[00:01:57] Paul: Their own full backend to it. That's right. Yeah. Yeah. The anchoring point on that being the fire hose of data that runs the entire thing is open as well. And so you start up your own application, you spin up a service that just pipes into that fire hose and taps into all the activity.

History of AT Protocol

[00:02:18] Jeremy: Talking about this underlying protocol maybe we could start where this all began so people get some context for where this all came from.

[00:02:28] Paul: For sure. All right, so let's wind the clock back here in my brain. We started out 2022, right at the beginning of the year. We were formed as a, essentially a consulting company outside of Twitter with a contract with Twitter. And, uh, our goal was to build a protocol that could run, uh, Twitter, much like the way that we just described, which set us up with a couple of pretty specific requirements.

[00:02:55] For one, we had to make sure that it could scale. And so that ended up being a really important first requirement. and we wanted to make sure that there was a strong kind of guarantees that the network doesn't ever get captured by any one operator. The idea was that Twitter would become the first, uh, adopter of the technology.

[00:03:19] Other applications, other services would begin to take advantage of it and users would be able to smoothly migrate their accounts in between one or the other at any time. Um, and it's really, really anchored in a particular goal of just deconstructing monopolies. Getting rid of those moats that make it so that there's a kind of a lack of competition, uh, between these things.

[00:03:44] And making sure that, if there was some kind of reason that you decided you're just not happy with what direction this service has been going, you move over to another one. You're still in touch with all the folks you were in touch with before. You don't lose your data. You don't lose your, your your follows. Those were the kind of initial requirements that we set out with. The team by and large came from, the decentralized web, movement, which is actually a pretty, large community that's been around since, I wanna say around 2012 is when we first kind of started to form. It got really made more specifically into a community somewhere around 2015 or 16, I wanna say.

[00:04:23] When the internet archives started to host conferences for us. And so that gave us kind of a meeting point where all started to meet up there's kind of three schools of thought within that movement. There was the blockchain community, the, federation community, and the peer-to-peer community.

[00:04:43] And so blockchain, you don't need to explain that one. You got Federation, which was largely ActivityPub Mastodon. And then peer-to-peer was IPFS, DAT protocol, um, secure scuttlebutt. But, those kinds of BitTorrent style of technologies really they were all kind of inspired by that.

[00:05:02] So these three different kind of sub communities we're all working, independently on different ways to attack how to make these open applications. How do you get something that's a high scale web application without one corporation being the only operator? When this team came together in 2022, we largely sourced from the peer-to-peer group of the decentralized community.

Scaling limitations of peer-to-peer

[00:05:30] Paul: Personally, I've been working in the space and on those kinds of technologies for about 10 years at that stage. And, the other folks that were in there, you know, 5-10 each respectively. So we all had a fair amount of time working on that. And we had really kind of hit some of the limitations of doing things entirely using client devices. We were running into challenges about reliability of connections. Punching holes to the individual device is very hard. Synchronizing keys between the devices is very hard. Maintaining strong availability of the data because people's devices are going off and on, things like that. Even when you're using the kind of BitTorrent style of shared distribution, that becomes a challenge.

[00:06:15] But probably the worst challenge was quite simply scale. You need to be able to create aggregations of a lot of behavior even when you're trying to model your application as largely peer wise interactions like messaging. You might need an aggregation of accounts that even exist, how do you do notifications reliably?

[00:06:37] Things like that. Really challenging. And what I was starting to say to myself by the end of that kind of pure peer-to-peer stent was that it can't be rocket science to do a comment section. You know, like at some point you just ask yourself like, how, how hard are we willing to work to, to make these ideas work?

[00:06:56] But, there were some pretty good pieces of tech that did come out of the peer-to-peer world. A lot of it had to do with what I might call a cryptographic structure. things like Merkel trees and advances within Merkel Trees. Ways to take data sets and reduce them down to hashes so that you can then create nice signatures and have signed data sets at rest at larger scales.

[00:07:22] And so our basic thought was, well, all right, we got some pretty good tech out of this, but let's drop that requirement that it all run off of devices. And let's get some servers in there. And instead think of the entire network as a peer-to-peer mesh of servers. That's gonna solve your scale problem.

[00:07:38] 'cause you can throw big databases at it. It's gonna solve your availability problems, it's gonna solve your device sync problems. But you get a lot of the same properties of being able to move data sets between services. Much like you could move them between devices in the peer-to-peer network without losing their identifiers because you're doing this in direction of, cryptographic identifiers to the current host.

[00:08:02] That's what peer-to-peer is always doing. You're taking like a public key or hash and then you're asking the network, Hey, who has this? Well, if you just move that into the server, you get the same thing, that dynamic resolution of who's your active host. So you're getting that portability that we wanted real bad.

[00:08:17] And then you're also getting that kind of in meshing of the different services where each of them is producing these data sets that they can sink from each other. So take peer-to-peer and apply it to the server stack. And that was our kind of initial thought of like, Hey, you know what? This might work.

[00:08:31] This might solve the problems that we have. And a lot of the design fell out from that basic mentality.

Crytographic identifiers and domain names

[00:08:37] Jeremy: When you talk about these cryptographic identifiers, is the idea that anybody could have data about a person, like a message or a comment, and that could be hosted different places, but you would still know which person that originally came from. Is that, is that the goal there?

[00:08:57] Paul: That's exactly it. Yeah. Yeah. You wanna create identification that supersedes servers, right? So when you think about like, if I'm using Twitter and I wanna know what your posts are, I go to twitter.com/jeremy, right? I'm asking Twitter and your ID is consequently always bound to Twitter. You're always kind of a second class identifier.

[00:09:21] We wanted to boost up the user identifier to be kind of a thing freestanding on its own. I wanna just know what Jeremy's posts are. And then once you get into the technical system it'll be designed to figure out, okay, who knows that, who can answer that for you? And we use cryptographic identifiers internally.

[00:09:41] So like all the data sets use these kind of long URLs to identify things. But in the application, the user facing part, we used domain names for people. Which I think gives the picture of how this all operates. It really moves the user accounts up into a free standing first class identifier within the system.

[00:10:04] And then consequently, any application, whatever application you're using, it's really about whatever data is getting put into your account. And then that just exchanges between any application that anybody else is using.

[00:10:14] Jeremy: So in this case, it sounds like the identifier is some long string that, I'm not sure if it's necessarily human readable or not. You're shaking your head no.

[00:10:25] Paul: No.

[00:10:26] Jeremy: But if you have that string, you know it's for a specific person. And since it's not really human readable, what you do is you put a layer on top of it which in this case is a domain that somebody can use to look up and find the identifier.

[00:10:45] Paul: Yeah, yeah, yeah. So we just use DNS. Put a TXT record in there, map into that long string, or you could do a .well-known file on a web server if that's more convenient for you. And then the ID that's behind that, the non-human readable one, those are called DIDs which is actually a W3C spec. Those then map to a kind of a certificate. What you call a DID document that kind of confirms the binding by declaring what that domain name should be. So you get this bi-directional binding. And then that certificate also includes signing keys and active servers. So you pull down that certificate and that's how the discovery of the active server happens is through the DID system.

What's stored on a PDS

[00:11:29] Jeremy: So when you refer to an active server what is that server and what is that server storing?

[00:11:35] Paul: It's kinda like a web server, but instead of hosting HTML, it's hosting a bunch of JSON records. Every user has their own document store of JSON documents. It's bucketed into collections. Whenever you're looking up somebody on the network you're gonna get access to that repository of data, jump into a collection.

[00:11:58] This collection is their post collection. Get the rkey (Record Key), and then you're pulling out JSON at the end of it, which is just a structured piece of stuff saying here's the CreatedAt, here's the text, here's the type, things like that. One way you could look at the whole system is it's a giant, giant database network.

Servers can change, signing keys change, but not DID

[00:12:18] Jeremy: So if someone's going to look up someone's identifier, let's say they have the user's domain they have to go to some source, right? To find the user's data. You've mentioned, I think before, the idea that this is decentralized and by default I would, I would picture some kind of centralized resource where I send somebody a domain and then they give me back the identifier and the links to the servers.

[00:12:46] So, so how does that work in practice where it actually can be decentralized?

[00:12:51] Paul: I mentioned that your DID that non-human readable identifier, and that has that certificate attached to it that lists servers and signing keys and things like that.

[00:13:00] So you're just gonna look up inside that DID document what that server is your data repository host. And then you contact that guy and say, all right, I'm told you're hosting this thing. Here's the person I'm looking for, hand over the hand over the data. It's really, you know, pretty straightforward.

[00:13:18] The way that gets decentralized is by then to the fact that I could swap out that active server that's in my certificate and probably wanna rotate the signing keys 'cause I've just changed the, you know. I don't want to keep using the same signing keys as I was using previously because I just changed the authority.

[00:13:36] So that's the migration change, change the hosting server, change out the signing keys. Somebody that's looking for me now, they're gonna load up my document, my DID document. They're gonna say, okay, new server, new keys. Pull down the data. Looks good, right? Matches up with the DID doc.

[00:13:50] So that's how you get that level of portability. But when those changes happen, the DID doesn't change, right? The DID document changes. So there's the level of indirection there and that's pretty important because if you don't have a persistent identifier whenever you're trying to change out servers, all those backlinks are gonna break.

[00:14:09] That's the kind of stuff that stops you from being able to do clean migrations on things like web-based services. the only real option is to go out and ask everybody to update their data. And when you're talking about like interactions on the social network, like people replying to each other, there's no chance, right?

[00:14:25] Every time somebody moves you're gonna go back and modify all those records. You don't even control all the records from the top down 'cause they're hosted all over the web. So it's just, you can't do it. Generally we call this account portability, that you're kinda like phone number portability that you can change your host, but, so that part's portable, but the ID stays the same.

[00:14:45] And keeping that ID the same is the real key to making sure that this can happen without breaking the whole system.

[00:14:52] Jeremy: And so it, it sounds like there's the decentralized id, then there's the decentralized ID document that's associated with that points you to where the actual location of your, your data, your posts, your pictures and whatnot. but then you also mentioned that they could change servers.

[00:15:13] So let's say somebody changes where their data is, is stored, that would change the servers, I guess, in their document. But

[00:15:23] then how do all of these systems. Know okay. I need to change all these references to your old server, to these new servers,

[00:15:32] Paul: Yeah. Well, the good news is that you only have to, you, you got the public data set of all the user's activity, and then you have like internal caches of where the current server is. You just gotta update those internal caches when you're trying to contact their server. Um, so it's actually a pretty minimal thing to just like update like, oh, they moved, just start talking to update my, my table, my Redis, that's holding onto that kind of temporary information, put it on ttl, that sort of thing.

Most communication won't be between servers, it will be from event streams

[00:16:01] Paul: And, honestly, in practice, a fair amount of the system for scalability reasons doesn't necessarily work by servers directly contacting each other. It's actually a little bit more like how, I told you before, I'm gonna use this metaphor a lot, the search engines with the web, right? What we do is we actually end up crawling the repositories that are out in the world and funneling them into event streams like a Kafka. And that allows the entire system to act like a data processing pipeline where you're just tapping into these event streams and then pushing those logs into databases that produce these large scale aggregations.

[00:16:47] So a lot of the application behavior ends up working off of these event logs. If I reply to somebody, for instance, I don't necessarily, it's not, my server has to like talk to your server and say, Hey, I'm replying to you. What I do is I just publish a reply in my repository that gets shot out into the event logs, and then these aggregators pick up that the reply got created and just update their database with it.

[00:17:11] So it's not that our hosting servers are constantly having to send messages with each other, you actually use these aggregators to pull together the picture of what's happening on the network.

[00:17:22] Jeremy: Okay, so like you were saying, it's an event stream model where everybody publishes the events the things that they're doing, whether that's making a new post, making a reply, that's all being posted to this event stream. And then everybody who provides, I'm not sure if instances is the right term, but an implementation of the atmosphere protocol (Authenticated Transfer protocol).

[00:17:53] They are listening for all those changes and they don't necessarily have to know that you moved servers because they're just listening for the events and you still have the same identifier.

[00:18:10] Paul: Generally speaking. Yeah. 'cause like if you're listening to one of these event streams what you end up looking for is just the signature on it and making sure that the signature matches up. Because you're not actually having to talk to their live server. You're just listening to this relay that's doing this aggregation for you.

[00:18:27] But I think actually to kind of give a little more clarity to what you're talking about, it might be a good idea to refocus how we're talking about the system here. I mentioned before that our goal was to make a high scale system, right? We need to handle a lot of data. If you're thinking about this in the way that Mastodon does it, the ActivityPub model, that's actually gonna give you the wrong intuition.

Designing the protocol to match distributed systems practices (Event sourcing / Stream processing)

[00:18:45] Paul: 'cause we chose a dramatically different system. What we did instead was we picked up, essentially the same practices you're gonna use for a data center, a high scale application data center, and said, all right, how do you tend to build these sorts of things? Well, what you're gonna do is you're gonna have, multiple different services running different purposes.

[00:19:04] It gets pretty close to a microservices approach. You're gonna have a set of databases, and then you're going to, generally speaking for high scale, you're gonna have some kind of a kafka, some kind of a event log that you are tossing changes about the state of these databases into. And then you have a bunch of secondary systems that are tapping into the event log and processing that into, the large scale, databases like your search index, your, nice postgres of user profiles.

[00:19:35] And that makes sure that you can get each of these different systems to perform really well at their particular task, and then you can detach them in their design. for instance, your primary storage can be just a key value store that scales horizontally. And then on the event log, you, you're using a Kafka that's designed to handle.

[00:19:58] Particular semantics of making sure that the messages don't get dropped, that they come through at a particular throughput. And then you're using, for us, we're using like ScyllaDB for the big scale indexes that scales horizontally really well. So it's just different kind of profiles for different pieces.

[00:20:13] If you read Martin Kleppman's book, data Intensive applications I think it's called or yeah. A lot of it gets captured there. He talks a lot about this kind of thing and it's sometimes called a kappa architecture is one way this is described, event sourcing is a similar term for it as well.

[00:20:30] Stream processing. That's pretty standard practices for how you would build a traditional high scale service. so if you take, take this, this kind of microservice architecture and essentially say, okay, now imagine that each of the services that are a part of your data center could be hosted by anybody, not just within our data center, but outside of our data center as well and should be able to all work together.

[00:20:57] Basically how the AT Proto is designed. We were talking about the data repository hosts. Those are just the primary data stores that they hold onto the user keys and they hold onto those JSON records. And then we have another service category we call Relay that just crawls those data repositories and sucks that in that fire hose of data we were talking about that event log.

App views pull data from relay and produces indexes and threads

[00:21:21] Paul: And then we have what we call app views that sit there and tail the index and tail the log, excuse me, and produce indexes off of it, they're listening to those events and then like, making threads like okay, that guy posted, that guy replied, that guy replied.

[00:21:37] That's a thread. They assemble it into that form. So when you're running an application, you're talking to the AppView to read the network, and you're talking to the hosts to write to the network, and each of these different pieces sync up together in this open mesh. So we really took a traditional sort of data center model and just turned it inside out where each piece is a part of the protocol and communicate it with each other and therefore anybody can join into that mesh.

[00:22:07] Jeremy: And to just make sure I am tracking the data repository is the data about the user. So it has your decentralized identifier, it has your replies, your posts, And then you have a relay, which is, its responsibility, is to somehow find all of those data repositories and collect them as they happen so that it can publish them to some kind of event stream.

[00:22:41] And then you have the AppView which it's receiving messages from the relay as they happen, and then it can have its own store and index that for search. It can collect them in a way so that it can present them onto a UI. That's sort of thing that's the user facing part I suppose.

[00:23:00] Paul: Yeah, that's exactly it. And again, it's, it's actually quite similar to how the web works. If you combine together the relay and the app view, you got all these different, you know, the web works where you got all these different websites, they're hosting their stuff, and then the search engine is going around, aggregating all that data and turning it into a search experience.

[00:23:19] Totally the same model. It's just being applied to, more varieties of data, like structured data, like posts and, and replies, follows, likes, all that kinda stuff. And then instead of producing a search application at the end. I mean, it does that too, but it also produces a, uh, you know, timelines and threads and, um, people's profiles and stuff like that.

[00:23:41] So it's actually a pretty bog standard way of doing, that's one of the models that we've seen work for large scale decentralized systems. And so we're just transposing it onto something that kind of is more focused towards social applications

[00:23:58] Jeremy: So I think I'm tracking that the data repository itself, since it has your decentralized identifier and because the data is cryptographically signed, you know, it's from a specific user. I think the part that I am still not quite sure about is the relays. I, I understand if you run all the data repositories, you know where they are, so you know how to collect the data from them.

[00:24:22] But if someone's running another system outside of your organization, how do they find, your data repositories? Or do they have to connect to your relay? What's the intention for that?

Data hosts request relays to pull their data

[00:24:35] Paul: That logic runs, again, really similar to how search engines find out about websites. So there is actually a way for, one of these, data hosts to contact Relay and say, Hey, I exist. You know, go ahead and get my stuff. And then it'll be up to the relay to decide like if they want it or not.

[00:24:52] Right now, generally we're just like, yeah, you know, we, we want it. But as you can imagine, as the thing matures and gets to higher scale, there might be some trust kind of things to worry about, you know? So that's kind of the naive operation that currently exists. But over time as the network gets bigger and bigger, it'll probably involve some more traditional kind of spiraling behaviors because as more relays come into the system, each of these hosts, they're not gonna know who to talk to.

Relays can bootstrap who they know about by talking to other relays

[00:25:22] Paul: You're trying to start a new relay. What they're gonna do is they're going to discover all of the different users that exist in the system by looking at what data they have to start with. Probably involve a little bit of a manual feeding in at first, whenever I'm starting up a relay, like, okay, there's bluesky's relay.

[00:25:39] Lemme just pull what they know. And then I go from there. And then anytime you discover a new user you don't have, you're like, oh, I wanna look them up. Pull them into the relay too. Right. So there's a, pretty straightforward, discovery process that you'll just have to bake into a relay to, to make sure you're calling as much the network as possible.

ActivityPub federation vs AT Proto

[00:25:57] Jeremy: And so I don't think we've defined the term federation, but maybe you could explain what that is and if that is what this is.

[00:26:07] Paul: We are so unsure.

[00:26:10] Jeremy: Okay.

[00:26:11] Paul: Yeah. This has jammed is up pretty bad. Um, because I think everybody can, everybody pretty strongly agrees that ActivityPub is federation, right? and ActivityPub kind of models itself pretty similarly to email in a way, like the metaphors they use is that there's inboxes and outboxes and, and every ActivityPub server they're standing up the full vertical stack.

[00:26:37] They set up, the primary hosting, the views of the data that's happening there. the interface for the application, all of it, pretty traditional, like close service, but then they're kind of using the perimeter. they're making that permeable by sending, exchanging, essentially mailing records to each other, right?

[00:26:54] That's their kind of logic of how that works. And that's pretty much in line with, I think, what most people think of with Federation. Whereas what we're doing isn't like that we've cut, instead of having a bunch of vertical stacks communicating horizontally with each other, we kind of sliced in the other direction.

[00:27:09] We sliced horizontally into, this microservices mesh and have all the different, like a total mix and match of different microservices between different operators. Is that federation? I don't know. Right. we tried to invent a term, didn't really work, you know, At the moment, we just kind of don't worry about it that much, see what happens, see what the world sort of has to say to us about it.

[00:27:36] and beyond that, I don't know.

[00:27:42] Jeremy: I think people probably are thinking of something like, say, a Mastodon instance when you're, when you're talking about everything being included, The webpage where you view the posts, the Postgres database that's keeping the messages.

[00:28:00] And that same instance it's responsible for basically everything.

[00:28:06] Paul: mm-Hmm

[00:28:06] Jeremy: And I believe what you're saying is that the difference with, the authenticated transfer protocol, is that the

[00:28:15] Paul: AT Protocol, Yep.

[00:28:17] Jeremy: And the difference there is that you've, at the protocol level, you've split it up into the data itself, which can be validated completely separately from other parts of the system.

[00:28:31] You could have the JSON files on your hard drive and somebody else can have that same JSON file and they would know that who the user is and that these are real things that user posted. That's like a separate part. And then the relay component that looks for all these different repositories that has people's data, that can also be its own independent thing where its job is just to output events.

[00:29:04] And that can exist just by itself. It doesn't need the application part, the, the user facing part, it can just be this event stream on itself. and that's the part where it sounds like you can make decisions on who to, um, collect data from. I guess you have to agree that somebody can connect to you and get the users from your data repositories.

[00:29:32] And likewise, other people that run relays, they also have to agree to let you pull the users from theirs.

[00:29:38] Paul: Yeah, that's right. Yeah.

[00:29:41] Jeremy: And so I think the Mastodon example makes sense. And, but I wonder if the underlying ActivityPub protocol forces you to use it in that way, in like a whole full application that talks to another full application.

[00:29:55] Or is it more like that's just how people tend to use it and it's not necessarily a characteristic of the protocol.

[00:30:02] Paul: Yeah, that's a good question actually. so, you know, generally what I would say is pretty core to the protocol is the expectations about how the services interact with each other. So the mailbox metaphor that's used in ActivityPub, that design, if I reply to you, I'll update my, local database with what I did, and then I'll send a message over to your server saying, Hey, by the way, add this reply.

[00:30:34] I did this. And that's how they find out about things. That's how they find out about activity outside of their network. that's the part that as long as you're doing ActivityPub, I suspect you're gonna see reliably happening. That's that, I can say for sure that's a pretty tight requirement.

[00:30:50] That's ActivityPub. If you wanted to split it up the way we're talking about, you could, I don't know, I don't know if you necessarily would want to. Because I don't know. That's actually, I think I'd have to dig into their stack a little bit more to see how meaningful that would be. I do know that there's some talk of introducing a similar kind of an aggregation method into the ActivityPub world which I believe they're also calling a relay and to make things even more complicated.

[00:31:23] And NOSTR has a concept of a relay. So these are three different protocols that are using this term. I think you could do essentially what a search engine does on any of these things. You could go crawling around for the data, pull them into a fire hose, and then, tap into that aggregation to produce, bigger views of the network.

[00:31:41] So that principle can certainly apply anywhere. AT Protocol, I think it's a little bit, we, we focused in so hard from that on that from the get go, we focus really hard on making sure that this, the data is, signed at rest. That's why it's called the authenticated transfer protocol. And that's a nice advantage to have when you're running a relay like this because it means that you don't have to trust the relay.

[00:32:08] Like generally speaking, when I look at results from Google, you know, I'm trusting pretty well that they're accurately reflecting what's on the website, which is fine. You know, there's, that's not actually a huge risk or anything. But whenever you're trying to build entire applications and you're using somebody else's relay, you could really run into things where they say like, oh, you know what Paul tweeted the other day, you know, I hate dogs.

[00:32:28] They're like, no, I didn't. That's a lie, right? You just sneak in Little lies like that over a while, it becomes a problem. So having the signatures on the data is pretty important. You know, if you're gonna be trying to get people to cooperate, uh, you gotta manage the trust model. I know that ActivityPub does have mechanisms for signed records.

Issuers with ActivityPub identifiers

[00:32:44] Paul: I don't know how deep they go if they could fully replace that, that utility. and then Mastodon ActivityPub, they also use a different identifier system that they're actually taking a look at DIDs um, right now, I don't know what's gonna happen there. We're, we're totally on board to, you know, give any kind of insight that we got working on 'em.

[00:33:06] But at, at the moment, they use I think it's WebFinger based identifiers they look like emails. So you got host names in there and those identifiers are being used in the data records. So you don't get that continuous identifier. They actually do have to do that hey, I moved update your records sort of thing.

[00:33:28] And that causes it to, I mean, it works like decently well, but not as well as it could. They got us to the point where it moves your profile over and you update all the folks that were following you so they can update their follow records, but your posts, they're not coming right, because that's too far into that mesh of interlinking records.

[00:33:48] There's just no chance. So that's kind of the upper limit on that, it's a different set of choices and trade-offs. You're always kind of asking like, how important is the migration? Does that work out? Anyway, now I'm just kind of digging into differences between the two here.

Issues with an identifier that changes and updating old records

[00:34:07] Jeremy: So you were saying that with ActivityPub, all of the instances need to be notified that you've changed your identifier but then all of the messages that they had already received. They don't point to the new identifier somehow.

[00:34:24] Paul: Yeah. You run into basically just the practicalities of actual engineering with that is what happens, right? Because if you imagine you got a multimillion user social network. They got all their posts. Maybe the user has like, let's say a thousand posts and 10,000 likes. And that, activity can range back three years.

[00:34:48] Let's say they changed their identifier, and now you need to change the identifier of all those records. If you're in a traditional system that's already a tall order, you're going back and rewriting a ton of indexes, Anytime somebody replied to you, they have these links to your posts, they're now, you've gotta update the identifiers on all of those things.

[00:35:11] You could end up with a pretty significant explosion of rewrites that would have to occur. Now that's, that's tough. If you're in a centralized model. If you're in a decentralized one, it's pretty much impossible because you're now, when you notify all the other servers like, Hey, this, this changed. How successful are all of them at actually updating that, that those, those pointers, it's a good chance that there's things are gonna fall out of correctness. that's just a reality of it. And if, so, if you've got a, if you've got a mutable identifier, you're in trouble for migrations. So the DID is meant to keep it permanent and that ends up being the anchoring point. If you lose control of your DID well, that's it.

Managing signing keys by server, paper key reset

[00:35:52] Paul: Your, your account's done. We took some pretty traditional approaches to that, uh, where the signing keys get managed by your hosting server instead of like trying to, this may seem like really obvious, but if you're from the decentralization community, we spend a lot of time with blockchains, like, Hey, how do we have the users hold onto their keys?

[00:36:15] You know, and the tooling on that is getting better for what it's worth. We're starting to see a lot better key pair management in like Apple's ecosystem and Google's ecosystem, but it's still in the range of like, nah, people lose their keys, you know? So having the servers manage those is important.

[00:36:33] Then we have ways of exporting paper keys so that you could kind of adversarially migrate if you wanted to. That was in the early spec we wanted to make sure that this portability idea works, that you can always migrate your accounts so you can export a paper key that can override.

[00:36:48] And that was how we figured that out. Like, okay, yeah, we don't have to have everything getting signed by keys that are on the user's devices. We just need these master backup keys that can say, you know what? I'm done with that host. No matter what they say, I'm overriding what they, what they think. and that's how we squared that one.

[00:37:06] Jeremy: So it seems like one of the big differences with account migration is that with ActivityPub, when you move to another instance, you have to actually change your identifier.

[00:37:20] And with the AT protocol you're actually not allowed to ever change that identifier. And maybe what you're changing is just you have say, some kind of a lookup, like you were saying, you could use a domain name to look that up, get a reference to your decentralized identifier, but your decentralized identifier it can never change.

[00:37:47] Paul: It, it, it can't change. Yeah. And it shouldn't need to, you know what I mean? It's really a total disaster kind of situation if that happens. So, you know that it's designed to make sure that doesn't happen in the applications. We use these domain name handles to, to identify folks. And you can change those anytime you want because that's really just a user facing thing.

[00:38:09] You know, then in practice what you see pretty often is that you may, if you change hosts, if you're using, we, we give some domains to folks, you know, 'cause like not everybody has their own domain. A lot of people do actually, to our surprise, people actually kind of enjoy doing that. But, a lot of folks are just using like paul.bsky.social as their handle.

[00:38:29] And so if you migrated off of that, you probably lose that. Like your, so your handle's gonna change, but you're not losing the followers and stuff. 'cause the internal system isn't using paul.bsky.social, it's using that DID and that DID stays the same.

Benefits of domain names, trust signal

[00:38:42] Jeremy: Yeah. I thought that was interesting about using the domain names, because when you like you have a lot of users, everybody's got their own sub-domain. You could have however many millions of users. Does that become, does that become an issue at some point?

[00:39:00] Paul: Well, it's a funny thing. I mean like the number of users, like that's not really a problem 'cause you run into the same kind of namespace crowding problem that any service is gonna have, right? Like if you just take the subdomain part of it, like the name Paul, like yeah, only, you only get to have one paul.bsky.social.

[00:39:15] so that part of like, in terms of the number of users, that part's fine I guess. Uh, as fine as ever. where gets more interesting, of course is like, really kind of around the usability questions. For one, it's, it's not exactly the prettiest to always have that B sky.social in there. If we, if we thought we, if we had some kind of solution to that, we would use it.

[00:39:35] But like the reality is that, you know, now we're, we've committed to the domain name approach and some folks, you know, they kind of like, ah, that's a little bit ugly. And we're like, yeah that's life. I guess the plus side though is that you can actually use like TLD the domain. It's like on pfrazee.com.

[00:39:53] that starts to get more fun. it can actually act as a pretty good trust signal in certain scenarios. for instance, well-known domain names like nytimes.com, strong authentication right there, we don't even need a blue check for it. Uh, similarly the .gov, domain name space is tightly regulated.

[00:40:14] So you actually get a really strong signal out of that. Senator Wyden is one of our users and so he's, I think it's wyden.senate.gov and same thing, strong, you know, strong identity signal right there. So that's actually a really nice upside. So that's like positives, negatives.

[00:40:32] That trust signal only works so far. If somebody were to make pfrazee.net, then that can be a bit confusing. People may not be paying attention to .com vs .net, so it's not, I don't wanna give the impression that, ah, we've solved blue checks. It's a complicated and multifaceted situation, but, it's got some juice.

[00:40:54] It's also kinda nice too, 'cause a lot of folks that are doing social, they're, they've got other stuff that they're trying to promote, you know? I'm pretty sure that, uh, nytimes would love it if you went to their website. And so tying it to their online presence so directly like that is a really nice kind of feature of it.

[00:41:15] And tells a I think a good story about what we're trying to do with an open internet where, yeah, everybody has their space on the internet where they can do whatever they want on that. And that's, and then thethese social profiles, it's that presence showing up in a shared space. It's all kind of part of the same thing.

[00:41:34] And that that feels like a nice kind of thing to be chasing, you know? And it also kind of speaks well to the naming worked out for us. We chose AT Protocol as a name. You know, we back acronymed our way into that one. 'cause it was a @ simple sort of thing. But like, it actually ended up really reflecting the biggest part of it, which is that it's about putting people's identities at the front, you know, and make kind of promoting everybody from a second class identity that's underneath Twitter or Facebook or something like that.

[00:42:03] Up into. Nope, you're freestanding. You exist as a person independently. Which is what a lot of it's about.

[00:42:12] Jeremy: Yeah, I think just in general, not necessarily just for bluesky, if people had more of an interest in getting their own domain, that would be pretty cool if people could tie more of that to something you basically own, right?

[00:42:29] I mean, I guess you're leasing it from ICANN or whatever, but,

[00:42:33] yeah, rather than everybody having an @Gmail, Outlook or whatever they could actually have something unique that they control more or less.

[00:42:43] Paul: Yeah. And we, we actually have a little experimental service for registering domain names that we haven't integrated into the app yet because we just kind of wanted to test it out and, and kind of see what that appetite is for folks to register domain names way higher than you'd think we did that early on.

[00:43:01] You know, it's funny when you're coming from decentralization is like an activist space, right? Like it's a group of people trying to change how this tech works. And sometimes you're trying to parse between what might come off as a fascination of technologists compared to what people actually care about.

[00:43:20] And it varies, you know, the domain name thing to a surprising degree, folks really got into that. We saw people picking that up almost straight away. More so than certainly we ever predicted. And I think that's just 'cause I guess it speaks to something that people really get about the internet at this point.

[00:43:39] Which is great. We did a couple of other things that are similar and we saw varied levels of adoption on them. We had similar kinds of user facing, opening up of the system with algorithms and with moderation. And those have both been pretty interesting in and of themselves.

Custom feed algorithms

[00:43:58] Paul: So with algorithms, what we did was we set that up so that anybody can create a new feed algorithm. And this was kind of one of the big things that you run into whenever you use the app. If you wanted to create a new kind of for you feed you can set up a service somewhere that's gonna tap into that fire hose, right?

[00:44:18] And then all it needs to do is serve a JSON endpoint. That's just a list of URLs, but like, here's what should be in that feed. And then the bluesky app will pick that up and, and send that, hydrate in the content of the posts and show that to folks. I wanna say this is a bit of a misleading number and I'll explain why but I think there's about 35,000 of these feeds that have been created.

[00:44:42] Now, the reason it's little misleading is that, I mean, not significantly, but it's not everybody went, sat down in their IDE and wrote these things. Essentially one of our users created, actually multiple of our users made little platforms for building these feeds, which is awesome. That's the kinda thing you wanna see because we haven't gotten around to it.

[00:44:57] Our app still doesn't give you a way to make these things. But they did. And so lots of, you know, there it is. Cool. Like, one, one person made a kind of a combinatorial logic thing that's like visual almost like scratch, it's like, so if it has this hashtag and includes these users, but not those users, and you're kind of arranging these blocks and that constructs the feed and then probably publish it on your profile and then folks can use it, you know?

[00:45:18] And um, so that has been I would say fairly successful. Except, we had one group of hackers do put in a real effort to make a replacement for you feed, like magic algorithmic feed kind of thing. And then they kind of kept up going for a while and then ended up giving up on it. Most of what we see are actually kind of weird niche use cases for feeds.

[00:45:44] You get straightforward ones, like content oriented ones like a cat feed, politics feed, things like that. It's great, some of those are using ML detection, so like the cat feed is ML detection, so sometimes you get like a beaver in there, but most of the time it's a cat. And then we got some ones that are kind of a funny, like change in the dynamic of freshness.

[00:46:05] So, uh, or or selection criteria, things that you wouldn't normally see. Um, but because they can do whatever they want, you know, they try it out. So like the quiet posters ended up being a pretty successful one. And that one just shows people you're following that don't post that often when they do just those folks.

[00:46:21] It ended up being, I use that one all the time because yeah, like they get lost in the noise. So it's like a way to keep up with them.

Custom moderation and labeling

[00:46:29] Paul: The moderation one, that one's a a real interesting situation. What we did there essentially we wanted to make sure that the moderation system was capable of operating across different apps so that they can share their work, so to speak.

[00:46:43] And so we created what we call labeling. And labeling is a metadata layer that exists over the network. Doesn't actually live in the normal data repositories. It uses a completely different synchronization because a lot of these labels are getting produced. It's just one of those things where the engineering characteristics of the labels is just too different from the rest of the system.

[00:47:02] So we created a separate synchronization for this, and it's really kind of straightforward. It's, here's a URL and here's a string saying something like NSFW or Gore, or you know, whatever. then those get merged onto the records brought down by the client and then the client, you know, based on the user's preferences.

[00:47:21] We'll put like warning screens up, hide it, stuff like that. So yeah, these label streams can then, you know, anybody that's running a moderation service can, you know, are publishing these things and so anybody can subscribe to 'em. And you get that kind of collaborative thing we're always trying to do with this.

[00:47:34] And we had some users set up moderation services and so then as an end user you find it, it looks like a profile in the app and you subscribe to it and you configure it and off you go. That one has had probably the least amount of adoption throughout all of 'em. It's you know, moderation.

[00:47:53] It's a sticky topic as you can imagine, challenging for folks. These moderation services, they do receive reports, you know, like whenever I'm reporting a post, I choose from all my moderation services who I wanna report this to. what has ended up happening more than being used to actually filter out like subjective stuff is more kind of like either algorithmic systems or what you might call informational.

[00:48:21] So the algorithmic ones are like, one of the more popular ones is a thing that's looking for, posts from other social networks. Like this screenshot of a Reddit post or a Twitter post or a Facebook post. Because, which you're kinda like, why, you know, but the thing is some folks just get really tired of seeing screenshots from the other networks.

[00:48:40] 'cause often it's like, look what this person said. Can you believe it? You know, it's like, ah. Okay, I've had enough. So one of our users aendra made a moderate service that just runs an ML that detects it, labels it, and then folks that are tired of it, they subscribe to it and they're just hide it, you know?

[00:48:57] And so it's like a smart filter kind of thing that they're doing. you know, hypothetically you could do that for things like spiders, you know, like you've got arachniphobia, things like that. that's like a pretty straightforward, kind of automated way of doing it. Which takes a lot of the spice, you know, outta out of running moderation.

[00:49:15] So that users have been like, yeah, yeah, okay, we can do that.

[00:49:20] Those are user facing ways that we tried to surface the. Decentralized principle, right? And make take advantage of how this whole architecture can have this kind of a pluggability into it.

Users can self host now

[00:49:33] Paul: But then really at the end of the day, kind of the important core part of it is those pieces we were talking about before, the hosting, the relay and the, the applications themselves, having those be swappable in completely. so we tend to think of those as kind of ranges of infrastructure into application and then into particular client side stuff.

[00:49:56] So a lot of folks right now, for instance, they're making their own clients to the application and those clients are able to do customizations, add features, things like that, as you might expect,

[00:50:05] but most of them are not running their own backend. They're just using our backend. But at any point, it's right there for you. You know, you can go ahead and, and clone that software and start running the backend. If you wanted to run your own relay, you could go ahead and go all the way to that point.

[00:50:19] You know, if you wanna do your own hosting, you can go ahead and do that. Um, it's all there. It's really just kind of a how much effort your project really wants to take. That's the kind of systemically important part. That's the part that makes sure that the overall mission of de monopolizing, social media online, that's where that really gets enforced.

[00:50:40] Jeremy: And so someone has their own data repository with their own users and their own relay. they can request that your relay collect the information from their own data repositories. And that's, that's how these connections get made.

[00:50:58] Paul: Yeah. And, and we have a fair number of those already. Fair number of, we call those the self hosters right? And we got I wanna say 75 self hoster going right now, which is, you know, love to see that be more, but it's, really the folks that if you're running a service, you probably would end up doing that.

[00:51:20] But the folks that are just doing it for themselves, it's kind of the, the nerdiest of the nerds over there doing that. 'cause it doesn't end up showing itself in the, in the application at all. Right? It's totally abstracted away. So it, that, that one's really about like, uh, measure your paranoia kind of thing.

[00:51:36] Or if you're just proud of the self-hosting or, or curious, you know, that that's kind of where that sits at the moment.

AT Protocol beyond bluesky

[00:51:42] Jeremy: We haven't really touched on the fact that there's this underlying protocol and everything we've been discussing has been centered around the bluesky social network where you run your own, instance of the relay and the data repositories with the purpose of talking to bluesky, but the protocol itself is also intended to be used for other uses, right?

[00:52:06] Paul: Yeah. It's generic. The data types are set up in a way that anybody can build new data types in the system. there's a couple that have already begun, uh, front page, which is kind of a hacker news clone. There's Smoke Signals, which is a events app. There's Blue Cast, which is like a Twitter spaces, clubhouse kind of thing.

[00:52:29] Those are the folks that are kind of willing to trudge into the bleeding edge and deal with some of the rough edges there for pretty I think, obvious reasons. A lot of our work gets focused in on making sure that the bluesky app and that use case is working correctly.

[00:52:43] But we are starting to round the corner on getting to a full kind of how to make alternative applications state. If you go to the atproto.com, there's a kind of a introductory tutorial where that actually shows that whole stack and how it's done. So it's getting pretty close. There's a couple of still things that we wanna finish up.

[00:53:04] jeremy so in a way you can almost think of it as having an eventually consistent data store on the network, You can make a traditional web application with a relational database, and the source of truth can actually be wherever that data repository is stored on the network.

[00:53:24] paul Yeah, that's exactly, it is an eventually consistent system. That's exactly right. The source of truth is there, is their data repo. And that relational database that you might be using, I think the best way to think about it is like secondary indexes or computed indexes, right? They, reflect the source of truth.

[00:53:43] Paul: This is getting kind of grandiose. I don't tend to poses in these terms, but it is almost like we're trying to have an OS layer at a protocol level. It's like having your own

[00:53:54] Network wide database or network-wide file system, you know, these are the kind of facilities you expect out of a platform like an os And so the hope would be that this ends up getting that usage outside of just the initial social, uh, app, like what we're doing here.

[00:54:12] If it doesn't end up working out that way, if this ends up, you know, good for the Twitter style use case, the other one's not so much, and that's fine too. You know, that's, that's our initial goal, but we, we wanted to make sure to build it in a way that like, yeah, there's evolve ability to, it keeps, it, keeps it, make sure that you're getting kinda the most utility you can out of it.

Peer-to-peer and the difficulty of federated queries

[00:54:30] Jeremy: Yeah, I can see some of the parallels to some of the decentralized stuff that I, I suppose people are still working on, but more on the peer-to-peer side, where the idea was that I can have a network host this data. but, and in this case it's a network of maybe larger providers where they could host a bunch of people's data versus just straight peer to peer where everybody has to have a piece of it.

[00:54:57] And it seems like your angle there was really the scalability part.

[00:55:02] Paul: It was the scalability part. And there's great work happening in peer-to-peer. There's a lot of advances on it that are still happening. I think really the limiter that you run into is running queries against aggregations of data. Because you can get the network, you know, BitTorrent sort of proved that you can do distributed open horizontal scaling of hosting.

[00:55:29] You know, that basic idea of, hey, everybody's got a piece and you sync it from all these different places. We know you can do things like that. What nobody's been able to really get into a good place is running, queries across large data sets. In the model like that, there's been some research in what is, what's called federated queries, which is where you're sending a query to multiple different nodes and asking them to fulfill as much of it as they can and then collating the results back. But it didn't work that well. That's still kind of an open question and until that is in a place where it can like reliably work and at very large scales, you're just gonna need a big database somewhere that does give the properties that you need. You need these big indexes. And once we were pretty sure of that requirement, then from there you start asking, all right, what else about the system

[00:56:29] Could we make easier if we just apply some more traditional techniques and merge that in with the peer-to-peer ideas? And so key hosting, that's an obvious one. You know, availability, let's just have a server. It's no big deal. But you're trying to, you're trying to make as much of them dumb as possible.

[00:56:47] So that they have that easy replaceability.

Moderation challenges

[00:56:51] Jeremy: Earlier you were talking a a little bit about the moderation tools that people could build themselves. There was some process where people could label posts and then build their own software to determine what a feed should show per a person.

[00:57:07] Paul: Mm-Hmm

[00:57:07] Jeremy: But, but I think before that layer for the platform itself, there's a base level of moderation that has to happen.

[00:57:19] Paul: yeah.

[00:57:20] Jeremy: And I wonder if you could speak to, as the app has grown, how that's handled.

[00:57:26] Paul: Yeah. the, you gotta take some requirements in moderation pretty seriously to start. And with decentralization. It sometimes that gets a little bit dropped. You need to have systems that can deal with questions about CSAM. So you got those big questions you gotta answer and then you got stuff that's more in the line of like, alright, what makes a good platform?

[00:57:54] What kind of guarantees are we trying to give there? So just not legal concerns, but you know, good product experience concerns. That's something we're in the realm of like spam and and abusive behavior and things like that. And then you get into even more fine grain of like what is a person's subjective preference and how can they kind of make their thing better?

[00:58:15] And so you get a kind of a telescoping level of concerns from the really big, the legal sort of concerns. And then the really small subjective preference kind of concerns. And that actually that telescoping maps really closely to the design of the system as well. Where the further you get up in the kind of the, in that legal concern territory, you're now in core infrastructure.

[00:58:39] And then you go from infrastructure, which is the relay down into the application, which is kind of a platform and then down into the client. And that's where we're having those labelers apply. And each of them, as you kind of move closer to infrastructure, the importance of the decision gets bigger too.

[00:58:56] So you're trying to do just legal concerns with the relay right? Stuff that you objectively can, everybody's in agreement like Yeah, yeah, yeah. You know, no bigs don't include that. The reason is that at the relay level, you're anybody that's using your relay, they depend on the decisions you're making, that sort of selection you're doing, any filtering you're doing, they don't get a choice after that.

[00:59:19] So you wanna try to keep that focus really on legal concerns and doing that well. so that applications that are downstream of it can, can make their choices. The applications themselves, you know, somebody can run a parallel I guess you could call it like a parallel platform, so we got bluesky doing the microblogging use case, other people can make an application doing the microblogging use case. So there's, there's choice that users can easily switch, easily enough switch between, it's still a big choice.

[00:59:50] So we're operating that in many ways. Like any other app nowadays might do it. You've got policies, you know, for what's acceptable on the network. you're still trying to keep that to be as, you know, objective as possible, make it fair, things like that. You want folks to trust your T&S team. Uh, but from the kind of systemic decentralization question, you get to be a little bit more opinionated.

[01:00:13] Down all the way into the client with that labeling system where you can, you know, this is individuals turning on and off preferences. You can be as opinionated as you want on that letter. And that's how we have basically approached this. And in a lot of ways, it really just comes down to, in the day to day, you're the moderation, the volume of moderation tasks is huge.

[01:00:40] You don't actually have high stakes moderation decisions most of the time. Most of 'em are you know pretty straightforward. Shouldn't have done that. That's gotta go. You get a couple every once in a while that are a little spicier or a policy that's a little spicier. And it probably feels pretty common to end users, but that just speaks to how much moderation challenges how the volume of reports and problems that come through.

[01:01:12] And we don't wanna make it so that the system is seized up, trying to decentralize itself. You know, it needs to be able to operate day to day. What you wanna make is, you know, back pressure, you know, uh, checks on that power so that if an application or a platform does really start to go down the wrong direction on moderation, then people can have this credible exit.

[01:01:36] This way of saying, you know what, that's a problem. We're moving from here. And somebody else can come in with different policies that better fit people's people's expectations about what should be done at, at these levels. So yeah, it's not about taking away authority, it's about checking authority, you know, kind of a checks and balances mentality.

[01:01:56] Jeremy: And high level, 'cause you saying how there's such a high volume of, of things that you know what it is, you'd know you wanna remove it, but there's just so much of it. So is there, do you have automated tools to label these things? Do you have a team of moderators? Do they have to understand all the different languages that are coming through your network?

[01:02:20] Yes, yes, yes and yes. Yeah. You use every tool at your disposal to, to stay on top of it. cause you're trying to move as fast as you can, folks. The problems showing up, you know, the slower you are to respond to it, the, the more irritating it is to folks. Likewise, if you make a, a missed call, if somebody misunderstands what's happening, which believe me, is sometimes just figuring out what the heck is going on is hard.

[01:02:52] Paul: People's beefs definitely surface up to the moderation misunderstanding or wrong application. Moderators make mistakes so you're trying to maintain a pretty quick turnaround on this stuff. That's tough. And you, especially when to move fast on some really upsetting content that can make its way through, again, illegal stuff, for instance, but more videos, stuff like that, you know, it's a real problem.

[01:03:20] So yeah, you're gotta be using some automated systems as well. Clamping down on bot rings and spam. You know, you can imagine that's gotten a lot harder thanks to LLMs just doing text analysis by dumb statistics of what they're talking about that doesn't even work anymore.

[01:03:41] 'cause the, the LLMs are capable of producing consistently varied responses while still achieving the same goal of plugging a online betting site of some kind, you know? So we do use kind of dumb heuristic systems for when it works, but boy, that won't work for much longer.

[01:04:03] And we've already got cases where it's, oh boy, so the moderation's in a dynamic place to say the least right now with, with LLMs coming in, it was tough before and now it's, now it's real interesting. Uh, so you use everything you can and you keep playing the cat and mouse game. And that's that.

[01:04:26] Jeremy: Are the tools that you use were they all built in-house or are there...

[01:04:31] Paul: Not all of them. We do use a couple of services that are out there. For one with the CSAM stuff, there's a company called Thorn that is actually set up to handle that. Obviously it's a very sensitive kind of thing. They maintain these hash fingerprint databases that we check in every post, Hey, this is on your database.

[01:04:58] We also use a service called Hive to help us automatically detect pornography and gore. we're still tuning that one, but it hits significantly. It's correct more than it's incorrect. And we encourage users to self label their stuff. 'cause you know good actors don't want to be posting NSFW to people that don't wanna get it.

[01:05:23] We do see people do it, but we still need to have that extra check. And so we're running that kind of detection on it and things like that. And then we got a couple of in-house things, something called automod which you could find the source for. That's just sitting there and it's a Go code base that's sitting there and running all kinds of heuristics trying to detect kinds of patterns of behavior that we've identified as being a problem.

[01:05:48] Often that has to do with spam and then it's people.

[01:05:53] Jeremy: I think with any application that has user generated content, it just seems like such a huge challenge to deal with.

[01:06:06] Paul: It is, it really is. And, it's the thing, you know, it shows up in a lot of places. What we're talking about is kind of the, what's funny is what we're talking about is the obvious way that it shows up. And even that it's not really apparent to the average person just how much that really is.

[01:06:23] It's a pretty big task, but it also makes its way into product development. And is one of those hidden costs that shows up all over the place? There's that meme about, oh, I can build Twitter at a weekend, kind of thing.

[01:06:45] Jeremy: Right.

[01:06:46] Paul: Which, there's a lot of reasons why, building a a microblogging app, had a lot of time to reflect on how many weekends it was taken to build it, you know, and why, why it took longer than a weekend for us.

[01:07:04] Certainly targeting a lot of different platforms, will do it to you. React native makes that a lot easier these days and I give them a plug. They've been great. The moderation, the safety, that, that part of the design that you gotta work into all new features, that is a real hidden cost of product development that you gotta take seriously from the get go.

[01:07:28] And you gotta look at every, everything you do and say, what, how's this gonna get messed with where people are gonna do with this? And I think our track record has been pretty good. There're definitely moments where we get something out and go, ah, you know, didn't think of that. Which is, that's just how it goes.

[01:07:45] But that shows up everywhere.

[01:07:48] Jeremy: I didn't even think about the fact where you had mentioned the LLMs, where now people just have the power to generate posts that are indistinguishable from people. So I don't know. I don't know where we're going here, but...

[01:08:04] Paul: Somewhere new.

[01:08:10] Jeremy: Any closing thoughts or other things you want to mention?

[01:08:13] Paul: Yeah, this has been an interesting project. I'm happy with how it's gone so far for sure. The first question that we had when we started out was just whether or not it would scale, because historically folks that work in the decentralization space, that's what jams us up pretty bad. In obvious ways, peer to peer.

[01:08:31] But even blockchains actually got jammed up real bad on scaling. And that's kind of why the NFT bubble popped as hard as it did when it did was everybody knows the transaction fees got really high. It's 'cause the system got too crowded. So making these big scale systems that have open operation is a real challenge.

[01:08:51] And that's the part that so far, we're up to now 11 million registered accounts, and we sit at somewhere around 1.5 to 2 million DAU and we haven't identified anything that makes us think, oh, that's a scaling limit for this design. We're pretty sure that it can make its way all the way to the kind of scale that people expect out of this.

[01:09:12] So I'm really happy to see just kind of from the engineering side of things, like getting that validated has felt pretty good about this project. And so I'm, I'm pretty curious to see kind of how this all unfolds moving forward. I'm feeling very happy with the outcome so far and just actually genuinely interested on a personal level to see how, how these dynamics play out over time.

[01:09:36] Really looking forward to seeing somebody else start to operate full nodes like we are. I think that'll be a great milestone that we're trying to head for. And hopefully in a year or two, I think we kind of need to make a little social network in a box, make it a little bit easier for somebody to just press a button and get the whole thing going.

[01:09:55] That's a milestone in the future I'm looking forward to. other than that. a lot of my day to day just comes down to, A lot of it's just running the social app and learning all the lessons that come along with that, which has been a trip in and of itself.

[01:10:07] But you know, the, all the source code, um, is available on GitHub. So if you wanna see how we do it, it's right there. And if you're an application developer, I think in particular, the social app repo is pretty interesting to look at. That's a React native repository. You can see the full thing.

[01:10:26] Paul: Lots of good forkable code in there if you're building an app. So I would definitely check that out. All the protocol code is up there. We got specs up on the app protocol website, so atproto.com, you can check all that out. Some good intro material, things like that. You want to learn a little bit more about how all this works?

[01:10:43] Find out on atproto.com. Yeah, so it's it's all up there. It's all ready to, ready to look at.

[01:10:50] Jeremy: And if people wanna follow you or see what you're up to,

[01:10:53] Paul: You're gonna jump on bluesky and on pfrazee.com.

[01:10:57] Jeremy: Paul, thank you so much for taking the time.

[01:11:00] Paul: Absolutely. Thank you.

Mayra Navarro on Getting to RubyConf (RubyConf 2023)

Thu, 30 Nov 2023 03:43:00 -0000

Mayra Navarro is an organizer of WNB.rb and Ruby Perú.

Mayra shares how the Ruby community helped her get to RubyConf, going from project manager to developer, and the different ways people learn and communicate.

This is the final interview recorded at RubyConf 2023 in San Diego.

Groups

People

Conferences

Transcript

You can help correct transcripts on GitHub.

[00:00:00] Jeremy: I hope you've been enjoying the conversations from RubyConf. Before we get started. I just want to say thank you to everyone. I met at the conference, all the guests who were so generous with their time. And to Irene from RubyConf for arranging a space and helping me connect with guests.

[00:00:16] This final interview is with Mayra Navarro. She's an organizer of the Ruby community in Lima, Peru and a member of the women and non-binary Ruby online community. She's going to tell us how the community pulled together. Both friends of hers and strangers she had never met. To get her to RubyConf this year, we start the story in April where she's just finished attending the Ruby on rails conference, RailsConf in Atlanta.

Getting to RubyConf

[00:00:44] Mayra: So the thing is, in the last RailsConf, the last day that I was in the US, um, the day that I was returning to Peru, I got fired. (laughs) Yes. So I was with all the stress.

[00:01:01] All the luggages that I had to pick or maybe overweight or something like that. And then I received that

[00:01:08] Jeremy: Oh my gosh.

[00:01:09] Mayra: And I, oh my gosh, what I can do?

[00:01:13] Jeremy: That's terrible.

[00:01:15] Mayra: It was awful. Since that, I think it was May 1st or something like that. And I was looking for job like everybody else who were fired all these times It was a difficult time for me. my plan was just before September I get a job so I have enough money and not using my, my savings for, for going to the RailsConf. Sorry, the RubyConf. but, uh, eventually at the end of September, October, I didn't get anything.

[00:01:44] I'm Christian. So I, well, God doesn't want me to come here.

[00:01:50] it there must be a reason, but there was something inside me that. I just to have to do something else. and I thank to my mom because she's someone that is always fighting for what she wants.

[00:02:02] So I say, okay, I went to sleep that day that I say, okay, maybe I don't want to go. So next day I have the idea. Maybe you didn't use your last card. There is something else. That is something that I have from my mom. I can feel that. I say, what if you ask for money? Well, like a fundraising, I learned about that word later, and I say what, what else could you lose you don't have anything to lose right now.

[00:02:30] So I say, okay, I'm going to write something. I asked Cody Norman, that is someone that I really appreciate right now. I asked him about suggestions, if that's a good idea or no, maybe not. He said, yes, you can do it. Uh, and I asked him if he could help me with the speech because I tried to write something and also I'm not good at writing things on Twitter and especially asking for money because I had to be open myself and be vulnerable to do that.

[00:03:02] And it was like, uh, the last break for myself

[00:03:05] a... I sent the speech to Cody, he helped me to update some things that I have to just improve. And I did it. I, also, I didn't know how to open a GoFundMe campaign because that's only for the US and Mexico. I think it doesn't exist in Peru.

[00:03:23] So he said, Oh, there is another page that you can go.

[00:03:27] I did it. So I just published that. I didn't open that until three, four hours later, because I was like, no, I don't want to see. And then I, I open it and I started to contact with the people who.

[00:03:44] Well, who knows me because I like to be connected with a lot of people. I'm part of the FL RB even being in Peru, I am part of the FL RB. I go attend to the Atlanta Ruby Group. I go, I, I know a lot of people because of the conference. I try to help to the woman and non binary community also. I am organizer of the Ruby Peru, but I didn't want to ask them money for them, but I have some close friends from the conference that I, that I go for all.

[00:04:13] All these two years. So they helped me to share that. And in two days, I got the money.

[00:04:20] It was like a, I can't believe it. It is what, and I'm not good open myself for things like that. I love helping people, but it's difficult when I, you have to help yourself. So. All these people who I could see their names because it's, it's transferred to PayPal.

[00:04:39] So I could see their name is like, uh, I really appreciate the thing that they don't know me. Some people, they don't know me, but, but I know them. I know who you are, if you're listening to this and I thank you appreciate for doing that. I also had the opportunity because I need to talk about this. I got a ticket from HexDev, uh, from Stephanie.

[00:05:01] Jeremy: Oh... Hexdevs. Yeah.

[00:05:02] Mayra: Yes. And, also I applied for the volunteer positions, just in case But I got the volunteer position. So what I did is, besides all my expense, I mean, that trip and also the hotel, expenses, I don't know, does it work? I, I said, I'm going to give this ticket that I have left to one of the women in the wnb community.

[00:05:25] So I did that and say, I have a ticket and also I can share the room. I don't want to say her name because She's trying not to be too connected to social media, but she, she accepted sharing the room with me.

[00:05:40] Jeremy: Yeah.

[00:05:43] Mayra: So she's already here with me and I feel so happy because People not only helping me, they helping me to help and it feel like, wow.

[00:05:53] Yeah. And that is, that is my story. And I still, well, I accepted to come to talk about this today because I received a job offer in the morning that I accepted.

[00:06:06] So I wanted to. Send a happy ending for all my story about this. Yeah. And especially because I know that in the next conference I'll be with my own money, I could say, expenses.

Asking for help

[00:06:20] Jeremy: So was this all this year, the RailsConf? That was this year where you, you went to the conference, you enjoyed the talks and you were employed. And so it was the day that it was over that you, you found out that you got laid off. Wow. it's this, you have this high, right, you've met all your friends and, you know, you're learning all this stuff and you're really excited.

[00:06:42] And then you get this notice and it's like, what, what happened?

[00:06:47] Mayra: Yeah. That is what's happening.

[00:06:50] Jeremy: then you, you kind of, like you said, you opened yourself up but yeah, it takes courage to say like, Hey, I need, you know, I need some help.

[00:06:57] Mayra: Yeah, it's, this is something that I learned about this is always asking for help. This is something that I have been I bring into my life is always asking for help. I know as a woman, uh, as a woman, I have the thing that Try to be strong sometimes. I can do it by myself. I don't need help.

[00:07:16] I don't need help, but sometimes you need, as human, you can open yourself. It's not something related to

[00:07:23] gender it's more like humanity. That's how I feel right now. That is the feeling that I have and that is what is going to keep with me. For the rest of my life, I know. (laughs)

[00:07:33] Jeremy: Yeah. Cause I, I think when, when people don't know, they, they might assume because you're so involved, right. With your, your local community and then the community internationally where people just assume that, Hey, Mayra is doing great. Right? She's, she's got no problems, no issues, and, there's just no way for people to know unless, you know, you, you share, and then that way people can help you,

[00:07:58] Mayra: Yes.

[00:07:59] Jeremy: That's a great story. I'm, I'm, I'm glad that, like, getting laid off is never good, right? That's never fun. But at least... Uh, things positive came out of it in terms of people coming out to support you, but also like you said, being able to support another, you know, another member of our community,

[00:08:19] Mayra: And I would do it, and I would do it again.

[00:08:21] I know that.

Attending conferences

[00:08:23] Jeremy: You know, now that you've gotten to come out, how, how has the, the conference been for you? Like,

[00:08:29] Mayra: It was really good. I feel less insecure than the last time that I was here. (laughs) Actually, my first RubyConf was RubyConf Mini in Rhode Island. So this is like a, the Ruby real not the real one. It's just my more it's different.

[00:08:48] Mayra: But, uh, the same time is. It's closer. That's how I feel it. I mean, this is my fifth conference. My first one was in Colombia. RubyConf Colombia. And I got a ticket as a scholarship. but until now I can say that it's like a, the feeling of the Ruby community, not only Ruby on Rails. Ruby community is like a It's really positive. It has changed my life so much since the first time that I joined to community that it's, I'm so happy to be developer instead of what, you know, everybody switched jobs.

[00:09:20] I did too so it's like, uh, I won't regret.

From project manager to developer

[00:09:24] Jeremy: Hmm. Tell, tell me a little bit about that. How did you get interested in Ruby or, or have been involved with the community?

[00:09:31] Mayra: I wasn't, I it was because money. Because it is.

[00:09:34] Jeremy: That's a good reason. That's a good reason.

[00:09:36] Mayra: Yeah. The thing is, I am graduated, uh, of the university. Uh, in Peru, but I was project manager before, well, I've switched a lot of careers because I was looking what I wanted to do and eventually I was project manager also. but I hated me in that position wasn't really good and it wasn't the company.

[00:09:59] I knew it was me. I wasn't satisfied with my job and also I didn't like that, uh, working from nine to six every day in an office or something like that. It wasn't for me. So I remember that someone, one friend on Facebook shared something like a bootcamp that was about to start in Peru, that they were teaching Ruby on Rails.

[00:10:21] I didn't know what was Ruby on Rails at the time. And then, and also React and JavaScript because, and you have to pay only if you get your first job.

[00:10:32] Jeremy: Oh, this is like a bootcamp.

[00:10:34] Mayra: Yeah, it's bootcamp, the first one that I met, I know, but that time there was someone called Laboratoria, but it's only related to JavaScript, but this one's a little bit more complete. So I apply, I didn't know that I could make it. I did it and it was an intensive bootcamp, six months from nine to six, but also I remember I didn't leave.

[00:10:59] The place until 11 o'clock PM, because we were all 19 people there. So we really wanted to change our future. When everything ends, there was a moment when I, I could feel that Ruby also, especially Ruby on Rails, it was like, uh, something that I really like, uh, the syntax. Things like that. And also our teachers used to say, I can see who could be backend, who's going to be frontend.

[00:11:31] I consider myself full stack, but she, she used to say that. I remember that.

[00:11:36] Jeremy: So, which one were you?

[00:11:38] Mayra: I was the Ruby side. The backend.

[00:11:43] Mayra: Then I got an internship in the same companies who that was promoting the, the bootcamp. After three months, the, the, the internship ended because it was part of the contract but I wanted really to work in a place that they had Ruby on Rails.

[00:12:05] So that's what I got. It took me more time than the rest of my friends, it's maybe it was like. four of us got a job in Ruby on Rails, uh, and I got mine. I remember my first job, full time job for Ruby on Rails was for the government of Peru. Actually, they use, they use Ruby on Rails for, CMS that they manage, that is called gov.pe. So I started working there. So it was a nice experience and I love, and I learned a lot about that experience also. So that is my story how it started.

[00:12:44] Jeremy: Yeah. So you had talked about your friend and your friend referred you to the bootcamp, had you ever done any programming or anything related?

[00:12:54] Mayra: When I was project manager, I had the opportunity to, to manage developers. I have developers in charge. So, but I had the kind of person that even I was. Your product manager, I try to help you to solve some things, like something that I say is a pseudocode. Instead of coding, I tried to give you the pseudocode that the things that you could do. So with that, maybe I can help. Well, my, my goal wanted to help you to unlock if you, you, you got stuck in something.

[00:13:30] So with that, I just have a little bit of knowledge of what to do, but I. I felt that I hadn't the tools or I hadn't the skill to do that. That's why I decided to, to study in a bootcamp because they can teach me about the, that kind of tools because I couldn't study by myself. I couldn't. I can understand how the things goes right now, but at that time I, I was, I was lost.

[00:13:56] Jeremy: But that's like, the, the skill that you already had as a project manager, being able to write the pseudocode and, and talk to your developers about the type of problem they're, they're trying to solve. That, that's a really important skill already. So I think, like, going into the bootcamp, nothing was totally new.

[00:14:15] Right? I, I think that's really great that, that you got to see that beforehand and, and hopefully get a sense for like, that you might. Enjoy this, this sort of thing too, right?

[00:14:25] Mayra: I love solving puzzles, so that was puzzle for me. I started with Code Wars. I know everybody started with that, but it's like a resolving puzzles. I need something there. And one of the things I really love I love helping people. I discovered that when I was helpdesk before. eventually all this time, even this time without job, I realized I can bring that.

[00:14:47] Oh, I being more. aware about that I can help people just coding. So that's, that's what I've been doing all these months because I try to understand about gems or learning things more, but my focus is always going to be helping people.

[00:15:03] Jeremy: I, I think that's really great that that's something that really appeals to you because that's something everybody needs.

[00:15:12] Mayra: You know, the word that comes to my mind all the time that I say this is server. If you think about the word server, it's what I do. It receives something and gives you something. It's all the time. It's, you know, I know it's a machine or something like that, but if you think about the word, you are receiving something and giving something.

[00:15:36] In all the time you are waiting for a, for a request to give something. So that is the word that it's, is for me, it's kind of helping people.

[00:15:45] Jeremy: Hmm. we all serve one another in, in, in one way or another. Yeah. the boot camp, you said it was, uh, six months, and... You were, you were staying till 11 at night. So what, what was that experience like?

How different people communicate and learn

[00:16:02] Mayra: it's a nightmare, don't do it. (laughs) No, it was fun. that bootcamp changed my life. Before that bootcamp, I would say, I'm not going to say I was lack of some of the skills. The thing is, I didn't know how to, how to communicate. And one of the things that I learned besides that you need English or things like that, it's more about how to communicate with people because, there are multiple ways.

[00:16:28] You can't talk in a way, for example, there's something that I'm always going to remember about it is you would prefer. WhatsApp, for example, and I would prefer Slack, or maybe voice, voice records instead of writing, or maybe an email. So, there must be a point between you and me. that might help us to communicate in a good way.

[00:16:53] I, me, myself, especially, don't have to be forced people to do it in my way. It's a way of two, for two, you and me, and the best way, and try to do the same for the rest of the team, for example, or the rest of the people. Maybe they don't prefer this in this way, maybe another way. I have to be open to that.

[00:17:12] Before the bootcamp I didn't know anything about it, but, and I tried to do it in my way. And then, right, thinking about it, it was a little bit selfish, but I need to learn. I need to be aware about

[00:17:25] Jeremy: You're, you're referring to the way people. Communicate, or the ways people learn, or...

[00:17:31] Mayra: Communicate actually. If we talk about the people, how the people learn. I am the example of, I am bad at listening book, podcasts.

[00:17:41] Jeremy: Uh... Oh (laughs)...

[00:17:42] Mayra: I'm sorry, I, I have ADHD. So it's hard for me to follow videos and podcasts because I have to. Pause, uh, Rewind, and Play again. And this is something when I miss so many ideas. So I prefer reading blogs or maybe transcriptions because it helps me just do reading.

[00:18:04] And then return and continue reading when I can't understand something. it was difficult for me just to understand also that people prefer videos. Yes, it's not only my way to learn, it's their way to learn. And also we need to be open to that. Even when I used to, I mentor a couple of... people So, I had to be open to that also.

[00:18:29] I am the kind of person you can write me at 2 a. m. and if I am awake, I'm going to answer you because maybe you are desperate for an answer at the time, but I can understand there are no people who are not. Who doesn't like, don't like that, so I try to be open to that or maybe improve our communications or maybe give some rules and not to think that everything is personal, right?

[00:18:56] It's just, I hope the best of you. And I try that you get the best of me. (laughs)

[00:19:03] Jeremy: Yeah, it's understanding their expectations, what they feel comfortable with, so that you know, It's okay if I send Mayra a message at 2 AM, but if I send someone else a message at 2 AM, it, it, maybe their phone dings and, you know, now they're distracted and, yeah, so, that, that makes sense.

[00:19:26] Mayra: Yeah. If you, for example, I, I, I'm going to ask you because I learned that, can I send you a message at that time? But I, even I have to think about the time zone.

[00:19:36] Jeremy: Yeah, oh, that's true, that's true.

[00:19:38] Mayra: For example, because now I, I would have think about just my friends from Peru, but now I have friends all over the world because of the Woman Non-binary community.

[00:19:49] So I have to think about things like that when I write. So what, all the things that has been useful for me is, for example, is like when you schedule a message that has been useful for me when I have to ask or sending messages.

[00:20:03] Jeremy: It's interesting that you mentioned how with learning you prefer blogs and, and books and things like that because this may be a generalization, but maybe with, newer developers or, or younger people, a lot of them really like the video form Yesterday I was, interviewing, David Copeland, who he, he wrote a book about, sustainable web development with Ruby on Rails, and, yeah, we were talking a little bit about, it's like, so many people want video, is there a, is there still a place for me with my, you know, my, my book?

[00:20:39] And stuff like that, and I think it's important to remember that there's people like yourself, and, um, I, I'm partly the same way, like, I like to be able to have the text so that I can read it at my own pace and copy and paste stuff and stuff like that. But you're right that everybody learns differently, so it, it makes sense for there to be the videos, for there to be, podcasts and blog posts.

[00:21:05] There's different people who learn different ways

[00:21:07] Mayra: And also, some people, including me, needs to pay for something if you want to learn something, because sometimes when it is free, you won't have the value that

[00:21:22] Mayra: it has.

[00:21:23] Jeremy: I totally understand that, yeah.

Accessibility for videos and podcasts

[00:21:26] Mayra: And there is something I would like to mention after you talk about this is I open to videos and podcasts, I can't take my time because I have now a lot of friends who, who create this type kind of content, but I like to remember that there are people with other difficult things.

[00:21:45] No, it's, it's related to accessibility because when you have deaf people, they need. Transcriptions, they need closed captions. So maybe you are in a place with a lot of noise and it can help you, even if the video has closed captions, so people can read it. So it is something like, it's not me. It's more to be more open to people who are really has a disability.

[00:22:12] That's a word that was like, so yeah. Or maybe they, their main language is not English and a lot of the content or the majority of the content about coding is in English. So when you have the transcriptions or you have the blog, you can translate it. And it's easy for that is access to them.

[00:22:31] Jeremy: Yeah, that's definitely true. And I think even past people who aren't native speakers or have a disability, if you aren't in either of those categories, there's still a lot of people who they want to have a transcript or outside. Ruby or programming, people who watch movies and TV shows, a lot of them turn on the subtitles and they're native English speakers.

[00:22:55] The dialogue is in English. They still want to see the subtitles.

[00:22:59] Yeah, I think it's becoming very common. So, to your point when you have video having a way for people to still get the information another way, I think is helpful for everyone. Yeah.

[00:23:15] Mayra: And it isn't too difficult nowadays because now you have AI or maybe, programs that can get you the, at least the closest words.

[00:23:24] Jeremy: It gets you maybe 90 percent of the way there, so it definitely saves time, but I will say it is still work.

[00:23:33] Mayra: Yes. Yes. It still work. Actually, it's because it could be a couple of words that need that maybe the, the program needs to improve. You can improve how the program, uh, translated, but it is something behind the meaning that you still can keep.

[00:23:49] Jeremy: It being there and not perfect is kind of better than having nothing, I guess, yeah.

What's next

[00:23:55] Jeremy: Now that you've spent time at RubyConf, like, what are, like what are your plans next?

[00:24:02] Mayra: There is a story behind all this, but I'm going to, the TLDR

[00:24:06] Jeremy: Okay.

[00:24:08] Mayra: Could be easy because I have a plenty of time without working to make a lot of thoughts in my mind. So it's just like, uh, I would like to explore more about Ruby, Ruby without Rails, something like that. So one of the things that I would like to do after this is just.

[00:24:27] I would like to investigate more about the use of Ruby out of what is what application. It is something I was talking like a couple of hours ago, because there I found a blog about how is, how are the conference. The Ruby conference in Japan. So it was really interesting. It was, an article that is really old, but it got my attention because I never thought about it because I came from a boot camp and it was like a.

[00:24:59] There is something else. So I could see a couple of talks about talking about, for example, Rack.

[00:25:06] Mayra: So I will like how, oh, for example, we have another talk about how to create desktop applications with Ruby. So that is something that I would like to investigate. I would like to try also with the Ruby Peru community.

[00:25:20] We decided to choose to investigate more about it and prepare talks about it in Spanish, because that is the mission. Our vision of our community is to create content in Spanish.

[00:25:34] And also I was planning to give a lightning talk, but I wasn't ready yet to do it because I was nervous about, because I applied for jobs or things like that, I hadn't the time to prepare that, but actually I would like. I dunno if you heard about Dave Kimura and Drifting Ruby?

[00:25:52] Jeremy: Uh, yes. Yes. They, they do the videos, right.

[00:25:54] Mayra: He has a, blog on how to implement some kind of, uh, when your test test fails, you saw the light can change to red or green based on that The test that you are running. So it was really interesting. It isn't related to rails, but it, it is based on ruby so it's like, wow, I want to learn how to do that.

Woman and non-binary community (WNB-rb.dev)

[00:26:17] Jeremy: Anything else you want to mention or think we should have talked about?

[00:26:22] Mayra: Yeah, because I am a member of the Women and Non-binary community. So if you are a woman or a non-binary person, you're invited to our community, we are open to, to you and we have meetups monthly. Uh, we have book clubs and we are always open to new ideas to share, to help you to do. that's it, I think. Yes.

[00:26:47] Jeremy: And where can they find you if they're interested in that?

[00:26:51] Mayra: Uh, we have a webpage,

[00:26:53] wnb-rb.dev That is dev in English, I think.

[00:27:00] Yes. And there you can find us. And also there's a form, where you can give us your, your info. It won't be shared only. No, it won't be shared only for the organizers. And that's all. We keep your privacy there.

[00:27:17] Jeremy: Very cool. So

[00:27:19] wnb-rb.dev

[00:27:23] Mayra: yes, it is.

[00:27:26] Jeremy: Well, Myra thank you so much for spending time to talk with me today.

[00:27:30] Mayra: Thank you and sorry for my English. Ha ha ha ha

[00:27:34] Jeremy: Your English is good, your English is much better than my Spanish.

[00:27:38] Mayra: Okay.

Mike Perham on Keeping it solo (RubyConf 2023)

Tue, 21 Nov 2023 06:52:45 -0000

Mike Perham is the creator of Sidekiq, a background job processor for Ruby. He's also the creator of Faktory a similar product for multiple language environments.

We talk about the RubyConf keynote and Ruby's limitations, supporting products as a solo developer, and some ideas for funding open source like a public utility.

Recorded at RubyConf 2023 in San Diego.

A few topics covered:

Sidekiq (Ruby) vs Faktory (Polyglot)
Why background job solutions are so common in Ruby
Global Interpreter Lock (GIL)
Ractors (Actor concurrency)
Downsides of Multiprocess applications
When to use other languages
Getting people to pay for Sidekiq
Keeping a solo business
Being selective about customers
Ways to keep support needs low
Open source as a public utility

Mike

Ruby

Transcript

You can help correct transcripts on GitHub.

Introduction

[00:00:00] Jeremy: I'm here at RubyConf San Diego with Mike Perham. He's the creator of Sidekiq and Faktory.

[00:00:07] Mike: Thank you, Jeremy, for having me here. It's a pleasure.

Sidekiq

[00:00:11] Jeremy: So for people who aren't familiar with, I guess we'll start with Sidekiq because I think that's what you're most known for. If people don't know what it is, maybe you can give like a small little explanation.

[00:00:22] Mike: Ruby apps generally have two major pieces of infrastructure powering them. You've got your app server, which serves your webpages and the browser. And then you generally have something off on the side that... It processes, you know, data for a million different reasons, and that's generally called a background job framework, and that's what Sidekiq is.

[00:00:41] It, Rails is usually the thing that, that handles your web stuff, and then Sidekiq is the Sidekiq to Rails, so to speak.

[00:00:50] Jeremy: And so this would fit the same role as, I think in Python, there's celery. and then in the Ruby world, I guess there is, uh, Resque is another kind of job.

[00:01:02] Mike: Yeah, background job frameworks are quite prolific in Ruby. the Ruby community's kind of settled on that as the, the standard pattern for application development. So yeah, we've got, a half a dozen to a dozen different, different examples throughout history, but the major ones today are, Sidekiq, Resque, DelayedJob, GoodJob, and, and, and others down the line, yeah.

Why background jobs are so common in Ruby

[00:01:25] Jeremy: I think working in other languages, you mentioned how in Ruby, there's this very clear, preference to use these job scheduling systems, these job queuing systems, and I'm not. I'm not sure if that's as true in, say, if somebody's working in Java, or C sharp, or whatnot. And I wonder if there's something specific about Ruby that makes people kind of gravitate towards this as the default thing they would use.

[00:01:52] Mike: That's a good question. What makes Ruby... The one that so needs a background job system. I think Ruby, has historically been very single threaded. And so, every Ruby process can only do so much work. And so Ruby oftentimes does, uh, spin up a lot of different processes, and so having processes that are more focused on one thing is, is, is more standard.

[00:02:24] So you'll have your application server processes, which focus on just serving HTTP responses. And then you have some other sort of focused process and that just became background job processes. but yeah, I haven't really thought of it all that much. But, uh, you know, something like Java, for instance, heavily multi threaded.

[00:02:45] And so, and extremely heavyweight in terms of memory and startup time. So it's much more frequent in Java that you just start up one process and that's it. Right, you just do everything in that one process. And so you may have dozens and dozens of threads, both serving HTTP and doing work on the side too. Um, whereas in Ruby that just kind of naturally, there was a natural split there.

Global Interpreter Lock

[00:03:10] Jeremy: So that's actually a really good insight, because... in the keynote at RubyConf, Mats, the creator of Ruby, you know, he mentioned the, how the fact that there is this global, interpreter lock,

[00:03:23] or, or global VM lock in Ruby, and so you can't, really do multiple things in parallel and make use of all the different cores. And so it makes a lot of sense why you would say like, okay, I need to spin up separate processes so that I can actually take advantage of, of my, system.

[00:03:43] Mike: Right. Yeah. And the, um, the GVL. is the acronym we use in the Ruby community, or GIL. Uh, that global lock really kind of is a forcing function for much of the application architecture in Ruby. Ruby, uh, applications because it does limit how much processing a single Ruby process can do. So, uh, even though Sidekiq is heavily multi threaded, you can only have so many threads executing.

[00:04:14] Because they all have to share one core because of that global lock. So unfortunately, that's, that's been, um, one of the limiter, limiting factors to Sidekiq scalability is that, that lock and boy, I would pay a lot of money to just have that lock go away, but. You know, Python is going through a very long term experiment about trying to remove that lock and I'm very curious to see how well that goes because I would love to see Ruby do the same and we'll see what happens in the future, but, it's always frustrating when I come to another RubyConf and I hear another Matt's keynote where he's asked about the GIL and he continues to say, well, the GIL is going to be around, as long as I can tell.

[00:04:57] so it's a little bit frustrating, but. It's, it's just what you have to deal with.

Ractors

[00:05:02] Jeremy: I'm not too familiar with them, but they, they did mention during the keynote I think there Ractors or something like that. There, there, there's some way of being able to get around the GIL but there are these constraints on them. And in the context of Sidekiq and, and maybe Ruby in general, how do you feel about those options or those solutions?

[00:05:22] Mike: Yeah, so, I think it was Ruby 3. 2 that introduced this concept of what they call a Ractor, which is like a thread, except it does not have the global lock. It can run independent to the global lock. The problem is, is because it doesn't use the global lock, it has pretty severe constraints on what it can do.

[00:05:47] And the, and more specifically, the data it can access. So, Ruby apps and Rails apps throughout history have traditionally accessed a lot of global data, a lot of class level data, and accessed all this data in a, in a read only fashion. so there's no race conditions because no one's changing any of it, but it's still, lots of threads all accessing the same variables.

[00:06:19] Well, Ractors can't do that at all. The only data Ractors can access is data that they own. And so that is completely foreign to Ruby application, traditional Ruby applications. So essentially, Ractors aren't compatible with the vast majority of existing Ruby code. So I, I, I toyed with the idea of prototyping Sidekiq and Ractors, and within about a minute or two, I just ran into these, these, uh...

[00:06:51] These very severe constraints, and so that's why you don't see a lot of people using Ractors, even still, even though they've been out for a year or two now, you just don't see a lot of people using them, because they're, they're really limited, limited in what they can do. But, on the other hand, they're unlimited in how well they can scale.

[00:07:12] So, we'll see, we'll see. Hopefully in the future, they'll make a lot of improvements and, uh, maybe they'll become more usable over time.

Downsides of multiprocess (Memory usage)

[00:07:19] Jeremy: And with the existence of a job queue or job scheduler like Sidekiq, you're able to create additional processes to get around that global lock, I suppose. What are the... downsides of doing so versus another language like we mentioned Java earlier, which is capable of having true parallelism in the same process.

[00:07:47] Mike: Yeah, so you can start up multiple Ruby processes to process things truly in parallel. The issue is that you do get some duplication in terms of memory. So your Ruby app maybe take a gigabyte per process. And, you can do copy on write forking. You can fork and get some memory sharing with copy on write semantics on Unix operating systems.

[00:08:21] But you may only get, let's say, 30 percent memory savings. So, there's still a significant memory overhead to forking, you know, let's say, eight processes versus having eight threads. You know, you, you, you may have, uh, eight threads can operate in a gigabyte process, but if you want to have eight processes, that may take, let's say, four gigabytes of RAM.

[00:08:48] So you, you still, it's not going to cost you eight gigabytes of RAM, you know, it's not like just one times eight, but, there's still a overhead of having those separate processes.

[00:08:58] Jeremy: would you say it's more of a cost restriction, like it costs you more to run these applications, or are there actual problems that you can't solve because of this restriction.

[00:09:13] Mike: Help me understand, what do you mean by restriction? Do you mean just the GVL in general, or the fact that forking processes still costs memory?

[00:09:22] Jeremy: I think, well, it would be both, right? So you're, you have two restrictions right now. You have the, the GVL, which means you can't have parallelism within the same process. And then your other option is to spin up a bunch of processes, which you have said is the downside there is that you're using a lot more RAM.

[00:09:43] I suppose my question is that Does that actually stop you from doing anything? Like, if you throw more money at the problem, you go like, we're going to have more instances, I'll pay for the RAM, it's fine, can that basically get you out of these situations or are these limitations actually stopping you from, from doing things you could do in other languages?

[00:10:04] Mike: Well, you certainly have to manage the multiple processes, right? So you've gotta, you know, if one child process crashes, you've gotta have a parent or supervisor process watching all that and monitoring and restarting the process. I don't think it restricts you. Necessarily, it just, it adds complexity to your deployment.

[00:10:24] and, and it's just a question of efficiency, right? Instead of being able to deploy on a, on a one gigabyte droplet, I've got to deploy to a four gigabyte droplet, right? Because I just, I need the RAM to run the eight processes. So it, it, it's more of just a purely a function of how much money am I going to have to throw at this problem.

[00:10:45] And what's it going to cost me in operational costs to operate this application in production?

When to use other languages?

[00:10:53] Jeremy: So during the. Keynote, uh, Matz had mentioned that Rails, is really suitable as this one person framework, like you can have a very small team or maybe even yourself and, and build this product. And so I guess from... Your perspective, once you cross a certain threshold, is like, what Ruby and what Sidekiq provides not enough, and that's why you need to start looking into other languages?

[00:11:24] Or like, where's the, turning point, or the, if you

[00:11:29] Mike: Right, right. The, it's all about the problem you're trying to solve, right? At the end of the day, uh, the, the question is just what are we trying to solve and how are we trying to solve it? So at a higher level, you got to think about the architecture. if the problem you're trying to solve, if the service you're trying to build, if the app you're trying to operate.

[00:11:51] If that doesn't really fall into the traditional Ruby application architecture, then you, you might look at it in another language or another ecosystem. something like Go, for instance, can compile down to a single binary, which makes deployment really easy. It makes shipping up a product. on to a user's machine, much simpler than deploying a Ruby application onto a user's desktop machine, for instance, right?

[00:12:22] Um, Ruby does have this, this problem of how do you package everything together and deploy it somewhere? Whereas Go, when you can just compile to a single binary, now you've just got a single thing. And it's just... Drop it on the file system and execute it. It's easy. So, um, different, different ecosystems have different application architectures, which empower different ways of solving the same problems.

[00:12:48] But, you know, Rails as a, as a one man framework, or sorry, one person framework, It, it, I don't, I don't necessarily, that's a, that's sort of a catchy marketing slogan, but I just think of Rails as the most productive framework you can use. So you, as a single person, you can maximize what you ship and the, the, the value that you can create because Rails is so productive.

[00:13:13] Jeremy: So it, seems like it's maybe the, the domain or the type of application you're making. Like you mentioned the command line application, because you want to be able to deliver it to your user easily. Just give them a binary, something like Go or perhaps Rust makes a lot more sense. and then I could see people saying that if you're doing something with machine learning, like the community behind Python, it's, they're just, they're all there.

[00:13:41] So

Room for more domains in Ruby

[00:13:41] Mike: That was exactly the example I was going to use also. Yeah, if you're doing something with data or AI, Python is going to be a more, a more traditional, natural choice. that doesn't mean Ruby can't do it. That doesn't mean, you wouldn't be able to solve the problem with Ruby. And, and there's, that just also means that there's more space for someone who wants to come in and make an impact in the Ruby community.

[00:14:03] Find a problem that Ruby's not really well suited to solving right now and build the tooling out there to, to try and solve it. You know, I, I saw a talk, from the fellow who makes the Glimmer gem, which is a native UI toolkit. Uh, a gem for building native UIs in Ruby, which Ruby traditionally can't do, but he's, he's done an amazing job at sort of surfacing APIs to build these, um, these native, uh, native applications, which I think is great.

[00:14:32] It's awesome. It's, it's so invigorating to see Ruby in a new space like that. Um, I talked to someone else who's doing the Polars gem, which is focused on data processing. So it kind of takes, um, Python and Pandas and brings that to Ruby, which is, is awesome because if you're a Ruby developer, now you've got all these additional tools which can allow you to solve new sets of problems out there.

[00:14:57] So that's, that's kind of what's exciting in the Ruby community right now is just bring it into new spaces.

Faktory

[00:15:03] Jeremy: In addition to Sidekiq, you have, uh, another product called Faktory, I believe. And so does that serve a, a similar purpose? Is that another job scheduling, job queueing system?

[00:15:16] Mike: It is, yes. And it's, it's, it's similar in a way to Sidekiq. It looks similar. It's got similar concepts at the core of it. At the end of the day, Sidekiq is limited to Ruby. Because Sidekiq executes in a Ruby VM, it executes the jobs, and the jobs are, have to be written in Ruby because you're running in the Ruby VM.

[00:15:38] Faktory was my attempt to bring, Sidekiq functionality to every other language. I wanted, I wanted Sidekiq for JavaScript. I wanted Sidekiq for Go. I wanted Sidekiq for Python because A, a lot of these other languages also could use a system, a background job system. And the problem though is that.

[00:16:04] As a single man, I can't port Sidekiq to every other language. I don't know all the languages, right? So, Faktory kind of changes the architecture and, um, allows you to execute jobs in any language. it, it replaces Redis and provides a server where you just fetch jobs, and you can use it from it.

[00:16:26] You can use that protocol from any language to, to build your own worker processes that execute jobs in whatever language you want.

[00:16:35] Jeremy: When you say it replaces Redis, so it doesn't use Redis, um, internally, it has its own.

[00:16:41] Mike: It does use Redis under the covers. Yeah, it starts Redis as a child process and, connects to it over a Unix socket. And so it's really stable. It's really fast. from the outside, the, the worker processes, they just talk to Faktory. They don't know anything about Redis at all.

[00:16:59] Jeremy: I see. And for someone who, like we mentioned earlier in the Python community, for example, there is, um, Celery. For someone who is using a task scheduler like that, what's the incentive to switch or use something different?

[00:17:17] Mike: Well, I, I always say if you're using something right now, I'm not going to try and convince you to switch necessarily. It's when you have pain that you want to switch and move away. Maybe you have Maybe there's capabilities in the newer system that you really need that the old system doesn't provide, but Celery is such a widely known system that I'm not necessarily going to try and convince people to move away from it, but if people are looking for a new system, one of the things that Celery does that Faktory does not do is Celery provides like data adapters for using store, lots of different storage systems, right?

[00:17:55] Faktory doesn't do that. Faktory is more, has more of the Rails mantra of, you know, Omakase where we choose, I choose to use Redis and that's it. You don't, you don't have a choice for what to use because who cares, you know, at the end of the day, let Faktory deal with it. it's, it's not something that, You should even necessarily be concerned about it.

[00:18:17] Just, just try Faktory out and see how it works for you. Um, so I, I try to take those operational concerns off the table and just have the user focus on, you know, usability, performance, and that sort of thing. but it is, it's, it's another background job system out there for people to try out and see if they like that.

[00:18:36] And, and if they want to, um, if they know Celery and they want to use Celery, more power to Faktory them.

Sidekiq (Ruby) or Faktory (Polyglot)

[00:18:43] Jeremy: And Sidekiq and Faktory, they serve a very similar purpose. For someone who they have a new project, they haven't chosen a job. scheduling system, if they were using Ruby, would it ever make sense for them to use Faktory versus use Sidekiq?

[00:19:05] Mike: Uh Faktory is excellent in a polyglot situation. So if you're using multiple languages, if you're creating jobs in Ruby, but you're executing them in Python, for instance, um, you know, if you've, I have people who are, Creating jobs in PHP and executing them in Python, for instance. That kind of polyglot scenario, Sidekiq can't do that at all.

[00:19:31] So, Faktory is useful there. In terms of Ruby, Ruby is just another language to Faktory. So, there is a Ruby API for using Faktory, and you can create and execute Ruby jobs with Faktory. But, you'll find that in the Ruby community, Sidekiq is much widely... much more widely used and understood and known. So if you're just using Ruby, I think, I think Sidekiq is the right choice.

[00:19:59] I wouldn't look at Faktory. But if you do need, find yourself needing that polyglot tool, then Faktory is there.

Temporal

[00:20:07] Jeremy: And this is maybe one, maybe one layer of abstraction higher, but there's a product called Temporal that has some of this job scheduling, but also this workflow component. I wonder if you've tried that out and how you think about that product?

[00:20:25] Mike: I've heard of them. I don't know a lot about the product. I do have a workflow API, the Sidekiq batches, which allow you to fan out jobs and then, and then execute callbacks when all the jobs in that, in that batch are done. But I don't, provide sort of a, a high level. Graphical Workflow Editor or anything like that.

[00:20:50] Those to me are more marketing tools that you use to sell the tool for six figures. And I don't think they're usable. And I don't think they're actually used day to day. I provide an API for developers to use. And developers don't like moving blocks of code around in a GUI. They want to write code. And, um, so yeah, temporal, I, like I said, I don't know much about them.

[00:21:19] I also, are they a venture capital backed startup?

[00:21:22] Jeremy: They are, is my understanding,

[00:21:24] Mike: Yeah, that, uh, any, any sort of venture capital backed startup, um, who's building technical infrastructure. I, I would look long and hard at, I'm, I think open source is the right core to build on. Of course I sell commercial software, but. I'm bootstrapped. I'm profitable.

[00:21:46] I'm going to be around forever. A VC backed startup, they tend to go bankrupt, because they either get big or they go out of business. So that would be my only comment is, is, be a little bit leery about relying on commercial venture capital based infrastructure for, for companies, uh, long term.

Getting people to pay for Sidekiq

[00:22:05] Jeremy: So I think that's a really interesting part about your business is that I think a lot of open source maintainers have a really big challenge figuring out how to make it as a living. The, there are so many projects that they all have a very permissive license and you can use them freely one example I can think of is, I, I talked with, uh, David Kramer, who's the CTO at Sentry, and he, I don't think they use it anymore, but they, they were using Nginx, right?

[00:22:39] And he's like, well, Nginx, they have a paid product, like Nginx. Plus that or something. I don't know what the name is, but he was like, but I'm not going to pay for it. Right. I'm just going to use the free one. Why would I, you know, pay for the, um, the paid thing? So I, I, I'm kind of curious from your perspective when you were coming up with Sidekiq both as an open source product, but also as a commercial one, how did you make that determination of like to make a product where it's going to be useful in its open source form?

[00:23:15] I can still convince people to pay money for it.

[00:23:19] Mike: Yeah, the, I was terrified, to be blunt, when I first started out. when I started the Sidekiq project, I knew it was going to take a lot of time. I knew if it was successful, I was going to be doing it for the next decade. Right? So I started in 2012, and here I am in 2023, over a decade, and I'm still doing it.

[00:23:38] So my expectation was met in that regard. And I knew I was not going to be able to last that long. If I was making zero dollars, right? You just, you burn out. Nobody can last that long. Well, I guess there are a few exceptions to that rule, but yeah, money, I tend to think makes things a little more sustainable for sure.

[00:23:58] Especially if you can turn it into a full time job solving and supporting a project that you, you love and, and is, is, you know, your, your, your baby, your child, so to speak, your software, uh, uh, creation that you've given to the world. but I was terrified. but one thing I did was at the time I was blogging a lot.

[00:24:22] And so I was telling people about Sidekiq. I was telling people what was to come. I was talking about ideas and. The one thing that I blogged about was financial experiments. I said bluntly to the, to, to the Ruby community, I'm going to be experimenting with financial stability and sustainability with this project.

[00:24:42] So not only did I create this open source project, but I was also publicly saying I I need to figure out how to make this work for the next decade. And so eventually that led to Sidekiq Pro. And I had to figure out how to build a closed source Ruby gem, which, uh, There's not a lot of, so I was kind of in the wild there.

[00:25:11] But, you know, thankfully all the pieces came together and it was actually possible. I couldn't have done it if it wasn't possible. Like, we would not be talking if I couldn't make a private gem. So, um, but it happened to work out. Uh, and it allowed me to, to gate features behind a paywall effectively. And, and yeah, you're right.

[00:25:33] It can be tough to make people pay for software. but I'm a developer who's selling to other developers, not, not just developers, open source developers, and they know that they have this financial problem, right? They know that there's this sustainability problem. And I was blunt in saying, this is my solution to my sustainability.

[00:25:56] So, I charge what I think is a very fair price. It's only a thousand dollars a year to a hobbyist. That may seem like a lot of money to a business. It's a drop in the bucket. So it was easy for developers to say, Hey, listen, we want to buy this tool for a thousand bucks. It'll ensure our infrastructure is maintained for the next decade.

[00:26:18] And it's, and it's. And it's relatively cheap. It's way less than, uh, you know, a salary or even a laptop. So, so that's, that's what I did. And, um, it's, it worked out great. People, people really understood. Even today, I talk to people and they say, we, we signed up for Sidekiq Pro to support you. So it's, it's, it's really, um, invigorating to hear people, uh, thank me and, and they're, they're actively happy that they're paying me and our customers.

[00:26:49] Jeremy: it's sort of, uh, maybe a not super common story, right, in terms of what you went through. Because when I think of open core businesses, I think of companies like, uh, GitLab, which are venture funded, uh, very different scenario there. I wonder, like, in your case, so you started in 2012, and there were probably no venture backed competitors, right?

[00:27:19] People saying that we're going to make this job scheduling system and some VC is going to give me five million dollars and build a team to work on this. It was probably at the time, maybe it was Rescue, which was...

[00:27:35] Mike: There was a venture backed system called IronMQ,

[00:27:40] Jeremy: Hmm.

[00:27:41] Mike: And I'm not sure if they're still around or not, but they... They took, uh, one or more funding rounds. I'm not sure exactly, but they were VC backed. They were doing, background jobs, scheduled jobs, uh, you know, running container, running container jobs. They, they eventually, I think, wound up sort of settling on Docker containers.

[00:28:06] They'll basically spin up a Docker container. And that container can do whatever it wants. It can execute for a second and then shut down, or it can run for, for however long, but they would, um, yeah, I, yeah, I'll, I'll stop there because I don't know the actual details of exactly their system, but I'm not sure if they're still around, but that's the only one that I remember offhand that was around, you know, years ago.

[00:28:32] Yeah, it's, it's mostly, you know, low level open source infrastructure. And so, anytime you have funded startups, they're generally using that open source infrastructure to build their own SaaS. And so SaaS's are the vast majority of where you see sort of, uh, commercial software.

[00:28:51] Jeremy: so I guess in that way it, it, it gave you this, this window or this area where you could come in and there wasn't, other than that iron, product, there wasn't this big money that you were fighting against. It was sort of, it was you telling people openly, I'm, I'm working on this thing.

[00:29:11] I need to make money so that I can sustain it. And, if you, yeah. like the work I do, then, you know, basically support me. Right. And, and so I think that, I'm wondering how we can reproduce that more often because when you see new products, a lot of times it is VC backed, right?

[00:29:35] Because people say, I need to work on this. I need to be paid. and I can't ask a team to do this. For nothing, right? So

[00:29:44] Mike: Yeah. It's. It's a wicked problem. Uh, it's a really, really hard problem to solve if you take vc you there, that that really kind of means that you need to be making tens if not hundreds of millions of dollars in sales. If you are building a small or relatively small. You know, put small in quotes there because I don't really know what that means, but if you have a small open source project, you can't charge huge amounts for it, right?

[00:30:18] I mean, Sidekiq is a, I would call a medium sized open source project, and I'm charging a thousand bucks for it. So if you're building, you know, I don't know, I don't even want to necessarily give example, but if you're building some open source project, and It's one of 300 libraries that people's applications will depend on.

[00:30:40] You can't necessarily charge a thousand dollars for that library. depending on the size and the capabilities, maybe you can, maybe you can't. But there's going to be a long tail of open source projects that just, they can't, they can't charge much, if anything, for them. So, unfortunately, we have, you know, these You kind of have two pathways.

[00:31:07] Venture capital, where you've got to sell a ton, or free. And I've kind of walked that fine line where I'm a small business, I can charge a small amount because I'm bootstrapped. And, and I don't need huge amounts of money, and I, and I have a project that is of the right size to where I can charge a decent amount of money.

[00:31:32] That means that I can survive with 500 or a thousand customers. I don't need to have a hundred million dollars worth of customers. Because I, you know, when I started the business, one of the constraints I said is I don't want to hire anybody. I'm just going to be solo. And part of the, part of my ability to keep a low price and, and keep running sustainably, even with just You know, only a few hundred customers is because I'm solo.

[00:32:03] I don't have the overhead of investors. I don't have the overhead of other employees. I don't have an office space. You know, my overhead is very small. So that is, um, you know, I just kind of have a unique business in that way, I guess you might say.

Keeping the business solo

[00:32:21] Jeremy: I think that's that's interesting about your business as well But the fact that you've kept it you've kept it solo which I would imagine in most businesses, they need support people. they need, developers outside of maybe just one. Um, there's all sorts of other, I don't think overhead is the right word, but you just need more people, right?

[00:32:45] And, and what do you think it is about Sidekiq that's made it possible for it to just be a one person operation?

[00:32:52] Mike: There's so much administrative overhead in a business. I explicitly create business policies so that I can run solo. you know, my support policy is officially you get one email ticket or issue per quarter. And, and anything more than that, I can bounce back and say, well, you're, you're requiring too much support.

[00:33:23] In reality, I don't enforce that at all. And people email me all the time, but, but things like. Things like dealing with accounting and bookkeeping and taxes and legal stuff, licensing, all that is, yeah, a little bit of overhead, but I've kept it as minimal as I can. And part of that is I don't want to hire another employee because then that increases the administrative overhead that I have.

[00:33:53] And Sidekiq is so tied to me and my knowledge that if I hire somebody, they're probably not going to know Ruby and threading and all the intricate technical detail necessary to build and maintain and support the system. And so really you'll kind of regress a little bit. We won't be able to give as good support because I'm busy helping that other employee.

Being selective about customers

[00:34:23] Mike: So, yeah, it's, it's a tightrope act where you've got to really figure out how can I scale myself as far as possible without overwhelming myself. The, the overwhelming thing that I have that I've never been able to solve. It's just dealing with billing inquiries, customers, companies, emailing me saying, how do we buy this thing?

[00:34:46] Can I get an invoice? Every company out there, it seems wants an invoice. And the problem with invoicing is it takes a lot more. manual labor and administrative overhead to issue that invoice to collect payment on the invoice. So that's one of the reasons why I have a very strict policy about credit card only for, for the vast majority of my customers.

[00:35:11] And I demand that companies pay a lot more. You have to have a pretty big enterprise license if you want an invoice. And if the company, if the company comes back and complains and says, well, you know, that's ridiculous. We don't, we don't want to pay that much. We don't need it that much. Uh, you know, I, I say, okay, well then you have two, two things, two, uh, two things.

[00:35:36] You can either pay with a credit card or you can not use Sidekiq. Like, that's, that's it. I'm, I don't need your money. I don't want the administrative overhead of dealing with your accounting department. I just want to support my, my customers and build my software. And, and so, yeah, I don't want to turn into a billing clerk.

[00:35:55] So sometimes, sometimes the, the, the best thing in business that you can do is just say no.

[00:36:01] Jeremy: That's very interesting because I think being a solo... Person is what probably makes that possible, right? Because if you had the additional staff, then you might say like, Well, I need to pay my staff, so we should be getting, you know, as much business as

[00:36:19] Mike: Yeah. Chasing every customer you can, right. But yeah.

[00:36:22] Every customer is different. I mean, I have some customers that just, they never contact me. They pay their bill really fast or right on time. And they're paying me, you know, five figures, 20, a year. And they just, it's a, God bless them because those are, are the.

[00:36:40] Best customers to have and the worst customers are the ones who are paying 99 bucks a month and everything that they don't understand or whatever is a complaint. So sometimes, sometimes you, you want to, vet your customers from that perspective and say, which one of these customers are going to be good?

[00:36:58] Which ones are going to be problematic?

[00:37:01] Jeremy: And you're only only person... And I'm not sure how many customers you have, but

[00:37:08] Mike: I have 2000

[00:37:09] Jeremy: 2000 customers.

[00:37:10] Okay.

[00:37:11] Mike: Yeah.

[00:37:11] Jeremy: And has that been relatively stable or has there been growth

[00:37:16] Mike: It's been relatively stable the last couple of years. Ruby has, has sort of plateaued. Um, it's, you don't see a lot of growth. I'm getting probably, um, 15, 20 percent growth maybe. Uh, so I'm not growing like a weed, like, you know, venture capital would want to see, but steady incremental growth is, is, uh, wonderful, especially since I do very little.

[00:37:42] Sales and marketing. you know, I come to RubyConf I, I I tweet out, you know, or I, I toot out funny Mastodon Toots occasionally and, and, um, and, and put out new releases of the software. And, and that's, that's essentially my, my marketing. My marketing is just staying in front of developers and, and, and being a presence in the Ruby community.

[00:38:06] But yeah, it, it's, uh. I, I, I see not a, not a huge amount of churn, but I see enough sales to, to, to stay up and keep my head above water and to keep growing, um, slowly but surely.

Support needs haven't grown

[00:38:20] Jeremy: And as you've had that steady growth, has the support burden not grown with it?

[00:38:27] Mike: Not as much because once customers are on Sidekiq and they've got it working, then by and large, you don't hear from them all that much. There's always GitHub issues, you know, customers open GitHub issues. I love that. but yeah, by and large, the community finds bugs. and opens up issues. And so things remain relatively stable.

[00:38:51] I don't get a lot of the complete newbie who has no idea what they're doing and wants me to, to tell them how to use Sidekiq that I just don't see much of that at all. Um, I have seen it before, but in that case, generally, I, I, I politely tell that person that, listen, I'm not here to educate you on the product.

[00:39:14] It's there's documentation in the wiki. Uh, and there's tons of, of more Ruby, generic Ruby, uh, educational material out there. That's just not, not what I do. So, so yeah, by and large, the support burden is, is not too bad because once people are, are up and running, it's stable and, and they don't, they don't need to contact me.

[00:39:36] Jeremy: I wonder too, if that's perhaps a function of the price, because if you're a. new developer or someone who's not too familiar with how to do job processing or what they want to do when you, there is the open source product, of course. but then the next step up, I believe is about a hundred dollars a month.

[00:39:58] And if you're somebody who is kind of just getting started and learning how things work, you're probably not going to pay that, is my guess. And so you'll never hear from them.

[00:40:11] Mike: Right, yeah, that's a good point too, is the open source version, which is what people inevitably are going to use and integrate into their app at first. Because it's open source, you're not going to email me directly, um, and when people do email me directly, Sidekiq support questions, I do, I reply literally, I'm sorry I don't respond to private email, unless you're a customer.

[00:40:35] Please open a GitHub issue and, um, that I try to educate both my open source users and my commercial customers to try and stay in GitHub issues because private email is a silo, right? Private email doesn't help anybody else but them. If I can get people to go into GitHub issues, then that's a public record.

[00:40:58] that people can search. Because if one person has that problem, there's probably a dozen other people that have that same problem. And then that other, those other 11 people can search and find the solution to their problem at four in the morning when I'm asleep. Right? So that's, that's what I'm trying to do is, is keep, uh, keep everything out in the open so that people can self service as much as possible.

Sidekiq open source

[00:41:24] Jeremy: And on the open source side, are you still primarily the main contributor? Or do you have other people that are

[00:41:35] Mike: I mean, I'd say I do 90 percent of the work, which is why I don't feel guilty about keeping 100 percent of the money. A lot of open source projects, when they look for financial sustainability, they also look for how can we split this money amongst the team. And that's, that's a completely different topic that I've.

[00:41:55] is another reason why I've stayed solo is if I hire an employee and I pay them 200, 000 a year as a developer, I'm meanwhile keeping all the rest of the profits of the company. And so that almost seems a little bit unfair. because we're both still working 40 hours a week, right? Why am I the one making the vast majority of the, of the profit and the money?

[00:42:19] Um, so, uh, I've always, uh, that's another reason why I've stayed solo, but, but yeah, having a team of people working on something, I do get, regular commits, regular pull requests from people, fixing a bug that they found or just making a tweak that. that they saw, that they thought they could improve.

[00:42:42] A little more rarely I get a significant improvement or feature, as a pull request. but Sidekiq is so stable these days that it really doesn't need a team of people maintaining it. The volume of changes necessary, I can easily keep up with that. So, I'm still doing 90 95 percent of the work.

Are there other Sidekiq-like opportunities out there?

[00:43:07] Jeremy: Yeah, so I think Sidekiq has sort of a unique positioning where it's the code base itself is small enough where you can maintain it yourself and you have some help, but primarily you're the main maintainer. And then you have enough customers who are willing to, to pay for the benefit it gives them on top of what the open source product provides.

[00:43:36] cause it's, it's, you were talking about how. Every project people work on, they have, they could have hundreds of dependencies, right? And to ask somebody to, to pay for each of them is, is probably not ever going to happen. And so it's interesting to think about how you have things like, say, you know, OpenSSL, you know, it's a library that a whole bunch of people rely on, but nobody is going to pay a monthly fee to use it.

[00:44:06] You have things like, uh, recently there was HashiCorp with Terraform, right? They, they decided to change their license because they, they wanted to get, you know, some of that value back, some of the money back, and the community basically revolted. Right? And did a fork. And so I'm kind of curious, like, yeah, where people can find these sweet spots like, like Sidekiq, where they can find this space where it's just small enough where you can work on it on your own and still get people to pay for it.

[00:44:43] It's, I'm trying to picture, like, where are the spaces?

Open source as a public utility

[00:44:48] Mike: We need to look at other forms of financing beyond pure capitalism. If this is truly public infrastructure that needs to be maintained for the long term, then why are we, why is it that we depend on capitalism to do that? Our roads, our water, our sewer, those are not Capitalist, right? Those are utilities, that's public infrastructure that we maintain, that the government helps us maintain.

[00:45:27] And in a sense, tech infrastructure is similar or could be thought of in a similar fashion. So things like Open Collective, things like, uh, there's a, there's a organization in Europe called NLNet, I think, out of the Netherlands. And they do a lot of grants to various open source projects to help them improve the state of digital infrastructure.

[00:45:57] They support, for instance, Mastodon as a open source project that doesn't have any sort of corporate backing. They see that as necessary social media infrastructure, uh, for the long term. And, and I, and I think that's wonderful. I like to see those new directions being explored where you don't have to turn everything into a product, right?

[00:46:27] And, and try and market and sale, um, and, and run ads and, and do all this stuff. If you can just make the case that, hey, this is, this is useful public infrastructure that so many different, um, Technical, uh, you know, applications and businesses could rely on, much like FedEx and DHL use our roads to the benefit of their own, their own corporate profits.

[00:46:53] Um, why, why, why shouldn't we think of tech infrastructure sort of in a similar way? So, yeah, I would like to see us explore more. in that direction. I understand that in America that may not happen for quite a while because we are very, capitalist focused, but it's encouraging to see, um, places like Europe, uh, a little more open to, to trialing things like, cooperatives and, and grants and large long term grants to, to projects to see if they can, uh, provide sustainability in, in, you know, in a new way.

[00:47:29] Jeremy: Yeah, that's a good point because I think right now, a lot of the open source infrastructure that we all rely on, either it's being paid for by large companies and at the whim of those large companies, if Google decides we don't want to pay for you to work on this project anymore, where does the money come from?

[00:47:53] Right? And on the other hand, there's the thousands, tens of thousands of people who are doing it. just for free out of the, you know, the goodness of their, their heart. And that's where a lot of the burnout comes from. Right. So I think what you're saying is that perhaps a lot of these pieces that we all rely on, that our, our governments, you know, here in the United States, but also around the world should perhaps recognize as this is, like you said, this is infrastructure, and we should be.

[00:48:29] Paying these people to keep the equivalent of the roads and, and, uh, all that working.

[00:48:37] Mike: Yeah, I mean, I'm not, I'm not claiming that it's a perfect analogy. There's, there's, there's lots of questions that are unanswered in that, right? How do you, how do you ensure that a project is well maintained? What does that even look like? What does that mean? you know, you can look at a road and say, is it full of potholes or is it smooth as glass, right?

[00:48:59] It's just perfectly obvious, but to a, to a digital project, it's, it's not as clear. So, yeah, but, but, but exploring those new ways because turning everybody into a businessman so that they can, they can keep their project going, it, it, it itself is not sustainable, right? so yeah, and that's why everything turns into a SaaS because a SaaS is easy to control.

[00:49:24] It's easy to gatekeep behind a paywall and it's easy to charge for, whereas a library on GitHub. Yeah. You know, what do you do there? You know, obviously GitHub has sponsors, the sponsors feature. You've got Patreon, you've got Open Collective, you've got Tidelift. There's, there's other, you know, experiments that have been run, but nothing has risen to the top yet.

[00:49:47] and it's still, it's still a bit of a grind. but yeah, we'll see, we'll see what happens, but hopefully people will keep experimenting and, and maybe, maybe governments will start. Thinking in the direction of, you know, what does it mean to have a budget for digital infrastructure maintenance?

[00:50:04] Jeremy: Yeah, it's interesting because we, we started thinking about like, okay, where can we find spaces for other Sidekiqs? But it sounds like maybe, maybe that's just not realistic, right? Like maybe we need more of a... Yeah, a rethinking of, I guess the, the structure of how people get funded. Yeah.

[00:50:23] Mike: Yeah, sometimes the best way to solve a problem is to think at a higher level. You know, we, the, the sustainability problem in American Silicon Valley based open source developers is naturally going to tend toward venture capital and, and capitalism. And I, you know, I think, I think that's, uh, extremely problematic on a, on a lot of different, in a lot of different ways.

[00:50:47] And, and so sometimes you need to step back and say, well, maybe we're, maybe we just don't have the right tool set to solve this problem. But, you know, I, I. More than that, I'm not going to speculate on because it is a wicked problem to solve.

[00:51:04] Jeremy: Is there anything else you wanted to, to mention or thought we should have talked about?

[00:51:08] Mike: No, I, I, I loved the talk, of sustainability and, and open source. And I, it's, it's a, it's a topic really dear to my heart, obviously. So I, I am happy to talk about it at length with anybody, anytime. So thank you for having me.

[00:51:25] Jeremy: All right. Thank you very much, Mike.

Sara Jackson on Teaching in Kanazawa (RubyConf 2023)

Sat, 18 Nov 2023 18:35:26 -0000

Sara is a team lead at thoughtbot.

She talks about her experience as a professor at Kanazawa Technical College, giant LAN parties in Rochester, transitioning from Java to Ruby, shining a light on maintainers, and her closing thoughts on RubyConf.

Recorded at RubyConf 2023 in San Diego.

A few topics covered:

Being an Assistant Arofessor in Kanazawa
Teaching naming, formatting, and style
Differences between students in Japan vs US
Technical terms and programming resources in Japanese
LAN parties at Rochester
Transitioning from Java to Ruby
Consulting
The forgotten maintainer
RubyConf

Japan

Rochester

Transcript

You can help correct transcripts on GitHub.

[00:00:00] Jeremy: I'm here at RubyConf, San Diego, with Sara Jackson, thank you for joining me today.

[00:00:05] Sara: Thank you for having me. Happy to be here.

[00:00:07] Jeremy: Sara right now you're working at, ThoughtBot, as a, as a Ruby developer, is that right?

[00:00:12] Sara: Yes, that is correct.

Teaching in Japan

[00:00:14] Jeremy: But I think before we kind of talk about that, I mean, we're at a Ruby conference, but something that I, I saw, on your LinkedIn that I thought was really interesting was that you were teaching, I think, programming in. Kanazawa, for a couple years.

[00:00:26] Sara: Yeah, that's right. So for those that don't know, Kanazawa is a city on the west coast of Japan. If you draw kind of a horizontal line across Japan from Tokyo, it's, it's pretty much right there on the west coast. I was an associate professor in the Global Information and Management major, which is basically computer science or software development. (laughs) Yep.

[00:00:55] Jeremy: Couldn't tell from the title.

[00:00:56] Sara: You couldn't. No.. so there I was teaching classes for a bunch of different languages and concepts from Java to Python to Unix and Bash scripting, just kind of all over.

[00:01:16] Jeremy: And did you plan the curriculum yourself, or did they have anything for you?

[00:01:21] Sara: It depended on the class that I was teaching. So some of them, I was the head teacher. In that case, I would be planning the class myself, the... lectures the assignments and grading them, et cetera. if I was assisting on a class, then usually it would, I would be doing grading and then helping in the class. Most of the classes were, uh, started with a lecture and then.

Followed up with a lab immediately after, in person.

[00:01:54] Jeremy: And I think you went to, is it University of Rochester?

[00:01:58] Sara: Uh, close. Uh, Rochester Institute of Technology. So, same city. Yeah.

[00:02:03] Jeremy: And so, you were studying computer science there, is that right?

[00:02:07] Sara: I, I studied computer science there, but I got a minor in Japanese language. and that's how, that's kind of my origin story of then teaching in Kanazawa. Because Rochester is actually the sister city with Kanazawa. And RIT has a study abroad program for Japanese learning students to go study at KIT, Kanazawa Institute of Technology, in Kanazawa, do a six week kind of immersive program.

And KIT just so happens to be under the same board as the school that I went to teach at.

[00:02:46] Jeremy: it's great that you can make that connection and get that opportunity, yeah.

[00:02:49] Sara: Absolutely. Networking!

[00:02:52] Jeremy: And so, like, as a student in Rochester, you got to see how, I suppose, computer science education was there. How did that compare when you went over to Kanazawa?

[00:03:02] Sara: I had a lot of freedom with my curriculum, so I was able to actually lean on some of the things that I learned, some of the, the way that the courses were structured that I took, I remember as a freshman in 2006, one of the first courses that we took, involved, learning Unix, learning the command line, things like that.

I was able to look up some of the assignments and some of the information from that course that I took to inform then my curriculum for my course,

[00:03:36] Jeremy: That's awesome. Yeah. and I guess you probably also remember how you felt as a student, so you know like what worked and maybe what didn't.

[00:03:43] Sara: Absolutely. And I was able to lean on that experience as well as knowing. What's important and what, as a student, I didn't think was important.

Naming, formatting, and style

[00:03:56] Jeremy: So what were some examples of things that were important and some that weren't?

[00:04:01] Sara: Mm hmm. For Java in particular, you don't need any white space between any of your characters, but formatting and following the general Guidelines of style makes your code so much easier to read. It's one of those things that you kind of have to drill into your head through muscle memory. And I also tried to pass that on to my students, in their assignments that it's.

It's not just to make it look pretty. It's not just because I'm a mean teacher. It is truly valuable for future developers that will end up reading your code.

[00:04:39] Jeremy: Yeah, I remember when I went through school. The intro professor, they would actually, they would print out our code and they would mark it up with red pen, basically like a writing assignment and it would be like a bad variable name and like, white space shouldn't be here, stuff like that. And, it seems kind of funny now, but, it actually makes it makes a lot of sense.

[00:04:59] Sara: I did that.

[00:04:59] Jeremy: Oh, nice.

[00:05:00] Sara: I did that for my students. They were not happy about it. (laughs)

[00:05:04] Jeremy: Yeah, at that time they're like, why are you like being so picky, right?

[00:05:08] Sara: Exactly. But I, I think back to my student, my experience as a student. in some of the classes I've taken, not even necessarily computer related, the teachers that were the sticklers, those lessons stuck the most for me. I hated it at the time. I learned a lot.

[00:05:26] Jeremy: Yeah, yeah. so I guess that's an example of things that, that, that matter. The, the aesthetics or the visual part for understanding. What are some things that they were teaching that you thought like, Oh, maybe this isn't so important.

[00:05:40] Sara: Hmm.

Pause for effect. (laughs) So I think that there wasn't necessarily Any particular class or topic that I didn't feel was as valuable, but there was some things that I thought were valuable that weren't emphasized very well. One of the things that I feel very strongly about, and I'm sure those of you out there can agree. in RubyWorld, that naming is important.

The naming of your variables is valuable. It's useful to have something that's understood. and there were some other teachers that I worked with that didn't care so much in their assignments. And maybe the labs that they assigned had less than useful names for things. And that was kind of a disappointment for me.

[00:06:34] Jeremy: Yeah, because I think it's maybe hard to teach, a student because a lot of times you are writing these short term assignments and you have it pass the test or do the thing and then you never look at it again.

[00:06:49] Sara: Exactly.

[00:06:50] Jeremy: So you don't, you don't feel that pain. Yeah,

[00:06:53] Sara: Mm hmm. But it's like when you're learning a new spoken language, getting the foundations correct is super valuable.

[00:07:05] Jeremy: Absolutely. Yeah. And so I guess when you were teaching in Kanazawa, was there anything you did in particular to emphasize, you know, these names really matter because otherwise you or other people are not going to understand what you were trying to do here?

[00:07:22] Sara: Mm hmm. When I would walk around class during labs, kind of peek over the shoulders of my students, look at what they're doing, it's... Easy to maybe point out at something and be like, well, what is this? I can't tell what this is doing. Can you tell me what this does? Well, maybe that's a better name because somebody else who was looking at this, they won't know, I don't know, you know, it's in your head, but you will not always be working solo.

my school, a big portion of the students went on to get technical jobs from after right after graduating. it was when you graduated from the school that I was teaching at, KTC, it was the equivalent of an associate's degree. Maybe 50 percent went off to a tech job. Maybe 50 percent went on to a four year university.

And, and so as students, it hadn't. Connected with them always yet that oh, this isn't just about the assignment. This is also about learning how to interact with my co workers in the future.

Differences between students

[00:08:38] Jeremy: Yeah, I mean, I think It's hard, but, group projects are kind of always, uh, that's kind of where you get to work with other people and, read other people's code, but there's always that potential imbalance of where one person is like, uh, I know how to do this.

I'll just do it. Right? So I'm not really sure how to solve that problem. Yeah.

[00:09:00] Sara: Mm hmm. That's something that I think probably happens to some degree everywhere, but man, Japan really has groups, group work down. They, that's a super generalization. For my students though, when you would put them in a group, they were, they were usually really organized about who was going to do what and, kept on each other about doing things

maybe there were some students that were a little bit more slackers, but it was certainly not the kind of polarized dichotomy you would usually see in an American classroom.

[00:09:39] Jeremy: Yeah. I've been on both sides. I've been the person who did the work and the slacker.

[00:09:44] Sara: Same.

[00:09:46] Jeremy: And, uh, I feel bad about it now, but, uh,

[00:09:50] Sara: We did what we had to do.

[00:09:52] Jeremy: We all got the degree, so we're good. that is interesting, though. I mean, was there anything else, like, culturally different, you felt, from, you know, the Japanese university?

[00:10:04] Sara: Yes. Absolutely. A lot of things. In American university, it's kind of the first time in a young person's life, usually, where they have the freedom to choose what they learn, choose where they live, what they're interested in. And so there's usually a lot of investment in your study and being there, being present, paying attention to the lecture.

This is not to say that Japanese college students were the opposite. But the cultural feeling is college is your last time to have fun before you enter the real world of jobs and working too many hours. And so the emphasis on paying Super attention or, being perfect in your assignments. There was, there was a scale.

There were some students that were 100 percent there. And then there were some students that were like, I'm here to get a degree and maybe I'm going to sleep in class a little bit. (laughs) That is another major difference, cultural aspect. In America, if you fall asleep in a meeting, you fall asleep in class, super rude.

Don't do it. In Japan, if you take a nap at work, you take a nap in class, not rude. It's actually viewed as a sign of you are working really hard. You're usually working maybe late into the night. You're not getting enough sleep. So the fact that you need to take maybe a nap here or two here or there throughout the day means that you have put dedication in.

[00:11:50] Jeremy: Even if the reason you're asleep is because you were playing games late at night.

[00:11:54] Sara: Yep.

[00:11:55] Jeremy: But they don't know that.

[00:11:56] Sara: Yeah. But it's usually the case for my students.

[00:11:59] Jeremy: Okay. I'm glad they were having fun at least

[00:12:02] Sara: Me too.

Why she moved back

[00:12:04] Jeremy: That sounds like a really interesting experience. You did it for about two years? Three years.

[00:12:12] Sara: So I had a three year contract with an option to extend up to five, although I did have a There were other teachers in my same situation who were actually there for like 10 years, so it was flexible.

[00:12:27] Jeremy: Yeah. So I guess when you made the decision to, to leave, what was sort of your, your thinking there?

[00:12:35] Sara: My fiance was in America

[00:12:37] Jeremy: Good.

[00:12:37] Sara: he didn't want to move to Japan

[00:12:39] Jeremy: Good, reason.

[00:12:39] Sara: Yeah, he was waiting three years patiently for me.

[00:12:44] Jeremy: Okay. Okay. my heart goes out there . He waited patiently.

[00:12:49] Sara: We saw each other. We, we were very lucky enough to see each other every three or four months in person. Either I would visit America or he would come visit me in Kanazawa.

[00:12:59] Jeremy: Yeah, yeah. You, you couldn't convince him to, to fall in love with the country.

[00:13:03] Sara: I'm getting there

[00:13:04] Jeremy: Oh, you're getting Oh,

[00:13:05] Sara: it's, We're making, we're making way.

[00:13:07] Jeremy: Good, that's good. So are you taking like, like yearly trips or something, or?

[00:13:11] Sara: That was, that was always my intention when I moved back so I moved back in the Spring of 2018 to America and I did visit. In 2019, the following year, so I could attend the graduation ceremony for the last group of students that I taught.

[00:13:26] Jeremy: That's so sweet.

[00:13:27] Sara: And then I had plans to go in 2020. We know what happened in 2020

[00:13:32] Jeremy: Yeah.

[00:13:33] Sara: The country did not open to tourism again until the fall of 2022.

But I did just make a trip last month.

[00:13:40] Jeremy: Nice

[00:13:40] Sara: To see some really good friends for the first time in four years.

[00:13:43] Jeremy: Amazing, yeah. Where did you go?

[00:13:46] Sara: I did a few days in Tokyo. I did a few days in Niigata cause I was with a friend who studied abroad there. And then a few days in Kanazawa.

[00:13:56] Jeremy: That's really cool, yeah. yeah, I had a friend who lived there, but they were teaching English, yeah. And, I always have a really good time when I'm out there, yeah.

[00:14:08] Sara: Absolutely. If anyone out there visiting wants to go to Japan, this is your push. Go do it. Reach out to me on LinkedIn. I will help you plan.

[00:14:17] Jeremy: Nice, nice. Um, yeah, I, I, I would say the same. Like, definitely, if you're thinking about it, go. And, uh, sounds like Sara will hook you up.

[00:14:28] Sara: Yep, I'm your travel guide.

Technical terms in Japanese

[00:14:31] Jeremy: So you, you studied, uh, you, you said you had a minor in Japanese? Yeah. So, so when you were teaching there, were you teaching classes in English or was it in Japanese?

[00:14:42] Sara: It was a mix. Uh, when I was hired, the job description was no Japanese needed. It was a very, like, Global, international style college, so there was a huge emphasis on learning English. They wanted us to teach only in English. My thought was, it's hard enough learning computer science in your native language, let alone a foreign language, so my lectures were in English, but I would assist the labs in japanese

[00:15:14] Jeremy: Oh, nice. Okay. And then, so you were basically fluent then at the time. Middle. Okay. Yeah. Yeah. Hey, well, I think if you're able to, to help people, you know, in labs and stuff, and it's a technical topic, right? So that's gotta be kind of a, an interesting challenge

[00:15:34] Sara: I did learn a lot of new computer vocabulary. Yes.

[00:15:39] Jeremy: So the words are, like, a lot of them are not the same? Or, you know, for, for specifically related to programming, I guess.

[00:15:46] Sara: Hmm. Yeah, there are Japanese specific words. There's a lot of loan words that we use. We. Excuse me. There's a lot of loan words that Japanese uses for computer terms, but there's plenty that are just in Japanese. For example, uh, an array is hairetsu.

[00:16:08] Jeremy: Okay.

[00:16:08] Sara: And like a screen or the display that your monitor is a gamen,

but a keyboard would be keyboard... Kībōdo, probably.

[00:16:20] Jeremy: Yeah. So just, uh, so that, they use that as a loan word, I guess. But I'm not sure why not the other two.

[00:16:27] Sara: Yeah, it's a mystery.

[00:16:29] Jeremy: So it's just, it's just a total mix. Yeah. I'm just picturing you thinking like, okay, is it the English word or is it the Japanese word? You know, like each time you're thinking of a technical term. Yeah.

[00:16:39] Sara: Mm hmm. I mostly, I, I I went to the internet. I searched for Japanese computer term dictionary website, and kind of just studied the terms. I also paid a lot of attention to the Japanese professors when they were teaching, what words they were using. Tried to integrate.

Also, I was able to lean on my study abroad, because it was a technical Japanese, like there were classes that we took that was on technical Japanese.

Computer usage, and also eco technology, like green technology. So I had learned a bunch of them previously.

[00:17:16] Jeremy: Mm. So was that for like a summer or a year or something

[00:17:20] Sara: It was six weeks

[00:17:21] Jeremy: Six weeks.

[00:17:21] Sara: During the summer,

[00:17:22] Jeremy: Got it. So that's okay. So like, yeah, that must have been an experience like going to, I'm assuming that's the first time you had been

[00:17:30] Sara: It was actually the second time

[00:17:31] Jeremy: The second

[00:17:32] Sara: Yeah. That was in 2010 that I studied abroad.

[00:17:35] Jeremy: And then the classes, they were in Japanese or? Yeah.

Yeah. That's, uh, that's, that's full immersion right there.

[00:17:42] Sara: It was, it was very funny in the, in the very first lesson of kind of just the general language course, there was a student that was asking, I, how do I say this? I don't know this. And she was like, Nihongo de.

[00:17:55] Jeremy: Oh (laughs) !

[00:17:56] Sara: You must, must ask your question only in

[00:17:59] Jeremy: Yeah,

Programming resources in Japanesez

[00:17:59] Jeremy: yeah. yeah. That's awesome. So, so it's like, I guess the, the professors, they spoke English, but they were really, really pushing you, like, speak Japanese. Yeah, that's awesome. and maybe this is my bias because I'm an English native, but when you look up. Resources, like you look up blog posts and Stack Overflow and all this stuff.

It's all in English, right? So I'm wondering for your, your students, when, when they would search, like, I got this error, you know, what do I do about it? Are they looking at the English pages or are they, you know, you know what I mean?

[00:18:31] Sara: There are Japanese resources that they would use. They love Guguru (Google) sensei.

[00:18:36] Jeremy: Ah okay. Okay.

[00:18:38] Sara: Um, but yeah, there are plenty of Japanese language stack overflow equivalents. I'm not sure if they have stack overflow specifically in Japanese. But there are sites like that, that they, that they used. Some of the more invested students would also use English resources, but that was a minority.

[00:19:00] Jeremy: Interesting. So there's a, there's a big enough community, I suppose, of people posting and answering questions and stuff where it's, you don't feel like, there aren't people doing the same thing as you out there.

[00:19:14] Sara: Absolutely. Yeah. There's, a large world of software development in Japan, that we don't get to hear. There are questions and answers over here because of that language barrier.

[00:19:26] Jeremy: Yeah. I would be, like, kind of curious to, to see, the, the languages and the types of problems they have, if they were similar or if it's, like, I don't know, just different.

[00:19:38] Sara: Yeah, now I'm interested in that too, and I bet you there is a lot of research that we could do on Ruby,

since Ruby is Japanese.

[00:19:51] Jeremy: Right. cause something I've, I've often heard is that, when somebody says they're working with Ruby, Here in, um, the United States, a lot of times people assume it's like, Oh, you're doing a Rails app,

[00:20:02] Sara: Mm hmm.

[00:20:03] Jeremy: Almost, almost everybody who's using Ruby, not everyone, but you know, the majority I think are using it because of Rails.

And I've heard that in Japan, there's actually a lot more usage that's, that's not tied to Rails.

[00:20:16] Sara: I've also heard that, and I get the sense of that from RubyKaigi as well. Which I have never been lucky enough to attend. But, yeah, the talks that come out of RubyKaigi, very technical, low to the metal of Ruby, because there's that community that's using it for things other than Rails, other than web apps.

[00:20:36] Jeremy: Yeah, I think, one of the ones, I don't know if it was a talk or not, but, somebody was saying that there is Ruby in space.

[00:20:42] Sara: That's awesome. Ruby's everywhere.

LAN parties in college

[00:20:44] Jeremy: So yeah, I guess like another thing I saw, during your time at Rochester is you were, involved with like, there's like a gaming club I wonder if you could talk a little bit about your experience with that.

[00:20:55] Sara: Absolutely, I can. So, at RIT, I was an executive board member for three or four years at the Electronic Gaming Society. EGS for short, uh, we hosted weekly console game nights in, the student alumni union area, where there's open space, kind of like a cafeteria. We also hosted quarterly land parties, and we would actually get people from out of state sometimes who weren't even students to come.

Uh, and we would usually host the bigger ones in the field house, which is also where concerts are held.

And we would hold the smaller ones in conference rooms. I think when I started in 2006, the, the, the LANs were pretty small, maybe like 50, 50 people bring your, your, your huge CRT monitor tower in.

[00:21:57] Jeremy: Oh yeah,

[00:21:57] Sara: In And then by the time I left in 2012. we were over 300 people for a weekend LAN party, um, and we were actually drawing more power than concerts do.

[00:22:13] Jeremy: Incredible. what were, what were people playing at the time? Like when they would the LANs like,

[00:22:18] Sara: Yep. Fortnite, early League of Legends, Call of Duty. Battlegrounds. And then also just like fun indie games like Armagedtron, which is kind of like a racing game in the style of

[00:22:37] Jeremy: okay. Oh, okay,

[00:22:39] Sara: Um, any, there are some like fun browser games where you could just mess with each other. Jackbox. Yeah.

[00:22:49] Jeremy: Yeah, it's, it's interesting that, you know, you're talking about stuff like Fortnite and, um, what is it? Battlegrounds is

[00:22:55] Sara: not Fortnite. Team Fortress.

[00:22:58] Jeremy: Oh Team Fortress!

[00:22:59] Sara: Sorry. Yeah.

Oh, yeah, I got my, my names mixed up. Fortnite, I think, did not exist at the time, but Team Fortress was big.

[00:23:11] Jeremy: Yeah.

that's really cool that you're able to get such a big group there. is there something about Rochester, I guess, that that was able to bring together this many people for like these big LAN events? Because I'm... I mean, I'm not sure how it is elsewhere, but I feel like that's probably not what was happening elsewhere in the country.

[00:23:31] Sara: Yeah, I mean, if you've ever been to, um, DreamHack, that's, that's a huge LAN party and game convention, that's fun. so... EGS started in the early 2000s, even before I joined, and was just a committed group of people. RIT was a very largely technical school. The majority of students were there for math, science, engineering, or they were in the computer college,

[00:24:01] Jeremy: Oh, okay.

[00:24:01] Sara: GCIS, G C C I S, the Gossano College of Computing and Information Sciences.

So there was a lot of us there.

[00:24:10] Jeremy: That does make sense. I mean, it's, it's sort of this, this bias that when there's people doing, uh, technical stuff like software, um, you know, and just IT,

[00:24:21] Sara: Mm

hmm.

[00:24:23] Jeremy: there's kind of this assumption that's like, oh, maybe they play games. And it seems like that was accurate

[00:24:27] Sara: It was absolutely accurate. And there were plenty of people that came from different majors. but when I started, there were 17, 000 students and so that's a lot of students and obviously not everyone came to our weekly meetings, but we had enough dedicated people that were on the eboard driving, You know, marketing and advertising for, for our events and things like that, that we were able to get, the good community going.

I, I wasn't part of it, but the anime club at RIT is also huge. They run a convention every year that is huge, ToraCon, um. And I think it's just kind of the confluence of there being a lot of geeks and nerds on campus and Rochester is a college town. There's maybe like 10 other universities in

[00:25:17] Jeremy: Well, sounds like it was a good time.

[00:25:19] Sara: Absolutely would recommend.

Strong Museum of Play

[00:25:22] Jeremy: I've never, I've never been, but the one thing I have heard about Rochester is there's the, the Strong Museum of Play.

[00:25:29] Sara: Yeah, that place is so much fun, even as an adult. It's kind of like, um, the, the Children's Museum in Indiana for, for those that might know that. it just has all the historical toys and pop culture and interactive exhibits. It's so fun.

[00:25:48] Jeremy: it's not quite the same, but it, when you were mentioning the Children's Museum in, um, I think it's in St. Louis, there's, uh, it's called the City Museum and it's like a, it's like a giant playground, you know, indoors, outdoors, and it's not just for kids,

right? And actually some of this stuff seems like kind of sketch in terms of like, you could kind of hurt yourself, you

know, climbing

[00:26:10] Sara: When was this made?

[00:26:12] Jeremy: I'm not sure, but,

uh,

[00:26:14] Sara: before regulations maybe. ha.

[00:26:16] Jeremy: Yeah. It's, uh, but it's really cool. So at the, at the Museum of Play, though, is it, There's like a video game component, right? But then there's also, like, other types of things,

[00:26:26] Sara: Yeah, they have, like, a whole section of the museum that's really, really old toys on display, like, 1900s, 1800s. Um, they have a whole Sesame Street section, and other things like that.

Yeah.

From Java to Ruby

[00:26:42] Jeremy: Check it out if you're in Rochester. maybe now we could talk a little bit about, so like now you're working at Thoughtbot as a Ruby developer. but before we started recording, you were telling me that you started, working with Java. And there was like a, a long path I suppose, you know, changing languages.

So maybe you can talk a little bit about your experience there.

[00:27:06] Sara: Yeah. for other folks who have switched languages, this might be a familiar story for you, where once you get a job in one technology or one stack, one language, you kind of get typecast after a while. Your next job is probably going to be in the same language, same stack. Companies, they hire based on technology and So, it might be hard, even if you've been playing around with Ruby in your free time, to break, make that barrier jump from one language to another, one stack to another.

I mean, these technologies, they can take a little while to ramp up on. They can be a little bit different, especially if you're going from a non object oriented language to an object oriented, don't. Lose hope. (laughs) If you have an interest in Ruby and you're not a Rubyist right now, there's a good company for you that will give you a chance.

That's the key that I learned, is as a software developer, the skills that you have that are the most important are not the language that you know. It's the type of thinking that you do, the problem solving, communication, documentation, knowledge sharing, Supporting each other, and

as Saron the keynote speaker on Wednesday said, the, the word is love.

[00:28:35] Jeremy:

[00:28:35] Sara: So when I was job hunting, it was really valuable for me to include those important aspects in my skill, in my resume, in my CV, in my interviews, that like, I'm newer to this language because I had learned it at a rudimentary level before. Never worked in it really professionally for a long time. Um, when I was applying, it was like, look, I'm good at ramping up in technologies.

I have been doing software for a long time, and I'm very comfortable with the idea of planning, documenting, problem solving. Give me a chance, please. I was lucky enough to find my place at a company that would give me a chance. Test Double hired me in 2019 as a remote. Software Consultant, and it changed my life.

[00:29:34] Jeremy: What, what was it about, Ruby that I'm assuming that this is something that you maybe did in your spare time where you were playing with Ruby or?

[00:29:43] Sara: I am one of those people that don't really code in their spare time, which I think is valuable for people to say. The image of a software developer being, well, if you're not coding in your spare time, then you're not passionate about it. That's a myth. That's not true. Some of us, we have other hobbies. I have lots of hobbies.

Coding is not the one that I carry outside of the workplace, usually, but, I worked at a company called Constant Contact in 2014 and 2015. And while I was there, I was able to learn Ruby on Rails.

[00:30:23] Jeremy: Oh, okay. So that was sort of, I guess, your experience there, on the job. I guess you enjoyed something about the language or something about Rails and then that's what made you decide, like, I would really love to, to... do more of this

[00:30:38] Sara: Absolutely. It was amazing. It's such a fun language. The first time I heard about it was in college, maybe 2008 or 2009. And I remember learning, this looks like such a fun language. This looks like it would be so interesting to learn. And I didn't think about it again until 2014. And then I was programming in it.

Coming from a Java mindset and it blew my mind, the Rails magic also, I was like, what is happening? This is so cool. Because of my typecasting sort of situation of Java, I wasn't able to get back to it until 2019. And I don't want to leave. I'm so happy. I love the language. I love the community. It's fun.

[00:31:32] Jeremy: I can totally see that. I mean, when I first tried out Rails, yeah, it, like, you mentioned the magic, and I know some people are like, ah, I don't like the magic, but when, I think, once I saw what you could do, And how, sort of, little you needed to write, and the fact that so many projects kind of look the same.

Um, yeah, that really clicked for me, and I really appreciated that. think that and the Rails console. I think the console is amazing.

[00:32:05] Sara: Being able to just check real quick. Hmm, I wonder if this will work. Wait, no, I can check right now. I

[00:32:12] Jeremy: And I think that's an important point you brought up too, about, like, not... the, the stereotype and I, I kind of, you know, showed it here where I assumed like, Oh, you were doing Java and then you moved to Ruby. It must've been because you were doing Ruby on the side and thought like, Oh, this is cool.

I want to do it for my job. but I, I thought that's really cool that you were able to, not only that you, you don't do the programming stuff outside of work, but that you were able to, to find an opportunity where you could try something different, you know, in your job where you're still being paid. And I wonder, was there any, was there any specific intention behind, like, when you took that job, it was so that I can try something different, or did it just kind of happen? I'm curious what your...

The appeal of consulting

[00:32:58] Sara: I was wanting to try something different. I also really wanted to get into consulting.

[00:33:04] Jeremy: Hmm.

[00:33:05] Sara: I have ADHD. And working at a product company long term, I think, was never really going to work out for me. another thing you might notice in my LinkedIn is that a lot of my stays at companies have been relatively short.

Because, I don't know, I, my brain gets bored. The consultancy environment is... Perfect. You can go to different clients, different engagements, meet new people, learn a different stack, learn how other people are doing things, help them be better, and maybe every two weeks, two months, three months, six months, a year, change and do it all over again.

For some people, that sounds awful. For me, it's perfect.

[00:33:51] Jeremy: Yeah, I hadn't thought about that with, with consulting. cause I, I suppose, so you said it's, it's usually about half a year between projects or is

[00:34:01] Sara: varies

[00:34:01] Jeremy: It varies widely.

[00:34:02] Sara: Widely. I think we try to hit the sweet spot of 3-6 months. For an individual working on a project, the actual contract engagement might be longer than that, but, yeah.

Maintainers don't get enough credit

[00:34:13] Jeremy: Yeah. And, and your point about how some people, they like to jump on different things and some people like to, to stick to the same thing. I mean, that, that makes a lot of, sense in terms of, I think maintaining software and like building new software. It's, they're both development,

[00:34:32] Sara: Mm hmm.

[00:34:32] Jeremy: they're very different.

Right.

[00:34:35] Sara: It's so funny that you bring that up because I highly gravitate towards maintaining over making. I love going to different projects, but I have very little interest in Greenfield, very little interest in making something new. I want to get into the weeds, into 10 years that nobody wants to deal with because the weeds are so high and there's dragons in there.

I want to cut it away. I want to add documentation. I want to make it better. It's so important for us to maintain our software. It doesn't get nearly enough credit. The people that work on open source, the people that are doing maintenance work on, on apps internally, externally, Upgrades, making sure dependencies are all good and safe and secure.

love that stuff.

[00:35:29] Jeremy: That's awesome. We, we need more of you. (laughs)

[00:35:31] Sara: There's plenty of us out there, but we don't get the credit (laughs)

[00:35:34] Jeremy: Yeah, because it's like with maintenance, well, I would say probably both in companies and in open source when everything is working.

Then Nobody nobody knows.

Nobody says anything. They're just like, Oh, that's great. It's working. And then if it breaks, then everyone's upset.

[00:35:51] Sara: Exactly.

[00:35:53] Jeremy: And so like, yeah, you're just there to get yelled at when something goes wrong.

But when everything's going good, it's like,

[00:35:59] Sara: A job well done is, I was never here.

[00:36:02] Jeremy: Yeah. Yeah. Yeah. I don't know how. To, you know, to fix that, I mean, when you think about open source maintainers, right, like a big thing is, is, is burnout, right? Where you are keeping the internet and all of our applications running and, you know, what you get for it is people yelling at you and the issues, right?

[00:36:23] Sara: Yeah, it's hard. And I think I actually. Submitted a talk to RubyConf this year about this topic. It didn't get picked. That's okay.

Um, we all make mistakes.

I'm going to try to give it somewhere in the future, but I think one of the important things that we as an industry should strive for is giving glory.

Giving support and kudos to maintenance work. I've been trying to do that. slash I have been doing that at ThoughtBot by, at some cadence. I have been putting out a blog post to the ThoughtBot blog called. This week in open source, the time period that is covered might be a week or longer in those posts.

I give a summary of all of the commits that have been made to our open source projects. And the people that made those contributions with highlighting to new version releases, including patch level. And I do this. The time I, I, I took up the torch of doing this from a co worker, Mike Burns, who used to do it 10 years ago. I do this so that people can get acknowledgement for the work they do, even if it's fixing a broken link, even if it's updating some words that maybe don't make sense. All of it is valuable.

[00:37:54] Jeremy: Definitely. Yeah. I mean, I, I think that, um, yeah, what's visible to people is when there's a new feature or an API change and Yeah, it's just, uh, people don't, I think a lot of people don't realize, like, how much work goes into just keeping everything running.

[00:38:14] Sara: Mm hmm. Especially in the world of open source and Ruby on Rails, all the gems, there's so many different things coming out, things that suddenly this is not compatible. Suddenly you need to change something in your code because a dependency, however many steps apart has changed and it's hard work. The people that do those things are amazing.

[00:38:41] Jeremy: So if anybody listening does that work, we, we appreciate you.

[00:38:45] Sara: We salute you.

Thank you. And if you're interested in contributing to ThoughtBot open source, we have lots of repos. There's one out there for you.

Thoughts on RubyConf

[00:38:54] Jeremy: You've been doing programming for quite a while, and, you're here at, at RubyConf. I wonder what kind of brings you to these, these conferences? Like, what do you get out of them? Um, I guess, how was this one? That sort of thing.

[00:39:09] Sara: Well, first, this one was sick. This one was awesome. Uh, Ruby central pulled out all the stops and that DJ on Monday. In the event, in the exhibit hall. Wow. Amazing. So he told me that he was going to put his set up on Spotify, on Weedmaps Spotify, so go check it out. Anyway, I come to these conferences for people.

I just love connecting with people. Those listening might notice that I'm an extrovert. I work remotely. A lot of us work remotely these days. this is an opportunity to see some of my coworkers. There's seven of us here. It's an opportunity to see people I only see at conferences, of which there are a lot.

It's a chance to connect with people I've only met on Mastodon, or LinkedIn, or Stack Overflow. It's a chance to meet wonderful podcasters who are putting out great content, keeping our community alive. That's, that's the key for me. And the talks are wonderful, but honestly, they're just a side effect for me.

They just come as a result of being here.

[00:40:16] Jeremy: Yeah, it's kind of a unique opportunity, you know, to have so many of your, your colleagues and to just all be in the same place. And you know that anybody you talk to here, like if you talk about Ruby or software, they're not going to look at you and go like, I don't know what you're talking about.

Like everybody here has at least that in common. So it's, yeah, it's a really cool experience to, to be able to chat with anybody. And it's like, You're all on the same page,

[00:40:42] Sara: Mm hmm. We're all in this boat together.

[00:40:45] Jeremy: Yup, that we got to keep, got to keep afloat according to matz

[00:40:49] Sara: Gotta keep it afloat, yeah.

[00:40:51] Jeremy: Though I was like, I was pretty impressed by like during his, his keynote and he had asked, you know, how many of you here, it's your first RubyConf and it felt like it was over half the room.

[00:41:04] Sara: Yeah, I got the same sense. I was very glad to see that, very impressed. My first RubyConf was and it was the same sort of showing of

[00:41:14] Jeremy: Nice, yeah. Yeah, actually, that was my first one, too.

[00:41:17] Sara: Nice!

[00:41:19] Jeremy: Uh, that was Nashville, Yeah, yeah, yeah. So it's, yeah, it's really interesting to see because, the meme online is probably like, Ah, Ruby is dead, or Rails is dead.

But like you come to these conferences and yeah, there's, there's so many new people.

There's like new people that are learning it and experiencing it and, you know, enjoying it the same way we are. So I, I really hope that the, the community can really, yeah, keep this going.

[00:41:49] Sara: Continue, continue to grow and share. I love that we had first timer buttons, buttons where people could self identify as this is my first RubyConf and, and then that opens a conversation immediately. It's like, how are you liking it? What was your favorite talk?

[00:42:08] Jeremy: Yeah, that's awesome. okay, I think that's probably a good place to start wrapping it. But is there anything else you wanted to mention or thought we should have talked about?

[00:42:18] Sara: Can I do a plug for thoughtbot?

[00:42:20] Jeremy: yeah, go for it.

[00:42:21] Sara: Alright. For those of you out there that might not know what ThoughtBot does, we are a full software lifecycle or company lifecycle consultancy, so we do everything from market fit and rapid prototyping to MVPs to helping with developed companies, developed teams, maybe do a little bit of a Boost when you have a deadline or doing some tech debt.

Pay down. We also have a DevOps team, so if you have an idea or a company or a team, you want a little bit of support, we have been around for 20 years. We are here for you. Reach out to us at thoughtbot.com.

[00:43:02] Jeremy: I guess the thing about Thoughtbot is that, within the Ruby community specifically, they've been so involved with sponsorships and, and podcasts. And so, uh, when you hear about consultancies, a lot of times it's kind of like, well, I don't know, are they like any good? Do they know what they're doing?

But I, I feel like, ThoughtBot has had enough, like enough of a public record. I feel It's like, okay, if you, if you hire them, um, you should be in good hands.

[00:43:30] Sara: Yeah. If you have any questions about our abilities, read the blog.

[00:43:35] Jeremy: It is a good blog. Sometimes when I'm, uh, searching for how to do something in Rails, it'll pop up,

[00:43:40] Sara: Mm hmm. Me too. Every question I ask, one of the first results is our own blog. I'm like, oh yeah, that makes sense.

[00:43:47] Jeremy: Probably the peak is if you've written the blog.

[00:43:50] Sara: That has happened to my coworkers They're like, wait, I wrote a blog about this nine years ago.

[00:43:55] Jeremy: Yeah, yeah. So maybe, maybe that'll happen to you soon. I, I know definitely people who do, um, Stack Overflow. And it's like, Oh, I like, this is a good answer. Oh, I wrote this. (laughs) yeah. Well, Sara, thank you so much for, for chatting with me today.

[00:44:13] Sara: Absolutely, Jeremy. Thank you so much for having me. I was really glad to chat today.

David Copeland on Medium Sized Decisions (RubyConf 2023)

Fri, 17 Nov 2023 05:35:01 -0000

David was the chief software architect and director of engineering at Stitch Fix. He's also the author of a number of books including Sustainable Web Development with Ruby on Rails and most recently Ruby on Rails Background Jobs with Sidekiq.

He talks about how he made decisions while working with a medium sized team (~200 developers) at Stitch Fix.

The audio quality for the first 19 minutes is not great but the correct microphones turn on right after that.

Recorded at RubyConf 2023 in San Diego.

A few topics covered:

Ruby's origins at Stitch Fix
Thoughts on Go
Choosing technology and cloud services
Moving off heroku
Building a platform team
Where Ruby and Rails fit in today
The role of books and how different people learn
Large Language Model's effects on technical content

Transcript

You can help correct transcripts on GitHub.

Intro

[00:00:00] Jeremy: Today. I want to share another conversation from RubyConf San Diego. This time it's with David Copeland. He was a chief software architect and director of engineering at stitch fix.

And at the start of the conversation, you're going to hear about why he decided to write the book, sustainable web development with Ruby on rails. Unfortunately, you're also going to notice the sound quality isn't too good. We had some technical difficulties. But once you hit the 20 minute mark of the recording, the mics are going to kick in. It's going to sound way better.

So I hope you stick with it. Enjoy.

Ruby at Stitch Fix

[00:00:35] David: Stitch Fix was a Rails shop. I had done a lot of Rails and learned a lot of things that worked and didn't work, at least in that situation.

And so I started writing them down and I was like, I should probably make this more than just a document that I keep, you know, privately on my computer. Uh, so that's, you know, kind of, kind of where the genesis of that came from and just tried to, write everything down that I thought what worked, what didn't work.

Uh, if you're in a situation like me. Working on a product, with a medium sized, uh, team, then I think the lessons in there will be useful, at least some of them. Um, and I've been trying to keep it up over, over the years. I think the first version came out a couple years ago, so I've been trying to make sure it's always up to date with the latest stuff and, and Rails and based on my experience and all that.

[00:01:20] Jeremy: So it's interesting that you mention, medium sized team because, during the, the keynote, just a few moments ago, Matz the creator of Ruby was talking about how like, Oh, Rails is really suitable for this, this one person team, right? Small, small team. And, uh, he was like, you're not Google. So like, don't worry about, right.

Can you scale to that level? Yeah. Um, and, and I wonder like when you talk about medium size or medium scale, like what are, what are we talking?

[00:01:49] David: I think probably under 200 developers, I would say. because when I left Stitch Fix, it was closing in on that number of developers. And so it becomes, you know, hard to...

You can kind of know who everybody is, or at least the names sound familiar of everybody. But beyond that, it's just, it's just really hard. But a lot of it was like, I don't have experience at like a thousand developer company. I have no idea what that's like, but I definitely know that Rails can work for like...

200 ish people how you can make it work basically. yeah.

[00:02:21] Jeremy: The decision to use Rails, I'm assuming that was made before you joined?

[00:02:26] David: Yeah, the, um, the CTO of Stitch Fix, he had come in to clean up a mess made by contractors, as often happens. They had used Django, which is like the Python version of Rails.

And he, the CTO, he was more familiar with Rails. So the first two developers he hired, also familiar with Rails. There wasn't a lot to maintain with the Django app, so they were like, let's just start fresh, fresh with Rails. yeah, but it's funny because a lot of the code in that Rails app was, like, transliterated from Python.

So you could, it would, it looked like the strangest Ruby code in the world because it was basically, there was no test. So they were like, let's just write the Ruby version of this Python just so we know it works. but obviously that didn't, didn't last forever, so.

[00:03:07] Jeremy: So, so what's an example of a, of a tell?

Where you're looking at the code and you're like, oh, this is clearly, it came from Python.

[00:03:15] David: You'd see like, very, very explicit, right? Like Python, there's a lot of like single line things. very like, this sounds like a dig, but it's very simple looking code. Like, like I don't know Python, but I was able to change this Django app.

And I had to, I could look at it and you can figure out immediately how it works. Cause there's. Not much to it. There's nothing fancy. So, like, this, this Ruby code, there was nothing fancy. You'd be like, well, maybe they should have memoized that, or maybe they should have taken that into another class, or you could have done this with a hash or something like that.

So there was, like, none of that. It was just, like, really basic, plain code like you would see in any beginning programming language kind of thing. Which is at least nice. You can understand it. but you probably wouldn't have written it that way at first in Ruby.

Thoughts on Go

[00:04:05] Jeremy: Yeah, that's, that's interesting because, uh, people sometimes talk about the Go programming language and how it looks, I don't know if simple is the right word, but it's something where you look at the code and even if you don't necessarily understand Go, it's relatively straightforward.

Yeah. I wonder what your thoughts are on that being a strength versus that being, like,

[00:04:25] David: Yeah, so at Stitch Fix at one point we had a pro, we were moving off of Heroku and we were going to, basically build a deployment platform using ECS on AWS. And so the deployment platform was a Rails app and we built a command line tool using Ruby.

And it was fine, but it was a very complicated command line tool and it was very slow. And so one of the developers was like, I'm going to rewrite it in Go. I was like, ugh, you know, because I just was not a big fan. So he rewrote it in Go. It was a bazillion times faster. And then I was like, okay, I'm going to add, I'll add a feature to it.

It was extremely easy. Like, it's just like what you said. I looked at it, like, I don't know anything about Go. I know what is happening here. I can copy and paste this and change things and make it work for what I want to do. And it did work. And it was, it was pretty easy. so there's that, I mean, aesthetically it's pretty ugly and it's, I, I.

I can't really defend that as a real reason to not use it, but it is kind of gross. I did do Go, I did a small project in Go after Stitch Fix, and there's this vibe in Go about like, don't create abstractions. I don't know where I got that from, but every Go I look at, I'm like we should make an abstraction for this, but it's just not the vibe.

They just don't like doing that. They like it all written out. And I see the value because you can look at the code and know what it does and you don't have to chase abstractions anywhere. But. I felt like I was copying and pasting a lot of, a lot of things. Um, so I don't know. I mean, the, the team at Stitch Fix that did this like command line app in go, they're the platform team.

And so their job isn't to write like web apps all day, every day. There's kind of in and out of all kinds of things. They have to try to figure out something that they don't understand quickly to debug a problem. And so I can see the value of something like go if that's your job, right? You want to go in and see what the issue is.

Figure it out and be done and you're not going to necessarily develop deep expertise and whatever that thing is that you're kind of jumping into. Day to day though, I don't know. I think it would make me kind of sad. (laughs)

[00:06:18] Jeremy: So, so when you say it would make you kind of sad, I mean, what, what about it? Is it, I mean, you mentioned that there's a lot of copy and pasting, so maybe there's code duplication, but are there specific things where you're like, oh, I just don't?

[00:06:31] David: Yeah, so I had done a lot of Java in my past life and it felt very much like that. Where like, like the Go library for making an HTTP call for like, I want to call some web service. It's got every feature you could ever want. Everything is tweakable. You can really, you can see why it's designed that way.

To dial in some performance issue or solve some really esoteric thing. It's there. But the problem is if you just want to get an JSON, it's just like huge production. And I felt like that's all I really want to do and it's just not making it very easy. And it just felt very, very cumbersome. I think that having to declare types also is a little bit of a weird mindset because, I mean, I like to make types in Ruby, I like to make classes, but I also like to just use hashes and stuff to figure it out.

And then maybe I'll make a class if I figure it out, but Go, you can't. You have to have a class, you have to have a type, you have to think all that ahead of time, and it just, I'm not used to working that way, so it felt, I mean, I guess I could get used to it, but I just didn't warm up to that sort of style of working, so it just felt like I was just kind of fighting with the vibe of the language, kind of.

Yeah,

[00:07:40] Jeremy: so it's more of the vibe or the feel where you're writing it and you're like this seems a little too... Explicit. I feel like I have to be too verbose. It just doesn't feel natural for me to write this.

[00:07:53] David: Right, it's not optimized for what in my mind is the obvious case.

And maybe that's not the obvious case for the people that write Go programs. But for me, like, I just want to like get this endpoint and get the JSON back as a map. Not any easier than any other case, right? Whereas like in Ruby, right? And you can, I think if you include net HTTP, you can just type get. And it will just return whatever that is.

Like, that's amazing. It's optimized for what I think is a very common use case. So it makes me feel really productive. It makes me feel pretty good. And if that doesn't work out long term, I can always use something more complicated. But I'm not required to dig into the NetHttp library just to do what in my mind is something very simple.

[00:08:37] Jeremy: Yeah, I think that's something I've noticed myself in working with Ruby. I mean, you have the standard library that's very... Comprehensive and the API surface is such that, like you said there, when you're trying to do common tasks, a lot of times they have a call you make and it kind of does the thing you expected or hoped for.

[00:08:56] David: Yeah, yeah. It's kind of, I mean, it's that whole optimized for programmer happiness thing. Like it does. That is the vibe of Ruby and it seems like that is still the way things are. And, you know, I, I suppose if I had a different mindset, I mean, because I work with developers who did not like using Ruby or Rails.

They loved using Go or Java. And I, I guess there's probably some psychological analysis we could do about their background and history and mindset that makes that make sense. But, to me, I don't know. It's, it's nice when it's pleasant. And Ruby seems pleasant. (laughs)

Choosing Technology

[00:09:27] Jeremy: as a... Software Architect, or as a CTO, when, when you're choosing technology, what are some of the things you look at in terms of, you know?

[00:09:38] David: Yeah, I mean, I think, like, it's a weird criteria, but I think what is something that the team is capable of executing with? Because, like, most, right, most programming languages all kind of do the same thing. Like, you can kind of get most stuff done in most common popular programming languages. So, it's probably not...

It's not true that if you pick the wrong language, you can't build the app. Like, that's probably not really the case. At least for like a web app or something. so it's more like, what is the team that's here to do it? What are they comfortable and capable of doing? I worked on a project with...

It was a mix of like junior engineers who knew JavaScript, and then some senior engineers from Google. And for whatever reason someone had chosen a Rails app and none of them were comfortable or really yet competent with doing Ruby on Rails and they just all hated it and like it didn't work very well.

Um, and so even though, yes, Rails is a good choice for doing stuff for that team at that moment. Not a good choice. Right. So I think you have to go in and like, what, what are we going to be able to execute on so that when the business wants us to do something, we just do it. And we don't complain and we don't say, Oh, well we can't because this technology that we chose, blah, blah, blah.

Like you don't ever want to say that if possible. So I think that's. That's kind of the, the top thing. I think second would be how widely supported is it? Like you don't want to be the cutting edge user that's finding all the bugs in something really. Like you want to use something that's stable. Postgres, MySQL, like those work, those are fine.

The bugs have been sorted out for most common use cases. Some super fancy edge database, I don't know if I'd want to be doing, doing that you know?

Choosing cloud services

[00:11:15] Jeremy: How do you feel about the cloud specific services and databases? Like are you comfortable saying like, oh, I'm going to use... Google Cloud, BigQuery. Yeah.

[00:11:27] David: That sort of thing. I think it would kind of fall under the same criteria that I was just, just saying like, so with AWS it's interesting 'cause when we moved from Heroku to AWS by EC2 RDS, their database thing, uh, S3, those have been around for years, probably those are gonna work, but they always introduce new things.

Like we, we use RabbitMQ and AWS came out with. Some, I forget what it was, it was a queuing service similar to Rabbit. We were like, Oh, maybe we should switch to that. But it was clear that they weren't really ready to support it. So. Yeah, so we didn't, we didn't switch to that. So I, you gotta try to read the tea leaves of the provider to see are they committed to, to supporting this thing or is this there to get some enterprise client to move into the cloud.

And then the idea is to move off of that transitional thing into what they do support. And it's hard to get a clear answer from them too. So it takes a little bit of research to figure out, Are they going to support this or not? Because that's what you don't want. To move everything into some very proprietary cloud system and have them sunset it and say, Oh yeah, now you've got to switch again.

Uh, that kind of sucks. So, it's a little trickier.

[00:12:41] Jeremy: And what kind of questions or research do you do? Is it purely a function of this thing has existed for X number of years so I feel okay?

[00:12:52] David: I mean, it's kind of similar to looking at like some gem you're going to add to your project, right? So you'll, you'll look at how often does it change?

Is it being updated? Uh, what is the documentation? Does it look like someone really cared about the documentation? Does the documentation look updated? Are there issues with it that are being addressed or, or not? Um, so those are good signals. I think, talking to other practitioners too can be good. Like if you've got someone who's experienced.

You can say, hey, do you know anybody back channeling through, like, everybody knows somebody that works at AWS, you can probably try to get something there. at Stitch Fix, we had an enterprise support contract, and so your account manager will sometimes give you good information if you ask. Again, it's a, they're not going to come out and say, don't use this product that we have, but they might communicate that in a subtle way.

So you have to triangulate from all these sources to try to. to try to figure out what, what you want to do.

[00:13:50] Jeremy: Yeah, it kind of

makes me wish that there was a, a site like, maybe not quite like, can I use, right? Can I use, you can see like, oh, can I use this in my browser? Is there, uh, like an AWS or a Google Cloud?

Can I trust this? Can I trust this? Yeah. Is this, is this solid or not?

[00:14:04] David: Right, totally. It's like, there's that, that site where you, it has all the Apple products and it says whether or not you should buy it because one may or may not be coming out or they may be getting rid of it. Like, yeah, that would... For cloud services, that would be, that would be nice.

[00:14:16] Jeremy: Yeah, yeah. That's like the Mac Buyer's Guide. And then we, we need the, uh, the technology. Yeah. Maybe not buyers. Cloud Provider Buyer's Guide, yeah. I guess we are buyers.

[00:14:25] David: Yeah, yeah, totally, totally.

[00:14:27] Jeremy: it's interesting that you, you mentioned how you want to see that, okay, this thing is mature. I think it's going to stick around because, I, interviewed, someone who worked on, I believe it was the CloudWatch team.

Okay. Daniel Vassalo, yeah. so he left AWS, uh, after I think about 10 years, and then he wrote a book called, uh, The Good Parts of AWS. Oh! And, if you read his book, most of the services he says to use are the ones that are, like, old. Yeah. He's, he's basically saying, like, S3, you know you're good.

Yeah. Right? but then all these, if you look at the AWS webpage, they have who knows, I don't know how many hundreds of services. Yeah. He's, he's kind of like I worked there and I would not use, you know, all these new services. 'cause I myself, I don't trust

[00:15:14] David: it yet. Right. And so, and they're working there?

Yeah, they're working there. Yeah. No. One of the VPs at Stitch Fix had worked on Google Cloud and so when we were doing this transition from Heroku, he was like, we are not using Google Cloud. I was like, really? He's like AWS is far ahead of the game. Do not use Google Cloud. I was like, all right, I don't need any more info.

You work there. You said don't. I'm gonna believe you. So

[00:15:36] Jeremy: what, what was his did he have like a core point?

[00:15:39] David: Um, so he never really had anything bad to say about Google per se. Like I think he enjoyed his time there and I think he thought highly of who he worked with and what he worked on and that sort of thing.

But his, where he was coming from was like AWS was so far ahead. of Google on anything that we would use, he was like, there's, there's really no advantage to, to doing it. AWS is a known quantity, right? it's probably still the case. It's like, you know, you've heard the nobody ever got fired for using IBM or using Microsoft or whatever the thing is.

Like, I think that's, that was kind of the vibe. And he was like, moving all of our infrastructure right before we're going to go public. This is a serious business. We should just use something that we know will work. And he was like, I know this will work. I'm not confident about. Google, uh, for our use case.

So we shouldn't, we shouldn't risk it. So I was like, okay, I trust you because I didn't know anything about any of that stuff at the time. I knew Heroku and that was it. So, yeah.

[00:16:34] Jeremy: I don't know if it's good or bad, but like you said, AWS seems to be the default choice. Yeah. And I mean, there's people who use Azure.

I assume it's mostly primarily Microsoft. Yeah. And then there's Google Cloud. It's not really clear why you would pick it, unless there was a specific service or something that only they had.

[00:16:55] David: Yeah, yeah. Or you're invested in Google, you know, you want to keep everything there. I mean, I don't know. I haven't really been at that level to make that kind of decision, and I would probably choose AWS for the reasons discussed, but, yeah.

Moving off Heroku

[00:17:10] Jeremy: And then, so at Stitch Fix, you said you moved off of Heroku

[00:17:16] David: yeah.

Yeah, so we were heavy into Heroku. I think that we were told that at one point we had the biggest Heroku Postgres database on their platform. Not a good place to be, right? You never want to be the biggest customer person, usually. but the problem we were facing was essentially we were going to go public.

And to do that, you're under all the scrutiny. about many things, including the IT systems and the security around there. So, like, by default, a Postgres, a Heroku Postgres database is, like, on the internet. It's only secured by the password. all their services are on the internet. So, not, not ideal. they were developing their private cloud service at that time.

And so that would have given us, in theory, on paper, it would have solved all of our problems. And we liked Heroku and we liked the developer experience. It was great. but...

Heroku private spaces, it was still early. There's a lot of limitations that when they explained why those limitations, they were reasonable. And if we had. started from scratch on Heroku Private Spaces. It probably would have worked great, but we hadn't.

So we just couldn't make it work. So we were like, okay, we're going to have to move to AWS so that everything can be basically off the internet. Like our public website needs to be on the internet and that's kind of it. So we need to, so that's basically was the, was the impetus for that. but it's too bad because I love Heroku.

It was great. I mean, they were, they were a great partner. They were great. I think if Stitch Fix had started life a year later, Private Spaces. Now it's, it's, it's way different than it was then. Cause it's been, it's a mature product now, so we could have easily done that, but you know, the timing didn't work out, unfortunately.

[00:18:50] Jeremy: And that was a compliance thing to,

[00:18:53] David: Yeah. And compliance is weird cause they don't tell you what to do, but they give you some parameters that you need to meet. And so one of them is like how you control access. So, so going public, the compliance is around the financial data and. Ensuring that the financial data is accurate. So a lot of the systems at Stichfix were storing the financial data.

We, you know, the warehouse management system was custom made. Uh, all the credit card processing was all done, like it was all in some databases that we had running in Heroku. And so those needed to be subject to stricter security than we could achieve with just a single password that we just had to remember to rotate when someone like left the team.

So that was, you know, the kind of, the kind of impetus for, for all of that.

[00:19:35] Jeremy: when you were using Heroku, Salesforce would have already owned it then. Did you, did you get any sense that you weren't really sure about the future of the platform while you're on it or,

[00:19:45] David: At that time, no, it seemed like they were still innovating. So like, Heroku has a Redis product now. They didn't at the time we wish that they did. They told us they're working on it, but it wasn't ready. We didn't like using the third parties. Kafka was not a thing. We very much were interested in that.

We would have totally used it if it was there. So they were still. Like doing bigger innovations then, then it seems like they are now. I don't know. It's weird. Like they're still there. They still make money, I assume for Salesforce. So it doesn't feel like they're going away, but they're not innovating at the pace that they were kind of back in the day.

[00:20:20] Jeremy: it used to feel like when somebody's asking, I want to host a Rails app. Then you would say like, well, use Heroku because it's basically the easiest to get started. It's a known quantity and it's, it's expensive, but, it seemed for, for most people, it was worth it. and then now if I talk to people, it's like.

Not what people suggest anymore.

[00:20:40] David: Yeah, because there's, there's actual competitors. It's crazy to me that there was no competitors for years, and now there's like, Render and Fly. io seem to be the two popular alternatives. Um, I doubt they're any cheaper, honestly, but... You get a sense, right, that they're still innovating, still building those platforms, and they can build with, you know, all of the knowledge of what has come before them, and do things differently that might, that might help.

So, I still use Heroku for personal things just because I know it, and I, you know, sometimes you don't feel like learning a new thing when you just want to get something done, but, yeah, I, I don't know if we were starting again, I don't know, maybe I'd look into those things. They, they seem like they're getting pretty mature and.

Heroku's resting on its laurels, still.

[00:21:26] Jeremy: I guess I never quite the mindset, right? Where you You have a platform that's doing really well and people really like it and you acquire it and then it just It seems like you would want to keep it rolling, right? (laughs)

[00:21:38] David: Yeah, it's, it is wild, I mean, I guess... Why did you, what was Salesforce thinking they were going to get? Uh, who knows maybe the person at Salesforce that really wanted to purchase it isn't there. And so no one at Salesforce cares about it. I mean, there's all these weird company politics that like, who knows what's going on and you could speculate.

all day. What's interesting is like, there's definitely some people in the Ruby community who work there and still are working there. And that's like a little bit of a canary for me. I'm like, all right, well, if that person's still working there, that person seems like they're on the level and, and, and, and seems pretty good.

They're still working there. It, it's gotta be still a cool place to be or still doing something, something good. But, yeah, I don't know. I would, I would love to know what was going on in all the Salesforce meetings about acquiring that, how to manage it. What are their plans for it? I would love to know that stuff.

[00:22:29] Jeremy: maybe you had some experience with this at Stitch Fix But I've heard with Heroku some of their support staff at least in the past they would, to some extent, actually help you troubleshoot, like, what's going on with your app. Like, if your app is, like, using a whole bunch of memory, and you're out of memory, um, they would actually kind of look into that, for you, which is interesting, because it's like, that's almost like a services thing than it is just a platform.

[00:22:50] David: Yeah. I mean, they, their support, you would get, you would get escalated to like an engineer sometimes, like who worked on that stuff and they would help figure out what the problem was. Like you got the sense that everybody there really wanted the platform to be good and that they were all sort of motivated to make sure that everybody.

You know, did well and used the platform. And they also were good at, like a thing that trips everybody up about Heroku is that your app restarts every day. And if you don't know anything about anything, you might think that is stupid. Why, why would I want that? That's annoying. And I definitely went through that and I complained to them a lot.

And I'm like, if you only could not restart. And they very patiently and politely explained to me why that it needed to do that, they weren't going to remove that, and how to think about my app given that reality, right? Which is great because like, what company does that, right? From the engineers that are working on it, like No, nobody does that.

So, yeah, no, I haven't escalated anything to support at Heroku in quite some time, so I don't know if it's still like that. I hope it is, but I'm not really, not really sure.

Building a platform team

[00:23:55] Jeremy: Yeah, that, uh, that reminds me a little bit of, I think it's Rackspace? There's, there's, like, another hosting provider that was pretty popular before, and they... Used to be famous for that type of support, where like your, your app's having issues and somebody's actually, uh, SSHing into your box and trying to figure out like, okay, what's going on?

which if, if that's happening, then I, I can totally see where the, the price is justified. But if the support is kind of like dropping off to where it's just, they don't do that kind of thing, then yeah, I can see why it's not so much of a, yeah,

[00:24:27] David: We used to think of Heroku as like they were the platform team before we had our own platform team and they, they acted like it, which was great.

[00:24:35] Jeremy: Yeah, I don't have, um, experience with, render, but I, I, I did, talk to someone from there, and it does seem like they're, they're trying to fill that role, um, so, yeah, hopefully, they and, and other companies, I guess like Vercel and things like that, um, they're, they're all trying to fill that space,

[00:24:55] David: Yeah, cause, cause building our own internal platform, I mean it was the right thing to do, but it's, it's a, you can't just, you have to have a team on it, it's complicated, getting all the stuff in AWS to work the way you want it to work, to have it be kind of like Heroku, like it's not trivial. if I'm a one person company, I don't want to be messing around with that particularly. I want to just have it, you know, push it up and have it go and I'm willing to pay for that. So it seems logical that there would be competitors in that space. I'm glad there are. Hopefully that'll light a fire under, under everybody.

[00:25:26] Jeremy: so in your case, it sounds like you moved to having your own platform team and stuff like that, uh, partly because of the compliance thing where you're like, we need our, we need to be isolated from the internet. We're going to go to AWS. If you didn't have that requirement, do you still think like that would have been the time to, to have your own platform team and manage that all yourself?

[00:25:46] David: I don't know. We, we were thinking an issue that we were running into when we got bigger, um, was that, I mean, Heroku, it, It's obviously not as flexible as AWS, but it is still very flexible. And so we had a lot of internal documentation about this is how you use Heroku to do X, Y, and Z. This is how you set up a Stitch Fix app for Heroku.

Like there was just the way that we wanted it to be used to sort of. Just make it all manageable. And so we were considering having a team spun up to sort of add some tooling around that to sort of make that a little bit easier for everybody. So I think there may have been something around there. I don't know if it would have been called a platform team.

Maybe we call, we thought about calling it like developer happiness or because you got developer experience or something. We, we probably would have had something there, but. I do wonder how easy it would have been to fund that team with developers if we hadn't had these sort of business constraints around there.

yeah, um, I don't know. You get to a certain size, you need some kind of manageability and consistency no matter what you're using underneath. So you've got to have, somebody has to own it to make sure that it's, that it's happening.

[00:26:50] Jeremy: So even at your, your architect level, you still think it would have been a challenge to, to. Come to the executive team and go like, I need funding to build this team.

[00:27:00] David: You know, certainly it's a challenge because everybody, you know, right? Nobody wants to put developers in anything, right? There are, there are a commodity and I mean, that is kind of the job of like, you know, the staff engineer or the architect at a company is you don't have, you don't have the power to put anybody on anything you, you have the power to Schedule a meeting with a VP or the CTO and they will listen to you.

And that's basically, you've got to use that power to convince them of what you want done. And they're all reasonable people, but they're balancing 20 other priorities. So it would, I would have had to, it would have been a harder case to make that, Hey, I want to take three engineers. And have them write tooling to make Heroku easier to use.

What? Heroku is not easy to use. Why aren't, you know, so you really, I would, it would be a little bit more of a stretch to walk them through it. I think a case could be made, but, definitely would take some more, more convincing than, than what was needed in our case.

[00:27:53] Jeremy: Yeah. And I guess if you're able to contrast that with, you were saying, Oh, I need three people to help me make Heroku easier. Your actual platform team on AWS, I imagine was much larger, right?

[00:28:03] David: Initially it was, there was, it was three people did the initial move over. And so by the time we went public, we'd been on this new system for, I don't know, six to nine months. I can't remember exactly. And so at that time the platform team was four or five people, and I, I mean, so percentage wise, right, the engineering team was maybe almost 200, 150, 200.

So percentage wise, maybe a little small, I don't know. but it kind of gets back to the power of like the rails and the one person framework. Like everything we did was very much the same And so the Rails app that managed the deployment was very simple. The, the command line app, even the Go one with all of its verbosity was very, very simple.

so it was pretty easy for that small team to manage. but, Yeah, so it was sort of like for redundancy, we probably needed more than three or four people because you know, somebody goes out sick or takes a vacation. That's a significant part of the team. But in terms of like just managing the complexity and building it and maintaining it, like it worked pretty well with, you know, four or five people.

Where Rails fits in vs other technology

[00:29:09] Jeremy: So during the Keynote today, they were talking about how companies like GitHub and Shopify and so on, they're, they're using Rails and they're, they're successful and they're fairly large. but I think the thing that was sort of unsaid was the fact that. These companies, while they use Rails, they use a lot of other, technology as well.

And, and, and kind of increasing amounts as well. So, I wonder from your perspective, either from your experience at StitchFix or maybe going forward, what is the role that, that Ruby and Rails plays? Like, where does it make sense for that to be used versus like, Okay, we need to go and build something in Java or, you know, or Go, that sort of thing?

[00:29:51] David: right. I mean, I think for like your standard database backed web app, it's obviously great. especially if your sort of mindset bought into server side rendering, it's going to be great at that. so like internal tools, like the customer service dashboard or... You know, something for like somebody who works at a company to use.

Like, it's really great because you can go super fast. You're not going to be under a lot of performance constraints. So you kind of don't even have to think about it. Don't even have to solve it. You can, but you don't have to, where it wouldn't work, I guess, you know, if you have really strict performance.

Requirements, you know, like a, a Go version of some API server is going to use like percentages of what, of what Rails would use. If that's meaningful, if what you're spending on memory or compute is, is meaningful, then, then yeah. That, that becomes worthy of consideration. I guess if you're, you know, if you're making a mobile app, you probably need to make a mobile app and use those platforms.

I mean, I guess you can wrap a Rails app sort of, but you're still making, you still need to make a mobile app, that does something. yeah. And then, you know, interestingly, the data science part of Stitch Fix was not part of the engineering team. They were kind of a separate org. I think Ruby and Rails was probably the only thing they didn't use over there.

Like all the ML stuff, everything is either Java or Scala or Python. They use all that stuff. And so, yeah, if you want to do AI and ML with Ruby, you, it's, it's hard cause there's just not a lot there. You really probably should use Python. It'll make your life easier. so yeah, those would be some of the considerations, I guess.

[00:31:31] Jeremy: Yeah, so I guess in the case of, ML, Python, certainly, just because of the, the ecosystem, for maybe making a command line application, maybe Go, um, Go or Rust, perhaps,

[00:31:44] David: Right. Cause you just get a single binary. Like the problem, I mean, I wrote this book on Ruby command line apps and the biggest problem is like, how do I get the Ruby VM to be anywhere so that it can then run my like awesome scripts? Like that's kind of a huge pain. (laughs) So

[00:31:59] Jeremy: and then you said, like, if it's Very performance sensitive, which I am kind of curious in, in your experience with the companies you've worked at, when you're taking on a project like that, do you know up front where you're like, Oh, the CPU and memory usage is going to be a problem, or is it's like you build it and you're like, Oh, this isn't working.

So now I know.

[00:32:18] David: yeah, I mean, I, I don't have a ton of great experience there at Stitch Fix. The biggest expense the company had was the inventory. So like the, the cost of AWS was just de minimis compared to all that. So nobody ever came and said, Hey, you've got to like really save costs on, on that stuff. Cause it just didn't really matter.

at the, the mental health startup I was at, it was too early. But again, the labor costs were just far, far exceeded the amount of money I was spending on, on, um, you know, compute and infrastructure and stuff like that. So, Not knowing anything, I would probably just sort of wait and see if it's a problem.

But I suppose you always take into account, like, what am I actually building? And like, what does this business have to scale to, to make it worthwhile? And therefore you can kind of do a little bit of planning ahead there. But, I dunno, I think it would kind of have to depend.

[00:33:07] Jeremy: There's a sort of, I guess you could call it a meme, where people say like, Oh, it's, it's not, it's not Rails that's slow, it's the, the database that's slow. And, uh, I wonder, is that, is that accurate in your experience, or,

[00:33:20] David: I mean, most of the stuff that we had that was slow was the database, because like, it's really easy to write a crappy query in Rails if you're not, if you're not careful, and then it's really easy to design a database that doesn't have any indexes if you're not careful. Like, you, you kind of need to know that, But of course, those are easy to fix too, because you just add the index, especially if it's before the database gets too big where we're adding indexes is problematic. But, I think those are just easy performance mistakes to make. Uh, especially with Rails because you're not, I mean, a lot of the Rails developers at Citrix did not know SQL at all.

I mean, they had to learn it eventually, but they didn't know it at all. So they're not even knowing that what they're writing could possibly be problematic. It's just, you're writing it the Rails way and it just kind of works. And at a small scale, it does. And it doesn't matter until, until one day it does.

[00:34:06] Jeremy: And then in, in the context of, let's say, using ActiveRecord and instantiating the objects, or, uh, the time it takes to render templates, that kinds of things, to, at least in your experience, that wasn't such of an issue.

[00:34:20] David: No, and it was always, I mean, whenever we looked at why something was slow, it was always the database and like, you know, you're iterating over some active records and then, and then, you know, you're going into there and you're just following this object graph. I've got a lot of the, a lot of the software at Stitch Fix was like internal stuff and it was visualizing complicated data out of the database.

And so if you didn't think about it, you would just start dereferencing and following those relationships and you have this just massive view and like the HTML is fine. It's just that to render this div, you're. Digging into some active record super deep. and so, you know, that was usually the, the, the problems that we would see and they're usually easy enough to fix by making an index or.

Sometimes you do some caching or something like that. and that solved most of the, most of the issues

[00:35:09] Jeremy:

The different ways people learn

[00:35:09] Jeremy: so you're also the author of the book, Sustainable Web Development with Ruby on Rails. And when you talk to people about like how they learn things, a lot of them are going on YouTube, they're going on, uh, you know, looking for blogs and things like that.

And so as an author, what do you think the role is of, of books now?

Yeah,

[00:35:29] David: I have thought about this a lot, because I, when I first got started, I'm pretty old, so books were all you had, really. Um, so they seem very normal and natural to me, but... does someone want to sit down and read a 400 page technical book? I don't know. so Dave Thomas who runs Pragmatic Bookshelf, he was on a podcast and was asked the same question and basically his answer, which is my answer, is like a long form book is where you can really lay out your thinking, really clarify what you mean, really take the time to develop sometimes nuanced, examples or nuanced takes on something that are Pretty hard to do in a short form video or in a blog post.

Because the expectation is, you know, someone sends you an hour long YouTube video, you're probably not going to watch that. Two minute YouTube video is sure, but you can't, you can't get into so much, kind of nuanced detail. And so I thought that was, was right. And that was kind of my motivation for writing.

I've got some thoughts. They're too detailed. It's, it's too much set up for a blog post. There's too much of a nuanced element to like, really get across. So I need to like, write more. And that means that someone's going to have to read more to kind of get to it. But hopefully it'll be, it'll be valuable. one of the sessions that we're doing later today is Ruby content creators, where it's going to be me and Noel Rappin and Dave Thomas representing the old school dudes that write books and probably a bunch of other people that do, you know, podcasts videos. It'd be interesting to see, I really want to know how do people learn stuff?

Because if no one reads books to learn things, then there's not a lot of point in doing it. But if there is value, then, you know. It should be good and should be accessible to people. So, that's why I do it. But I definitely recognize maybe I'm too old and, uh, I'm not hip with the kids or, or whatever, whatever the case is.

I don't know.

[00:37:20] Jeremy: it's tricky because, I think it depends on where you are in the process of learning that thing. Because, let's say, you know a fair amount about the technology already. And you look at a book, in a lot of cases it's, it's sort of like taking you from nothing to something. And so you're like, well, maybe half of this isn't relevant to me, but then if I don't read it, then I'm probably missing a lot still.

And so you're in this weird in be in between zone. Another thing is that a lot of times when people are trying to learn something, they have a specific problem. And, um, I guess with, with books, it's, you kind of don't know for sure if the thing you're looking for is going to be in the book.

[00:38:13] David: I mean, so my, so my book, I would not say as a beginner, it's not a book to learn how to do Rails. It's like you already kind of know Rails and you want to like learn some comprehensive practices. That's what my book is for. And so sometimes people will ask me, I don't know Rails, should I get your book? And I'm like, no, you should not.

but then you have the opposite thing where like the agile web development with Rails is like the beginner version. And some people are like, Oh, it's being updated for Rails 7. Should I get it? I'm like, probably not because How to go from zero to rails hasn't changed a lot in years. There's not that much that's going to be new.

but, how do you know that, right? Hopefully the Table of Contents tells you. I mean, the first book I wrote with Pragmatic, they basically were like, The Table of Contents is the only thing the reader, potential reader is going to have to have any idea what's in the book. So, You need to write the table of contents with that in mind, which may not be how you'd write the subsections of a book, but since you know that it's going to serve these dual purposes of organizing the book, but also being promotional material that people can read, you've got to keep that in mind, because otherwise, how does anybody, like you said, how does anybody know what's, what's going to be in there?

And they're not cheap, I mean, these books are 50 bucks sometimes, and That's a lot of money for people in the U. S. People outside the U. S. That's a ton of money. So you want to make sure that they know what they're getting and don't feel ripped off.

[00:39:33] Jeremy: Yeah, I think the other challenge is, at least what I've heard, is that... When people see a video course, for whatever reason, they, they set, like, a higher value to it. They go, like, oh, this video course is, 200 dollars and it's, like, seems like a lot of money, but for some people it's, like, okay, I can do that. But then if you say, like, oh, this, this book I've been researching for five years, uh, I want to sell it for a hundred bucks, people are going to be, like no. No way.,

[00:40:00] David: Yeah. Right. A hundred bucks for a book. There's no way. That's a, that's a lot. Yeah. I mean, producing video, I've thought about doing video content, but it seems so labor intensive. Um, and it's kind of like, It's sort of like a performance. Like I was mentioning before we started that I used to play in bands and like, there's a lot to go into making an even mediocre performance.

And so I feel like, you know, video content is the same way. So I get that it like, it does cost more to produce, but, are you getting more information out of it? I, that, I don't know, like maybe not, but who knows? I mean, people learn things in different ways. So,

[00:40:35] Jeremy: It's just like this perception thing, I think. And, uh, I'm not sure why that is. Um,

[00:40:40] David: Yeah, maybe it's newer, right? Maybe books feel older so they're easier to make and video seems newer. I mean, I don't know. I would love to talk to engineers who are like... young out of college, a few years into their career to see what their perception of this stuff is. Cause I mean, there was no, I mean, like I said, I read books cause that's all there was.

There was no, no videos. You, you go to a conference and you read a book and that was, that was all you had. so I get it. It seems a whole video. It's fancier. It's newer. yeah, I don't know. I would love to hear a wide variety of takes on it to see what's actually the, the future, you know?

[00:41:15] Jeremy: sure, yeah. I mean, I think it probably can't just be one or the other, right? Like, I think there are... Benefits of each way. Like, if you have the book, you can read it at your own pace without having to, like, scroll through the video, and you can easily copy and paste the, the code segments,

[00:41:35] David: Search it. Go back and forth.

[00:41:36] Jeremy: yeah, search it.

So, I think there's a place for it, but yeah, I think it would be very interesting, like you said, to, to see, like, how are people learning,

[00:41:45] David: Right. Right. Yeah. Well, it's the same with blogs and podcasts. Like I, a lot of podcasters I think used to be bloggers and they realized that like they can get out what they need by doing a podcast. And it's way easier because it's more conversational. You don't have to do a bunch of research. You don't have to do a bunch of editing.

As long as you're semi coherent, you can just have a conversation with somebody and sort of get at some sort of thing that you want to talk about or have an opinion about. And. So you, you, you see a lot more podcasts and a lot less blogs out there because of that. So it's, that's kind of like the creators I think are kind of driving that a little bit.

yeah. So I don't know.

[00:42:22] Jeremy: Yeah, I mean, I can, I can say for myself, the thing about podcasts is that it's something that I can listen to while I'm doing something else. And so you sort of passively can hopefully pick something up out of that conversation, but... Like, I think it's maybe not so good at the details, right? Like, if you're talking code, you can talk about it over voice, but

can you really visualize it?

Yeah, yeah, yeah. I think if you sit down and you try to implement something somebody talked about, you're gonna be like, I don't know what's happening.

[00:42:51] David: Yeah.

[00:42:52] Jeremy: So, uh, so, so I think there's like these, these different roles I think almost for so like maybe you know the podcast is for you to Maybe get some ideas or get some familiarity with a thing and then when you're ready to go deeper You can go look at a blog post or read a book I think video kind of straddles those two where sometimes video is good if you want to just see, the general concept of a thing, and have somebody explain it to you, maybe do some visuals.

that's really good. but then it can also be kind of detailed, where, especially like the people who stream their process, right, you can see them, Oh, let's, let's build this thing together. You can ask me questions, you can see how I think. I think that can be really powerful. at the same time, like you said, it can be hard to say, like, you know, I look at some of the streams and it's like, oh, this is a three hour stream and like, well, I mean, I'm interested.

I'm interested, but yeah, it's hard enough for me to sit through a, uh, a three hour movie,

[00:43:52] David: Well, then that, and that gets into like, I mean, we're, you know, we're at a conference and they, they're doing something a little, like, there are conference talks at this conference, but there's also like. sort of less defined activities that aren't a conference talk. And I think that could be a reaction to some of this too.

It's like I could watch a conference talk on, on video. How different is that going to be than being there in person? maybe it's not that different. Maybe, maybe I don't need to like travel across the country to go. Do something that I could see on video. So there's gotta be something here that, that, that meets that need that I can't meet any other way.

So it's all these different, like, I would like to think that's how it is, right? All this media all is a part to play and it's all going to kind of continue and thrive and it's not going to be like, Oh, remember books? Like maybe, but hopefully not. Hopefully it's like, like what you're saying. Like it's all kind of serving different purposes that all kind of work together.

Yeah.

[00:44:43] Jeremy: I hope that's the case, because, um, I don't want to have to scroll through too many videos.

[00:44:48] David: Yeah. The video's not for me.

Large Language Models

[00:44:50] Jeremy: I, I like, I actually do find it helpful, like, like I said, for the high level thing, or just to see someone's thought process, but it's like, if you want to know a thing, and you have a short amount of time, maybe not the best, um, of course, now you have all the large language model stuff where you like, you feed the video in like, Hey, tell, tell, tell me, uh, what this video is about and give me the code snippets and all that stuff.

I don't know how well it works, but it seems

[00:45:14] David: It's gotta get better. Cause you go to a support site and they're like, here's how to fix your problem, and it's a video. And I'm like, can you just tell me? But I'd never thought about asking the AI to just look at the video and tell me. So yeah, it's not bad.

[00:45:25] Jeremy: I think, that's probably where we're going. So it's, uh, it's a little weird to think about, but,

[00:45:29] David: yeah, yeah. I was just updating, uh, you know, like I said, I try to keep the book updated when new versions of Rails come out, so I'm getting ready to update it for Rails 7.

1 and in Amazon's, Kindle Direct Publishing as their sort of backend for where you, you know, publish like a Kindle book and stuff, and so they added a new question, was AI used in the production of this thing or not? And if you answer yes, they want you to say how much, And I don't know what they're gonna do with that exactly, but I thought it was pretty interesting, cause I would be very disappointed to pay 50 for a book that the AI wrote, right?

So it's good that they're asking that? Yeah.

[00:46:02] Jeremy: I think the problem Amazon is facing is where people wholesale have the AI write the book, and the person either doesn't review it at all, or maybe looks at a little, a little bit. And, I mean, the, the large language model stuff is very impressive, but If you have it generate a technical book for you, it's not going to be good.

[00:46:22] David: yeah. And I guess, cause cause like Amazon, I mean, think about like Amazon scale, like they're not looking at the book at all. Like I, I can go click a button and have my book available and no person's going to look at it. they might scan it or something maybe with looking for bad words. I don't know, but there's no curation process there.

So I could, yeah. I could see where they could have that, that kind of problem. And like you as the, as the buyer, you don't necessarily, if you want to book on something really esoteric, there are a lot of topics I wish there was a book on that there isn't. And as someone generally want to put it on Amazon, I could see a lot of people buying it, not realizing what they're getting and feeling ripped off when it was not good.

[00:47:00] Jeremy: Yeah, I mean, I, I don't know, if it's an issue with the, the technical stuff. It probably is. But I, I know they've definitely had problems where, fiction, they have people just generating hundreds, thousands of books, submitting them all, just flooding it.

[00:47:13] David: Seeing what happens.

[00:47:14] Jeremy: And, um, I think that's probably... That's probably the main reason why they ask you, cause they want you to say like, uh,

yeah, you said it wasn't.

And so now we can remove your book.

[00:47:24] David: right. Right. Yeah. Yeah.

[00:47:26] Jeremy: I mean, it's, it's not quite the same, but it's similar to, I don't know what Stack Overflow's policy is now, but, when the large language model stuff started getting big, they had a lot of people answering the questions that were just. Pasting the question into the model

[00:47:41] David: Which because they got it from

[00:47:42] Jeremy: and then

[00:47:43] David: The Got model got it from Stack Overflow.

[00:47:45] Jeremy: and then pasting the answer into Stack Overflow and the person is not checking it.

Right. So it's like, could be right, could not be right. Um, cause, cause to me, it's like, if, if you generate it, if you generate the answer and the answer is right, and you checked it, I'm okay with that.

[00:48:00] David: Yeah. Yeah.

[00:48:01] Jeremy: but if you're just like, I, I need some karma, so I'm gonna, I'm gonna answer these questions with, with this bot, I mean, then maybe

[00:48:08] David: I could have done that. You're not adding anything.

Yeah, yeah.

[00:48:11] Jeremy: it's gonna be a weird, weird world, I think.

[00:48:12] David: Yeah, no kidding. No kidding.

[00:48:15] Jeremy: that's a, a good place to end it on, but is there anything else you want to mention,

[00:48:19] David: No, I think we covered it all just yeah, you could find me online. I'm Davetron5000 on Ruby. social Mastodon, I occasionally post on Twitter, but not that much anymore. So Mastodon's a place to go.

[00:48:31] Jeremy: David, thank you so much

[00:48:32] David: All right. Well, thanks for having me.

ChaelCodes on The Joy of Programming Games and Streaming (RubyConf 2023)

Wed, 15 Nov 2023 08:06:44 -0000

Episode Notes

Rachael Wright-Munn (ChaelCodes) talks about her love of programming games (games with programming elements in them, not how to make games!), starting her streaming career with regex crosswords, and how streaming games and open source every week led her to a voice acting role in one of her favorite programming games.

Recorded at RubyConf 2023 in San Diego.

mastodon twitch Personal website

Programming Games mentioned:

Transcript

You can help edit this transcript on GitHub.

Jeremy: I'm here at RubyConf San Diego with Rachel Wright-Munn, and she goes by Chaelcodes online. Thanks for joining me today.

Rachael: Hi, everyone. Hi, Jeremy. Really excited to be here.

Jeremy: So probably the first thing I'll ask about is on your web page, and I've noticed you have streams, you say you have an interest in not just regular games, but programming games, so.

Rachael: Oh my gosh, I'm so glad you asked about this. Okay, so I absolutely love programming games. When I first started streaming, I did it with Regex Crossword. What I really like about it is the fact that you have this joyful environment where you can solve puzzles and work with programming, and it's really focused on the experience and the joy. Are you familiar with Zach Barth of Zachtronics?

Jeremy: Yeah. So, I've tried, what was it? There's TIS-100. And then there's the, what was the other one? He had one that's...

Rachael: Opus Magnum? Shenzhen I/O?

Jeremy: Yeah, Shenzhen I/O.

Rachael: Oh, my gosh. Shenzhen I/O is fantastic. I absolutely love that. The whole conceit of it, which is basically that you're this electronics engineer who's just moved to Shenzhen because you can't find a job in the States. And you're trying to like build different solutions for these like little puzzles and everything.

It was literally one of the, I think that was the first programming game that really took off just because of the visuals and everything.

And it's one of my absolute favorites. I really like what he says about it in terms of like testing environments and the developer experience.

Cause it's built based on assembly, right?

He's made a couple of modifications. Like he's talked about it before where it's like The memory allocation is different than what it would actually look like in assembly and the way the registers are handled I believe is different, I wouldn't think of assembly as something that's like fun to write, but somehow in this game it is. How far did you get in it?

Jeremy: Uh, so I didn't get too far. So, because like, I really like the vibe and sort of the environment and the whole concept, right, of you being like, oh, you've been shipped off to China because that's the only place that these types of jobs are, and you're working on these problems with bad documentation and stuff like that. And I like the whole concept, but then the actual writing of the software, I was like, I don't know.

Rachael: And it's so hard, one of the interesting things about that game is you have components that you drop on the board and you have to connect them together and wire them, but then each component only has a specific number of lines. So like half the time I would be like, oh, I have this solution, but I don't have enough lines to actually run it or I can't fit enough components, then you have to go in and refactor it and everything. And it's just such a, I don't know, it's so much fun for me. I managed to get through all of the bonus levels and actually finish it. Some of them are just real, interesting from both a story perspective and interesting from a puzzle perspective. I don't wanna spoil it too much. You end up outside Shenzhen, I'll just say that.

Jeremy: OK. That's some good world building there.

Rachael: Yeah.

Jeremy: Because in your professional life, you do software development work. So I wonder, what is it about being in a game format where you're like, I'm in it. I can do it more. And this time, I'm not even being paid. I'm just doing it for fun.

Rachael: I think for me, software development in general is a very joyful experience. I love it. It's a very human thing. If you think about it like math, language, all these things are human concepts and we built upon that in order to build software in our programs and then on top of that, like the entire purpose of everything that we're building is for humans, right? Like they don't have rats running programs, you know what I mean? So when I think about human expression and when I think about programming, these two concepts are really closely linked for me and I do see it as joyful, But there are a lot of things that don't spark joy in our development processes, right? Like lengthy test suites, or this exhausting back and forth, or sometimes the designs, and I just, I don't know how to describe it, but sometimes you're dealing with ugly code, sometimes you're dealing with code smells, and in your professional developer life, sometimes you have to put up with that in order to ship features. But when you're working in a programming game, It's just about the experience. And also there is a correct solution, not necessarily a correct solution, but like there's at least one correct solution. You know for a fact that there's, that it's a solvable problem.

And for me, that's really fun. But also the environment and the story and the world building is fun as well, right? So one of my favorite ones, we mentioned Shenzhen, but Zachtronics also has Exapunks. And that one's really fun because you have been infected by a disease. And like a rogue AI is the only one that can provide you with the medicine you need to prevent it. And what this disease is doing is it is converting parts of your body into like mechanical components, like wires and everything. So what you have to do as an engineer is you have to write the code to keep your body running. Like at one point, you were literally programming your heart to beat. I don't have problems like that in my day job. In my day job, it's like, hey, can we like charge our customers more? Like, can we put some banners on these pages? Like, I'm not hacking anybody's hearts to keep them alive.

Jeremy: The stakes are a little more interesting. Yeah, yeah.

Rachael: Yeah, and in general, I'm a gamer. So like having the opportunity to mix two of my passions is really fun.

Jeremy: That's awesome. Yeah, because that makes sense where you were saying that there's a lot of things in professional work where it's you do it because you have to do it. Whereas if it's in the context of a game, they can go like, OK, we can take the fun problem solving part. We can bring in the stories. And you don't have to worry about how we're going to wrangle up issue tickets.

Rachael: Yeah, there are no Jira tickets in programming games.

Jeremy: Yeah, yeah.

Rachael: I love what you said there about the problem solving part of it, because I do think that that's an itch that a lot of us as engineers have. It's like we see a problem, and we want to solve it, and we want to play with it, and we want to try and find a way to fix it. And programming games are like this really small, compact way of getting that dopamine hit.

Jeremy: For sure. Yeah, it's like. Sometimes when you're doing software for work or for an actual purpose, there may be a feeling where you want to optimize something or make it look really nice or perform really well. And sometimes it just doesn't matter, right? It's just like we need to just put it out and it's good enough. Whereas if it's in the context of a game, you can really focus on like, I want to make this thing look pretty. I want to feel good about this thing I'm making.

Rachael: You can make it look good, or you can make it look ugly. You don't have to maintain it. After it runs, it's done. Right, right, right. There's this one game. It's 7 Billion Humans. And it's built by the creators of World of Goo. And it's like this drag and drop programming solution. And what you do is you program each worker. And they go solve a puzzle. And they pick up blocks and whatever. But they have these shredders, right? And the thing is, you need to give to the shredder if you have like a, they have these like little data blocks that you're handing them. If you're not holding a data block and you give to the shredder, the worker gives themself to the shredder. Now that's not ideal inside a typical corporate workplace, right? Like we don't want employees shredding themselves. We don't want our workers terminating early or like anything like that. But inside the context of a game, in order to get the most optimal solution, They have like a lines of code versus fastest execution and sometimes in order to win the end like Lines of code. You just kind of have to shred all your workers at the, When I'm on stream and I do that when I'm always like, okay everybody close your eyes That's pretty good it's Yeah, I mean cuz like in the context of the game.

Jeremy: I think I've seen where they're like little They're like little gray people with big eyes Yes, yes, yes, yes. Yeah, so it's like, sorry, people. It's for the good of the company, right?

Rachael: It's for my optimal lines of code solution. I always draw like a, I always write a humane solution before I shred them.

Jeremy: Oh, OK. So it's, you know, I could save you all, but I don't have to.

Rachael: I could save you all, but I would really like the trophy for it. There's like a dot that's going to show up in the elevator bay if I shred you.

Jeremy: It's always good to know what's important. But so at the start, you mentioned there was a regular expression crossword or something like that. Is that how you got started with all this?

Rachael: My first programming game was Regex Crossword. I absolutely loved it. That's how I learned Regex.

Rachael: I love it a lot. I will say one thing that's been kind of interesting is I learned Regex through Regex Crossword, which means there's actually these really interesting gaps in my knowledge. What was it? at Link Tech Retreat, they had like a little Regex puzzle, and it was like forward slash T and then a plus, right? And I was like, I have no idea what that character is, right? Like, I know all the rest of them. But the problem is that forward slash T is tab, and Regex crossword is a browser game. So you can't have a solution that has tab in it. And have that be easy for users. Also, the idea of like greedy evaluation versus lazy evaluation doesn't apply, because you're trying to find a word that satisfies the regex. So it's not necessarily about what the regex is going to take. So it's been interesting finding those gaps, but I really think that some of the value there was around how regex operates and the rules underlying it and building enough experience that I can now use the documentation to fill in any gaps.

Jeremy: So the crossword, is it where you know the word and you have to write a regular expression to match it? Or what's the?

Rachael: They give you regex. And there's a couple of different versions, right? The first one, you have two regex patterns. There's one going up and down, and there's one going left and right. And you have to fill the crossword block with something that matches both regular expressions.

Rachael: Then we get into hexagonal ones. Yeah, where you have angles and a hexagon, and you end up with like three regular expressions. What's kind of interesting about that one is I actually think that the hexagonal regex crosswords are a little bit easier because you have more rules and constraints, which are more hints about what goes in that box.

Jeremy: Interesting. OK, so it's the opposite of what I was thinking. They give you the regex rules, and then you put in a word that's going to satisfy all the regex you see.

Rachael: Exactly. When I originally did it, they didn't have any sort of hints or anything like that. It was just empty. Now it's like you click a box, and then they've got a suggestion of five possible letters that could go in there. And it just breaks my heart. I liked the old version that was plainer, and didn't have any hints, and was harder. But I acknowledge that the new version is prettier, and probably easier, and more friendly. But I feel like part of the joy that comes from games, that comes from puzzles, It comes from the challenge, and I miss the challenge.

Jeremy: I guess someone, it would be interesting to see people who are new to it, if they had tried the old way, if they would have bounced off of it.

Rachael: I think you're probably right. I just want them to give me a toggle somewhere.

Jeremy: Yeah, oh, so they don't even let you turn off the hints, they're just like, this is how it is.

Rachael: Yep.

Jeremy: Okay. Well, we know all about feature flags.

Rachael: And how difficult they are to maintain in perpetuity.

Jeremy: Yeah, but no, that sounds really cool because I think some things, like you can look up a lot of stuff, right? You can look up things about regex or look up how to use them. But I think without the repetition and without the forcing yourself to actually go through the motion, without that it's really hard to like learn and pick it up.

Rachael: I completely agree with you. I think the repetition, the practice, and learning the paradigm and patterns is huge. Because like even though I didn't know what forward slash t plus was, I knew that forward slash t was going to be some sort of character type.

Jeremy: Yeah, it kind of reminds me of, there was, I'm not sure if you've heard of Vim Adventures, but...

Rachael: I did! I went through the free levels. I had a streamerversary and my chat had completed a challenge where I had to go learn Vim. So I played a little bit of Vim Adventures.

Jeremy: So I guess it didn't sell you.

Rachael: Nope, I got Vim Extensions turned on.

Jeremy: Oh, you did?

Rachael: Yeah, I have the Vim extension turned on in VS Code. So I play with a little bit of sprinkling of Vim in my everyday.

Jeremy: It's kind of funny, because I am not a Vim user in the sense that I don't use it as my daily editor or anything like that. But I do the same thing with the extensions in the browser. I like being able to navigate with the keyboard and all that stuff.

Rachael: Oh, that is interesting. That's interesting. You have a point like memorizing all of the different patterns when it comes to like Keyboard navigation and things like that is very similar to navigating in Vim. I often describe writing code in Vim is kind of like solving a puzzle in order to write your code So I think that goes back to that Puzzle feeling that puzzle solving feeling we were having we were talking about before.

Jeremy: Yeah, I personally can't remember, but whenever I watch somebody who's, really good at using Vim, it is interesting to see them go, oh, yes, I will go to the fifth word, and I will swap out just this part. And it's all just a few keystrokes, yeah.

Rachael: Very impressive. Can be done just as well with backspace and, like, keyboard, like, little arrows and everything. But there is something fun about it and it is... Faster-ish.

Jeremy: Yeah, I think it's like I guess it depends on the person, but for some people it's like they, they can think and do things at the speed that they type, you know, and so for them, I guess the the flow of, I'm doing stuff super fast using all these shortcuts is probably helpful to them.

Rachael: I was talking to someone last night who was saying that they don't even think about it in Vim anymore. They just do it. I'm not there yet. (laughs)

Jeremy: Yeah, I'll probably never be there (laughs)

But yeah, it is something to see when you've got someone who's really good at it.

Rachael: Definitely. I'm kind of glad that my chat encouraged and pressured me to work with Vim. One of the really cool things is when I'm working on stuff, I'll sometimes be like, oh, I want to do this. Is there a command in Vim for that? And then I'll get multiple suggestions or what people think, and ideas for how I can handle things better. Someone recently told me that if you want to delete to the end of a line, you can use capital D. And this whole time I was doing lowercase d dollar sign.

Jeremy: Oh, right, right, right. Yeah. Yeah, it's like there's so many things there that, I mean, we should probably talk about your experiences streaming. But that seems like a really great benefit that you can be working through a problem or just doing anything, really. And then there's people who they're watching, and they're like, I know how to do it better. And they'll actually tell you, yeah.

Rachael: I think that being open to that is one of the things that's most important as a streamer. A lot of people get into this cycle where they're very defensive and where they feel like they have to be the expert. But one of the things that I love about my chat is the fact that they do come to me with these suggestions. And then I can be open to them, and I can learn from them. And what I can do is I can take those learnings from one person and pass it on to the other people in chat. I can become a conduit for all of us to learn.

Jeremy: So when you first decided to start streaming, I guess what inspired you to give it a shot? Like, what were you thinking?

Rachael: That's a great question. It's also kind of a painful question. So the company that I was working for, I found out that there were some pay issues with regards to me being a senior, promotion track, things like that. And it wasn't the first time this had happened, right? Like, I often find that I'm swapping careers every two to three years because of some miserable experience at the company. Like you start and the first year is great. It's fantastic. It's awesome. But at the end of it, you're starting to see the skeletons and that two to three years later you're burnt out. And what I found was that every two to three years I was losing everything, right? Like all of my library of examples, the code that I would reference, like that's in their private repo. When it came to my professional network, the co -workers that liked and respected me, we had always communicated through the workplace Slack. So it's really hard to get people to move from the workplace Slack to like Instagram or Twitter or one of those other places if that's not where, if that's not a place where you're already used to talking to them. And then the other thing is your accomplishments get wiped out, right? Like when you start at the next company and you start talking about promotion and things like that, the work that you did at previous companies doesn't matter. They want you to be a team lead at that company. They want you to lead a massive project at that company and that takes time. It takes opportunities and Eventually, I decided that I wanted to exist outside my company.

Like I wanted to have a reputation that went beyond that and that's what originally inspired me to stream And it's pretty hard to jump from like oh.

I got really frustrated and burnt out at my company to I've got it I'm gonna do some regex crossword on stream, but honestly, that's what it was right was I just wanted to slowly build this reputation in this community outside of of my company and it's been enormously valuable in terms of my confidence, in terms of my opportunities. I've been able to pick up some really interesting jobs and I'm able to leverage some of those experiences in really clear professional ways and it's really driven me to contribute more to open source. I mentioned that I have a lot of people like giving me advice and suggestions and feedback.

That's enormously helpful when you're going out there and you're trying to like get started in open source and you're trying to build that confidence and you're trying to build that reputation. I often talk about having a library of examples, right? Like your best code that you reference again and again and again. If I'm streaming on Twitch, everything that I write has to be open source because I'm literally showing it on video, right? So it's really encouraged me to build that out. And now when I'm talking to my coworkers and companies, I can be like, oh, we need to talk about single table inheritance. I did that in Hunter's Keepers. Why don't we go pull that up and we'll take a look at it. Or are we building a Docker image? I did that in Hunter's Keepers and Conf Buddies. Why don't we look at these, compare them, and see if we can get something working here, right? Like I have all of these examples, and I even have examples from other apps as well. Like I added Twitch Clips to 4M. So when I want to look at how to build a liquid tag, because Jekyll uses liquid tags as well. So when I'm looking at that, I can hop to those examples and hop between them, and I'm never going to lose access to them.

Jeremy: Yeah, I mean, that's a really good point where I think a lot of people, they do their work at their job and it's never going to be seen by anyone and you can sort of talk about it, but you can't actually show anybody what you did. So it's like, and I think to that point too, is that there's some knowledge that is very domain specific or specific to that company. And so when you're actually doing open source work, it's something that anybody can pick up and use and has utility way beyond just your company. And the whole point of creating this record, that makes a lot of sense too, because if I wanna know if you know how to code, I can just see like, wow, she streams every Thursday. She's clearly she knows what she's doing and you know, you have these also these open source contributions as well So it's it's sort of like it's not this question of if I interview you It's it's not I'm just going off of your word that and I believe what you're saying. But rather it's kind of the proof is all it's all out there.

Rachael:

Oh, definitely if I were to think about my goals and aspirations for the future I've been doing this for four years still continuing But I think I would like to get to the point where I don't really have to interview.

Where an interview is more of a conversation between me and somebody who already knows they want to hire me.

Jeremy: Have you already started seeing a difference?

Like you've been streaming for about four years I think

Rachael: I had a really interesting job for about eight months doing developer relations with New Relic. That was a really interesting experience. And I think it really pushed the boundaries of what I understood myself to be capable of because I was able to spend 40 hours a week really focused on content creation, on blogging, on podcasting, on YouTube videos and things like that. Obviously there was a lot of event organization and things like that as well. But a lot of the stuff that came out of that time is some of my best work. Like I, I'm trying to remember exactly what I did while I was at New Relic, but I saw a clear decrease afterwards. But yeah, I think that was probably close to the tipping point. I don't for sure know if I'm there yet, right? Like you never know if you're at the point where you don't have to interview anymore until you don't have to interview. But the last two jobs, no, I haven't had to interview.

Jeremy: So, doing it full -time, how did you feel about that versus having a more traditional lead or software developer role?

Rachael: It was definitely a trade-off. So I spent a lot less time coding and a lot more time with content, and I think a little bit of it was me trying to balance the needs and desires of my audience against the needs and desires of my company. For me, and this is probably going to hurt my chances of getting one of those jobs where I don't have to interview in the future, but my community comes first, right? They're the people who are gonna stick with me when I swap between jobs, but that was definitely something that I constantly had to think about is like, how do I balance what my company wants from me with the responsibility that I have to my community? But also like my first talk, your first open source contribution, which was at RubyConf Denver, Like, that was written while I was at New Relic. Like, would I have had the time to work on a talk in addition to the streaming schedule and everything else? Um, for a period of time, I was hosting Ruby Galaxy, which was a virtual meetup. It didn't last very long, and we have deprecated it. Um, I deprecated it before I left the company because I wanted to give it, like, a good, clean ending versus, um, necessarily having it, like, linger on and be a responsibility for other people. but...

I don't think I would have done those if I was trying to balance it with my day job. So, I think that that was an incredible experience. That said, I'm very glad it's over. I'm very glad that the only people I'm beholden to are my community now.

Jeremy: So, is it the sheer amount that you had to do that was the main issue? Or is it more that that tension between, like you said, serving your audience and your community versus serving your employer?

Rachael: Oh, a lot of it was tension. A lot of it was hectic, event management in general. I think if you're like planning and organizing events, that's a very challenging thing to do. And it's something that kind of like goes down to the deadline, right? And it's something where everybody's trying to like scramble and pull things together and keep things organized. And that was something that I don't think I really enjoyed. I like to have everything like nice and planned out and organized and all that sort of stuff, and I don't think that that's Something that happens very often in event management at least not from my experience So these were like in -person events or what types of events like I actually skipped out before the in -person events.

They would have been in -person events. We had future stack at New Relic, which is basically like this big gathering where you talk about things you can do with New Relic and that sort of stuff. We all put together talks for that. We put together an entire like. Oh gosh, I'm trying to remember the tool that we use, but it was something similar to gather round where you like interact with people. And there's just a lot that goes into that from marketing to event planning to coordinating with everyone. I'm grateful for my time at New Relic and I made some incredible friends and some incredible connections and I did a lot, but yeah, I'm very glad I'm not in DevRel anymore. I don't, if you ask any DevRel, They'll tell you it's hectic, they'll tell you it's chaotic, and they'll tell you it's a lot of work.

Jeremy: Yeah. So it sounds like maybe the streaming and podcasting or recording videos, talks, that part you enjoy, but it's the I'm responsible for planning this event for all these people to, you know. That's the part where you're like, OK, maybe not for me.

Rachael: Yeah, kind of. I describe myself as like a content creator because I like to just like dabble and make things, right? Like I like to think about like, what is the best possible way to craft this tweet or this post or like to sit there and be like, okay, how can I structure this blog post to really communicate what I want people to understand? When it comes to my streams, what I actually do is I start with the hero's journey as a concept. So every single stream, we start with an issue in the normal world, right? And then what we do is we get drawn into the chaos realm as we're like debugging and trying to build things and going Back and forth and there's code flying everywhere and the tests are red and then they're green and then they're red and then they're green and then finally at the end we come back to the normal world as we create this PR and, Submit it neither merge it or wait for maintainer feedback. And for me that Story arc is really key and I like I'm a little bit of an artist. I like the artistry of it. I like the artistry of the code, and I like the artistry of creating the content. I think I've had guests on the show before, and sometimes it's hard to explain to them, like, no, no, no, this is a code show. We can write code, and that's great, but that's not what it's about. It's not just about the end product. It's about bringing people along with us on the journey.

And sometimes it's been three hours, and I'm not doing a great job of bringing people along on the journey so like you know I'm tooting my own horn a little bit here but like that is important to me.

Jeremy: So when you're working through a problem, When you're doing it on stream versus you're doing it by yourself, what are the key differences in how you approach the problem or how you work through it?

Rachael: I think it's largely the same. It's like almost exactly the same. What I always do is, when I'm on stream, I pause, I describe the problem, I build a test for it, and then I start working on trying to fix what's wrong. I'm a huge fan of test -driven development. The way I see it, you want that bug to be reproducible, and a test gives you the easiest way to reproduce it. For me, it's about being easy as much as it is about it being the right way or not.

But yeah, I would say that I approach it largely in the same way. I was in the content creator open space a little bit earlier, and I had to give them a bit of a confession. There is one small difference when I'm doing something on stream versus when I'm doing something alone. Sometimes, I have a lot of incredible senior staff, smart, incredible people in my chat. I'll describe the problem in vivid detail, and then I'll take my time writing the test, and by the time I'm done writing the test, somebody will have figured out what the problem is, and talk back to me about it. I very rarely do that. It's more often when it's an ops or an infrastructure or something like that. A great example of this is like the other day I was having an issue, I mentioned the Vim extensions. If I do command P on the code section, Vim extensions was capturing that, and so it wasn't opening the file. So one of my chatters was like, oh, you know, you can fix that if you Google it. I was like, oh, I don't know. I mean, I could Google it, but it will take so long and distract from the stream. Literally less than 15 minutes later a chatter had replied with like, here's exactly what to add to your VS Code extension, and I knew that was gonna happen. So that's my little secret confession. That's the only difference when I'm debugging things on stream is sometimes I'll let chat do it for me.

Jeremy: Yeah, that's a superpower right there.

Rachael: It is, and I think that happens because I am open to feedback and I want people to engage with me and I support that and encourage that in my community. I think a lot of people sometimes get defensive when it comes to code, right? Like when it comes to the languages or the frameworks that we use, right? There's a little bit of insecurity because you dive so deep and you gain so much knowledge that you're kind of scared that there might be something that's just as good because it means you might not have made the right decision. And I think that affects us when it comes to code reviews. I think it affects us when we're like writing in public.

And I think, yeah, and I think it affects a lot of people when they're streaming, where they're like, if I'm not the smartest person in the room, and why am I the one with a camera and a microphone? But I try to set that aside and be like, we're all learning here.

Jeremy: And when people give that feedback, and it's good feedback, I think it's really helpful when people are really respectful about it and kind about it. Have you had any issues like having to moderate that or make sure it stays positive in the context of the stream?

Rachael: I have had moderation issues before, right? Like, I'm a woman on the internet, I'm going to have moderation issues. But for me, when it comes to feedback and suggestions, I try to be generous with my interpretation and my understanding of what they're going with. Like people pop in and they'll say things like, Ruby is dead, Rails is dead. And I have commands for that to like remind them, no, actually Twitch is a Rails app. So like, no, it's definitely not dead. You just used it to send a message. But like, I try to be understanding of where people are coming from and to meet them where they are, even if they're not being the most respectful. And I think what I've actually noticed is that when I do that, their tone tends to change. So I have two honorary trolls in my chat, Kego and John Sugar, and they show up and they troll me pretty frequently. But I think that that openness, that honesty, like that conversation back and forth it tends to defuse any sort of aggressive tension or anything.

Jeremy: Yeah, and it's probably partly a function of how you respond, and then maybe the vibe of your stream in general probably brings people that are.

Rachael: No, I definitely agree. I think so.

Jeremy: Yeah.

Rachael: It's the energy, you get a lot of the energy that you put out.

Jeremy: And you've been doing this for about four years, and I'm having trouble picturing what it's even like, you know, you've never done a stream and you decide I'm gonna turn on the camera and I'm gonna code live and, you know, like, what was kind of going through your mind? How did you prepare? And like, what did, like, what was that like?

Rachael: Thank you so much. That's a great question. So, actually, I started with Regex Crossword because it was structured, right? Like, I didn't necessarily know what I wanted to do and what I wanted to work on, but with Regex Crossword, you have a problem and you're solving it. It felt very structured and like a very controlled environment, and that gave me the confidence to get comfortable with, like, I'm here, I have a moderator, right? Like we're talking back and forth, I'm interacting with chatters, and that allowed me to kind of build up some skills. I'm actually a big fan of Hacktoberfest. I know a lot of people don't like it. I know a lot of people are like, oh, there are all these terrible spam PRs that show up during Hacktoberfest and open source repositories. But I'm a really big fan because I've always used it to push my boundaries, right? Like every single year, I've tried to take a new approach on it. So the first year that I did it, I decided that what I wanted to do to push my boundaries was to actually work on an application. So this one was called Hunter's Keepers. It was an app for managing characters in Monster of the Week and it was a Reels app because that's what I do professionally and that's what I like to work on.

So I started just building that for Hacktoberfest and people loved it. It got a ton of engagement, way more than Regex Crossword and a little bit, like those open source streams continue to do better than the programming games, but I love the programming games so much that I don't wanna lose them, but that's where it kind of started, right? Was me sitting there and saying like, oh, I wanna work on these Rails apps. The Hacktoberfest after that one, And I was like, OK, I worked on my own app in the open, and I've been doing that for basically a year. I want to work on somebody else's app. So I pushed myself to contribute to four different open source repositories. One of the ones I pushed myself to work on was 4M. They did not have Twitch clips as embeds. They had YouTube videos and everything else. And I looked into how to do it, and I found out how liquids tags work, and I had a ton of other examples. I feel like extensions like that are really great contributions to open source because it's an easy way with a ton of examples that you can provide value to the project, and it's the sort of thing where, like, if you need it, other people probably need it as well. So I went and I worked on that, and I made some Twitch clips.

And that was like one of my first like external open source project contributions. And that kind of snowballed, right? Because I now knew how to make a liquid tag. So when I started working on my Jekyll site, and I found out that they had liquid tags that were wrapped in gems, I used that as an opportunity to learn how to build a gem. And like how to create a gem that's wrapped around a liquid tag.

And that exists now and is a thing that I've done. And so it's all of these little changes and moments that have stacked on top of each other, right? Like it's me going in and saying, OK, today I'd like to customize my alerts. Or like, today I'd like to buy a better microphone and set it up and do these changes. It's not something that changed all at once, right? It's just this small putting in the time day by day, improving.

I say like the content gears are always grinding. You always need something new to do, right? And that's basically how my stream has gone for the last four years, is I'm just always looking for something new to do.

We haven't talked about this yet, but I'm a voice actress in the programming video game, One Dreamer.

And I actually collaborated with the creator of another one, Compressor, who like reached out to me about that Steam key.

But the reason that I was able to talk to these people and I was able to reach out to them is rooted in Regex Crossword, right? Cause I finished Regex Crossword and Thursday night was like my programming game stream. And I loved them, so I kept doing them. And I kept picking up new games to play, and I kept exploring new things. So at the end of it, I ended up in this place where I had this like backlog in knowledge and history around programming games. So when Compressor was developed, I think he's like the creator, Charlie Bridge is like a VP at Arm or something. And okay, I should back up a little bit. Compressor is this game where you build CPUs with Steam. So it's like Steam Punk, like, electrical engineering components. Ah, it's so much fun. And like, the characters are all cool, because it's like you're talking to Nikola Tesla, and like Charles Babbage, and Ada Lovelace, and all this sort of stuff. It's just super fun.

But the reason he reached out to me was because of that reputation, that backlog, that feedback. Like, when you think about how you became a developer, right, it's day by day, right? when you develop your experience. There's a moment where you look back and you're like, I just have all of these tools in my toolkit. I have all of these experiences. I've done all these things, and they just stack to become something meaningful. And that's kind of how it's gone with my stream, is just every single day I was trying to push, do something new. Well, not every day. Sometimes I have a lazy day, but like, but like I am continuously trying to find new ground to tread.

Jeremy: Yeah, I mean that's really awesome thinking about how it went from streaming you solving these regex crosswords to all the way to ending up in one of these games that you play. Yeah, that's pretty pretty cool.

Rachael: By the way, that is my absolute favorite game. So the whole reason that I'm in the game is because I played the demo on stream.

Jeremy: Oh, nice.

Rachael: And I loved it. Like I immediately was like, I'm going to go join the creators discord. This is going to be my game of the year. I can't wait to like make a video on this game. What's really cool about this one is that it uses programming as a mechanic and the story is the real driver. It's got this emotional impact and story. The colors are gorgeous and the way you interact with the world, like it is a genuine puzzle game where the puzzles are small, little, simple programming puzzles. And not like I walk up to this and like I solve a puzzle and the door opens.

No, it's like you're interacting with different components in the world and wiring them together in order to get the code working.

The whole premise is that there's an indie game developer who's gone through this really traumatic experience with his game, and now he's got the broken game, and he's trying to fix it in time for a really important game demo.

I think it's like, it's like Vig something. Video game indie gaming.

But what happened is I started following the creator, and I was super interested in them. And then he actually reached out to me about like the Steve workshop and then he was looking for people to voice act and I was like me please yes so yeah that's how I got involved with it yeah that's awesome it's like everything came full circle I guess it's like where you started and yeah no absolutely it's amazing.

Jeremy: And so what was that experience like the voice acting bit? I'm assuming you didn't have professional experience with that before.

Rachael: No, no, no, no. I had to do a lot of research into like how to voice act. My original ones were tossed out. I just, OK, so there's one line in it. This is going to this is so embarrassing. I can't believe I'm saying this on a podcast. There's one line that's like, it's a beautiful day to code. It's like a, because I'm an NPC, right? So like you can keep interacting with me and one of the like cycling ones is like, it's a beautiful day to code. Well, I tried to deliver it wistfully. Like I was staring out a window and I was like, it's a beautiful day to code. And every single person who heard it told me that it sounded like somewhat sensual, sexy. And I was dying because I had just sent this to this like indie game developer that like I appreciated and he replies back and he's like, I'm not sure if there was an audio issue with some of these, but could you like rerecord some of these? So I was very inexperienced. I did a lot of practicing, a lot of vocal exercises, but I think that it turned out well.

Jeremy: That's awesome. So you kind of just kept trying and sending samples, or did they have anybody like try and coach you?

Rachael: No, I just kept sending samples. I did watch some YouTube videos from like real voice actors. To try and like figure out what the vocal exercises were. One of the things that I did at first was I sent him like one audio, like the best one in my opinion. And he replied back being like, no, just record this like 10, 20 times. Send it to me and I'll chop the one I want.

Jeremy: So the, anytime you did that, the one they picked, was it ever the one you thought was the best one?

Rachael: Oh gosh, I don't think I actually like, Wow, I don't think I've gone back over the recordings to figure out which one I thought was the best one. Or like checked which one he picked out of the ones that I recorded. Oh, that's interesting. I'm going to have to do that after this.

Jeremy: You're going to listen to all the, it's a beautiful day to code.

Rachael: The final version is like a nice, neutral like, it's a beautiful day to code. One of the really cool things about that, though, is my character actually triggers the end of game scene, which is really fun. You know how you get a little hint that's like, oh, this is where the end of the game is, my character gets to do that.

Jeremy: That's a big responsibility.

Rachael: It is. I was so excited when I found out.

Jeremy: That's awesome. Cool. Well, I think that's probably a good place to wrap it up on. But is there anything else you want to mention, or any games you want to recommend?

Rachael: Oh, I think I mentioned all of them. I think if you look at Code Romantic, AXA Punks, Bitburner, is an idle JavaScript game that can be played in the browser where you write the custom files and build it and you're going off and hacking servers and stuff like that. It's a little light on story. One Dreamer, yeah. I think if you look at those four to five games, you will find one you like. Oh, it's 7 Billion Humans.

Jeremy: Oh, right, yeah.

Rachael: I haven't written the blog post yet, but that's my five programming video games that you should try if you've never done one before. 7 million humans is on mobile, so if you've got a long flight back from RubyConf, it might be a great choice.

Jeremy: Oh, there you go.

Rachael: Yeah. Other than that, it can be found at chael.codes, chael.codes/links for the socials, chael.codes/about for more information about me. And yeah, thank you so much for having me. This has been so much fun.

Jeremy: Awesome. Well, Rachel, thank you so much for taking the time.

Rachael: Thank you.

Daniel Zingaro and Leo Porter on learning to program with LLMs

Wed, 20 Sep 2023 19:07:04 -0000

Dr. Daniel Zingaro and Dr. Leo Porter are co-authors of the book Learn AI-Assisted Python Programming.

Leo will teach an introductory computer science course this quarter at UCSD using this book. We discuss how tools like GitHub Copilot let people new to programming focus on breaking down problems instead of language syntax.

Dr. Zingaro is an Associate Professor of Computer Science at University of Toronto Mississauga and Dr. Porter is an Associate Professor at University of California San Diego.

This episode was originally posted on Software Engineering Radio.

Topics covered:

Making programming more accessible
Teaching problem decomposition instead of language syntax
The importance of reading and testing untrusted generated code
The rise of throwaway or one-off code
Concerns about relying on commercial tools
Rethinking how to assess students

Transcript

You can help edit this transcript on GitHub.

Note the timestamps and audio for this transcript will not completely match.

Intro

[00:00:00] Jeremy: Today I'm talking to Dr. Leo Porter. He's an associate teaching professor of computer science at the University of California San Diego, and he co-founded the computing education research laboratory there.

I'm also joined by Dr. Daniel Zingaro who is an associate teaching professor of computer science at the University of Toronto. And he's also the author of the book, learn to Code by Solving Problems and the Book, Algorithmic Thinking. They are co-authors of the book, learn AI Assisted Python programming.

Leo and Dan, welcome to Software Engineering Radio.

[00:00:37] Leo: Thank you for having us, Jeremy. I really appreciate your podcast, so thanks. Great to be here.

[00:00:41] Dan: Thanks Jeremy.

Writing a book for Leo's CS1 class

[00:00:43] Jeremy: The first thing we could start with is, is why this book? And, and why now? How did you decide on like, okay, this is the thing we need to do now.

[00:00:51] Leo: So, uh, this is Dan. Uh, so Dan, um, like really early when LLMs first kind of were coming out and being seen on the scene for programming, uh, he started playing with them, uh, for programming projects. And I think Dan really quickly realized that they'd had this, a big impact on how we teach programming. so he reached out to me, uh, and said, I really need to give em a try.

And, uh, after I played with them for a little while, I had the exact same realization that this is gonna change, uh, how we teach programming, uh, in a pretty dramatic way. So having realized that, having realized that we had to change our, uh, introductory CS1 courses, we knew we needed to do that, but in order to teach that class, we'd have to have a book that we could assign our students that that would go along with the class.

And so we knew we had to change the class, but we also knew we had to have a book for it. And given the, the timeline to write books, we started in the book first. Um, and so that's how it got started.

LLMs for Syntax, Humans for breaking down problems

[00:01:45] Dan: I guess we figured out that our course had to change first, before we knew exactly, um, how it had to change. One thing we, um, learned early on was that the kinds of assignments we give in our introductory courses, they're just solved by, by these tools like ChatGPT and copilot.

So, uh, we knew something had to change, and then it is just a matter of figuring out what. And so we spent, um, quite a bit of time with these tools and we started to realize that what's gonna change is the skills that our students need to learn, uh, to be effective using these tools. So like b before these tools, we would spend a lot of time teaching syntax.

Um, and students struggle quite a bit with learning syntax, which I mean, it's very, it's, it's very frustrating, right? Cuz you can't even do anything until you get the syntax right? And you're getting all these errors like missing colons and, you know, mismatched braces and stuff like that. Uh, so it's actually good, that, the LLMs are doing the syntax for the students.

But you know, just because that skill's, uh, not needed as much, uh, doesn't mean that there aren't still skills for students to learn. So instead of syntax, other things become more important. Uh, so for example, uh, Leo and I, realize that reading code is gonna be extremely important even more so than before.

I think if, if that, if that's even possible. Uh, and that's because sometimes you're gonna get back code that just doesn't work. And so we realized that students are gonna need to be able to read, the response that they get to see if the code looks reasonable, or not, right? And then if the code, uh, I is unreasonable, then they need to read more code, uh, and look at other solutions, right, that they get from the, uh, LLM.

Uh, there are other, uh, things they can do as well, like messing around with the prompt and so on. But they're gonna need to be able to read code, uh, throughout the process. And then, so we just kind of kept on using these tools and documenting the skills that students are gonna need.

And we just kinda realized that all the skills students are gonna need are skills we would want to teach anyway. So like, uh, one more example is testing, right? So, students may now not have, uh, an understanding of every last detail of, you know, the Python language like they would before. And so then that makes testing even more important, right?

Than it was they need to verify that the code they're getting is correct. And so they have to be very good at writing test cases. and, and, you know, similar, similar for debugging, we need our students to have strong debugging skills, again, even potentially stronger than before, right? Because if the code isn't working, they need to first determine what the code is doing to be able to fix it. And then I guess one more I'll mention is problem decomposition. And this is a big one. I think this is gonna come up a couple times probably in our talk today, but LLMs struggle when you give them tasks that are too large and students need to know how to break problems down into small components so that, that, LLM can solve each one and, you know, have a good chance of getting it right.

[00:04:56] Leo: Yeah, I, I think, um, kind of to, to piggyback off of that, you, you may be hearing these skills and saying, oh, these are absolutely essential skills. Every software engineer should know, uh, these are being taught right now. Right? Um, and the answer is not really, like these aren't core topics in a lot of introductory CS classes because so much time is spent on syntax.

And so fairly early on when we kind of realized these skills would be so essential, Uh, we got really excited because these are skills we want to teach in our classes, and the LLMs are now giving us the ability to do that more.

[00:05:27] Dan: Mm-hmm.

[00:05:28] Jeremy: I think that's interesting about the syntax comment because you were saying how reading is gonna be more important than ever because you have LLM generating the code. Um, and you need to understand that code that's being generated and understand that it does what it, uh, you think it does.

And so I wonder if when you say you spend less time on syntax, is it because you feel like they're gonna generate this code and they're sort of organically gonna pick up syntax that way versus having to focus on it at the start? I'm just trying to picture what you see changing there.

[00:06:05] Dan: Yeah, Jeremy. So, uh, I, I was, I guess speaking specifically about syntax errors, which don't generally happen when you're using LLMs, and I also agree with you, you need to know what the code is doing, but, um, you can do that without worrying about each specific piece of syntax. Like, um, you're gonna need to know what the keywords do for sure, but, missing, you know, brackets and colons and, uh, oh, there needs to be like a blank line here.

indentation, uh, a lot of this kind of thing. Is done for the most part, correctly by the LLMs. So yeah, I agree with you. You need to be able to identify the structures. So in our, in our book actually, Leo and I have, um, a couple of chapters on reading code and, I don't think we ever break breakdown, a line of code into its individual tokens.

We do talk about the main structures, like ifs and loops and functions and all that. but compared to other books, I, I think or other, uh, other ways of teaching where you would focus on the micro level, we try to focus on the line level now, cuz we want our students to be able to grasp what each line is doing, I guess more than each token.

[00:07:27] Leo: Yeah, maybe to, to add to that a bit, it's almost, uh, if you think about the advent of block-based languages, it was to make sure that the, essentially the, the author can't make syntax mistakes, right? Is the whole purpose of kind of block-based languages. And they're, they're huge for introductory programming, especially in like K through 12.

in a sense, LLMs do this because they'd never give you back wrong syntax, or they almost, almost never give you back wrong syntax. And so it takes away that kind of cognitive burden of making sure you handle the, the token level. as uh Dan was saying

LLM generated code needs test cases to catch logical errors

[00:08:00] Jeremy: I, I'm curious, so you said the syntax is correct, but what are the, the typical mistakes you see coming back from these LLMs?

Is it a, a logical mistake or is it ever something that. Actually doesn't compile. I'm, I'm kind of curious what your experience has been.

[00:08:19] Leo: I think the, uh, more common errors that we've been seeing are logical. So it misinterprets the prompt that you're giving it. It essentially tries to solve a problem that's different than what you're trying to solve. It may have bugs in it, so it is in fact trying to solve the right problem, but it, it's off by one, um, is maybe replicating some mistake that it found in, in the large code base.

And so most mistakes are gonna be you need to write test cases, run it. That mistake is then gonna show up when the test cases catch it, and then you'll have to try to fix it. if the students can read the code, uh, if we train them well to read the code, often you'll look at the response. And if the response is just not even trying to solve the right problem, you can usually pick that up pretty quick.

Uh, and I think, I think the students will be learn to do that and then they can just say, okay, this is clearly not the right answer. And, and use the different tools in say vscode to find another answer, and then pick one that's right or change their prompt to get a response that's right. Go through that whole flow.

But then some point or other it will give an answer that looks right. And then I think all of us as software engineers know that even the code looks right, it may not be. And so then they have to actually write the test cases, get some level of confidence that's actually working right before they'll know.

And so sometimes, sometimes, you know, really quick is that it's just clearly wrong at solving the wrong problem. And sometimes it looks right, but it actually has some bugs that need to be fixed.

[00:09:49] Dan: I guess one thing that struck me is how much a change in the prompt can, can matter. Uh, Leo, you know, um, we've, we've seen this over and over again where we'll write a prompt. It seems fine to us. And then we'll realize, oh, there are actually two different ways of interpreting this. and, uh, the ambiguity of, of English strikes again, right?

And so it's just amazing to me how clarifying the prompts, how many times that fixes the code. Not always. We've definitely have examples where that's not the case, but, um, more, more often than not, in my experience, changing the prompt, uh, appropriately has a bigger than, than, um, anticipated effect on the, on the code.

It's amazing.

[00:10:36] Leo: And for thinking of the prompt, uh, in terms of like doc strings for functions, uh, adding the test cases certainly help. Um, sometimes it is, surprising sometimes that you can add the test cases to the prompt and it'll still give you back code that does not actually pass that test case because it, vscode and copilot doesn't actually run the code that comes back from the LLM.

Uh, but I do find the test cases do tend to help with the quality response you get back.

[00:11:01] Jeremy: As a part of your prompt, you're asking it to implement some functionality, and you're also asking it to write these tests for that same functionality?

[00:11:11] Leo: Oh no, sorry. I, I, it's more the, um, doc test kind of format. So it, it, um, you're writing, let's say you, you've written your function signature and then you have the description of the function in a doc string. And then at towards the end of the doc string, I'm articulating the test cases that I intend to use.

Um, and the articulating the test cases that I intend to use helps it come with a better prompt. Um, I haven't found it to be great at writing test cases. I haven't spent a ton of time with this, but the time that I have spent, it tends to want to do almost like a brute force search of all possible inputs, uh, as opposed to doing, okay, well here's a couple common.

Here are the edge cases. Now I can feel fairly good about it. It doesn't seem to have that, um, intuition yet.

[00:11:55] Jeremy:

[00:11:55] Leo: For the most part, we're writing the test cases our ourselves, and we're gonna be teaching the students how to write the test cases

themselves

[00:12:01] Dan: Yeah,

Yeah. So Leo and I have actually made a conscious decision to have students write test cases from scratch. Even though you could play around with the LLM and have it, you know, try to generate test cases, whether it's flawed or not, we still want students to do this from scratch. We think that writing test cases is a skill we want our students to have.

[00:12:23] Jeremy: Sometimes what these models will generate, like you were saying, has logical errors. And hopefully if you're writing the test cases, you've put some thought into 'em, and your test cases are actually checking the correct behavior.

So then you have the LLM generate the implementation. It's running against tests where you know what the correct answer should be. And so if it generates something that's incorrect, you've, you've kind of caught it. You're not totally relying on it. Telling you everything is, is good, you know? Um, It's confidence in something that's like you personally can't see.

It's just what the machine gave you.

[00:13:05] Dan: Maybe it takes away one layer of uncertainty too, Jeremy, right? Like, so the code could be wrong, right? And then if it generates test cases, okay, the test cases could be wrong too. And maybe you get unlucky and two wrongs make a right and then your test cases pass for the wrong reason. So yeah, we really wanna hone this skill in our students.

And, and like Leo said earlier, these intro courses used to be so full of low level syntax concerns that we, we didn't do testing properly. I mean, you know, we all try to cover testing, but I think we're gonna be able to cover it a lot more, detailed now.

LLMs could encourage students to test more since their output is untrusted

[00:13:41] Leo: And I, I think we're enthusiastic about, uh, how students will approach testing when you're working with the LLM is what we. This is fairly anecdotal, but uh, when they interact with us talking about testing, often students aren't testing their code because they wrote it. And so of course it's Right. Right.

This is like this really famous, uh, kind of bug in human thinking, right? Is that if you write it, of course the computer's gonna interpret what you're saying, right? Um, and so students tend to trust their code in a way that professional software engineers never would. and I think because it's coming from this third party that you know is wrong, it's coming from the LLM that can, that can often make mistakes.

I think they're gonna be more inclined to actually engage in those testing practices. Uh, kind of knowing about the fallibility of the LLM,

[00:14:27] Jeremy: You're shifting the order. I mean, there is test driven development that some people practice, but I feel like probably what's most common is you write the implementation yourself and then, then you'll go and see like, oh, did this thing I, I wrote.

Did it do what I thought it should do? Um, whereas this is kind of flipping it, where it's the large language model is gonna write my code, so I'm just gonna start with the test and then I'll ask it to, to write me the code. And maybe that will kind of make test driven development be the default.

[00:15:02] Leo: So yeah, I, I, I think that students may wanna engage more in kind of test driven development because they wanna think more about, uh, what exactly should this function be doing? Uh, how should behave, what kind of inputs and output should it expect? And then it can kind of write the prompt to co-pilot or whatever LLM is using, uh, to express those inputs and outputs.

Well, they're more apt to get good answer from the LLM and they've kind already got their test cases worked out as well, so they can immediately just go right into the testing agency if the prompt came back right.

Using LLMs at the function level instead of a broader scope

[00:15:35] Jeremy: And you mentioned writing a prompt to implement a specific function. Have you found that they work well at the function level? But if you try to ask it to build something more broad, that that's kind of when it has problems?

[00:15:53] Dan: So, I think in general, LLMs do work best at the function level. We have tried to get it to generate bigger apps, collections of functions, and it can work, but sometimes it does, uh, it does do worse.

But also we want students to do the problem decomposition for themselves and break up the problem into individual functions. Even though maybe the LLM could work, uh, with, uh, bigger chunks of code, we want students to do it. And one reason is so that they can customize what they get from the LLM. So, in the book, we have a bunch of examples where you could probably just throw it at the LLM and get an answer and, you know, eventually get it to work.

But I think at that point, making changes to it might be trickier than it would be if you knew, uh, the architecture of what you were, what you were building. So in the book, we have a bunch of top-down design diagrams, and we want students to understand what they're building at that level, like at the function level instead of, like we said earlier, instead of like at the token level or the line level.

Potential issues with outsourcing high level design to an LLM

[00:17:03] Jeremy: And so like in this example, you're thinking more from a, a learning perspective. You want the student to look at the big picture, figure out, okay, what are all the different functions or parts of my application? Break that down and then feed those individually. To, um, these large language models. I, I'm wondering from like, let's say you're a, a professional software engineer and your interest is more in I want to make the thing and less so, in I want to learn how to make the thing.

in that case, do you feel like you could feel confident in, in giving the large language model a larger piece of the design, or do you still feel like it's good to have that overall structure done by the, the developer and then just be very targeted about how you use the large language model?

[00:18:03] Leo: I think that's a tricky question because we haven't worked with these tools heavily in a professional programming setting. I think often when we're thinking about large design of software, you're gonna be working on teams, talking with other members of the team about the interfaces and things like that.

And so I'd be pretty hesitant to to outsource that, that thinking to the, the l lm cuz you, the communication between the teams still has to happen. Uh, even if it weren't for that. Um, I kinda think of it as a probabilities. So essentially whenever you ask co copilot or any of these LMS to, to do a task, the more it has to right, get the kind of more likely it's gonna make a mistake.

Um, and so, uh, that's kind of why I like the functional level. It seems like I. Partially because it's not that much code that tends to write. Um, so you help to avoid kinda the probabilistic problem, but also because it's learned on a huge code base that has lots and lots of functions that have been implemented.

It tends to do well at that, that solving the function kind of task.

[00:19:10] Jeremy: Yeah. And I, I think the way you put it as outsourcing that designer, that decision is, is interesting because yeah, if you are working on a team and whether it's in code review or just in a discussion, often people will ask, well, well, why did you do it this way? Or Why, why is this the, you know, the good way to design it?

And if you kind of handed that off to an l l m, maybe your answer is, I don't know. It's just what it it told me, which (laughs)

[00:19:39] Dan: Yeah.

[00:19:42] Leo: That isn't an answer I want to u use talking to my boss. Right. Well the chat GPT told me I should have it this way. That doesn't seem like a good answer.

Choosing GitHub Copilot for CS1

[00:19:50] Jeremy: I think we, we've kind of been talking in more a general sense of working with LLMs and you've mentioned how you're gonna be teaching introductory computer science courses this coming, quarter or semester. And so when you teach these classes, what tools are you gonna recommend your students use?

And yeah, maybe you could go into that a bit.

[00:20:13] Leo: Absolutely. So we're gonna be recommending, um, At least, at least for my class, I'm gonna be recommending that they use, uh, vs code with copilot. Um, I just like the integration of the IDE with the, uh, interactions with the LLM uh, I think it avoids just a whole bunch of copy pasting from another interface into your IDE to then, uh, run it.

I think it also reduces the barrier of them kinda immediately getting the code and then testing it right there in the environment. I'm sure any of the other tools would work, it's just, that seems to have worked well for us, uh, when we were writing the book. And that's, that's actually the technique we recommend in the book as well.

Um, so that would be the primary tool for the students writing the code.

In addition to having them using copilot with, uh, in the IDE for a lot of the code generation, depending on where things are at with copilot x, um, which is right now, um, available through wait list. Uh, if that's, that's available publicly, I think we're gonna be recommending that because it has a copilot chat feature, uh, which can be really nice to interact with.

And, uh, the main use that, that we're gonna be encouraging students to use, whether it be co-pilot chat or a ChatGPT is in just a conversation with the LLM about, particularly modules and libraries. So if you are diving into, merging PDFs, which, uh, Dan did a great job in one of the chapters in our book talking about, if you wanna dive into that, well, what libraries should we be using in Python for that.

Uh, and we found that the LLMs do a really good job at this, of actually saying, here are the different libraries you could use. Here are the pros and cons of them. These are the ones that, uh, need to be actually have additional install done. Or these ones that come in with, vanilla Python. they're actually really good at kind of giving you the what you should use for the various libraries.

Um, and so that's, that's one other way that we were gonna be encouraging the students to use the LLM.

Types of questions to ask the LLM

[00:22:07] Dan: Yeah. So whenever the students or the junior programmer, doesn't know how or doesn't think they can, uh, do something in base Python, we have them interact with the chat and, and ask. So another example that comes to mind from the book is we have a chapter writing some games.

And so for most games, including the two that, uh, we've got in the book, you need to be able to generate random numbers, right? So how do you do that? And so in the past you would've used a search engine stack overflow or something, and you would've found, some sample code and you would've pasted it in to your file and made variable name changes and things like that.

And so what we do now is we ask chat, okay, I need to generate some random numbers. How do I do it? And then it will come back to you with a few options, and then you can systematically work through those options if you like. Uh, and you can ask, okay, is this one built into Python or not? And then it will tell you, oh, this one's not.

We don't need to memorize API docs

[00:23:11] Dan: And you say, oh, well, okay, so like, how do I install this? And then no, does it work on all OSS or just Windows? Right? So, uh, we guide the reader through these questions that you could have, uh, to help you make a decision. Um, and I think what I like the most about this is not having to learn. APIs, like yet another api.

Like I don't, I don't think I have room, you know, in my, like, brain for any more APIs. And, and what's cool is I, I've forgotten like every API that, uh, we've used in the book. So we have like examples of emerging PDFs and, uh, removing duplicate images from directories, uh, from like people's phones, and, and stuff like that.

And I don't know, I don't know which library it's using. Uh, and I'm, I'm totally okay with that, right? Like I just, I, I wanted to get the job done. I wanted to write a tool, and the tool got written and it used some sort of library and it worked great. And I didn't have to look through the documentation for that library and figure out like, which functions do I have to call and things like that.

So, I, I know it, it can be fun, you know, it could be fun to really learn an API well, but a lot of people, they don't want to program for programming sake. Like, they just wanna get work done, right? So, you know, while I, I, I fully admit to, enjoying programming just for the sake of programming. I do a lot of competitive programming problems just for fun.

You know, it's like Sunday morning and it's like, Hey, yeah, I got like an hour and I got an hour to work on something. Let me work on this little competitive programming problem. But, uh, a lot of people, they're not motivated by that. They're motivated by consequences of code. And this is one thing about LLMs that I'm very excited about, is you can just, make a lot more progress, without having to learn what these, people may believe is just useless knowledge, right?

Like, does it really matter how I should invoke this api Right, to merge PDF files? I mean, the answer for many people is no. Like, they just want the result to happen. And I love how we can kinda match what they, uh, deem important, right? With the LLMs, it's like a new level of abstraction, for for many people.

LLMs make building software possible for more people

[00:25:28] Leo: There's a couple of audiences that come to our introductory classes, and what Dan's talking about here is one of the things I'm most excited about with this, and that's the students who come and take just one. Programming class. I know it's probably a different audience than, uh, a lot of the people listening right now.

Um, but the people who just take one programming class, it's required for, for their major. They, I just wanted to explore it a little bit, but they, they don't go into this as a, as a career. I think a lot of those students right now, uh, if you ask them a year later to program something, do any of these tasks that we're talking about right now, I doubt they're able to, even if they did really well in that class.

Uh, and that's really disappointing, right? If they've taken a programming class, they should be able to, to do something with that, a year or even five years later. And I really believe that if you teach them the skills of interacting with these LLMs, they'll be able to do these tasks later. They'll be able to come back and go, you know, I don't remember any of the Python syntax.

I don't remember, uh, even how to get started with this. But you know what, I'm just gonna ask, uh, copilot, how do, how do I go about merging these PDFs, having this directory? And then, uh, the copilot chat comes back and says, oh, you might use this and that. And then they go, oh, I remember, I remember how to, how to write these functions.

And I just said, you have to go over a prompt. I think they could really do it. And that, that's a bit of a game changer, right? That means a larger portion of our society will be able to, uh, write code and using a useful way. And I'm just really excited about that. I think it's gonna be really nice, uh, after the changes happen.

More people might stick with Computer Science

[00:26:58] Jeremy: I can totally see in the context of someone who's, not seeing it as a career, or someone who is like, hasn't done it in a while. It could be. These tools can be incredibly useful, right? Or it can even get you interested in this field at all, right? Like a lot of people, they, they struggle through the syntax and then they decide like, oh, this is not for me.

Even though like they had something really cool they wanted to build and, and maybe these kind of tools can, can get them over that hump.

[00:27:31] Leo: Exactly. I think there's a population of students, um, and it varies a bit by demographics, who come to computer science, with really the best motives in mind, right? They wanna make their goals in their life are to make the world a better place, and they want to achieve those goals. And if you spend the first three quarters or three semesters working with them and all they're seeing is syntax and they're not actually solving anything meaningful, um, it starts to create this disconnect of what their goals are for their life and what they think the goals of are, are career are.

Of course as, as, as a computer science, I wanna say, stick it out. You know, if you, if you go into the fourth, fifth class, you'll start seeing how these are really useful tools that can make society a better place. But it'd be really nice to front load that and have them solving useful problems much earlier and seeing that, uh, computer science, uh, can be used in really nice ways.

Efficency can be taught later

[00:28:26] Jeremy: And, and so within the, the context of. People who are studying computer science will eventually, who may become professional software developers, things like that. Something more long term where it becomes more of a craft, the, the code that comes back from these large language models. Sometimes it could be something that's like not maybe the most easy to read or it may be doing something inefficiently.

And I'm wondering from your perspective how users of these tools should, should think about that and, and recognize when that's a problem.

[00:29:06] Dan: We in, in, in the first couple of courses, typically in the CS program, um, we don't spend much time on efficiency. the reason is that there's just so much to learn early on, and, um, we worry about overwhelming people with, know, too much, for them to, to process it at once. And we don't wanna prevent students from becoming interested, by.

Giving them all of these requirements early on. So typically we, you know, we push efficiency, down the, down the road into like a data structures course, for example. But your question points to another reason why, we've decided to teach some of the skills we teach early on. So if, if a student, you know, came up to Leo or, or me and said, Hey, you know, like I wanna generate efficient code, how do I do it?

My answer would, would be, so like, get, get familiar with programming first, but you are learning the skills necessary where you'll be able to look at code later because you know how to read it still, right? It's not, uh, something that you don't understand. You're gonna, you're gonna know it. We're gonna spend lots of time on code reading, and so later I think we can just teach efficiency the way we always did.

Um, so, you know, doing, uh, time complexity analysis on, on the code and they're still gonna understand what the code is doing. So, um, I, I, I don't think this is going to, this is going to change much in, in the earliest courses.

LLMs can expose students to different types of code

[00:30:35] Leo: To the, to the point about code readability, I might add that, uh, certainly they're gonna get back some, some code that's maybe not the best style and it may not be as readable. Uh, but what's kinda interesting is that students aren't exposed to a lot of different styles kind of in our existing courses, right?

They, they see the code that they write and they see the code that the professor writes and gives them, and there's not much else. And so, I mean, we're gonna need data and we're gonna need research to, to, to know this for sure, but it, it, I suspect them seeing lots of different code styles and having to read those different code styles may actually inform them better than we do now about what makes code more readable.

Uh, and then they might be able to employ that as they go forward.

[00:31:21] Jeremy: And, and when you're saying they're gonna read different styles and things like that, are you referring to code they're gonna see from the LLM or are you talking about them reading just other code bases in their classes or their professional work?

[00:31:39] Leo: Oh, I'm sorry. Yeah, I was referring to the code. They'll see from the LLM Right

[00:31:43] Jeremy: Oh I see

[00:31:43] Leo: LLM will come back in all these different ways. They'll have different styles and they'll, uh, have different approaches to solving it. Right? Sometimes they'll, uh, come back with like this one line

Lambda expression thing that solves it, and they'll have no idea how that works.

And they'll, they'll ask for a different answer and they'll get, uh, a much more, uh, user-friendly first, uh, first programing experience kind of code back. And they'll be able to understand that and go, okay, this is the kind of code that I wanna see. Not this thing that was completely non-readable.

[00:32:11] Dan: Yeah, Leo, I just thought of something. So, uh, so you know, by default you can get it to give you 10, uh, code segments to solve the problem, right? So it'd be kind of cool, if we ask students about each of them, right? Each of the 10, which ones are right, which ones have bugs, which ones have good style, which ones have bad style, it's like a built-in learning opportunity right there.

So yeah.

[00:32:34] Leo: Oh, it's true. Yeah. And, and so the 10 things that, uh, Dan I was referring to is if you do control, enter in vs code when you're working with a copilot, it'll give you back 10. Possible responses. And you're totally right Dan. You could just say of these 10, how readable are they? Are they right? Um, there's lots of fun things you can do to ask students questions.

[00:32:51] Dan: and often many of them are right with just subtly different ways of, of, of, of solving the problem. I mean, I'll, I'll admit to having some fun looking through all of the suggestions just to kind of see what the variability is and when there's a lot of variability. I really like it because, uh, like Leo said, it exposes people to different styles they may not have seen before.

And, um, may it may, it may, um, encourage you to ask questions, right? Like, why does this one work? Right? I've tested it. It doesn't look like it should work. Why does it work? I feel like that's the beginning of a pr pretty powerful learning experience right there.

[00:33:30] Jeremy: Yeah, that makes sense to me because I, I think about how when a lot of people are doing software development before all these LLMs, they will search on the internet and go, okay, what's an existing answer for this thing I'm trying to do?

They'll find a post on Stack Overflow and they'll find the accepted answer and it'll be like, okay, this is it. This is the solution. Whereas, at least in this case, it seems like you can go like, okay, well here's, here's 10, 10 potential solutions, and at least you get a little bit more exposure to, um, what are the different ways you could do it.

[00:34:06] Leo: Exactly, and, and it's nice for 'em to see these different options. And I think there is, for professional software engineers seeing that stack overflow post, like, here's the accepted answer, integrating that into your code isn't a big jump for, for a lot of us. Um, but I do wanna stress that for the intro students, it often is a really big jump.

Uh, just the, oh, how do I change around this? Oh, this was the interface for this function, but I'm been asked to have this other interface with a function and, and they really can struggle in that domain. And so I think copilot and these LLMs are nice in that they give back answers that are more tailored to the existing code that they're working with, um, and will reduce that barrier of them trying to incorporate the answer.

Optimization can come later, most code is straightforward

[00:34:50] Jeremy: So it seems kind of overall, when you're talking about people who are using programming in a more professional capacity, the code style and efficiency that will probably be taught very similarly to however it is now, where you basically have to get exposed to different styles and types of code, get exposed to the algorithms and and that will allow you to read the answers you get back better.

So the answers you get back from the LLM with the knowledge you gain from these later courses, you'll be able to tell like, oh, okay, this is, this. Level of complexity, or this has like, you know, exponential, performance implications, that kind of thing.

[00:35:43] Leo: So I think the performance piece is really important. Um, and I appreciate your, you bringing it up. I think, I'm, I'm kind of curious, uh, uh, what percentage of the time professional programmers are really spent, uh, are spending optimizing, uh, the code that they write?

Um, I suspect a lot of the code that's written, uh, is pretty straightforward. Uh, you, you already know how to work with the database you're working with. You already know how to write the queries for that. You're, you're, you're just, uh, you're still doing something that, that's certainly thought provoking, but it's not the hard work of, oh, how am I gonna write design the right algorithm for this to get the exact best runtime?

And so I think there are some times that that does matter, but those may be the times that the LLMs aren't as helpful and there's still gonna be a, a pretty big need for programmers who know how to do that, uh, themselves.

[00:36:33] Jeremy: Yeah. I mean, I, I think that of course this is gonna vary from industry to industry, but Dan, you were talking about learning APIs and I feel like a lot of jobs are learning APIs and gluing them together.

[00:36:49] Dan: Yeah. Um, I would agree, but I wonder what can happen if some of that's automated. Right? So maybe, people who are gluing APIs together will be able to. Get even more done, right? Incorporate even more, APIs in the same amount of time that they've been doing it. Now, I don't, I don't know if that job changes as dramatically as it, it seems, um, I guess there's this tension between people, having to change jobs or become more efficient in the current job.

And, you know, obviously I, I hope it's the latter and there is some recent evidence that it could end up being, the latter, just more productive people overall, building, know, bigger software in incorporating more APIs than, than before and, and not overloading yourself. So, we'll, we'll see, you know, how it, how it all, um, how it all turns out.

But I'm, I'm hopeful that we'll just be doing our jobs better.

Reading code as a skill

[00:37:51] Jeremy: In that, that context, sometimes people will say that the, the reading of code and comprehending code can sometimes be more difficult than writing the, the code. And in fact, can sometimes take you more time, like, let's say you've built out a project and now you need to add new features.

Well, to add the feature, you have to understand the, the code base that existed before and so. When we talk about LLMs and the context of not programming, but just general writing, people talk about the fact that it's easy to generate more writing, right? We can generate more documents, blog posts, more articles, that sort of thing.

And with code, it sounds like it'll be similar, right? Where it'll be easier for us to write more code, generate more code. Um, but I wonder if either of you have thought or, or think it's a concern that we'll be generating so much code that now we'll have so much we won't be able to even have the time to understand all of it,

[00:38:55] Leo: I haven't thought much about the generating so much code that you can't understand. I mean, I think if, if we're generating code, I, I'm really hoping someone's testing and making sure it works right and stuff. And so I guess it depends on what kind of, uh, what level of the interface are we, we looking at.

Um, but I have thought about a fair bit about the, the, what you described early on in your question, which was. Diving into a big code base, figuring out what needs to be changed and changing it, that is a really common task, especially for like new software engineers, uh, in their, their first jobs. Right.

And it is also one that's really well documented in the, the education literature, uh, education literature, uh, that we aren't teaching them to do. Like we almost always are giving them, uh, right, these functions are really well defined or, uh, write the code all from yourself, but we rarely ever give them large code bases to learn from.

Now I don't think diving into a large code base and trying to understand how it works is the right thing for like an intro class. And then we're mainly talking about, uh, students first learning your program here. Uh, but I am encouraged that we are teaching code reading as kind of a first level skill when I think current programming courses teach code reading right?

In parallel with writing. So a lot of the writing's happening very early before they even know how to read well. Um, and so I think there's some optimism here that if we teach code reading first and make it a core skill, they'll be better set up in the later classes to maybe take on those large projects where they tackle the exact problem you're describing, which is also the exact thing they're gonna have to do when they get to, to their jobs.

The amount of code we throw away may increase exponentionally

[00:40:37] Jeremy: Yeah, it, it also kind of, I wonder sometimes when you're writing code, you'll write it in a certain way because it's tedious to write a lot of code, right? Like you'll, you'll make something generic in such a way where you can reuse it, and maybe reduce the amount of lines of code. But then when you have something, generate that code, maybe it'll be a solution that.

Is a lot more code than you would've written personally, and it works. But, by nature, the fact that it was easy to generate, you chose that solution versus one that, that maybe was more generic and um, had less code. I, I'm not sure if that makes sense, but I'm kind of curious if the use of these models will sort of change maybe how we write code

[00:41:30] Dan: I'm kind of wondering if the amount of code we throw away is going to increase exponentially. Because, because, um, you spend time working on something, you're probably gonna keep it. But I, I wonder because, uh, Jeremy, like what you said, it's, it's so easy to generate code now. so I, I've had this thought where, what, not sure how, how, um, how much I believe myself here, but, uh, should we be storing the, the prompt, like not the dot py file, right?

Like just store the prompt and then if you do have to regenerate the code later, maybe you gotta make some tweaks or something. You just change the prompt and then, and then rerun it. So, because, because, because code is, um, It's not there yet, but it's, it's becoming free, right? It's becoming, you can generate as much of it as you want.

And so I, I wonder how much, how much of it is, so there's, there's a lot of code already that you write once, and you run it once and then, and then you get rid of it or lose it or whatever. And I wonder if that, that practice will increase. So it's like, okay, you know, I wanna do this data analysis.

Okay. So you write a prompt, you get some code, you generate some graph, and then you just don't even think about it. You just get rid of it, and then maybe later you want another similar analysis and you just do it again. Right. So I kind of wonder, because there's maybe less ownership now of code, right?

You didn't like sweat as much to write the code. So maybe, maybe more of it gets thrown away.

[00:43:03] Leo: I, I completely see what you're saying, Dan. So you have the prompt and you had it perform some form of data analysis and you wanna tweak it to do a slightly different data analysis. Uh, I wouldn't go into the, I mean, right now if I wrote the code from scratch, I would go into the code and find that one spot that I need to change and I would tweak it.

But if I'm just generating the code, I would just tweak the prompt and then get a new piece of code that does exactly what I want there without having to, to

[00:43:26] Dan: yeah. You know, how, how, it can take a, a long time to re-familiarize yourself with a program that you wrote six months ago. You know, it's like, oh, I, I called this variable temp one. Like, what's this for again? Right. you know, maybe, yeah,

[00:43:41] Leo: Wait, I think we've all been there.

Keeping the prompt instead of the code

[00:43:43] Dan: Uh, but yeah, I don't know. It's just, just a thought I've been having.

It's like, it, so, so when, when, now when, when I hear people talking about code maintenance, for example, like using, you know, good variable names and consistent style and stuff, in my head I'm thinking, well, you know, is, is the code the artifact now? Is it still the artifact? And right now, you know, of course it is.

But, um, but, you know, fast forward a little while, maybe, maybe some of what I just said, uh, sort of becomes true eventually.

[00:44:11] Leo: That's getting to perhaps kind a larger issue about what is the interface that we're, we work with as programmers. I've been thinking about this a lot, uh, just because I, I teach my, my background's. I have a PhD in computer architecture, and so I teach the classes that do machine code and assembly code, and they're, they're, they're core classes for computer scientists because you need to know how computers work.

And, um, I think that's a core component, understanding that, But we don't start by teaching the students machine code. Like no one wants to learn how to program a machine. Um, at least I can't imagine anyone wanting to learn that. Um, and we've kind of cognitively picked Python or Java right now, the most common two programming language to learn from.

Because they're easy to learn, they're easy to, to read. The code tends to be more understandable when you read it. It tends to be a little bit more forgiving when you write it. Um, and so we picked these because we think they're nice interfaces. They're, they're convenient for programmers and they're convenient for, for new learners.

And it just seems to make sense that the LLM may be that next step of interface that we start choosing. The, the catch is because it can be wrong. It's not like a compiler. A compiler is deterministic. It's gonna be, uh, shy of that. Maybe one time in your career you find a compiler bug, like the compiler's always right.

This time the LLM isn't always right and so I, I'm not sure how this is all gonna play out. Um, you can imagine the LLM as the new interface and all we ever store is, is code prompts and we don't ever even see the code, perhaps as one scenario. And the other is we, we do in fact still interact with the LLMs and still interact after the code.

Um, but I think it's too early to kind of know where this is all gonna fall. But, um, we could see some big shifts, I think, in the field over the next few years.

[00:45:52] Jeremy: Yeah, I think that's pretty interesting to think about what, what Dan had mentioned where yeah, you could check in your prompt and maybe a set of test cases for the app that's supposed to come out and yeah, maybe that's your alternative to the actual source code. Um, especially for things that, like you were saying, are, are used not that frequently or maybe you only use it once and so the, um, the quality of the actual code is.

Maybe less so important in terms of readability and things like that. And as long as you can reliably reproduce that thing, yeah, maybe, maybe that does make sense.

[00:46:39] Leo: The reliable reproduction could be the tricky part. And you there may be even saying that you, you start doing where you tag don't, don't try to reproduce this. Like, we actually spend a whole bunch of time on this. It's super optimized. Like, don't think the LLMs gonna give you this answer again. So, uh, keep the code along with the prompt.

Keep the code too. Don't, don't scratch that because the LLMs not gonna do better. Um, and then in some cases you're like, yeah, the LLM's gonna do a pretty good job on this and

[00:47:07] Dan: Yeah. Leo, maybe we have to Maybe we have to distinguish between code that you can just get out of an LLM no problem. And code that people have spent time working on. I like that. Yeah. Yeah,

[00:47:21] Leo: some you're like, hashtag don't change.

[00:47:23] Dan: Humans were here.

[00:47:25] Leo: exactly.

The concerns about relying on commercial tools

[00:47:27] Jeremy: Yeah. this is the 30th iteration of this code we generated and we verified that this one's good. So just, just, it's a interesting, interesting future. We, we might be heading into, so, so one thing you, you mentioned a little bit earlier is that the tools that you're gonna recommend to your students, it sounds like it's primarily going to be GitHub copilot and GitHub copilot X for the, the chat interface.

And one thing about these tools is these are tools by commercial companies, right? These are tools by OpenAI and Microsoft. They're tools that you have to pay a subscription fee to use. You have to send your code to a commercial server. And I wonder if that aspect concerns you at all. The, the fact that the foundations that our students are learning on is kind of reliant on these companies and these cloud services.

[00:48:31] Leo: I think it's an amazing question. Uh, I think to some degree these are the tools that professional software engineers are using, and so we need, there's, there's a bit of an obligation as instructors to teach them the tools that they're gonna be using as professionals going forward. I think right now they're free.

Uh, to use for, for education's sake. and so as long as that stays the case, I'm a little, more comfortable with it. If it started to move to a pay model for education, I think there could be some really big problems with equity. and I think it's not just true for, for computer science, but I'll start with computer science.

I mean, if it's computer science and we start making it where you would have to pay to get access to these models or use these models, then whether we tell the students they can use it or not, they still can use them. And so there's gonna be some students that, the wealthier students who may have access to these, who are being able to learn better from these, being able to solve better homeworks with these, that's super scary.

And you could imagine the same thing for even just K through 12 education, right? If you're thinking about them writing essays for homeworks or anything else, if it's a pay model, then the students who have, uh, the money will pay for it and get access to these tools. And the students who don't, won't.

You could imagine the, all these kind of socioeconomic, uh, divides that already exist, only being exacerbated by these tools if they switch to this pay model. Um, so that has me very worried. Um, and there's some real ethical issues we have to think about when we're, we're using them. Yeah. Um, the other ethical issue I kinda wanna mention is just the, the copyright and the notion of ownership.

Um, and I think it's important for us as instructors to engage students in the conversation about what it means to create content and intellectual property and how these models are built and what they're building off of. Um, and just engage in that ethical conversation with the students. I don't think we as a society have figured this out.

I don't, I think there's gonna be some time both legally and ethically before we have the right answers. but at the very least, you need to talk to the students about, uh, these challenges so they know what's going on and they can engage in the debate.

[00:50:45] Dan: Yeah, just to underscore that, Leo, this is the reason we're doing research on the first version of the course that Leo's teaching. We need research on the impact of LLMs, on students. especially, we need to know if students benefit from this, in what ways they benefit. How are these benefits distributed across demographic groups?

We have a long and sad history in, computer science of inequities, in who takes our courses, who succeeds in our courses. we're very aware of this and it's, uh, unacceptable to make that situation, uh, worse than it already is. So, um, we're, we're gonna be carefully doing our research on this, uh, first offering of the course.

A downside is students might bypass fundamentals

[00:51:30] Jeremy: So we've mostly been talking about the benefits of using these tools in classes and in education. we just mentioned the possible inequities if you don't have access to those things, I, I wonder if from either of you, if there are negatives you see to this technology, whether that's the impact on what people learn or in anything else.

Like are there downsides you see to the use of this technology?

[00:52:04] Dan: Yeah. So in addition to, uh, the important, uh, inequity concerns that, uh, we just talked about, I have a concern about students using the tools in ways that. Don't help them learn the skills we think they need. So it's a, it's a, it's a power tool and you can, uh, you can get pretty far, I think with, without, um, being systematic in, in how you work with it and without testing, without debugging, um, it's, you know, it's, it's kind of magic right now.

And so I can imagine, a lot of students just taking off at, you know, a hundred miles an hour. and so I'm one, one of, one of, uh, the things we have to worry about in these initial courses is, convincing students that there really are principles to using this technology. You can't just type something and get an answer and then go party.

and, and, and so that, that is one of my concerns. That's one of the negatives. It's super powerful. And, like, like, so before you, you can't just type some Python and make it work and, but now you can sort of type in whatever you want and kind of get something back. and so part of our job as educators is to help students use these tools, in in a way that.

Will ensure their long-term success with, with these tools, right? So, I, I'm not saying that they can't just do whatever they want and, and make some of their first assignments work. I, I think they could, I think they could be like un principled with the prompts and just throw it in there and get code and, you know, submit that, submit that code.

But, uh, we're, we're going, you know, we're going for longer term, uh, effectiveness here, right? We have students who may not take another CS course. We need to keep them in mind. We have students who are gonna wanna eventually be software engineers, uh, security experts, PhDs in computer science, right? So we have a number of audiences that we're talking about, and we think they all need to know the fundamental skills of programming still.

Even though, you know, they have this, this power tool at their expense now.

[00:54:07] Leo: Speaking of the fundamental skills for programming, I, because of my, my hardware background, I'm this huge fan of teaching mental models in classes. Like what is the mental model of computation? Like, how, how do you imagine the computer is executing as you write the code? And, uh, ideally a professional computer scientist should be able to take, okay, well this is kind of the, my interpretation, this is my mental model for when I'm working at Python.

If I really, really wanna drill this down, I can turn that into assembly. And if I really had to and turn to machine and even think about how this is working within the cash subsystems and virtual memory and all these things, I want 'em to be able to play those things out. We are changing the first class, and I think the first class is gonna be doing some things much better than before, like teaching problem decomposition and things like that.

I'll, I'll mention that in a second, but, we are doing some things better. but we may not be teaching at how is the computer working as well. And so you can't just change one course and think the rest of the curriculum's gonna work. And so I think the entire curriculum's gonna need to adjust some, um, in, in a way of just adapting to these LLMs.

Rethinking how to assess students

[00:55:10] Leo: Um, the second piece for things getting potentially more challenging, uh, is instructors, we're in a good place right now as instructors, uh, in terms of how we assign and grade homework. Um, so grading, uh, this probably isn't gonna be a shock, is not one of our favorite things to do as faculty. I mean, it's actually really important.

Uh, it's, it's central to us understanding how our students have learned, but it's generally not the most favorite thing that we do. And what a lot of instructors have done, myself included, is for much the introductory sequence. We have created assignments that can be entirely auto grade. So we define functions incredibly well.

Like, write a really good description, this is exactly what it needs to do, and the students write that one piece of code and, uh, whether we like it or not. That is exactly when copilot does very well, and the LLMs do really well. And so the LLMs are gonna solve those very easily already. So we have to fix our assignments just like it, it's a given.

Um, but it means that we're probably gonna have to rethink how we do assessment. Um, and so we're probably gonna be writing assignments that are much more open-ended and we're probably gonna have to be grading those, uh, putting more care and time integrating those potentially by hand. Uh, but I think these are all good things for the community and for the field.

Um, but you can imagine how it's gonna be a bit of a, a shift for faculty and, and may take some time, uh, to be adopted as a result.

[00:56:41] Jeremy: And, and so if you're shifting to homework that is more broad in scope, has more code, needs more human eyes on it, how how does that scale within the educators side? Right. You were, you were talking about how you've got, um, things that could be auto graded before and then now you're letting somebody generate this whole project.

How does that work from your end?

[00:57:09] Leo: I, I think there's a few things that are at play. Um, we, at, at large institutions like Dan and I are at, we have kind of armies of, uh, instructor assistants, instructional assistants that help us, uh, and so we can engage 'em in, in various tasks. And so, uh, one of the roles they heavily have now is helping students in the labs solve these auto grade assignments.

and so you can imagine they will still be in the labs helping the students with these creative assignments, but now they're gonna have to have potentially a larger role in assessing the success of those. Um, there's been some really creative work, uh, in, in assessment and so I'll, I'll, I'll mention a couple of the ones, but there's, I, I'm sure I'm gonna be omitting some.

But, uh, one is, Students could complete their project, and then they have to record a short video of them explaining the code that was in their project and how it worked, and you actually assess them on that video and their explanation of the code and how it works. Right? Because those can be perhaps shorter than trying to go through a really big project and, and see how it works.

Um, there's a tool out of a UIUC, um, called Prairie Learn that helps with, um, uh, these are still auto graded, but uh, it helps with the, the test setting where you can write questions and have them, uh, graded kinda in a, in a exam or homework setting. the, the neat feature of that is that it can be randomized and so you don't have to worry as much about students kind of leaking information to each other about, test content from quarter to quarter.

And so, because the randomization, they have to learn, actually learn the skills, and so you can, um, kinda engage with 'em in these test centers. And so right now a big grading burden on, on faculty is exams. And so you can actually give more exams, give more frequent feedback to the students and with, without the same grading burden.

and so that, that's the other kinda exciting assessment piece.

[00:59:01] Dan:

Current assessment is not effective

[00:59:01] Jeremy: In the different types of assessments, like the example of the video you gave, I'm just thinking to myself, well, the person could ask copilot or ChatGPT to give 'em a script, right? And they can rehearse that when they, um, send you a video.

[00:59:18] Leo: I think, but I think that's, um, I think this is a philosophical shift in assessment that's kind of been gaining momentum over the years and that's that the assignments are all formative and they should all be. Pretty low stakes and the students should be doing them for the process of learning. and then, and, and it's unfair in some ways. There's a, there's a lot of things right now where you kind of grade them on, were you present at this time?

Did you, did you meet this deadline at this time? Which if you're thinking about the, a diverse population of students, like you can imagine like a, a working mother who's also trying to do this, grading them on where you here at this time doesn't feel very equitable to me. And so there's this whole movement for grading for equity that shifts much of the assessment onto the exams.

And so, yeah, the students could, uh, find multiple ways to cheat on the homeworks, but that's not the point of the homework and the homework's just to learn. It's a small scale, the grade, so. But you still then have those kinda controlled environments where they're taking these tests and that's where the grade actually comes from.

Um, it's gonna take some time to make that shift, at, at, at least at a number of schools, my own included assess that those ho take home assignments are a huge portion of the grade. And students will love that because they can get all this help. And they can, especially with the auto graders, that they don't even write their own test cases.

They just use the auto graders, the test cases. Right. Um, which is really depressing. Um, and they go to the, the, the instructional staff. The instructional staff tends to, to give away the answers. That's actually a paper that we, uh, published a few years ago. Um, and so the students love this high stakes, but tons of help version of assessment, but that may not actually measure their, their level of knowledge.

And so it's gonna take a little bit of adjustment, for students and for faculty to do the shift, uh, to where the, as assess the, the exams are the

Give students something interesting to build and don't worry about cheating

[01:01:09] Dan: Yeah. Also, I'm, I'm not convinced that cheating is gonna be a problem here. it's very possible, for example, that students cheat on our previous assignments because the assignments were not authentic. Um, you know, in industry you're never going to, no one's gonna come up to you and say, Hey, like, from scratch, you know, write this exact function, takes two lists and determines, you know, how many values are equal between them.

It's like, it's like, that's not gonna happen, right? You're gonna be doing something that has some sort of business purpose. And I kind of wonder, um, and this, this will, you know, this will play out, um, one way or another in the next, in the next, uh, few months. But I kind of wonder if we give students authentic tasks.

Now you're cheating yourself right out of doing some, some something of value, right? Like before you were. You were probably cheating yourself out of a learning opportunity, but how, how can, you know, how can students know that? Right. The assignments boring, right? It's like, write all these functions and then something, something happens because of the magic, you know, starter glue code we wrote.

So I don't, I don't know. I feel like if you give students opportunities to learn what they want to learn, um, there's, I don't, I don't know. I don't, I just don't think there's a reason to cheat. And, and also, I mean, um, I, I've been much happier in my career recently when I don't worry about it. So it's like, okay, I've got a bunch of students, some of them are gonna cheat, some of them are not.

And I'm here to talk to the ones who, who wanna learn. So, I don't know. A lot of people were on some email lists, for example, and a lot of people seem to be panicking about it. And I, I kind of think, you know, buddy, you had a huge cheating problem before. I don't think it's gonna become worse now that you're giving students authentic work to do.

Right? They, they all want to be using, uh, you know, programming to, you know, to do their jobs better or make their lives better, or the world better. They don't wanna waste their own time. But if you give them a decontextualized task, it's like, it's super tempting to just cheat, right? Because what's the point?

Right? And so, um, I, I, I'm, I'm very hopeful. I, I, I am not convinced that that cheating is gonna be a problem.

[01:03:23] Jeremy: Yeah, that's a good point, and I think it's very motivating for any student or anybody who's learning a thing to, to be able to see a clear, connection to like an actual thing that I made, versus I'm writing functions to pass these test cases is like not very, not very interesting, uh, intellectually.

So I think if you structure the, the projects where it's like, oh, am I gonna actually make this thing that does this thing That seems pretty cool, then yeah, that's definitely more motivating

to, to actually go through with it.

[01:04:00] Dan: Like, just off the top of my head, imagine if every student had to make a landing page, like a website who's gonna cheat? Like what? I want a landing page. Like,

I, I want that. And, and all student and students are gonna want that too. And so it's like, well, okay. Like I, I, I may as well make it right. Like this has a, this has a purpose.

So, Leo. Leo, I'm curious, you've been, you've been, uh, patiently listening to, to that. I'm curious what you think about

[01:04:29] Leo: Oh, I, I, I, I can't agree more. I think the, um, I mean, we can leverage the research, right? The computing and context is kinda this well established thing that if you teach computing in a context that's meaningful to the students, they tend to learn more and engage more, and wanna stay in the major more. Um, and I think we're just gonna be able to do, we do this right?

We, for convenience sake, and because of the scale of the number of students that we've had in our classes, we've kind of moved away from that and gone to these auto graded nots of exciting assignments. And I think we're, this is the impetus we need to go back to fun, creative, interesting assignments that the students are gonna put time and care into because they want to, not because they have to.

Problem Decomposition

[01:05:10] Jeremy: So it, it sounds like through our discussion, you're, you're really excited about, bringing large language models into the classroom and kind of what that means for you and your, your students. And I wonder if there's anything we didn't really touch on or maybe something that was unexpected that you think is gonna make a really big difference, to you and your students.

[01:05:33] Leo: I think one of the things that we haven't touched on yet that I'm, I'm really excited about is, the piece of problem decomposition. And so over the years, because of this trend towards auto grading, uh, what's happened is, all the cognitive work of taking a, a big, computing task and breaking into smaller pieces, deciding what classes should exist, what functions should exist, all those interfaces, all that work that I think is really interesting and exciting.

It is now done for students because the auto grading structure just makes it so you have to have these functions and they just code the functions. and so I think that's really concerning just from a software engineer in perspective, that students are, are learning how to program without learning those core abilities as, as software engineers to take a large problem, break it down, figure out what the right interfaces are, and that's a lot of, that's actually more art than science, I'd argue.

And so the more time you have to practice it, the better. And I am incredibly excited that LLMs are kind of forcing our hand to make us step back, give larger programming tasks to them, and teach them the process of problem decomposition explicitly in a way that, in a way that we've never really, never done before.

I think that's, uh, that's a good place to, to wrap it up on so if people want to hear more about your upcoming book or maybe even enroll in in your class, Leo, where can they get some more information? I.

Both Dan and I have active LinkedIn pages and we're happy to have folks, uh, follow us there. Manning Publishing is the, publisher for our book. Um, and so we have that book out on early access right now. Um, it should be available, uh, entirely electronically by August in time for the start of the fall quarter.

Um, and then it should be out in print, uh, shortly thereafter.

[01:07:25] Jeremy: Cool. Well, this has been an interesting discussion. I mean, large language models are kind of that's the thing right now. Everybody's trying to, to stuff it into every single product. And I think getting both of your perspective on where it fits in in education has been, has been very interesting.

So thank you. Thank you very much for coming on the show

[01:07:46] Leo: Thank you Jeremy, for inviting us and for running such great podcast. We really appreciate it.

[01:07:52] Dan: Thanks Jeremy.

Anita Zhang and Alvaro Levia on systemd at Meta

Wed, 14 Jun 2023 07:00:00 -0000

systemd is a service manager for Linux. It is the first process that runs on many Linux distributions and manages all other user processes. It includes utilities for logging, process isolation, process dependencies, socket activation, and many other tasks.

psystemd is a python library to communicate with systemd over dbus from python as an alternative to shelling out from an application to control services.

Anita Zhang is an engineerd managerd at Meta and Alvaro Levia is a production engineer at Meta.

I attended their systemd workshop at the Southern California Linux Expo.

Topics covered:

What's systemd?
Giving talks and workshops
cgroups and namespaces
systemd timers vs cron
Migrating from CentOS 6 to 7
Production engineers need to go lower in the stack to debug applications
Meta's Linux userspace team
Use of public cloud at Meta
Meta's bootcamp
Pystemd

Mastodon

Workshop

systemd workshop

Conference talks

Transcript

You can help edit this transcript on GitHub.

Introductions

[00:00:00] Jeremy: So today I'm talking to Avaro Leiva and Anita Zhang. Avaro is the author of the pystemd library and he's a production engineer at Meta. And Anita is an engineerd managerd at Meta, and I'll let her explain that further.

[00:00:19] Jeremy: But thank you both for joining me today.

[00:00:21] Anita: Yeah, thanks for having us.

[00:00:24] Jeremy: I guess where we could start, Anita, maybe you could explain a little bit your, your title that I just gave you there.

engineerd managerd

[00:00:31] Anita: Yeah, so by default I, I should be a software engineering manager, but when I transitioned to management, I was not, Ready to go public with, um, my transition.

So I kind of hid it by, changing the title. we have some weird systems in place that grep on like the word engineer. So I had to keep engineer in there somehow. and so I kind of polled my friends what I should change my title to, and they're like, oh, you're gonna support the systemd team, so you should change it to like managerd.

So I was like, sounds good. engineerd, managerd.

I didn't wanna get kicked out of any workplace groups, for example, that required me to be an engineer.

[00:01:15] Jeremy: Oh, okay.

[00:01:17] Anita: Or like engineering function, I guess.

[00:01:19] Jeremy: Yeah. Yeah. And you just gotta title it yourself, so as long as you got engineer in it, you're good.

[00:01:24] Anita: Yeah, pretty much. Some people have really fun titles like Chief Potato Officer and things like that.

[00:01:32] Jeremy: So what groups does the, uh, the potato officer get to go in?

[00:01:37] Anita: Yeah. Not the C level ones. (laughs)

What's systemd?

[00:01:42] Jeremy: I guess maybe to, to start, we should explain to people who aren't familiar, uh, what systemd is. So if either of you wanna wanna take that one.

[00:01:52] Alvaro: so people who doesn't know, right?

So systemd is today is your init system, right? Is the thing that manage your, your process. and the best way to understand this, it is like when your computer, it needs to execute something. And that's something is what we call pid one. And that pid one is the thing that is gonna manage everything from now from there on, right?

Uh, in the most basic level, if you remember how to, how does program start, how does like an idea becomes a program? Uh, you need to fork exec, right? So that means that something has to be at the top of that tree and that is systemd. now that can be anything, right? So there was a time where that was like systemv and there was also like upstart, uh, today's systemd is the thing that, uh, it's shipped in most distributions.

[00:02:37] Jeremy: Yeah, because I, I definitely remember when I first started working with Linux, uh, it was with CentOS 6, and when I would want to run a service, I would have to go and write a bash script and kind of have all these checks for, is this thing running?

Does it have permission to these things, which user is it running as, and so there was a lot of stuff that I remember having to do before systemd came out.

[00:03:08] Alvaro: The good old days as we call them,

[00:03:11] Jeremy: Or the bad old days.

[00:03:13] Anita: Yeah. Depending on who you ask.

[00:03:15] Alvaro: Yeah. So, so that is super interesting because, um, During those time, like you said, you have to write a first script.

That means that you were basically yourself, your own service manager, right? So ideas as simple as, is my program running? There was no real answer. You have to figure it out, right? So if you run a program, uh, you maybe would create a pid file which hold the p or the pid of the process, of the main process, right?

And then something needs to check, oh, is this file exist? Does the file exist and does the content of this file actually match to a process? And then you grab the process. So it was all these ideas that you had to do, and then for, you have to do it for every single software that you would deploy on your machine, right?

That also makes really hard to parallelize stuff, right? Because you have no concept of dependencies. So if your computer has to put, uh, I, I dunno if you remember like long time ago, like Linux machine would, takes like five minutes to boot like your desktop. I remember like openSUSE. I can't remember, like 2008, 2007.

Uh, it would take like five minutes to boot and then Ubuntu came and, and it start like immediately. And it was because, you can parallelize things, but you cannot do that if all you're running are bash script.

Why was systemd chosen to be included in Linux distributions?

[00:04:26] Jeremy: I remember before the Linux distributions didn't include it.

And I wonder if you have any insight into how systemd got chosen to be the thing to manage our processes and basically how we got to where we are today.

[00:04:44] Anita: I mean, we can kind of speculate a little bit. at the time when Lennart started systemd, um, with. Kai Sievers probably messed up his name there. Um, they were all at Red Hat and Red Hat manages fedora these days and I believe fedoras kind of like the bleeding edge for a lot of the new software ideas. Um, and when they picked up systemd as the defaults, um, eventually it started to trickle down to the rest of their distributions through RHEL and to CentOS and at the same time, I think other distributions started to see how useful it was in terms of managing all the different processes and services.

Um, I know Debian at one point had kind of a vote on like whether they should make systemd either default or like, make it easy to switch between both. And then they decided to just stick with systemd because it's, I mean, the public agrees that it's like easy to use and it's more useful.

It abstracts away a lot of things that they had to manually do before

Who is interested in systemd? Who comes to your talks and workshops?

[00:05:43] Jeremy: Something I've been kind of curious about. So just this year at SCaLE uh, you ran a, a workshop teaching people how to use systemd and, and sort of what it is about. I guess when, when you get people coming to these workshops, what are they typically, where are they typically coming from? Are they like system administrators or are they software developers?

Like when you run these workshops, who are you looking for as your audience?

[00:06:13] Alvaro: To be fair, this was the first time that we actually did a workshop for this. But we have like, talk about this in, in many like conferences. here's what happened, right? So every time that you put systemd on the title of, uh, of a talk, you are like baiting people into coming in, right?

Because you do want to hear like some people who are still like reluctant from that war that happened like a few years ago. Between systemd and Ups tart right? most of the people who we get are, I would say like, software engineers, people who do software, and at least the question that I always get a lot, it is like, why should I care about systemd um, if I run everything on my containers in my Docker containers, right?

The other type of audience that you get, you do get system administrators. Uh, but in general those people only cares about starting and stopping services don't really care about like the, like the nice other features that systemd has to offer. And then you get people who just wanna start like flame wars and I'm here for them.

Why give talks and workshops on systemd?

[00:07:13] Jeremy: In previous years, you've given conference talks and, and things like that related to systemd. And I wonder for, for both of you where, where the, the interests came from, where this is something that you feel strongly enough about that you wanna give talks about it.

Because it's like, a lot of times when people give a conference talk, it's about, like new front end technology or some, you know, new shiny thing. Whereas systemd is like, it's like very valuable, but it's something that I feel like a lot of people don't think about.

And so I'm just kind of curious where the interest came for, for both of you.

[00:07:52] Anita: I think I just like giving talks and teaching in general. So if I have work that I found really exciting or interesting, then I'd want to like tell people about it and like teach them and like show them something cool. I think systemd is kind of a really good topic in that case because a lot of people want to learn more about it.

Today there's like lots of new developments going on in systemd. So there's like a lot of basic stuff that you can learn, but also a lot of new advanced topics that are changing every year as well. aside from that, there's also like more generally applicable things. Like everyone wants to know how to debug something if you're like a software engineer or developer or even a sysadmin.

Um, so last year I did a debugging talk. there's a lot of overlap I'd say how about you Alvaro?

[00:08:48] Alvaro: For me, it, my interest in systemd started in, back when I was working on Instagram, we needed to migrate from CentOS6 to CentOS7. and that was the transition where you would have like a random init system to systemd, right?

So we needed to migrate all of our scripts from like shell script to whatever shell script is going to interact with systemd. And that's when I was like, I don't like this. So I also have a thing where if I find something that doesn't have an Python API for it, I go and create a Python api. So I, I create pystemd like during that time.

And I guess for me, the first reaction was when I was digging up systemd was like, whoa, can systemd do that? Like, like really, like I can like manage, network firewalls, right? Can I, can I stop my service from actually accessing the internet without having to deal with iptables at the time?

So that's kind of like the feeling that I wanted to show people when I, when we do these these talks and, and these workshops, right? It's why like most of our talks, eh, have light demos in them because we do want to show people like, Hey, like, this is real. You can use it.

[00:09:55] Jeremy: I don't know if this was a conscious decision on your part, but the thing about things like systemd is they, they feel like more foundational things that don't change that quickly.

Like if you look at front end development, for example, at at meta you've got React, and that ecosystem changes so often that it's like there's always this new thing, you learn the way to do it and then it changes, right? Whereas I feel like when you're in the Linux user space and you're with systemd, like they're adding new things, but the, the.

Foundations kind of stay the same. I'm not sure if that sounds accurate to both of you.

[00:10:38] Anita: Yeah, I'd say a lot of the, there are a lot of stable building blocks in systemd, but at Meadow we also have a kernel team, which is working on like new kernel features all the time. They take years possibly to adopt, but with systemd, if we're able to influence the community and like get those kernel features in earlier, then like we can start to really shape what the future of operating systems look like.

So it's not, it's very like not short term, uh, work that we're doing. It's a lot of long term, uh, work.

[00:11:11] Jeremy: Yeah, that's, that's interesting in that I didn't even think about the fact that you are sitting at the, the user level with systemd, but you kind of know what you want. And so if there's things that the kernel can do to support that, you're having that involvement.

With the open source community, make sure that you have your, your say get put in there. Yeah.

[00:11:33] Anita: Mm-hmm.

[00:11:35] Alvaro: It, it goes both way, right? So one part it is like, yeah, sure, we want features and we create them. Um, and we actually wanted to those to be upstream because we like, one thing that you should, you should never do is manage internal patches for like, things like the kernel, because that's rebase hell.

Um, but you also want to be like part of the community and, and, and, and get the benefit of like, being part of it.

Who should care about systemd?

[00:11:59] Jeremy: And so, like one thing you mentioned ear earlier, Alvaro, is that people will sometimes ask you, I'm running my application in, in Docker containers. Why do I care about systemd? So, so maybe you could explain like, how you would respond to that. Yeah.

[00:12:17] Alvaro: Well for more, for most people who actually run their application container I'd say like, no, you probably shouldn't care.

Right? Like, you're good where you are. But in general, like, like system is foundational in the sense that it is the first thing that your computer boots your computer doesn't boot off of Docker or Kubernetes or, or any like that. So like something has to run these applications. there's also like a lot of value is that not all applications exist in the vacuum.

Like, uh, like let me give you an example. Like if you have a web server, When people are uploading stuff to the web server, you will upload temporary things and then you have to clean it up after a while. So you may want to take advantage of systemd timers or cron or, or whatever you want, right?

While the classical container view is that your pid one of the container is the application that you're running, right? So you do want to have like this whole ecosystem, Not all companies can run on containers. not everything can run in containers. So that's basically where all the things start to, to getting into shape.

There's a lot of value in understanding how programs actually like exist, right? With the thing that I told you at the beginning of how an idea becomes a program understanding like, like you hit, you are in your bash, right? And you hit ls Star full enter, right? What happened in your machine?

Understanding all the things, uh, there is a lot of value and understanding how systemd works. It's, it, it provides, uh, like that knowledge for you.

[00:13:39] Jeremy: So for the average engineer at Meta who is relying on your team to deploy their, their code, I guess, if that's the right term, do you think that they're ever needing to think about systemd or is that kind of more like the responsibility of your team and they're just worried about like, I put my thing into my container and I don't, I don't worry about it.

[00:14:04] Anita: I think there's like a whole level of the stack that sh ideally we should not even care or know that we're running systemd below them. I think that's, say we're doing our job well, cuz then the abstraction is good enough that they don't have to worry about it. But there's like a whole class of engineers below that that have to, you know, support the systems that run our on bare metal and infrastructure and make it happen.

And those are the people who really care about what we're putting in systemd or like what the corner cases are and things like that.

[00:14:37] Jeremy: Yeah, that, that makes sense. I mean, one of the talks that was at SCaLE was, uh, Brian Cantrill um, he gave a talk about the forgotten operator, and he was talking about how people forget that there are actual servers behind all the things we're deploying to, right?

[00:14:55] Anita: Mm-hmm.

[00:14:55] Jeremy: There is a person that you're racking the machines and plugging the power, and like, even though there's all these abstractions in front, that still exists. And so it sounds like things happening at the kernel level and the Linux user space and systemd that's also true because all this infrastructure that people are using to deploy their software on your team is the one who has to keep that running and to keep that running, they need to understand, uh, systemd and, and all these foundational Linux pieces.

Yeah.

[00:15:27] Anita: Mm-hmm. Yeah.

[00:15:29] Alvaro: Like with that said um, I, and maybe it's because I'm very close to to, to the source. Um, and, and you know, like, like I said, like when, when all your tool is a hammer, everything looks like a nail? Well, that hammer for me, a lot of the times it is like even like cgroups or, or namespaces or even like systemd itself, right?

there is a lot of times where, um, like for instance, a few years ago we have not, like, like last year or something, uh, we had an application that was very was very hard to load, right? It used a lot of memory. And so we start with, with a model where we would load like a, like a parent process and then child process would deal with, with, um, with the actual work of the thing, the classical model of our server.

Now, the thing is that each of the sub process that would run would need to run, uh, on a separate set of privileges, right? So it would really need to run as different users. And that was like very easy to do. But now we actually wanted to some process to run with a, with only view of the file system while the parent process actually doesn't have to do that, right?

Uh, or we want to limit the amount of CPU that a child process would use. So like all of these things, we were able like to, to swap it out uh, with using like systemd and, and, uh, like, like a good, Strategy for like, you create a process, you create a new cgroup, you put that into the cgroup, you create the namespace, uh, you add this process into that namespace, and then you have like all this architecture, and it's pretty free because forking it's free in general.

[00:17:01] Anita: Actually, Alvaro's comment reminded me of like why we even ended up building the systemd team in the first place. It's kind of like if we have all these teams trying to touch cgroups on their own or like manage processes on their own, they're all gonna do it a different way and not, all of them will be ideal or like, to put it bluntly, I guess, we're really aiming to try and provide like a unified, really good foundational experience, for the layers above us.

And so, systemd and the other things that go into the operating system are a step to get there.

What are cgroups and namespaces?

[00:17:40] Jeremy: And so for someone who's not familiar with the concept of cgroups or of namespaces, could you kind of give like a brief description?

[00:17:50] Anita: so namespaces are, uh, we're talking about the kernel feature where, um, there are different ways to isolate, uh, different resources to the process or like, so that they have their own view of certain things, the network or, the processes and things like that. Um, and Cgroup stand for control groups.

It's, at meta we only use Cgroups v2 which is a way to organize your processes into, Kind of like a directory view. but processes will be grouped into different, folders, shall you say, but that allows you to, uh, manage the resources between different groups of processes, which is how systemd does its services.

[00:18:33] Alvaro: So a, a control group will allow you to impose restrictions on how each system uses the resources, right? So with a cgroup, you can say, only use 20% of cpu, and the, and the kernel will take care of that. Uh, while namespace it is basically how you view the system around you. So like your mount directory like, like where does your home points to?

that's, I would say it's more on the namespace side of things. So one is the view then one is the actual, the restrictions. And like Anita said, like systemd does a very clever thing. It doesn't have two, is not the. It's not why cgroups exist, but every time that you start a systemd service, systemd will create a cgroup for that service and will put every process in that cgroup, even though all cgroups would end up being the same, for instance.

But eh, you can now like have a consolidated list of what process belong to a service. So a simple question like, like what services has my Apache web service started? That's show you how old I am. But yeah, you can answer that now because you just look at the cgroup, you don't look at the process tree.

[00:19:42] Jeremy: So it, it sounds like the, the namespacing is maybe more for the purposes of security, like you said, giving you a certain view of your, your system. and the cgroups are more for restricting resources, but also, like you said, being able to see what are all the processes, um, are associated. Um, so that you, you don't have a process that spins up other processes and then you don't know who owns those, and then you don't know how to shut 'em all down.

That, that takes care of that for you.

[00:20:17] Alvaro: So I, I always reluctant to use the word security or privacy. I would like to use the word isolation. Yeah. And then if people want to impose the idea of security and privacy to those, that's fine, but it's, but it's mostly about isolation.

[00:20:32] Anita: Yeah. Namespaces are what back all the container technologies are.

Anytime you run things in a container, it's probably using some kind of name spacing. But yeah, you, you kind of hit the nail in the head. Isolation versus like resource control

[00:20:46] Alvaro: As Anita just said that's what fits on containers, uh, namespaces and cgroup like a big mix of those. But that doesn't mean that the only reason why those things exist are for containers.

You can take advantage of those technologies without actually having to think of a container.

systemd timers vs cron

[00:21:04] Jeremy: Something you had mentioned a little bit earlier is, is how systemd has other features and one of them was, was timers. And I was kind of curious, cuz you said you could, you wanna schedule a job, you can run it using cron or you can run it using systemd timers.

And it, I feel like whenever I see people scheduling jobs, they're always talking about cron but, but not so much about systemd timers. So I was curious if you had any thoughts on that.

[00:21:32] Anita: I don't know. I feel like it's used pretty interchangeably these days. Um, like even when people say cron they're actually running a systemd timer with the cron format, for their time.

[00:21:46] Alvaro: So the, the advantage of of systemd timers over cron is, is basically two, right? The first one it is that, you get more control on the time, right?

So you have monotonic and absolute times, right? Which is basically like, you can say like this, start five minutes after the previous run. Or you can say this, start after five minutes after the vote, right? So those are two type of time, that is the first one, uh, which may be irrelevant for most people, but that's it.

Uh, the other one is that you actually have advantage over the, you take full advantage of systemd, right? In current you say run this process, right? And how that process run, it's basically controlled by the process itself, right? So if you, uh, like if the crontab is for the user, that's good for you, but if you want to like nice it or make it use less cpu, that's what it is.

Well, with systemd you say, This cron will start the service and the service, you take full fledged advantage of all the things a service can do.

[00:22:45] Jeremy: From what I could tell, looking at the, the timers api, it, it felt like it would be a lot easier to kind of see when things ran, get, you know, get a log of, I ran this time job and it, it failed.

Um, it seemed like systemd had a lot more kind of built in to, to kind of look into that. but, uh, yeah, like Anita was saying, like when you, you hear kind of cron all the time, but like you said, maybe it's, maybe they're not actually using cron all the time. They're just saying cron

[00:23:18] Alvaro: Well, I would say this for cron like the, the time, the time, uh, syntax for it, it's pretty, it's pretty easy to understand, even though I never remember where, I remember where weekday is, right? The fourth, which one is which?

[00:23:32] Jeremy: I, I'm with Anita. I need to look it up whenever I'm gonna use it. (laughs)

[00:23:36] Anita: Yeah. I use a cron translator when I have to use cron format.

[00:23:41] Alvaro: This is like, like a flags to tar, right?

Like, I never remember which, which flags to put.

[00:23:48] Anita: Yeah, that's true.

[00:23:50] Alvaro: We didn't talk about this, we haven't talked about systemd-run, but one of the advantages of the, one of the advantages of using timers is that you can schedule them on demand, right? So like cron if you wanna schedule something over time, you need to modify the cron the cron file.

Uh, and that's, it's problem right? With systemd, you can have like ephemeral units and so you can say like, just for now, go and run this process five hours from now. Like, and after that, just forget about it.

[00:24:21] Jeremy: Yeah, the, during the workshop you mentioned systemd-run and I hadn't even heard of it. And after I saw that I was like, wow, this, this could be really useful.

[00:24:32] Alvaro: It is quite useful.

How have things changed at meta?

[00:24:34] Jeremy: One of the things you had mentioned, I, I guess you've, you've been at Meta for, for quite a while and you were talking about how you started with having all these scripts you were running on CentOS 6 and getting off of that to something more standard. I wonder if you could speak a little bit to that, that process.

Like what did things look like then and, and how have they they changed over the years?

[00:25:01] Alvaro: I would say the following thing, right? Like Anita said, like for most engineers, the day to day of things don't really change that much, because this is foundational things, right? So if you have to fundamentally change the way that you run applications every couple of years, then you waste a lot of time, right?

It's not the same as you say, like react where, or, or in the old days, angular where angular one, angular two, angular three, and then it's gone, right? Like, so, so I, I would say it like for the average engineers things don't change that much, uh, for the other type of engineers, like, like us who we, who that we really care about, like how things run.

like having a, an API where you can like query the state of your service. Like if like asking like, is my service running with an API that returns true or false, that is actually like a volume value that you can like, Transferring in your application, uh, that, that helps a lot on, on distributed systems.

a lot of like our container infrastructure that we use internally at Meta is based on a lot of these ideas and technologies.

[00:26:05] Anita: Yeah, thinking back to the CentOS 6 to 7 migration, I wasn't on like the any operating systems team at the time, but I was working with them and I also was on a team that had to migrate, figure out how to migrate our scripts and things over. so the one thing that did make it easy is that the OS team, uh, we deploy all our things using Chef.

Maybe you've heard like Puppet and Ansible, that's our version, the Open Source Chef code. Um, and they wrote some really good documentation on how to migrate, from Runit, which is what we were using before to systemd. it was. a very large scale effort across multiple teams to kind of make sure their stuff works, do the OS upgrade and then get used to using systemd.

[00:26:54] Jeremy: And so the, the team who is performing this migration, that's not the product team. That would be the, is it production engineering? Is that, is that what you called that?

[00:27:09] Alvaro: So, so I was at the other side of, of that, of that table where I, the same as Anita, we were doing the migration more how most things work at Facebook is that it's a combination of the team that is responsible for the technology and the teams who uses the technology.

Right. So we are a company, so we. Can like, move together. it's the same thing when you upgrade kernels. Most of the time the kernel team will do the effort to upgrade the kernels, and when they hit a roadblock or something, they will call for the owner of the service and the owner of the service can help debug uh, for the case of CentOS 6 and CentOS 7, eh, I was the PE at Instagram P Stand for Production Engineer.

I was the PE at Instagram who did most of the migration of our fleet. So I, I rewrote most of the things because I understand how our things work, and the OS team provide like the support to understanding like, like when can I use some things, when can I use not other things. There was the equivalent of ChatGPT at those days, right?

I was just ask them how to do stuff. They will gimme recipes. so, so it it's kind of like, like a mix, uh, work, uh, between those two teams. Uh, Anita, maybe you can talk a little bit about what you talk when you were upgrading the version of systemd and you found a bug?

[00:28:23] Anita: Oh, the, like regular systemd upgrades nowadays?

I, I'd say it's a lot easier these days. I mean, since the, at the time when we did the CentOS 6 to 7 migration, it was like, our fleet was a lot more fragmented. I'd say nowadays it's a lot more homogenous, which makes, which makes it easier. yeah, in the early versions there were some kind of obscure like, interactions with the kernel or like, um, we, we make pretty heavy use of systemd to run our container system.

So, uh, if we run into any corner cases, um, like pretty obscure stuff sometimes, because we make pretty heavy use of the resource control properties. we usually those end up on the GitHub tracker, things like that.

[00:29:13] Alvaro: That's the side effect of hiring very smart people. They do very smart things that are very hard to understand. (laughs)

[00:29:21] Jeremy: That's kind of an interesting point about you, you saying you're using these, these features, you know, of the kernel very heavily because, you're kind of running your own infrastructure, I think even your own data centers, so you're kind of forced to go to this level, it sounds like just because of the sheer number of services you're running and the fact that like, you have to find a way to pack 'em all onto the same machine.

Does that, does that sound right?

[00:29:54] Anita: Yeah, I'd say at, at our scale, like it's more cost effective to act, own the servers and run all everything on it ourselves versus like, you know, leasing from, uh, AWS or something, which we've also explored in the past. But that also means we need more engineers to build and run things on our servers.

[00:30:16] Jeremy: Yeah. So the, the distinction between, let's say you're a, a small company or a mid-size company and you pay AWS or, or Google to, to do your hosting for you, then you may not necessarily get exposed to a lot of the, the kernel level problems or even the Linux user space problems because you're, you're working at a higher level and that's why you don't necessarily encounter those kinds of things.

[00:30:46] Anita: I'd say not, not necessarily. I think, once you get even like slightly lower in the stack where you're just like on your own server, Then you will want to start really looking into like what systemd's doing, how does it interact with other, uh, services, um, on your server, and how can you like connect these different features together?

[00:31:08] Alvaro: One of the things that every developer who who works like has to worry about is log right, and that, and that's the first time that you actually start interacting with systemdata available, right? So you have to understand, like maybe it's not just tail /var/log foo, but log right. Maybe it's just journalctl and it's like, what?

But yeah.

[00:31:32] Jeremy: Yeah. That's a good point too about whenever you're working with the operating system, like you're deploying onto a Linux machine. Regardless of the distribution, if you're the person who's responsible for that, you, you need to know this stuff. Right. Otherwise it's kind of like, you're just putting stuff out there and hoping for the best.

Yeah.

[00:31:54] Alvaro: Yeah. There, there's also another thing that, I dunno if I've said this before, but, a lot of the times you don't have to know these technologies, but knowing them will help you do your work better.

[00:32:05] Jeremy: Yeah, totally. I mean, I think that that applies to pretty much anything in, in development, right? I, I've heard often that some people will say, you take the level that you work at currently and then kind of just go down one level. Right. And then, so you can kind of see what's underneath that. And you don't necessarily need to keep digging, cuz eventually if you keep digging, you're getting into, you know, machine instructions and whatnot.

But, um, Yeah, maybe just one level is, is good to, to give you a better sense of what's

happening.

Production engineers need to go lower in the stack to be able to debug applications

[00:32:36] Alvaro: Um, every time that I, that I, that somebody ask me like, what is the difference between a PE and a SWE, uh, software engineer, production engineer, typical conference, uh, one of the biggest difference that I, that I say is that a PE would tends to ask a lot of questions going the same thing that you're saying, we're trying to go down the stack, right?

And I always ask the following question, eh, do you know how time dot sleep is implemented? Right? Do you like, like if you, if you were to see time dot sleep on your Python program, like do you actually know what is doing under the hood, right? Is it a while true? While the time, is it doing a signal interrupt?

Is it doing a select on a file descriptor with a timeout? Like what is it doing? would you be able to implement it? And the reason why I say this, because like when you're debugging an application, like somebody something's using your cpu, right? And then you see that line on your code, you. You can debug every single line of your code.

But also there's a lot of value to say like, no time.sleep doesn't cause CPU to spike. Right. Because it's implemented in a way that it would not be possible to do that.

Meta's linux user space team

[00:33:39] Jeremy: Another thing that I think might be kind of interesting to talk about is, so Meta has this Linux user space team. And I, I wonder like including your role in it, but just as a whole, like what does that actually mean day to day?

Like, what are the kinds of problems people are facing that, a user space team would be handling?

[00:34:04] Anita: Hmm. It's kind of large cuz now that the team's grown out to encompass a few other things as well. But I'll focus on the Linux user space part. the team started off, on the software engineering side as the systemd developer team.

So our job was really to contribute to the community. and both, you know, help with, problems and bugs that show up in upstream, um, while also bringing in new features, that we think would be useful both at Meta and to like, folks, in the Linux community as a whole. so we still play a heavy role in, systemd.

We also support it, uh, within the fleet, like we roll out new releases and things like that. but we're also working on a few other projects in. User space. Um, BP filter is one of them, which is, uh, how can we convert like IP tables and network filtering, into BPF programs. Um, on the production engineering side, they focus a lot on, the community engagements.

So in addition to supporting CentOS they also handle, or they like support several packages in Fedora, Debian and other distributions, really figuring out how we can, be a better member of the open source community, and, you know, make connections there and things like that.

[00:35:30] Jeremy: And, and what was your, your process for getting in involved with this team?

Because it sounded like maybe it either didn't exist at the start, or it was really small and, and now it's really, really grown.

[00:35:44] Anita: So I was kind of the first member of like the systemd team, if you would call it that. Um, it spun out of containers. So my manager at the time, who's now my director, was he kind of made a call out on workplace looking for people who'd be willing to, contribute to systemd.

He was, supporting the containers team at the time who after the CentOS 7 migration, they realized the potential that systemd could have, making their jobs a lot easier when it came to developing the container backend. and so along with that, they also needed someone to help, you know, fix bugs, put in new features and things that would, tie into the goals of the containers team.

Um, and eventually now our host management team, I was the first person who reached out to him and said, Hey, I wanna give this a try. I was on the security team at the time and I always had dreams of going back into like, operating systems development and getting better at it. So yeah, that's kind of how I ended up in this space.

A few years later, he decided, Hey, we should build a team and you should like hire some people who will also do this with you and increase our investments in systemd. so that's how we kind of built out the Linux user space team to encompass systemd and more like operating system, projects.

Working on the internal security team vs the linux userspace team

[00:37:12] Jeremy: And so when you were working on the security team before, was that on software internal to meta or were you also involved with, you know, the open source, user space side as well?

[00:37:24] Anita: That was all internal at the time. Which was kind of a regret because there was a lot of stuff that I would've liked to talk about externally.

But I think, moving to Linux user space made me realize like, oh, there's so much more potential in open source projects, in security, which is still like very closed source from our side.

[00:37:48] Jeremy: And, and so like in your experience, what have been some of the big differences? I mean, definitely getting to talk about it is a big one.

but like in terms of your day-to-day, what are the big differences between working on something internal versus something that that's open source?

[00:38:04] Anita: I have to talk more with external folks. we're, pretty regular members of like the systemd like conclave sync that we have with the other upstream maintainers.

Um, Oh yeah. There's a lot more like cross company or an external open source community building that we have to do. it kind of puts into perspective like how we manage our time and also our relationships versus like internally, like everyone you work with works at Meta. we kind of have, uh, some shared leadership at the top.

it is a little faster to turn around, um, because, you know, you can just ping people on work chat. But the, all of the systems there are closed source. So, um, there's not like this swath of people outside that you can ask about when it comes to open source things.

[00:38:58] Jeremy: You can't, can't look in, discord or whatever for questions about, internal meta infrastructure to other people.

It's gotta be. all in the same place. Yeah.

[00:39:10] Anita: Yeah. And I'd say with like the open source projects, there's a lot of potential to tap into, expertise and talent that just doesn't exist internally. That's what I found really valuable, cuz people have really great ideas outside as well. Um, and we should like, listen to them and figure out how to build that into their systems and also ours

Alvaro's work at meta

[00:39:31] Jeremy: And, Avaro, I don't know when you first started, was that on internal, infrastructure and tooling as well?

[00:39:39] Alvaro: Yeah, so, um, my path is different than Anita and actually my path and Anita doesn't share any common edges. so I, I don't work at the user space or the Linux kernel or anything. I always work in teams adjacent to it. Uh, but. It's always been very interesting to know these technologies, right? So I started working on Instagram and then I did a lot of the work in containers in migrations at where, where we build psystemd

and also like getting to know more about that technologies. We did, uh, a small pilot on using casync which is a very old tool that like, it's only for the fans, (laughs) it's still on systemd repository, I dunno if that's used or anything, but it was like a very cool idea of how to distribute images. Uh, and in Instagram we do very fast deployments.

So we deploy, or back then we used to deploy the source code, of Instagram every seven minutes, right? So every seven minutes, every time that a developer did commit to master, uh, we pushed that into production in less than an hour and we did that every seven minutes. So we were like planning to, to use those technologies for that.

Um, And then I moved to another team inside of Meta, which is called Cloud Foundation, where we do a lot of like cloud infrastructure, uh, like public cloud. Uh, that's the area, that is very much not talked much about. but I keep like contributing to, to like this world. never really work on, on, on those teams inside of Meta.

[00:41:11] Jeremy: So I guess it's your, your team is responsible for working with the engineers who work on product to be able to take their code and, and deploy it. And it's kind of like you work in combination with the user space team or the systemd team to make sure that what you're doing can be supported by them.

Is that kind of an accurate description?

[00:41:35] Alvaro: Yeah, that's, that's, that's definitely not an exhaustive description, but yeah, that's the, we, we, we do that.

Public cloud at meta

[00:41:42] Jeremy: It's interesting that you're, you're talking about public cloud now. So when you move to public cloud, are you using VMs kind of like you would in a data center, or is it, you're actually looking at the more managed services and things like that?

[00:41:57] Alvaro: So I'm gonna take a small detour and say like, something that is funny. When I got hired by Facebook, we were working on Instagram. So Instagram was just an acquisition for, for, for meta right. And Instagram ran on AWS. So why wasn't the original team who were moving stuff from AWS into the internal data centers at Meta?

On the team that I work now, uh, we work to support workloads that cannot run on meta infrastructure either for legal reasons, or for, for practical reasons. Right, because we don't have the hardware, uh, capability or legal resource because the government ask us, like, this cannot be on, on your data center or security, right?

We don't wanna run this, this binary that we don't understand on our network. We do want to work in isolation. and the same thing that Anita was saying, where their team are building the common ways of using these tools, like systemd, and user space. we do the same thing, but for using cloud technologies.

So in a way that is more similar to meta. So that's the detour now the, to answer your actual question, uh, we do a potpourri of things, right? So since we manage infrastructure and then teams deploy their code, they are better suited to know how their code, gets to run. Uh, with that said, we do have our preferred ways of how you would run stuff.

and it's a combination of user containers, uh, open source containers, and and also like VMs

There's a big difference between VMs and meta and in public cloud

[00:43:23] Jeremy: So it, it sounds like in this case, you're, you're still using VMs even in public cloud, so the way that you do deployments, the location is different, but the actual software and infrastructure that you're running is, is similar.

[00:43:39] Alvaro: So there's there's a lot of difference. Between the two things, right. So, the uniformity of hardware at Facebook, or our data centers, makes deploying things very simple, right? while in, in the cloud, you first, you don't get that uniformity because everybody like builds their AMIs as, as they want to build it.

But also like a meta, we use, one operating system, in the cloud, you are a little bit more free of what you want. And one of the reasons why you want to go to the cloud is because you can run stuff on. On, on, on the way that that meta will run. Right? So, so even though we have something that are similar, it's not as simple like, oh, just change your deployment from like this data center to like whatever us is one think you would

run.

[00:44:28] Jeremy: Can, can you give an example of something where you wouldn't be able to run it on Meta's, image that they would choose to go to public cloud to run a different image for?

[00:44:41] Alvaro: So, um, so in, in general, like if the government ask us, like, this is not necessarily like, like the US government, right? So, and like if the government ask us like, hey, like you need to keep this transaction on, on our territory, right?

for logs, for all the reasons, for whatever, right? like, and, and we wanted to be in the place, we would have to comply. And that's where we will probably use this, this kind of technologies security is another one that is pretty good. And the other one, it is like, in it general, like, like, uh, like disaster recovery, right?

If, if meta is down in a way where we cannot communicate with each other using metas technologies, right? Like you would need to have like a bootstrap point.

[00:45:23] Jeremy: Is, is it the case where you are not able to put, uh, meta's image up into public cloud? Because you were, The examples you gave was more about location, right?

Where you're saying we need to host in public cloud because it needs to be in this country, but then I think you were also saying the, the actual images you would use on AWS right. Would be. I don't know, maybe you'd be using Amazon Linux or maybe you'd be using a different, os entirely.

And is that mainly because you're just not able to deploy the same images you have, uh, in-house?

[00:46:03] Alvaro: So in, in, in general, uh, this is kind of like very hard to to explain, but, but, uh, if, if we would have to deploy code to a, machine and that machine would, would, would be accessed by people who are not like meta employees and we have no way of getting them to sign NDAs then we would not deploy meta code into that machine.

Uh, because that's Sorry. No, not Pi PI's personal information. I was, uh, ip, sorry, that's that's the word. Yeah. Yeah.

[00:46:31] Jeremy: So, okay. So if there's, so if you're in public cloud, there's certain things that you just won't put there just because. Those are only allowed to run on Metas own infrastructure.

[00:46:44] Alvaro: Yeah

Meta's bootcamp

[00:46:44] Jeremy: Earlier you were talking about Instagram was an acquisition and they were in AWS were, were you there at the time or you joined, after?

[00:46:54] Alvaro: No, I joined. I joined after I joined to, to meta. The way that Meta does hiring, at least for my area, is that you get hired as a production engineer, but you don't get assigned to a team.

So you go through a process called boot camp where you get to try different teams and figure out what things you like. I try a couple of different teams, turns out that I like it to work at the Instagram.

[00:47:15] Jeremy: And so at that time they were already running on Facebook's internal infrastructure and they had migrated off of AWS

[00:47:24] Alvaro: We were on the process of finishing that migration.

[00:47:28] Jeremy: So by the time you were there, yeah. Basically get, getting everything out of AWS and then into meta's internal.

[00:47:35] Alvaro: Yeah. And, and, and everything is, is a very hard terms to, to define. Uh, I would say like, like most of all, like the bulk of things we were putting it in inside, like, at least what we call our Django servers.

Like they were all just moving into internal infrastructure.

How Anita started

[00:47:52] Jeremy: This kind of touches on the, the whole boot camp thing, but, Anita, I saw that you, you interned at Facebook and then you took a position there, when you ended up taking a position, I'm kind of curious what were the different projects you looked at or, or how did you end up settling on the one you chose?

[00:48:11] Anita: Yeah, I interned, um, and I joined straight out of university. I went into bootcamp similar to Alvaro and I got the chance to explore several different teams. I knew I was never gonna do UI that was just like not my thing. Um, so I focused, uh, my search on all like backend infrastructure teams. Um, obviously security, uh, was one of them because that's the team I was in interning on.

Um, I also explored, the kind of testing infra team. we call it sandcastle. It runs our internal like unit tests and things. and I also explored one of the, ads infrastructure backend teams. so it was mainly just, you know, getting to know the people, um, seeing which projects appealed to me the most.

Um, and then, you know, I kind of chose based on that, I, I think I've always chosen. My work based on how interesting the project sounded, uh, which has worked out in my favor as far as I could tell.

How Alvaro started

[00:49:14] Jeremy: How, how about, you Alvaro what were the, the different projects you looked at when you first started?

[00:49:20] Alvaro: So, As a PE you do have a more restrictive, uh, number of teams that you can, that you can join.

Uh, like I don't get an option to work in ui. Not that I wanted, but, (laughs) I, I, it's, it's so long ago. Uh, I remember I did look at, um, at MySQL as a team, uh, that was also one of the cool team. Uh, we had at that time, uh, distribute, uh, engine, uh, to, to run work, like if, like celery or something like that. But internally, I really like the constable distribute like workloads, um, and.

I can't remember. I think I did put, come with the Messenger team, that I, I ended up having like a good relationship with their TL their tech lead, uh, but never actually like joined that team. And I believe because she have me have a, a PHP task and it was like, no, I'm not down for doing PHP

[00:50:20] Jeremy: Only Python. Huh?

[00:50:21] Alvaro: Exactly. Python.

Python. Because it's just above C level.

Psystemd

[00:50:27] Jeremy: I mean related to that, you, you started the, the psystemd project. And so I wonder if you could explain what the context behind that was. Like what sparked I need to make this, this library?

[00:50:41] Alvaro: So it's, it's a confluence of two things. The first one, it is like, again, if I see something that doesn't have a Python API for it, I.

Feels the strong urge to create one. I have done this a couple of times, mostly internally, but also externally. that was one. And when, while we were doing the migration, I, I, I honestly, I really hate text processing. So the classical thing was like, if you wanna know if your application's running, you do systemctl, you shell out to systemctl status, then parse the output, find the, find the status column.

Okay. And I didn't like that. And I start reading about like, systemd uh, and I got in contact with the or I saw like the dbus implementation of systemd. And that was, I thought that was a very interesting idea how that opened all the doors. Right? Uh, so I got a demo working like in a couple of hours.

and then I said like, okay, now how do we make this pythonic? And then I created that and I just created, again, just for migrating Instagram. That was the idea. Then, uh, one of the team members who work with Anita, but also one who doesn't work with us anymore, they saw this and said like, Hey, like this looks like a good thing to open source it.

So it was like, sure, like I'm happy to opensource it. So we opensource it and then we went to all System Go, which is a very nice interesting conference that happened in Berlin where like all the head for like user space get together. and, and I talk about it and people seems to like it, and that's the story of that.

[00:52:15] Jeremy: And so this was replacing, I guess, like you were saying, a lot of people were shelling out and running cat commands and things like that from their Python scripts. And this was meant to be a layer on top of that.

[00:52:30] Alvaro: Yes. So it, it does a couple of things. So first of all, inspecting the processes or, or like the services, getting that information out.

That's one of the main usage. But also like starting or stopping or like doing all that operations that you want to do. Uh, knowing the state of, of, of services, uh, that's also another thing that people take advantage of. The other thing that people take advantage of is to modify the status of the, of the processes at runtime, like changing properties, like increasing or decreasing the CPU threshold.

because systemd provides a very nice API or interface to modify the cgroups properties that otherwise you would need to kind of understand the tree structure that, uh, that, that whatever. so that's what people tend to use this mostly internally.

[00:53:23] Jeremy: And so it, it sounds like at least on the production engineering side, you're primarily working in, in Python.

is that something that's the teams before were using Python and so everybody just continues using Python? Or is there kind of like more structure or thought put into that?

[00:53:41] Alvaro: I would say the following thing about it, um, like in in general, uh, there's, there's not a direction on which language you should use.

It's pretty natural which language you should use, but with without said, there's not a Potpourri of languages inside of, of meta. most teams use c c plus plus Python and rust and that's it. There's go, that appears every once in a while there. Sorry, I should not talk about this like, like, or talk like this about this, but eh, there are team who are actually like very fond of go and they use it and they contribute a lot to that space.

It's just not. That much, uh, use internally. I have always gravitated towards Python. That has been the language that teach me how to do real coding. and that's the language that got me a job at meta. So I tends to work mostly on that. Yeah.

[00:54:31] Anita: Hey, you forgot hack Alvaro. Our web services. (laughs)

[00:54:37] Alvaro: Yes. Yes. Uh, so I would say like, the most used language at Meta is actually PHP

it's just like used by, by one particular product. That, that is the Facebook product. Yes. So our, our entire web interface, eh, or web stack uses a combination of hack, which is a compiled php, which is better than uncompiled php, also known as vanilla php. Uh, there is a lot of like GraphQL, React, and, I think that's it.

[00:55:07] Anita: Infrastructure is largely like c plus plus Python, and now Rust is getting a huge following as well.

[00:55:15] Alvaro: Yeah. Like, like Rust. Rust is, I I would say it's the fastest growing language inside, inside of Meta.

And the thing is that there is also what you call like the bootstrap problem. Um, there's like today, if I wanted do my python program and I have a function that fails one every three times, I can add a decorator that is retry, that retries every time that something fails for a timeout, right? And that's built in and it's there used and it's documented.

And I can look at source code that uses this to understand how, how works. When you start with a new language, you don't get the things. So people have to build them. So there's the bootstrap problem.

[00:55:55] Jeremy: That's also an opportunity as well, right? Like if you are the ones building sort of the foundations, then you, you have an opportunity to be the ones who have the core libraries that people are, are using every day.

Whereas if a language has been around a while, it's kind of, some of that stuff is already set, right? And you may or may not like the APIs, but that's what people use. So that's what we, that's what we do.

One of the last things I'd kind of like to ask, so Anita, you moved into management in just the last year or two or so, and I'm kind of curious what your experience has. Been like, was that a conscious decision where you wanted to go from engineering, uh, software engineering to management?

Or maybe you could talk a little bit to that.

[00:56:50] Anita: Oh man, it hasn't even been a year yet. I feel like so much time has passed already. Uh, no, I never had any plans to go into management. I love being an engineer. I love being in the code. but, I'd say my, my current manager and uh, my director, you know, who hired me into the Linux user space team, kind of.

Sold me a little bit on the idea of like, Hey, if you wanna like, keep pushing more projects, you wanna build out the team that you wanna see working on these things, um, you can consider going into management, taking it slow in a, what we call a T L M role, which is like a tech lead manager, role where you kind of spend some time doing development, and leading the team while also supporting, the engineers as a manager doing the hiring and the relationship building and things that you do in management.

so that actually worked out quite well for me, despite Alvaro shaking his head at first. I really enjoyed being able to split my time into kind of the key projects that I really wanted to work on, um, while also supporting the engineers and having them build out, um, New features in systemd and kind of getting their own foothold in the community as well.

but I'd say like in the past few months, it's been pretty crazy. I, I probably naively thought that I'd have a little more control over, I don't know. My destiny has a manager and that's like a hundred percent not true. (laughs) Um, you're, you are kind of at both the whims of your engineers and also the people above you.

And you kind of have to strike that balance. But, um, my favorite part still, just being able to hide the nasty stuff away from the engineers, let them focus on their work and enjoy what engineers wanna do best, which is just like coding, designing, and like, you know, doing fun, open source stuff.

[00:58:56] Alvaro: I will say like, Anita may laugh about me for, because like she's on the other side, but one thing that I least I find very cool at Meta is that managers are not seen as your boss. Right? They're still like a teammate who just basically has a different roles. This is why like when you're an engineer, you can transition to be a manager and that's, it's not considered a promotion that's considered like a, a like an horizontal step and vice versa, you can come back, right.

from a manager into, into like an engineer. Yeah.

[00:59:25] Jeremy: That was what I would say. And, uh, I guess when you were shaking your head, I'm guessing this means you, you don't wanna become a manager anytime soon.

[00:59:35] Alvaro: So I, I never closed the door on that, but I was checking my head to the work of a tlm. Right. Uh, so the tlm TL stands for Tech Lead and m stands for manager.

so you're basically both, but with the time of only one. So, uh, Anita was able to pull it off. I don't think I would be able to pull up like, double duty on that.

[00:59:56] Anita: Yeah. Unfortunately I support too many people now to do the TL stuff as deeply as I used to, but I still have find some time to code a little bit here and there.

[01:00:09] Jeremy: So you were talking a little bit about how things have been crazy the last few months. If, if someone is making the transition into management, like what are the kinds of things that you would tell them to, to look out for or to be aware that's coming?

[01:00:27] Anita: Um, when I, before I transitioned, I talked to a lot of managers about like, oh, what was like, you know, the hardest part about management.

And they all have kind of their own horror story about what happened to them when they transitioned or even like, difficult things that happened to them during management. I'd say don't expect it to be easy. you're gonna make a lot of mistakes usually in like the interpersonal relationship side, and it's really just about learning how to learn from your mistakes, pick back up and do better next time.

I think, um, you know, if people like books, the Making of a Manager by Julie Jo, she was a designer, and also a manager, at then Facebook. She's no longer here. but she has a really good book on like what you can expect when you transition into management. the other thing I'd say is don't go into management without having a management chain that you can really trust.

I'd say that can kind of make or break your first few years as a manager, whether you'll enjoy it or not, or even like whether you'll be able to get through the hard times.

[01:01:42] Jeremy: Good point. Yeah. I mean, I think whenever you take on anything new, right? Having the support of the people above you or just around you as well is like, that makes such a big difference, right? Even like the situation can be bad, but if everyone is supportive, then you can, you can get through it.

[01:02:02] Anita: Yeah, that's absolutely right.

[01:02:04] Jeremy: I think that's a good place to wrap up unless either of you have anything else that you thought we should have talked about.

so if people want to check out what you're working on, what you're up to, um, how can they find you?

[01:02:20] Anita: well, I guess we're both on matrix now. Uh, I'm Anita Zha on Matrix, a n i t a z h A. we both have Twitters as well. If you just search up our names. Nope. Yeah, you're on Twitter. Yeah.

[01:02:36] Alvaro: There is an impostor with my name, right? Actually it's not an impostor. It's just me. I just never log into Twitter anymore.

[01:02:40] Anita: We both have Mastodon now as well?

Yes. Fosstodon we're both frequently at conferences as well. what's, what's coming up next? I think it's, uh, devconf cZ in the Czech Republic. and then, uh, all systems go in September.

[01:02:57] Alvaro: You sent something in Canada?

[01:03:01] Anita: Oh, yeah. L F F L F S M M B P F is coming up. That's a, that's more of a kernel conference, though.

[01:03:09] Alvaro: An acryonym that is longer than the actual word. Yes. Yeah.

[01:03:12] Jeremy: That's a lot. That's a lot of letters.

[01:03:14] Anita: It's a, it's a mouthful. (laughs)

[01:03:18] Jeremy: That's very neat that you get to, to kind of go to all these different conferences and, and actually get, to meet the people in, in person that are, you know, working with the same things you are and, get to be in the same room.

I think that's a, that's a real privilege. Yeah.

[01:03:35] Anita: Yeah, for sure.

[01:03:38] Jeremy: All right. Well, Anita and Alvaro, thank you so much for chatting with me today.

[01:03:43] Alvaro: Thank you for hosting.

[01:03:45] Anita: Yeah. Thanks for the opportunity. This is a lot of fun.

David Cramer on Application Monitoring with Sentry

Wed, 14 Jun 2023 05:19:41 -0000

Sentry is an application monitoring tool that surfaces errors and performance problems. It minimizes the need to manually look at logs or dashboards by identifying common problems across applications and frameworks.

David Cramer is the co-founder and CTO of Sentry.

This episode originally aired on Software Engineering Radio.

Topics covered:

What's Sentry?
Treating performance problems as errors
Why you might no need logs
Identifying common problems in applications and frameworks
Issues with Open Telemetry data
Why front-end applications are difficult to instrument
The evolution of Sentry's architecture
Switching from a permissive license to the Business Source License

Transcript

You can help edit this transcript on GitHub.

[00:00:00] Jeremy: Today I'm talking to David Kramer. He's the founder and CTO of Sentry. David, welcome to Software Engineering Radio.

[00:00:08] David: Thanks for having me. Excited for today's conversation.

What's Sentry?

[00:00:11] Jeremy: I think the first thing we could start with is defining what Sentry is. I know some people refer to it as an error tracker. Some people have referred to it as, an application performance monitoring tool. I wonder if you could kind of describe in, in your words what it is.

[00:00:30] David: You know, as somebody who doesn't work in marketing, I just tell it how it is. So Sentry started out doing error monitoring, which. You know, dependent on who you talk to, you might just think of as logging, right? Like that's the honest truth. It is just logging just a different shape or form. these days it's hard to not classify us as just an APM tool that's like the industry that exists.

It's like the tools people understand. So I would just say it's an APM tool, right? We do a bunch of things within that space, and maybe it's not, you know, item for item the same as say a product like New Relic. but a lot of the overlap's there, so it's like errors performance, which is like latency and sort of throughput.

And then we have some stuff that just goes a little bit deeper within that. The, the one thing i would say that is different for us versus a lot of these tools is we actually only do application monitoring. So we don't do any since like systems or infrastructure monitoring. Meaning Sentry is not gonna tell you when you need to replace a hard drive or even that you need new hard, like more disk space or something like that because it's just, it's a domain that we don't think is relevant for sort of our customers and product.

Application Performance Monitoring is about finding crashes and performance problems that users would associate with bugs

[00:01:31] Jeremy: For people who aren't familiar with the term application performance monitoring, what is that compared to just error tracking?

[00:01:41] David: The way I always reason about it, this is what I tell new hires and what I would tell, like my mother, if I had to explain what I do, is like, you load Uber and it crashes. We all know that's bad, right? That's error monitoring. We capture the crash report, we send it to developers. You load Uber and it's a 30 second spinner, like a loading indicator as a customer.

Same outcome for me. I assume the app is broken, right? So we also know that's bad. Um, but that's different than a crash. Okay. Sentry captures that same thing and send it to developers. lastly the third example we use, which is a little bit more. I think, untraditional, but a non-traditional rather, uh, you load the Uber app and it's like a blank screen or there's no button to submit, like log in or something like this.

So it's kind of like a, it's broken, but it maybe isn't erroring and it's not like a slow thing. Right. Same outcome. It's probably a bug of some sorts. Like it's what an end user would describe it as a bug. So for me, APM just translates to there are bugs, user perceived bugs in your application and we're able to monitor and, and help the software teams sort of prioritize and resolve those, those concerns.

[00:02:42] Jeremy: Earlier you were talking about actual crashes, and then your second case is, may be more of if the app is running slowly, then that's not necessarily a crash, but it's still something that an APM would monitor.

[00:02:57] David: Yeah. Yeah. And I, I think to be fair, APM, historically, it's not a very meaningful term. Like I as a, when I was more of just an individual contributor, I would associate APM to, like, there's a dashboard that will tell me what's slow in my application, which it does. And that is kind of core to APM, but it would also, none of the traditional tools, pre sentry would actually tell you why it's broken, like when there's an error, a crash.

It was like most of those tools were kind of useless. And I don't know, I do actually know, but I'm gonna pretend I don't know about most people and just say for myself. But most of the time my problems are errors. They are not like it's fast or slow, you know? and so we just think of it as like it's a holistic thing to say, when I've changed the application and something's broken, or it's a bug, you know, what is that bug?

How do we help people fix it? And that comes from a lot of different, like data signals and things like that. the end result is still the same. You either are gonna fix it or it's not important and you ignore it. I don't know. So it's a pretty straightforward, premise for us. But again, most companies in the space, like the traditional company is when you grow a big company, what happens is like you build one thing and then you build lots of check boxes to sell more things.

And so I think a lot of the APM vendors, like they've created a lot of different products. Like RUM is a good example of another acronym that lives with an APM. And I would tell you RUM is completely meaningless. It, it stands for real user monitoring. And so I'm like, well, what's not real about monitoring the application?

Well, nothing's not real, but like they created a new category because that's how marketing engines work. And that new category is more like analytics than it is like application telemetry. And it's only because they couldn't collect the app, the application telemetry at the time. And so there's just a lot of fluff, i would say.

But at the end of the day too, like developers or engineering teams, it's like new version of the application. You broke something, let's tell you about it so you can fix it.

You might not need logging or performance monitoring

[00:04:40] Jeremy: And, and so earlier you were saying how this is a kind of logging, but there's also other companies, other products that are considered like logging infrastructure. Like I, I would think of companies like Paper Trail or Log Tail. So what space does Sentry fill that's that's different than that kind of logging?

[00:05:03] David: Um, so the way I always think about it, and this is both personally true, and what I advise other folks is when you're building something new, when you start from zero, right, you can often take Sentry put it in, and that's good enough. You don't even need performance monitoring. You just need like errors, right?

Like you're just causing bugs all the time. And you could do that with logging, but like the delta between air monitoring and logging is night and day. From a user experience, like error monitoring for us, or what we built at the very least, aggregates the errors. It, it helps you understand the frequency. It helps you when they're new versus old.

it really gives you a lot of detail where logs don't, and so you don't need logging often. And I will tell you today at Sentry. Engineers do not use logs for the most part. Uh, I had a debate with one of our, our team members about it, like, why does he use logs recently? But you should not need them because logs serve a different purpose.

Like if you have traces which tell you like, like fast and slow in a bunch of other network data and you have this sort of crash report collection or error monitoring thing, logs become like a compliance or an audit trail or like a security forensics, tool, and there's just not a lot of value that you would get out of them otherwise, like once in a while, maybe there's like some weird obscure use case, but generally speaking, you can just pretend that you don't need logs most days.

Um, and to me that's like an evolution of the industry. And so when, when Sentry is getting started, most people were still logs. And if you go talk to SRE teams, they're like, oh, login is what we know. Some of that's changed a little bit, but. But at the end of the day, they should only be needed for more complicated audit trails because they're just not a good solution to the problem.

It's just free form data. Structured or not, doesn't really matter. It's not aggregated. It's not something that you can really use. And it's why whenever you see logging tools, um, not even the papertrails of the world, but the bigger ones like Splunk or Cabana, it's like this weird, what we describe as choose your own adventure.

Like go have fun, build your dashboards and try to make the logs useful kind of story. Whereas like something like Sentry, it's just like, why would you waste any time trying to build dashboards when we can just tell you when something new is broken? Like that's the ideal situation.

[00:06:59] Jeremy: So it sounds like maybe the distinction is with a more general logging tool, like you mentioned Splunk and Kibana it's a collection of all this information. of things happening, even though nothing's necessarily wrong, whereas Sentry is more Sentry is it's going to log things, but it's only going to log things if Sentry believes something is wrong, either because of a crash or because of some kind of performance issue.

People don't want to dig through logs or dashboards, they want to be told when something is wrong and whyMost software is built the same way, so we know common problems

[00:07:28] David: Yeah. I, i would say it's about like actionability, right? Like, like nobody wants to spend their time digging through logs, digging through dashboards. Metrics are another good example of this. Like just charts with metrics on them. Yeah. They tell me something's happening. If there's lots of log statements, they tell me something's going on, but they're not, they're not optimized to like, help me solve a problem, right?

And so our philosophy was always like, we haven't necessarily nailed this in all cases for what it's worth, but. It was like, the goal is we identify an actual problem, like close to like a root cause kind of problem, and we escalate that up and that's it. Uh, versus asking somebody to like go have to like build these dashboards, build these things, figure out what data matters and all this because most software looks exactly the same.

Like if you have a web service, it doesn't matter what language it's written in, it doesn't matter how different you think your architecture is from somebody else's, they're all the same. It's like you've got a request, you've got a database, you've got some cache, you've got all these like known, known quantity things, and the slowness comes from the same places.

Errors are structured while logs are not

[00:08:25] David: The errors come from the same places. They're all exhibiting the same kinds of behavior. So logging is very unstructured. And what I mean by that is like there's no schema. Like you can hypothetically like make it JSON and everybody does that, but it's still unstructured. Whereas like errors, it's, it's a tight schema.

It's like there's a type of error, there's a message for the error, there's a stack trace, there's all these things that you know. Right. And as soon as you know and you define those things, you can just build better products. And so distributed tracing is similar. Hypothetically, it's a little bit abstract to be fair, but hypothetically, distributed tracing is creating a schema out of basically network annotations.

And somebody will yell at me for just simplifying it to that. I would tell 'em that's what it is. But, same goal in mind. If you know what the data is, you can take action on it. It's not quite entirely true. Um, because tracing is much more freeform. For example, it doesn't say if you have a SQL statement, it should be like this, it should be formatted this way, things like that.

whereas like stack traces, there's a file name, there's there's a line number, there's like all these things, right? And so that's how I think about the delta between what is useful information and what isn't, I guess. And what allows you to actually build things like Sentry versus just build abstract exploration.

Inferring problems rather than having user identify them

[00:09:36] Jeremy: Kind of paint the picture of how someone would get started with a tool like Sentry. Do they need to tell Sentry anything about their application? Do they need to modify their source code at all? give us a picture of how that works.

[00:09:50] David: Yeah, like one of our fundamentals, which I think applies for any real business these days is you've gotta like reduce user friction, right? Like you've gotta make it dead simple to use. Uh, and for us there were, there was like kind of a fundamental driving constraint behind that. So in many situations, um, APM vendors especially will require you to run an agent a basically like some kind of process that runs on your servers somewhere.

Well, if you look at modern tech stacks, that doesn't really work because I don't run the servers half my stuff's in the browser, or it's a mobile app or a desktop app, and. Even if I do have those servers, it's like an entirely different team that controls them. So deploying like a sidecar, an agent is actually like much more complicated.

And so we, we looked at that and also because like, it's much easier to have control if you just ship within the application. We're like, okay, let's build like an SDK and dependency that just injects into the, the application that runs, set an API key and then you're done. And so what that translates for Sentry is we spend a lot of time knowing what Django is or what Rails is or what expresses like all these frameworks.

And just knowing how to plug into the right signals in those frameworks. And then at that point, like the user doesn't have to do anything. And so like the ideal outcome for Sentry is like you install the dependency in whatever language makes sense, right? You somehow configure the API key and maybe there's a couple other minor settings you add and that gives you the bare bones and that's it.

Like it should just work from there. Now there's a lot you can do on top of that to enrich data and whatnot, but for the most part, especially for errors, like that's good enough. And that, that's always been a fundamental goal of ours. And I, I think we actually do it phenomenally well.

[00:11:23] Jeremy: So it sounds like it infers things about the application without manual configuration. Can you give some examples of the kind of things that Sentry knows without the user having to tell it?

[00:11:38] David: Yeah. So a good example. So on the errors side, we know literally everything because an error object in each language has all these attributes with it. It, it gives you the stack trace, it gives you a lot of these things. So that one's straightforward. On the performance side, we use a combination of leveraging some like open source, I guess implementations, like open telemetry where it's got all this instrumentation already and we can just soak that in, um, as well as we automatically instrument a bunch of stuff.

So for example, say you've got like a Python application and you're using, let's say like SQL Alchemy or something. I don't actually know if this is how our SDK works right now, but, we will build something that's aware of that library and make sure it can automatically instrument the things it needs to get the right information out of it.

And be fair. That's always been true for like APM vendors and stuff like that. The delta is, we've often gone a lot deeper. And so for Python for example, you plug it into an application, we'll capture things like the error, error object, which is like exception class name exception value, right? Stack trace, file, name, line number, all those normal things,

function name. We'll also collect source code. So we'll, we'll give you sort of surrounding source code blocks for each line in the stack trace, which makes it infinitely easier to consume. And then in Python and, and php, and I forget if we do this anywhere else right now, we'll actually even allow you to collect what are called stack locals.

So it'll, it'll give you basically the variables that are defined almost like a debugger. And that is actually, actually like game changing from a development point of view. Because if I can go look in production when there's an incident or a bug and I can actually see the state of the application. , I, I never need to know like, oh, what was going on here?

Oh, what if like, do I need to go reproduce this somehow? I always have the right information. And so all of that for us is automatic and we only succeed like, it, it's, it's like by definition inside of Sentry, it has to be automatic. Like if we ask the user to do anything whatsoever, we're failing. And so whenever we design any product or anything, and to be fair, this is how every product company should operate.

it's gotta be with as little user input as humanly possible. And so you can't always pull that off. Sometimes you have to have users configure stuff, but the goal should always be no input.

Detecting errors through unhandled exceptions

[00:13:42] Jeremy: So you, you're talking about getting a stack trace, getting, the state of variables, source code. That sounds like that's primarily gonna be through unhandled exceptions. Would you say that's, that's the primary way that you get error?

[00:13:58] David: Yeah, you can integrate in other ways. So you can like trigger our API to capture an, uh, an exception. You can also, for better or worse, it's not always good. You can integrate through logging adapters. So if you're already using a logging framework and you log their errors there, we can often capture those.

However, I will say in most cases, people use the logging APIs wrong and the data becomes junk. A good, a good example of this is like, uh, it varies per language. So I'm just gonna go to Python because Python is like sort of core to Sentry. Um, in Python you have the ability to log messages, you can log them as errors, you can log like actual error objects as errors.

But what usually happens is somebody does a try-catch. They, they capture the error they rescue from it. They create a logging call, like log dot error or something, put the, the error message or value in there. And then they send that upstream. And what happens is the stack trace is gone because we don't know that it's an error object.

And so for example, in Python, there's actually an an A flag. You pass the logging call to make sure that stack trace stays present. But if you don't know that the data becomes junk all of a sudden, and if we don't have a stack trace, we can't actually aggregate data because like there's just not enough information to like, to run hashing on it.

And so, so there are a lot of ways, I guess, to capture the information, but there are like good ways and there are bad ways and I think it, it's in everybody's benefit when they design their, their apt to like build some of these abstractions. And so like as an example, when, whenever I would start a new project these days, I will add some kind of helper function for me to like log an exception when I like, try catch and then I can just plug in whatever I need later if I want to enrich the data or if I wanna send that to Sentry manually or send it to logs manually.

And it just makes life a lot easier versus having to go back and like augment every single call in the code base.

[00:15:37] Jeremy: So it, it sounds like. When you're using a tool like Sentry, there's gonna be the, the unhandled exceptions, which are ones that you weren't expecting. So those should I guess happen without you catching them. And then the ones that you perhaps do anticipate, but you still consider to be a problem, you would catch that and then you would add some kind of logging statement to your code that talks to Sentry directly.

Finding issues like performance problems (N+1 queries) that are not explicit errorsz

[00:16:05] David: Potentially. Yeah. It becomes a, a personal choice to be fair at that, at that point. but yeah, the, the way, one of the ways we've been thinking about this lately, because we've been changing our error monitoring product to not just be about errors, so we call it issues, and that's in the guise of like, it's like an issue tracker, a bug tracker.

And so we started, we started putting what are effectively like, almost like static analysis concerns inside of this issue tracker. So for example, In our performance monitor, we'll do something called like detect n plus one queries, which is where you execute a, a duplicate query in a loop. It's not necessarily an error.

It might not be causing a problem, but it could be causing a problem in the future. But it's like, you know, the, the, the qualities of it are not the same as an error. Like it's not necessarily causing the user to experience a bug. And so we've started thinking more about this, and, and this is the same as like logging errors that you handle.

It's like, well, they're not really, they're not really bugs. It's like expected behavior, but maybe you still want to keep it like tracking somewhere. And I think about like, you know, Lins and things like that, where it's like, well, I've got some things that I definitely should be fixing. Then I've got a bunch of other stuff that's like informing me that maybe I should take action on or not.

But only I, the human can really know at the end of the day, right, if I, if I should prioritize that or not. And so that's how I kind of think about like, if I'm gonna try catch and then log. Yeah, you should probably collect that data. It's probably less important than like the, these other concerns, like, like an actual unhandled exception.

But you do, you do want to know that they're happening and whatnot. And so, I dunno, Sentry has not had a strong opinion on this historically. We're just like, send us whatever you want to capture in this regard, and you can pay for it, that's fine. It's like usage based, you know? we're starting to think a lot more about what should that look like if we, if we go back to like, what's the, what's the opinion we have for how you should use the product or how you should solve these kinds of software problems.

[00:17:46] Jeremy: So you gave the example of detecting n plus one queries is, is that like being aware of the framework or the ORM the person is using and that's how you're determining this? Or is it at more of a lower level than that?

[00:18:03] David: it is, yeah. It's at the framework level. So this is actually where Open Telemetry causes a lot of harm, uh, for us because we need to know what a database query is. Uh, we need to know like the structure of the query because we actually wanna parse it out in a lot of cases. Cause we actually need to identify if it's duplicate, right?

And we need to know that it's a database query, not a random annotation that you've added. Um, and so what we do is within these traces, which is like if you, if you don't know what a trace is, it's basically just like, it's a tree, like a tree structure. So it's like A calls B, calls C, B also calls D and E and et cetera, right?

And so you just, you know, it's a trace. Um, and so we actually just look at that trace data. We try to find these patterns, which is like, okay, B was a, a SQL query or something. And every single sibling of B is that same SQL query, but sort of removing certain parameters and stuff for the value. So we'll look at that data and we'll try to pull out anomalies.

So m plus one is an example of like a fairly obvious anti pattern that everybody knows is bad and can be optimized. Uh, but there's a lot of other that are a little bit more subjective. I'll give you an example. If you execute three SQL statements back to back, one could argue that you could just batch those SQL statements together.

I would argue most of the time it doesn't matter and I don't need to do that. And also it's not guaranteed that that is better. So it becomes much more like, well, in my particular situation this is valuable, but in this other situation it might not be. And that's where I go back to like, it's almost like a linter, you know?

But we're trying to infer all of that from the data stream. So, so Sentry's kind of, we're kind of a backwards product company. So we build our product from a technology vision, not from customers want this, or we have this great product vision or anything like that. And so in our case, the technology vision is like, there's a lot of application data that comes in, a lot of telemetry, right?

Errors, traces. We have a bunch of other streams now. within that telemetry there is like signal. And so one, it's all structured data so we know what it is so we can actually interpret it. And then we can identify that signal that might be a problem. And that signal in our case is often going to translate to like this issue concept.

And then the goal is like, well, can we identify these problems for people and surface them versus the choose your own adventure model, which is like, we'll just capture everything and feed it to the user and they can figure out what matters. Because again, a web service is a web service. A database is a database.

They're all the same problems for everybody. All you know, it's just, and so that's kind of the model we've built and are continuing to evolve on and, and so far works pretty well to, to curate a lot of these workflows.

Want to infer everything, but there are challenges

[00:20:26] Jeremy: You talked a little bit about how people will sometimes use tracing. And in cases like that, they may need some kind of session ID to track. Somebody making a call to a service and that talks to a database and that talks to other services. And you, inside of your application, you have to instrument some way of tracking.

This all came from this one request. Is that something that Sentry can infer or is there something that the developer has to put into play so that you can track that sort of thing?

[00:21:01] David: Yeah, so it's, it's like a bit of both. And i would say our goal is that we can infer everything. The reality is there is so much complexity and there's too much of a, like, too many technologies in the world. Like I was complaining about this the other day, like, the classic example on web service is if we have a middleware hook, We kind of know request response, usually that's how middleware would work, right?

And so we can infer a lot from there. Like basically we can infer the boundaries, which is a really big deal. Okay. That's one thing is boundaries is a problem. What we, we describe that as a transaction. So like when the request starts. When the request ends, right? That's a very important boundary for everybody to understand because when I'm working on the api, I care about the API boundary.

I actually don't care about what the database is doing at its low level or what the JavaScript application might be doing above it. I want my boundary. So that's one that we kind of can do. But it's hard in a lot of situations because of the way frameworks and technology has been designed, but at least traditional stuff like a, a traditional web stack, it works like a Rails app or a DDjango app or PHP app kind of thing, right?

And then within that it becomes, well, how do you actually build a trace versus just have a bunch of arbitrary labels? And so we have a bunch of complicated tech within each language that tries to establish that tree. and then we annotate a lot of things along the way. And so we will either leverage Open Telemetry, which is an open format spec that ideally has very high quality data.

Ideally, not realistically, but ideally it has high quality data. Every library author implements it great, everybody's happy. We don't have to do anything ever again. The reality is that data is like all over the map because there's not like strict requirements for what, how the data should be labeled and stuff.

And not everything even has that data. Like not everything's instrumented with open telemetry. So we also have a bunch of stuff that, unrelated to using that we'll say, okay, we know what this library is, we're gonna try to infer some characteristics from this library, or we know what maybe like the DDjango template engine is.

So we're gonna try to infer like when the template renders so you can capture that block of information. it is a very imperfect science and I would tell you like it's not, even though like Open Telemetry is a very fun topic for people. It is not necessarily good, like it's not in a good state. Could will it ever be good?

I don't know in all honesty, but like the data quality is like all over the map and so that's honestly one of our biggest challenges to making this experience that, you know, tells you what's going on in your database so it tells you what's going on with the cash or things like this is like, I dunno, the cash might be called something completely random in one implementation and something totally different in another.

And so it's a lot of like, like data normalization that you have to deal with. But for the most part, those libraries of things you don't control can and will be instrumented. Now the other interesting thing, which we'll see how this works out, so, so one thing Sentry tries to do there, we have all these layers of telemetry, so we have errors and traces, right?

Those are pretty high level concepts. We also have profiling data, which is very, very, very, very low level. So it's usually only if you have like disc. I like. It's where is all the CPU time being spent in my application? Mostly not waiting. Like waiting's usually like a network call, right? But it's like, okay, I have a loop that's doing a lot of math, or I'm writing a bunch of stuff to disc and that's really slow.

Like often those are not instrumented or it's like these black box areas of a performance. And so what we're trying to do with profiling data, instead of just showing you flame charts and stuff, is actually say, could we fill in these gaps in these traces? Like basically like, Hey, I've got a long period of time where the app's doing something.

You know, here's an API call, here's the database stuff. But then there's this block, okay, what's that function or something? Can we pull that out of the profiling data? And so in that case, again, that's just automatic because the profile actually knows everything about the application and know it. It has full access to the function and the stack and everything, right?

And so the dream is that you would just always have everything filled in the, the customer never has to do anything with one minor asterisk. And the asterisk is what I would call like business context. So a good example would be, You might wanna associate requests with a specific customer or something like that.

Like you might wanna say, well it's uh, I don't know, Goldman Sachs or one of these big companies or something. So you can know like, well when Goldman Sachs is having performance issues or whatever it is, oh maybe I should focus on them cuz maybe they pay you a lot of money or something. Right. Sentry would never know that at the end of the day.

So we also have these like kind of tagging contextual APIs that will say like, tell us some informations, maybe it's like customer, maybe it's something else that's relevant to your application. And we'll keep that data associated with the telemetry that's like present, you know, um, but the, at least the telemetry, like again, application's just worth the same, should be, there should be a day in the next few years that it's just all automatic.

and again, the only challenge today is like, can it be high quality and automatic? And so that, that's like to be determined.

[00:25:50] Jeremy: What you're kind of saying is the ideal is being able to look at this profiling information and be able to build a full picture of. a, a call from beginning to end, all the different things to talk to, but I guess what's the, what's the reality today? Like, what, what is Sentry able to determine, in the world we live in right now?

[00:26:11] David: So we've done a lot of this like performance detection stuff already. So we actually can do a lot now. We put a lot of time into it and I, I will tell you, if you look at other tools trying to do tracing, their approach is much more abstract. It's like your traditional monitoring tool that's like, we're just gonna collect a lot of signals and maybe we'll find magic anomaly detection or something going on in it, which, you know, props, but that can figure that out.

But, a lot of what we've done is like, okay, we kind of know what this data looks like. Let's go after this very like known quantity problem. Let's normalize the data. And let's make it happen like that's today. Um, the enrichment of profiles is new for us, but it, we actually can already do it. It's not perfect.

Detection of blocking the UI thread in mobile apps

[00:26:49] David: Um, and I think we're launching something in April or May, something around the, that timeframe where hopefully for the, the technologies we can instrument, we're actually able to surface that in a useful way. but as an example that, that concept that I was talking about, like with n plus one queries, the team built something using profiling data.

and I think this, this might be for like a mobile app more so than anything where mobile apps have this problem of, it's, you've got a main thread and if you block that main thread, the app is basically frozen. You see this on desktop apps all the time. You, you very rarely see it on web apps anymore.

But, but it's a really big problem when you have a web, uh, a mobile or desktop app because you don't want that like thing to be non-responsive. Right? And so one of the things they did was detect when you're doing like file io on the main thread, you know, right. When you're writing a disc, which is probably a slow thing or something like that, that's gonna block the whole thing.

Because you should just do it on a separate thread. It's like an easy fix, potentially may not be a problem, but it could become a problem. Same thing as n plus one. But what's really interesting about it is what the team did is like they used the profiling data to detect it because we already know threads and everything in there, and then they actually recreated a stack trace out of that profiling data when it's surfaced.

So it's actually like useful data with that. You could like that I or you as a developer might know how to take and actually be like, oh, this is where it happens at the source code. I can actually figure it out and go fix it myself. And to me, like as like I, I'm still very much in the weeds with software that is like one of the biggest gaps to most things.

Is it just, it doesn't make it easy to consume or like take action on, right? Like if I've got a, a chart that says my error rate is high, what am I gonna do with that? I'm like, okay, what's breaking? That's immediately my next question. Right? Okay. This is the error. Where is that error happening at? Again, my next question, it, it's literally just root cause analysis, right?

Um, and so that, that to me is very exciting. and I, I don't know that we're the first people to do that, I'm not sure. But like, if we can make that kind of data, that level of actionable and consumable, that's like a big deal for me because I will tell you is like I have 20 years of software experience. I still hate flame charts and like I struggle to use them.

Like they're not a friendly visualization. They're almost like a, a hypothetically necessary evil. But I also think one where nobody said like, do we even need to use that? Do we need that to be like the way we operate? and so anyways, like I guess that's my long-winded way of saying like, I'm very excited for how we can leverage that data and change how it's used.

[00:29:10] Jeremy: Yeah. So it sounds like in this example, both in the mobile app blocking the UI or the n plus one query is the Sentry, suppose, SDK or instrumentation that's hooked inside of your application. There are certain behaviors that it knows are, are not like ideal I guess, just based on. people's prior experience, like your own developers know that, hey, if you block the UI thread in this mobile application, then you're gonna have performance problems.

And so that way, rather than just telling you, Hey, your app is slow, it can tell you your app is slow and it's because you're blocking the UI thread.

Don't just aggregate metrics, the error tracker should have an opinion on what actual problems are

[00:29:55] David: Exactly, and I, and I actually think, I don't know why so many people don't recognize this gap, because at the end of the day, like, I don't know, I don't need more people to tell me response times are bad or anything. I need you to have an opinion about what's good because. The only way it's like math education, right?

Like, yeah, you learn the basics, but you're not expected to say, go to calc, but, and then like, do all the fundamentals. You're like, don't get a calculator and start simplifying the problem. Like, yeah, we're gonna teach you a few of these things so you understand it. We're gonna teach you how to use a calculator and then just use the calculator and then make it easier for everybody else.

But we're also not teaching you how to build a calculator because who cares? Like, that's not the purpose of it. And so for me, this is like, we should be helping people sort of get to the finish line instead of making them run the entirety of the race over and over if they don't need to. I don't, I don't know if that's a good analogy, but that has been the biggest gap, I think, in so much of this software throughout the industry.

And it's, it's, it's common everywhere. And there's no reason for that gap to exist these days. Like the technology's fine. And the technology's been fine for like 10 years. Like Sentry started in oh eight at this point. And I think there was only one other company I recall at the time that was doing anything that was even similar to like air monitoring and Sentry when we built it, we're just like, what if we just go deeper?

What if we collect all this information that will help you debug the problem instead of just stopping it like a log aggregator or something kind of thing, so we can actually have an opinion about it. And I, I genuinely, it baffles me that more people do not think this way because it was not a hard problem at the time.

It's certainly not hard these days, but there's still very, I mean, a lot more people do it now. They've seen Sentry successful and there's a lot of similar implementations, but it's, it's just amazes me. It's like, why don't you, why don't people try to make the data more actionable and more useful, the teams versus just collect more of it, you know?

40 people working on learning the common issues with languages and frameworks

[00:31:41] Jeremy: it, it sounds like maybe the, the popularity of the stack the person is using or of the framework means that you're gonna have better insights, right? Like if somebody makes a, a Django application or a Rails application, there's all these lessons that your team has picked up in terms of, Hey, if you use the ORM this way, your application is gonna be slow.

Whereas if somebody builds something totally homegrown, you won't know these patterns and you won't be able to like help as much basically.

[00:32:18] David: Yeah. Yeah, that's exactly, and, and you might think that that is a challenge, but then you look at how many employees exist at like large tech companies and it's, it's not that big of a deal, like, , you might even think collecting all the information for each, like programming, runtime or framework is a challenge.

We have like 40 people that work on that and it's totally fine. Like, and, and so I think actually all these scale just fine. Um, but you do have to understand like the domain, right? And so the counter version of this is if you look at say like browser applications, like very rich, uh, single page application type experiences.

It's not really obvious like what the opinions are. Like, like if, if you, and this is like real, like if you go to Sentry, it's, it's kind of slow, like the app is kind of slow. Uh, we even make fun of ourselves for how slow it is cuz it's a lot of JavaScript and stuff. If you ask somebody internally, Hey, how would we make pick a page fast?

They're gonna have no clue. Like, even if they have like infinite domain experience, they're gonna be like, I'm not entirely sure. Because there's a lot of like moving parts and it's not even clear what like, like good is right? Like we know n plus one is bad. So we can say not doing that is the better solution.

And so if you have a JavaScript app, which is like where a lot of the slowness will come from is like the render times itself. Like how do you fix it? You, you can't actually build a product that tells you what to fix without knowing how to fix it, right? And so some of these newer and very fast moving targets are, are frankly very difficult for us.

Um, and so that's one thing that I think is a challenge for the entire industry. And so, like, as an example, a lot of the browser folks have latched onto web vitals, which are just metrics that hopefully tell you something about the application, but they're not always actionable either. It'll be like, the idea with like web vitals is like, okay, time to interactive is an an important metric.

It's like how long until the page loads that a user can do what they're probably there to do. Okay. Like abstractly, it makes sense to us, but like put into action. How do I optimize time to interactive? Don't block the page. That's one thing. I don't know. Defer assets, that's another thing. Okay. So you've gotta like, you've gotta build a technology that knows these assets could be deferred and aren't.

Okay, which ones can be deferred? I don't know. Like, it, it, it's like such a deep rabbit hole. And then the problem is, six months from now, the tech will have completely changed, right? And it won't have like, necessarily solved some of these problems. It will just have changed and they're now a completely different shape of problem.

But still the same fundamental like user experience is the same, you know? Um, and to me that's like the biggest challenge in the industry right now is that like dilemma of the browser at the end of the day. And so even from our end, we're like, okay, maybe we should step back, focus on servers again, focus on web services.

Those are known quantities. We can do that really well. We can sort of change that to be better than it's been in the past and easier to consume with things like our n plus one detections. Um, and then take like a holistic, fresh look at browser and say, okay, now how would we solve this to make sure we can actually really latch onto the problems that like people have and, and we understand, right?

And, you know, we'll see when we get there. I don't think any product does a great job these days for helping, uh, solve those problems. . But I think even without the, the products, like I said, like even our team would be like, fixing this is gonna take months because it's gonna take months just to figure out exactly where the, the common bottlenecks are and all these other things within an application. And so I, I guess what I mean to say with that is there's a lot of opportunity, I think with the moving landscape of technology, we can find a way to, whether it's standardized or Sentry, can find a way to make that data actionable want it something in between there.

There are many ways to build things on the frontend with JavaScript which makes it harder to detect common problems compared to backend

[00:35:52] Jeremy: So it sounds like what you're saying, With the, the back end, there's almost like a standard way of doing things or a way that a lot of people do it the same way. Whereas on the front end, even if you're looking at a React application, you could look at tenant react applications and they could all be doing state management a totally different way.

They could be like the, the way that the application is structured could be totally different, and that makes it difficult for you to infer sort of these standard patterns on the front end side.

[00:36:32] David: Yeah, that's definitely true. And it, it goes, it's even worse than that because well, one, there's just like the nature of JavaScript, which is asynchronous in the sense of like, it's a lot of callbacks and things like that. And so that already makes it hard to understand what's going on, uh, where things are happening.

And then you have these abstractions like React, which are very good, but like they pull a lot of that away. And so, as an example of a common problem, you load the application, it has to do a lot of stuff to make the page render. You might call that hydration or whatever. Okay. And then there's a completely different state, which is going from, it's already hydrated.

Page one, I, I've done an interaction or something. Or maybe I've navigated a page too, that's an entirely different, like, sort of performance problem. But that hydration time, that's like a known thing. That's kind of like time to interactive, right? But if the problem is in your framework, which a lot of it is like a lot of the problems today exist because of frameworks, not because of the technology's bad or the framework's bad, but just because it's abstracted and it's really hard to make it work in all these situations, it's complicated.

And again, they have the same problem where it's like changing non sem. And so if the problem is the framework is somehow incorrectly re rendering the page as an example, and this came up recently, for some big technology stack, it's re rendering the page. That's a really bad problem for the, the customer because it's making the, it's probably actually causing a lot of CPU seconds.

This is why like your Chrome browser tabs are using so much memory in cpu, right? How do you fix that? Can you even fix that? Do you just say, I don't know, blame the technology? Is that the solution? Maybe that is right, but how would we even blame the technology like that alone, just to identify why it's happening.

and you need to know the why. Right? Like, that is such a hard problem these days. And, and personally, I think the only solution is if the industry sort of almost like standardizes on a way to like, on a belief of how this should be optimized and how it should be measured and monitored kind of thing.

Because like how errors work is like a standardization effectively. It may not be like a formal like declaration of like, this is what an error is, but more or less they always have the same attributes because we've all kind of understood that. Like those are the valuable things, right? Okay. I've got a server rendered application that has client interaction, which is sort of the current generation of the technology.

We need to standardize on what, like that web request, like response life cycle is, right? and what are the moving targets within there. And it just, to me, I, I honestly feel like a lot of what we use every day in technology is like beta. Right. And it's, I think it's one of the reasons why we're constantly always having to up, like upgrade and, and refactor and, and, and shift dependencies and things like that because it is not, it's very much a prototype, right?

It's a moving target, which I personally do not think is great for the industry because like customers do not care. They do not care that you're using some technology that like needs a change every few months and things like that. now it has improved things to be fair. Like web applications are much more like interactive and responsive sometimes.

Um, but it is a very hard problem I think for a lot of people in the world.

[00:39:26] Jeremy: And, and when you refer to, to things feeling like beta, I suppose, are, are you referring to the frameworks people are using or the libraries they're using to support their front end development? I, I'm curious what you're, you're thinking there.

[00:39:41] David: Um, I think it's everything. Even like the browser APIs are constantly shifting. It's, that's gotten a little bit better. But even the idea like type script and stuff, it's just like we're running like basically compilers to make all this code work. And, and so the, even that they're constantly adding features just because they can, which means behaviors are constantly changing.

But like, if you look at a real world example, like React is like the, the most dominant technology. It's very well designed for managing the dom. It's basically just a rendering engine at the end of the day. It's like it's managed to process updates to the dom. Okay. Makes sense. But we've all learned that these massive single page applications where you build all your application logic and loaded into a bundle is a problem.

Like, like, I don't know how big Sentry's bundle is, but it's multiple megs in size and it takes a little while for like a, even on fast fiber here in the Bay Area, it takes a, you know, several seconds for the UI to load. And that's not ideal. Like, it's like at some point half of us became okay with this.

So we're like, okay, what we need to do is go back, literally just go back 10 years and we need to render it on the server. And then we need some stuff that makes interactions, you know, highly responsive in the UI or dynamic content in the ui, you know, bring, it's like bringing back jQuery or something.

And so we're kind of going full circle, but that is actually like very complicated because the way people are trying to do is like, okay, we wanna, we wanna have the rendering engine operate the same on the server and is on as on the client, right? So it's like we just write one, path of code that basically it's like a template engine to some degree, right?

And okay, that makes sense. Like we can all get behind that kind of model. But that is actually really hard to make work with a lot of people's software and, and I think the challenge and framers have adopted it, right? So they've taken this, so for example, it's like, uh, react server components, which is basically just like, can we render it on the server and then also keep that same interaction in the ui.

But the problem is like frameworks take that, they abstract it and so it's another layer of complexity on something that is already enormously complex. And then they add their own flavor onto it, like their own opinions for maybe what the world way the world is going. And I will say like personally, I find those.

Those flavors to be very hard to adapt to like things that are tried and true or importantly in this context, things that we know how to monitor and fix, right? And so I, I don't know what, what the be all end all is, but my thesis on this is you need to treat the UI like a template engine, and that's it.

Remove all like complexity behind it. And so if you think about that, the term I've labeled it as, which I did not come up with, I saw this from somebody at some point, is like, it's like your front end as a service. Like you need to take that application that renders on the server and the front end, and it's just an entirely different application, which is annoying.

and it just calls your APIs and that's how it gets the data it needs. So you're literally just treating it as if it's like a single page application that can't connect to your database. But the frameworks have not quite done that. And they're like, no, no, no. We'll connect to the database and we'll do all this stuff, but then it doesn't work because you've got, like, it works this way on the back end and this way on the front end anyways.

Again, long winded way of saying like, it's very complicated. I don't think the technology can solve it today. I think the technology has to change before these problems can actually genuinely become solvable. And that's why I think the whole thing is like a beta, it's like, it's very much like a moving target that we're eventually we'll get there and it's definitely had value, but I don't know that, um, responsiveness for low latency connections is where the value has been created.

You know, for like folks with bad internet and say remote Africa or something, like I'm sure the internet is not a very fun place for them to use these days.

Some frontend code runs on the server and some in the browser which creates challenges

[00:43:05] Jeremy: I guess one of the things you mentioned is there's this, almost like this split where you have the application running on the server. It has its own set of rules because it, like you said, has access to the database and it can do things that you can't do in the browser, and then you have to sort of run the same application in the browser, but it's not quite the same application because it doesn't have access to the same things in the browser.

So you have this weird disconnect, I suppose.

[00:43:35] David: Yeah. Yeah. And, and, and then the challenges is like a developer that's actually complicated for you from the experience point of view, cuz you have to know somehow, okay, these things are ta, these are actually running on the server and only on the server. And like, so I think the two biggest technologies that try to do this, um, or at least do it well enough, or the two that I've used, there might be some others, um, are NextJS and remix and they have very different takes on how to do this.

But, remix is the one I use most recently. So I, I'll comment on that. But like, there's a, a way that you kind of say, well, this only runs on, I think the client as an example. And that helps you a little bit. You're like, okay, this is only gonna render on the client. I can, I actually can think about that and reason about that.

But then there's this thing like, okay, sometimes this runs on the server, only this part runs on the server. And it's, it just becomes like the mental capacity to figure out what's going on and debug it is like so difficult. And that database problem is like the, the normal problem, right? Like of like, I can only query the database on the server because I need secure credentials or something.

Okay. I understand that as a developer, but I don't understand how to make sure the application is doing what I expect it to do and how to fix it if something goes wrong. And that, that's why I think. , I'm a, I'm a believer in constraints. The only way you make progress is you simplify problems. Like you just give up on solving the complicated thing and you make the problem simpler.

Right? And so for me, that's why I'm like, just take the database outta the equation. We can create APIs from the client, from the server, same security levels. Okay? Make it so it can only do that and it has to be run as almost like a UI only thing. Now that creates complexity cuz you have to run this other service, right?

And, and like I personally do not wanna have to spin up a bunch of containers just to write like a simple like web application. but again, I, I think the problem has not been simplified yet for a lot of folks. Like React did this to be fair, um, it made it a lot easier to, to build UI that was responsive and, and just updated values when they changed, you know, which was a big deal for a long period of time.

But I feel like everything after has not quite reached that that area, whereas it's simple and even react is hard to debug when it doesn't do what you want it to do. So I don't know, there, there's so gaps I guess is what i would say. And. Hopefully, hopefully, you know, in the next five years we'll kind of see this come to completion because it does feel like it's, it's getting closer to that compromise.

You know, where like we used to have pure server rendered apps with some weird janky JavaScript on top. Now we've got this bridge of really complicated, you know, JavaScript on top, and the server apps are also complicated and it's just, it's a nightmare. And then this newer generation of these frameworks that work for some types of technology, but not all.

And, and we're kind of almost coming full circle to like server rendered, you know, everything. But with like allowing the same level of interactions that we've been desiring, I guess, on the web. So, and I, fingers crossed this gets better, but right now I do not see like a clear like, oh, it's definitely there.

I can see it coming. I'm like, well, we're kind of making progress. I don't love being the beta tester of the whole thing, but we're kind of getting there. And so, you know, we'll see.

There are multiple ways to write mobile apps as well (flutter, react native, web views)

[00:46:36] Jeremy: I guess you, you've been saying this whole shifting landscape of how Front End works has made it difficult for Sentry to provide like automatic instrumentation and things like that for, for mobile apps. Is that a different story? Like is it pretty standardized in terms of how do you instrument an Android app or an iOS app.

[00:46:58] David: Sort of, but also, no, like, a good example here is like early days mobile, it's a native application. You ship a binary known quantity, right? Or maybe you embedded a web browser, but like, that was like a very different thing. Okay. And then they did things where like, okay, more of it's like embedded web browser type stuff, or dynamically render content.

So that's now a moving target. the current version of that, which I'm not a mobile dev so like people have strong opinions on both sides of this fence, but it's like, okay, do you use like a, a hybrid framework which allows you to build. Say, uh, react native, which is like arou you to sort of write a JavaScript ish thing and it runs on both Android and mobile, but not really well on either.

Um, or do you write a native, native app, which is like a known quantity, but then you may maintain like two code bases, have two degrees of expertise and stuff. Flutters the same thing. so there's still that version of complexity that goes on within it. And I, I think people care less about mobile cuz it impacts people less.

Like, you know, there's that whole generation of like, oh, mobile's the future, everything's gonna be mobile, let's not become true. Uh, mobile's very important, but like we have desktops still. We use web software all the time, half the time on mobile. We're just using the web software at the end of the day, so at least we know that's a thing.

And I think, so I think that investment in mobile has died down some. Um, but some companies like mobile is like their main experience or one of their driving experience is like a, like a company like DoorDash, mobile is as important as web, if not more, right? Because of like the types of customers.

Spotify probably same thing, but I don't know, Sentry. We don't need a mobile app, who cares? It's irrelevant to the problem space, right? And so I, I think it's just not quite taken on. And so mobile is still like this secondary citizen at a lot of companies, and I think the evolution of it has been like complicated.

And so I, I think a lot of the problems are known, but maybe people care less or there's just less customers. And so the weight doesn't, like, the weight is wildly different. Like JavaScript's probably like a hundred times the size from an investment point of view for everyone in the world than say mobile applications are, is how I would think about it.

And so whether mobile is or isn't solved is almost irrelevant to the, the, the like general problem at hand. and I think at the very least, like mobile applications, there's like, there's like a tool chain where you can debug a lot of stuff that works fairly well and hasn't changed over the years, whereas like the web you have like browser tools, but that's about it. So.

Mobile apps can have large binaries or pull in lots of dependencies at runtime

[00:49:16] Jeremy: So I guess with mobile. Um, I was initially thinking of native apps, but you're, you're bringing up that there's actually people who would make a native app that's just a web view for a webpage, or there's React native or there's flutters, so there's actually, it really isn't standard how to make a mobile app.

[00:49:36] David: Yeah. And even within those, it comes back to like, okay, is it now the same problem where we're loading in a bunch of JavaScript or downloading a bunch of JavaScript and content remotely and stuff? And like, you'll see this when you install a mobile app, and sometimes the binaries are huge, right?

Sometimes they're really small, and then you load it up and it's downloading like several gigs of data and stuff, right? And those are completely different patterns. And even within those like subsets, I'm sure the implementations are wildly different, right? And so, you know, I, that may not be the same as like the runtime kind of changing, but I remember there was this, uh, this must be a decade ago.

I, I used, I still am a gamer, but. Um, early in my career I worked a lot with like games like World of Warcraft and stuff, and I remember when games started launching progressive loading where it's like you could download a small chunk of the game and actually start playing and maybe the textures were lower, uh, like resolution and everything was lower fidelity and, and you could only go so far until the game fully installed.

But like, imagine like if you're like focused on performance or something like that, measuring it there is completely different than measuring it once, say everything's installed, you know? And so I think those often become very complex use cases. And I think that used to be like an extreme edge case that was like such a, a hyper-specific optimization for like what The Warcraft, which is like one of the biggest games of all time that it made sense, you know, okay, whatever.

They can build their own custom tooling and figure it out from there. And now we've taken that degree of complexity and tried to apply it to everything in the world. And it's like uhoh, like nobody has the teams or the, the, the talent or the, the experience to necessarily debug a lot of these complicated problems just like Sentry like.

You know, we're not dealing with React internals. If something's wrong in the React internals, it's like somebody might be able to figure it out, but it's gonna take us so much time to figure out what's going on, versus, oh, we're rendering some html. Cool. We understand how it works. It's, it's a known, known problem.

We can debug it. Like there's nothing to even debug most of the time. Right. And so, I, I don't know, I think the industry has to get to a place where you can reason about the software, where you have the calculator, right. And you don't have to figure out how the calculator works. You just can trust that it's gonna work for you.

How Sentry's stack has become more complex over time

[00:51:35] Jeremy: so kind of. Shifting over a little bit to Sentry's internals. You, you said that Sentry started in, was it 2008 you said?

[00:51:47] David: Uh, the open source project was in 2008. Yeah.

[00:51:50] Jeremy:

The stack that's used in Sentry has evolved. Like I remembered that there was a period where I think you could run it with a pretty minimal stack, like I think it may have even supported SQLite.

[00:52:02] David: Yeah.

[00:52:03] Jeremy: And so it was something that people could run pretty easily on their own.

But things have, have obviously changed a lot. And so I, I wonder if you could speak to sort of the evolution of that process. Like when do you decide like, Hey, this thing that I built in 2008, Is, you know, not gonna cut it. And I really need to re-architect what this system is.

[00:52:25] David: Yeah, so I don't know if that's actually the reality of why things have changed, that it's like, oh, this doesn't work anymore. We've definitely introduced complexity in the sense of like, probably the biggest shift for Sentry was like, it used to be everything, and it was a SQL database, and everything was kind of optional.

I think half that was maintainable because it was mostly built by. And so I could maintain like an architectural vision that kept it minimal. I had the experience to figure it out and duct tape the right things. Um, so that was one thing. And I think eventually, you know, that doesn't scale as you're trying to do more and build more into the product.

So there's some complexity there. but for the most part you can, it can still be a SQL database, whatever. Could it be SQLite forever? Probably not. But it was, it was gold when you could run SQLite in the development environment cuz it was super fast, right? Um, and so then there was like the evolution of that to where we started adding more complex services.

So one example was, uh, we needed a key value store and we needed to like store blob data because storing them in the SQL database was highly inefficient and becoming very expensive. And so we wrote just an abstraction that could sort in the database or, and at the, the technology at the time was React, uh, Riak, not to be confused with React.

And that was just like, you can almost think of it like, uh, big Table or naively S3 or something. It's literally just like get set, delete where the operation's available to us. And then actually we're totally fine. So it meant like we could swap it out for more scalable technology, but you could also ignore swapping it out and at some point you've got like a maintenance burden for like handling these different code paths, right?

Because that's also the era when Sentry was only open source or primarily open source. So primarily just ran it yourself, you know? then we started adopting SaaS we're starting to build more complexity. Basically the constraints we put into place are actually causing problems now. So we need to change the constraints, which means supporting all these things, it's hard.

So that's one thing that happens right now. I would not excuse it and say you couldn't still keep things simple. I think that's a thing, but that takes a lot of rigor and at some point you're like, is it worth it? You know? Especially when primarily, uh, especially in this day and age, things are just SaaS services and we've all kind of accepted that.

But that was one, and then there was a big shift where we wanted to consume and make scalable and fast a lot of data, right? Like there's already lots of errors. Like, I don't know how many, like. I, I'm, I'm certain Sentry like sucks in hundreds of thousands of events a second on a bad day. On a slow day.

So I'm sure it's a huge amount of data that we get these days. and that takes a certain scale, right? And it's actually very hard to handle small scale and large scale at some point. And obviously we're optimizing for the SaaS. So at some point we're like, okay, let's a, we need a better solution for storing the events, something that we'll be able to like, write them in, search them very fast, take a lot of the load off the database, and so we adopt a click house.

And I think that's kind of when, that's like the, whatever the saying is, like straw that broke the camel's back or something. I don't, I don't know the saying, but, um, where everything started to get more complex and that's when all of a sudden it's like, you need like 15 docker containers and, and this isn't even microservices.

This isn't even like the worst version of this. Hell, you know, it's just like, and I think it's, it's unfortunate that that's like what it's come to is like you have to run a lot of these complex services. And I think this is not even representative of just centuries growth. It's like the whole industry where we've.

We've made things a little bit more complicated than it needs to be in a lot of cases. and so I, I think it's just like, it was that shift, and then it's like when we wanted to build performance monitoring, we needed all these new things. And even in the product today, like if you just want to use it for air monitoring, whether you're a SaaS customer or self-hosted, there's still a lot going on, right?

Okay, maybe that's fine. Maybe that's a reality. But right now, like this year, we're gonna redo a lot of the product because it's too complicated. There's too much going on, and it, it becomes like a distraction. Like you load up the AWS console or even gcps just as bad these days where there's like 800 different services and you're like, I don't know what, like, who cares?

Like why do I need any of this? And half of 'em are the same thing. And that is the worst part is half of them are the same thing, or, or they're not achieving different goals, would be how I think about it. And Sentry is not that. But the biggest companies in our space are that like, if you use Datadog, it's the same thing.

And there's like go to the Datadog's pricing page and you're like, oh my God, there are so many different things in here. I don't even know what the difference is between some of these products. Right. And so we're, we're very conscious of this, and I think, I don't know that we can reduce the technology complexity that much if at all these days, other than maybe making it a little bit easier to not have to run everything for, for the open source side of things.

But the product burden is still there when you're like, how do I even reason about all this stuff that's going on and why does it even need to happen this way? So I, I don't know, it, it's like a sad thing to some degree, but, uh, at the same time, like we've been able to build like such a significant product that does, it works for so many different people and it, it solves so many different problems.

And it's still, you can, not as easily, but you can still can run it on your own hardware. And I'm amazed when like a single human runs it on like a VPS or something. which it for, for newer generation people is, a virtual private server. Okay. Like a Linode or a digital ocean, something cheap. Um, because it is a lot of stuff, it's a lot of services to run and a lot of them are Java and they take a lot of memories.

As the product transitioned from a self hosted protect to a SaaS product the ability to pick and choose components for self hosting went away and the number of required components went up

[00:57:27] Jeremy: What it sounds like is that that version of the software that used to be able to run with pretty minimal dependencies, I, I think, you know, you would have a relational database. I think you had read us in there, and now there's like you said, click house, there's zookeeper, there's, is there Kafka?

[00:57:46] David: There is Kafka.

[00:57:47] Jeremy: There's a, there's a lot of different pieces and from what you're telling me is, is it's kind of a function of going from this company that's like, here, here's this open source software that anybody can run themselves to. This is really a SaaS company. This is like, We need to focus on being able to run this at scale and the use case of somebody running it themselves or at their own business is just like kind of not that important anymore.

At least to, you know, that that's not the priority. And, and that's why you kind of made these decisions because they were suited to building a, a SaaS platform that can service a lot of customers.

[00:58:28] David: Yeah, that that is a huge chunk of it. I think there was also the thing, like for example, that key value store. At some point we sported Cassandra, we never used Cassandra ourself, but it was in the code base. We're not gonna maintain that. It might be broken for all we know. Yeah, there's a CI system and at some point you've gotta say like, no, that's not happening.

And that was actually a good decision. Like less variability is huge for infrastructure, right? And that doesn't require it to be more complicated. But like if we just say too bad you can't run Cassandra, run a key val, like run, I don't know, you know, big table or something that we actually use ourselves.

It's like maybe you don't want to, but it's like we're not making any money and nobody's maintaining all these like adapters for these different databases like we supported like Django does. So we sort of did. But like MSSQL, which I assume is still a thing, I'm not sure, um, no idea if it ever worked correctly, was performant or not.

There is duplication in dependencies (RabbitMQ and Kafka) but the engineering work required to remove one of them is not worth it

[00:59:20] David: You know, that, and that's the same dilemma, SQLite, right? And so I think SQLite was fine for development environments. Who cares for production? No chance does it function for us. But, um, I don't know. So I think it's sort of a necessary evil, but, but it doesn't have to be complicated. So a good example today is we still require something like Rabbit, MQ and Kafka.

Why do we need two things that are effectively brokers? We don't. It's just like, , like why would it be a priority to replace like the old system that powers all this or combine it with the new system that powers these new things? It's like, it's a lot of engineering work, right? And so there is that compromise of velocity in there that we're forever facing.

And this doesn't even begin to touch on like the newer stuff that we're gonna open source. Like I don't even think we've open sourced, uh, one of our first acquisition, which was the profiling stuff. I think that might still be closed source. We're gonna open source that, uh, we acquired code cov recently.

We're gonna open source that. Um, and so those are more moving parts that even add more complexity to the product. And so one of the, and to be fair, this complexity, uh, cascades in the development environment and experience and stuff. So, so it's worthwhile to fix. but as an example, you should be able to say, I'm just working on maybe the errors stuff or the issues product, so I don't need to run all this other stuff.

And that same decision can also say, oh, the customer only wants to like use this for the errors product so they don't need to run all this other stuff. Right? You like, you can solve both of those goals and I think it just requires. Intent behind it doesn't necessarily mean it will make it that much simpler, but it is a path towards enabling that simplicity.

[01:00:51] Jeremy: it's also probably aligning with how the business is structured, right? Like, I don't think you offer a self-hosted, commercially supported plan, right? So in that regard there, there isn't really. The need from the company side to, to say like, Hey, let's, let's make these cases where you can deploy it and only install two of the dependencies because you only need that much.

it, it's like very different from, a, a system or a product that was not really a SaaS or, or it could be a SaaS but but also had a on-premise, option. Like you could take I, I suppose Atlassian's Suite as an example where they have their cloud service, but there's also a lot of businesses that run it onsite.

So there's probably different decisions that, that they have to make.

Made an intentional decision to not build on-premise software so they could focus on product instead of support

[01:01:44] David: Yeah. Yeah, a hundred percent. And I'm actually curious, like, you know, the model from a technology angle is probably similar to GitLab and I, I, I'm not a GitLab user. but I wonder if, if folks like that have the same complexity where they're like, here's our 15 different products and it's sort of open sourcey, you know, and we, ours at least was intentional.

We're like, we're not building on-prem software. We don't wanna be in that business. We don't want these support contracts for Cassandra and random shit. We basically don't want to have to have an outsourced ops team, you know, for this work. It's just not interesting. It's not building a product at the end of the day.

And so I think it was like, that was probably one of the best decisions we ever made was like, yes, it's open source. We're building an open source service, like a SaaS service at the end of the day. And that's the reality of things. Right. And that was such a big deal for us, because yeah, like I, I, it's like it's hard to keep the complexity in check no matter what if, if you get to any reasonable size.

Right? And like big ambitions are one of the most important things you can have, in my opinion, for building anything. Even if it's not like, oh, we wanna build a big company. You want to be able to build something that solves like significant scope of problems, right? And there's more value for Sentry as a whole if we can solve more problems under sort of the same kind of landscape, right?

If nothing else, because whether it's true or not, the the reality is like, Nobody wants to run 20 different tools or whatever it is, right? Like usually what happens is you have to, because like one tool cannot rule them all, but if the problem is the same shape, the technology should be able to be reused to solve that problem.

And that, that's where we go back to like, well, we have errors and we have n plus one issues, and they're kind of the same. So how do we make the product just surface both of those use cases, right? Same technology, same product for the most part. and then inherently here comes the complexity. So,

Choosing what to open source and choosing the Business Source License for the main application to prevent competitors from reselling their service while allowing people to learn from the source code

[01:03:26] Jeremy: Uh, you, you've mentioned a. A little bit about how there are parts of Sentry that are open source, and I know that there's different licenses applied to different parts of the product, so I wonder if you could explain sort of how those decisions are made and, and maybe you could explain the, the business source license as well.

[01:03:49] David: Yeah. So once upon a time Sentry was BSD licensed, uh, and that was the server itself. And Sentry's got a lot of open source stuff or a lot of projects and services and stuff. Um, but the core of the server was like super liberally licensed. Everything else was fairly liberally licensed. Like we never used GPLs uh, we never used proprietary licenses.

We had some closed source stuff, but it's irrelevant. It was like our data analytics and billing code and stuff like that. But, so as all this, it was all just like, like liberal free. You could use it. No strings, no open core, no nothing. Uh, and then we had constant annoying conversations with people trying to sell our software.

And to be clear, Sentry has always been built by myself and people that we've employed at the company. Sometimes people contribute small patches, but that's wildly different than maintaining or actually developing the software. And so it's best to think of Sentry as like, yes, it's open source, but it was built by our company.

the SDKs and integrations, that's different, but the core service, right? And so we had this thing where people, like companies were trying to sell it, they didn't wanna give back financially or they didn't contribute both basically. And so we're like, this is annoying and this is literally the decision tree.

This is annoying. Let's stop them from doing that so we can no longer think about this problem. And so at that point we said we're gonna change how we do licensing at Sentry for open source. If it's part of the service, the core service, it becomes bsl, which I'll explain in a minute. If it's anything else that doesn't run the core service like an SDK or anything that needs embedded in customer applications, it must use a liberal license.

And so, , we did this because I did not want an open core model. An open core, if you're not familiar, usually what it translates to is here's the shitty version of the open source product that you can use for free, no strings attached, and here's the good version. That costs minimum 50 to a hundred thousand dollars a year.

Something along those lines. It becomes a obnoxious selling pair or paradigm. I hate that model so much as a, as a consumer that I refuse to build it. And if it was not for companies trying to sell our stuff, we would still be BSD or Apache license for the server, right? Unfortunately, in humanity you can't take on good faith and definitely in like capitalism, you cannot operate on good faith, so, so we relicensed the BSL and bsl, the way I think about it is eventually open source.

And so open source has a lot of different angles. You can think of it as open source, like free software that I can use just for free or open source is like, I can take that software, do whatever I want with it, which is the most extreme version or open source. I can use the code in other ways or something like that, right?

There's all these kind of variations of the thing, but I think the most are like. I can do whatever I want with it, or it's free. And I actually think a lot of people mostly care about open source from, its the, its free angle. And so bsl, what it lets us do is say it's free. You can't sell it. So we, we blocked off people from like, sort of cannibalizing our ability to fund the development.

and in three years, which is the time horizon we picked, you can do, um, I think it the, the lowest you can do with BSLs four years or the high longest duration, you can do something like every year. It's like your personal choice. after three years it becomes Apache licensed and a lot of open source advocates who will be like, well, like three years is a lifetime.

I'm like, that's cool. We're not here to let other people build businesses at a Sentry. So I could, I could care less about those arguments, right? I want people to be able to run it self-hosted because like I want everybody to be able to use our. I don't need people to be able to sell it. I don't need people to be able to take it and do whatever they want with it.

It's, it's irrelevant like right to the world. But after three years, all that knowledge that we've gained in that prior art becomes public domain. Right? And so we still achieve almost like this knowledge share. Um, and you can, it's still source. You can view the source, you can like be inspired by it. And it's, it's software.

It's like, it's only so protected at the end of the day. Um, and so that's what it is for us, right? So it allows us to keep it open source. So like what is on Sentry io again, other than some proprietary stuff that's like billing code is literally like that mainline branch is what we're shipping to production at the end of the day.

And I mean, that's cool like that, that's like a lot of the ways in the spirit of open source, but it, it doesn't pray for humanity to be like nice humans all the time. Right? Like it protects the business and the development and the, you know, tens to hundreds of millions of dollars we poured into like r and d over the years, right?

[01:07:55] Jeremy: it, it sounds like the decision you made is probably very similar to other vendors like Mongo and, elastic trying to think of some other examples. I guess Cockroach is

[01:08:08] David: Yeah. Cockroach is another good one. It is very similar. The difference is like, there's very few companies that operate like Sentry in the world that are like a SaaS service that happens to be open source. Most of those are like infrastructure that you probably wanna run yourself. And most of those, they're billing.

Uh, their, the way they make money is actually still you running that software yourself, right? And so they may not be open core, some of them are, but they're still a lot of, it's fundamentally not a cloud service that they're selling. Like they will try. But like, you know, if, if you're looking at like Elastic, most people use it because they can run it on prem.

Otherwise, I'm just gonna use whatever the Google checkboxy button click thing is at the end of the day. but there are a lot of people, even the BSL model, there's a lot of people use it. Like Cockroach is one of the first that I recall that used it and we were inspired and, and learned about it from.

I could not tell you who else uses it, but there, it, it's become a more popular thing. And whether it's the right solution or not, there needs to be something that is mainstream that people use that achieves this, that basically is the bsl. And I hopefully it's a BSL or some knockoff. Because the problem is if you have 15 different flavors of this license, legal review is not fun.

And if legal review is not fun, it gets blocked in companies. Right? And so Mongo is its own license. Elastic I think is its own license. They're both like proprietary things that are one-offs that have to go through a legal review. BSL is a known quantity with a clause. So all legal has to do is like review that clause and say, does this clause, is that safe enough for us or not?

Right? So it's a very like simple decision at that point. And so, I don't know, hopefully the industry figures that out instead of constantly bickering about what is and isn't open source. Cuz that would be a much better spend of people's time these days.

Early on, pick a permissive license and worry about the licensing later. If you're successful then you can change it later if needed.

[01:09:44] Jeremy: if you're building a new product, I mean, when you look back at, at Sentry, started with the BSD license, which is a very permissive license. what's the, things you would look at to decide that you're gonna start this new product with a permission permissive license versus go straight to the the BSL license.

[01:10:04] David: Yeah, I honestly, I would just not get distracted by that choice early on in a project. I think, I think it depends on what you're building, in all honesty. Like if I were building something that was designed, it was like infrastructure that was like so low friction that you could just spin it up and it worked like very well.

Amazon's gonna sell that shit. They're gonna take it. They're gonna be like, cool, we can spin this up for you too. And then they're gonna, I mean, you're gonna struggle, right? Like I think now is that actually what would happen? I don't know. But that's a risk, right? And I think this was like the kind of risk people like Cockroach were worried about was like Amazon might sell it.

Because they saw that like elastic, that happened to them. Now Elastic is done fine. Nobody should be sad for Elastic. They've made plenty of money with Amazon selling it on top, like it doesn't really matter. Right? like pity the executives there or something that doesn't really matter. But for a early stage project, it's so irrelevant.

Like if you make your license restrictive, you will get less adoption. So unless the, your open source project is only gonna be used by a hundred companies in the world, having a restrictive license is only gonna harm you. Right. And so you can do it, but I just wouldn't. And open core is the same thing.

Like I, it's just like, yeah, like I, I don't know. Like this is the glory of sass. It's like you can build an open source thing and not care that it's open source now because you can just sell a cloud service and people will buy the cloud service no matter what. Right. Like centuries model when we raised our seed funding, , it was, it was BSD license time, super free, super open.

And people were like, well, why would anybody pay for it? And I'm like, it doesn't matter if they don't pay for Sentry, they won't pay for anything in the industry because we will build the best thing and it will be so free and good that nobody will have a choice, like, but to use it. Right? And the reality is, if you are a reasonable company, you don't wanna spend a bunch of engineering hours and time and money on running this random service that you can just outsource.

Like that's true for everything in the world, right? And, and that, that was the reality for Sentry. Like I said, there was no like actual risk to the business. It was just like, frankly, it was just pissing me off that I had this one company that was constantly trying to sell our shit. And so I'm like, you know what?

This is my middle finger to you is the BSL license and I can never think about you again now. but we were already a successful business when that that came along. Like people already use the SaaS service. Not everybody, but plenty of people. And so for me it was just like, is your goal that you want lots of customers using your thing, use a permissive license?

Like you're, otherwise, you're like over optimizing for a non-com. It's the same thing with like people that they'll build like random open source stuff and they use like GPL or something. Do you really want your, you, it's almost like you wanna put your politics on another company. Like you wanna force them to contribute back, who cares?

They're either gonna contribute back or not. Like it's still open source at the end of the day. Don't like restrict people's usage if you're just building like something on the side. And so like we actually have a policy internally, we won't license anything as a GPL variant. Like it, it's just not something we think is the right approach to open source.

Right. And again, you can have an opinion on it doesn't really matter to us, but we're like, no, if it's open source, if it's actually intended to be like the free open source, it should be free without the constraints. Who cares if somebody monetizes it? Like that's the beauty of open source is like, it helps everybody, you know, it's, it's almost like a, a charity to some degree.

So, so I'd say I have a lot of strong opinions about it, but mostly it's like, don't distract yourself with things that don't matter. Cause it's like a broad statement for anything, let alone the open source licensing.

[01:13:22] Jeremy: Yeah, that, that, that makes sense. I mean, you, you don't know that your product is gonna get adoption to begin with. Right? So it's probably, you would take a path similar to Sentry where you have something permissive, you see, do people care? Is this actually a business? And then once it becomes one, then you can maybe worry about licensing.

[01:13:42] David: Exactly. Exactly. Yeah. Because you can always change the license later or you can find another thing to augment the software you've built that allows you to monetize it, which is super common. Like you see this like, uh, there's been a lot of stuff that's raised funding recently. The core is still open source, true free open source.

And then they're like, this is how we're gonna monetize development via some mechanism, like some cloud services or something. That's not true for everybody. You can't just take open source and monetize it. But like, if that's what you wanted to do, why would, why'd you build open source in the first place, right?

Like um, just save yourself a headache. Like I think Nginx I dunno how well they've done as a business. But like, what are you gonna monetize in Nginx? It already just works. Oh, you got some like cloud analytics. It's cool. Like maybe they were useful. I never used them. Um, but like, I'm not gonna buy that.

But that was just like, oh, Nginx is so successful. Could we monetize it? Versus could we build a business that actually is open source? One's like very intentional versus, you know, capitalistic, i would say, or predatory to some degree. I'm not talking shit about the Nginx folks. I love the, the product and the business, but you know, it's a much harder thing to do.

[01:14:47] Jeremy: You, you love the product, but don't need to pay for it.

[01:14:50] David: Exactly. Yeah. I don't even know. Uh, we probably still use the product, but these days that's more of a commodity than anything like the web server. So,

[01:14:59] Jeremy: So as as we wrap up, is there anything else that you thought we should have brought up?

[01:15:07] David: um, no, I, I mean, I think this was, this was fun. you know, I, I'm very much an authentic person. I, I think it's like I'm not a marketer. I like talking about things that add value to folks' lives, not just like pitching products and stuff. So I think it's good to go deeper on the, the things which is fun.

Um, I don't know, Sentry's doing a lot of cool stuff. Hopefully people like it. If you don't tell us GitHub issues, tweet at me. We take every piece of feedback very, very seriously. Um, so yeah,

[01:15:33] Jeremy: And if, uh, people wanna see what you're up to, check out Sentry, where should they head?

[01:15:39] David: uh, Sentry io we're get Sentry on Twitter. Hopefully by the time this airs, Twitter still exists. But, um, yeah, on Sentry io GitHub, we're get Sentry. Sentry somewhere. You search for us, you'll find us. We're very, very, um, active on GitHub though, so that's a good, good forum if you wanna participate that way.

[01:15:57] Jeremy: David, thank you so much for coming on Software Engineering Radio.

[01:16:01] David: Absolutely. And, and thanks again for having me.

Luca Casonato on Deno

Thu, 02 Mar 2023 05:03:28 -0000

Luca Casonato is the tech lead for Deno Deploy and a TC39 delegate.

Deno is a JavaScript runtime from the original creator of NodeJS, Ryan Dahl.

Topics covered:

What's a JavaScript runtime
How V8 is used
Why Deno was created
The W3C WinterCG for server-side JavaScript
Why it's difficult to ship new features in Node
The benefits of web standards
Creating an all-inclusive toolset like Rust and Go
Deno's node compatibility layer
Use cases for WebAssembly
Benefits and implementation of Deno Deploy
Reasons to deploy on the edge
What's coming next

Luca

Deno

Deno Users

Transcript

You can help edit this transcript on GitHub.

[00:00:07] Jeremy: Today I'm talking to Luca Casonato. He's a member of the Deno Core team and a TC 39 Delegate.

[00:00:06] Luca: Hey, thanks for having me.

What's a runtime?

[00:00:07] Jeremy: So today we're gonna talk about Deno, and on the website it says, Deno is a runtime for JavaScript and TypeScript. So I thought we could start with defining what a runtime is.

[00:00:21] Luca: Yeah, that's a great question. I think this question actually comes up a lot. It's, it's like sometimes we also define Deno as a headless browser, or I don't know, a, a JavaScript script execution tool. what actually defines runtime? I, I think what makes a runtime a runtime is that it is a, it's implemented in native code.

It cannot be self-hosted. Like you cannot self-host a JavaScript runtime. and it executes JavaScript or TypeScript or some other scripting language, without relying on, well, yeah, I guess it's the self-hosting thing. Like it's, it's essentially a, a JavaScript execution engine, which is not self-hosted.

So yeah, it, it maybe has IO bindings, but it doesn't necessarily need to like, it. Maybe it allows you to read the, from the file system or, or make network calls. Um, but it doesn't necessarily have to. It's, I think the, the primary definition is something which can execute JavaScript without already being written in JavaScript.

How V8 and JavaScript runtimes are related

[00:01:20] Jeremy: And when we hear about JavaScript run times, whether it's Deno or Node or Bun, or anything else, we also hear about it in the context of v8. Could you explain the relationship between V8 and a JavaScript run time?

[00:01:36] Luca: Yeah. So V8 and, and JavaScript core and Spider Monkey, these are all JavaScript engines. So these are the low level virtual machines that can execute or that can parse your JavaScript code. turn it into byte code, maybe turn it into, compiled machine code, and then execute that code. But these engines, Do not implement any IO functions.

They do not. They implement the JavaScript spec as is written. and then they provide extension hooks for, they call these host environments, um, like environments that embed these engines to provide custom functionalities to essentially poke out of the sandbox, out of the, out of the virtual machine. Um, and this is used in browsers.

Like browsers have, have these engines built in. This is where they originated from. Um, and then they poke holes into this, um, sandbox virtual machine to do things like, I don't know, writing to the dom or, or console logging or making fetch calls and all these kinds of things. And what a runtime essentially does, a JavaScript runtime is it takes one of these engines and.

It then provides its own set of host APIs, like essentially its own set of holes. It pokes into the sandbox. and depending on what the runtime is trying to do, um, the weight will do. This is gonna be different and, and the sort of API that is ultimately exposed to the end user is going to be different.

For example, if you compare Deno and node, like node is very loosey goosey, about how it pokes holds into the sandbox, it sort of just pokes them everywhere. And this makes it difficult to enforce things like, runtime permissions for example. Whereas Deno is much more strict about how it, um, pokes holds into its sandbox.

Like everything is either a web API or it's behind in this Deno name space, which means that it's, it's really easy to find, um, places where, where you're poking out of the sandbox. and really you can also compare these to browsers. Like browsers are also JavaScript run times. Um, they're just not headless.

JavaScript run times, but JavaScript run times that also have a ui. and. . Yeah. Like there, there's, there's a whole Bunch of different kinds of JavaScript run times, and I think we're also seeing a lot more like embedded JavaScript run times. Like for example, if you've used React Native before, you, you may be using Hermes as a, um, JavaScript engine in your Android app, which is like a custom JavaScript engine written just for, for, for React native.

Um, and this also is embedded within a, like react native run time, which is specific to React native. so it's also possible to have run times, for example, that are, that can be where the, where the back backing engine can be exchanged, which is kind of cool.

[00:04:08] Jeremy: So it sounds like V8's role, one way to look at it is it can execute JavaScript code, but only pure functions. I suppose you

[00:04:19] Luca: Pretty much. Yep.

[00:04:21] Jeremy: Do anything that doesn't interact with IO so you think about browsers, you were mentioning you need to interact with a DOM or if you're writing a server side application, you probably need to receive or make HTTP requests, that sort of thing.

And all of that is not handled by v8. That has to be handled by an external runtime.

[00:04:43] Luca: Exactly Like, like one, one. There's, there's like some exceptions to this. For example, JavaScript technically has some IO built in with, within its standard library, like math, random. It's like random number. Generation is technically an IO operation, so, Technically V8 has some IO built in, right? And like getting the current date from the user, that's also technically IO

So, like there, there's some very limited edge cases. It's, it's not that it's purely pure, but V8 for example, has a flag to turn it completely deterministic. which means that it really is completely pure. And this is not something which run times usually have. This is something like the feature of an engine because the engine is like so low level that it can essentially, there's so little IO that it's very easy to make deterministic where a runtime higher level, um, has, has io, um, much more difficult to make deterministic.

[00:05:39] Jeremy: And, and for things like when you're working with JavaScript, there's, uh, asynchronous programming

[00:05:46] Luca: mm-hmm.

Concurrent JavaScript execution

[00:05:47] Jeremy: So you have concurrency and things like that. Is that a part of V8 or is that the responsibility of the run time?

[00:05:54] Luca: That's a great question. So there's multiple parts to this. There's the part, um, there, there's JavaScript promises, um, and sort of concurrent Java or well, yes, concurrent JavaScript execution, which is sort of handled by v8, like v8. You can in, in pure v8, you can create a promise, and you can execute some code within that promise.

But without IO there's actually no way to defer time, uh, which means that in with pure v8, you can either, you can create a promise. Which executes right now. Or you can create a promise that never executes, but you can't create a promise that executes in 10 seconds because there's no way to measure 10 seconds asynchronously.

What run times do is they add something called an event loop on top of this, um, on top of the base engine and that event loop, for example, like a very simple event loop, for example, might have a timer in it, which every second looks at if there's a timer schedule to run within that second.

And if it does, if, if that timer exists, it'll go call out to V8 and say, you can now execute that promise. but V8 is still the one that's keeping track of, of like which promises exist, and the code that is meant to be invoked when they resolve all that kind of thing. Um, but the underlying infrastructure that actually invokes which promises get resolved at what point in time, like the asynchronous, asynchronous IO is what this is called.

This is driven by the event loop, um, which is implemented by around time. So Deno, for example, it uses, Tokio for its event loop. This is a, um, an event loop written in Rust. it's very popular in the Rust ecosystem. Um, node uses libuv. This is a relatively popular runtime or, or event loop, um, implementation for c uh, plus plus.

And, uh, libuv was written for Node. Tokio was not written for Deno. But um, yeah, Chrome has its own event loop implementation. Bun has its own event loop implementation.

[00:07:50] Jeremy: So we, we might go a little bit more into that later, but I think what we should probably go into now is why make Deno, because you have Node that's, uh, currently very popular. The co-creator of Deno, to my understanding, actually created Node. So maybe you could explain to our audience what was missing or what was wrong with Node, where they decided I need to create, a new runtime.

Why create a new runtime? (standards compliance)

[00:08:20] Luca: Yeah. So the, the primary point of concern here was that node was slowly diverging from browser standards with no real path to, to, to, re converging. Um, like there was nothing that was pushing node in the direction of standards compliance and there was nothing, that was like sort of forcing node to innovate.

and we really saw this because in the time between, I don't know, 2015, 2018, like Node was slowly working on esm while browsers had already shipped ESM for like three years. , um, node did not have fetch. Node hasn't had, or node only at, got fetch last year. Right? six, seven years after browsers got fetch.

Node's stream implementation is still very divergent from, from standard web streams. Node was very reliant on callbacks. It still is, um, like promises in many places of the Node API are, are an afterthought, which makes sense because Node was created in a time before promises existed. Um, but there was really nothing that was pushing Node forward, right?

Like nobody was actively investing in, in, in improving the API of Node to be more standards compliant. And so what we really needed was a new like Greenfield project, which could demonstrate that actually writing a new server side run. Is A viable, and b is totally doable with an API that is more standards combined.

Like essentially you can write a browser, like a headless browser and have that be an excellent to use JavaScript runtime, right? And then there was some things that were I on top of that, like a TypeScript support because TypeScript was incredibly, or is still incredibly popular. even more so than it was four years ago when, when Deno was created or envisioned, um, this permission system like Node really poked holes into the V8 sandbox very early on with, with like, it's gonna be very difficult for Node to ever, ever, uh, reconcile this, this.

Especially cuz the, some, some of the APIs that it, that it exposes are just so incredibly low level that like, I don't know, you can mutate random memory within your process. Um, which like if you want to have a, a secure sandbox like that just doesn't work. Um, it's not compatible. So there was really needed to be a place where you could explore this, um, direction and, and see if it worked.

And Deno was that. Deno still is that, and I think Deno has outgrown that now into something which is much more usable as, as like a production ready runtime. And many people do use it, in production. And now Deno is on the path of slowly converging back with Node, um, in from both directions. Like Node is slowly becoming more standards compliant. and depending on who you ask this was, this was done because of Deno and some people said it would had already been going on and Deno just accelerated it. but that's not really relevant because the point is that like Node is becoming more standard compliant and, and the other direction is Deno is becoming more node compliant.

Like Deno is implementing node compatibility layers that allow you to run code that was originally written for the node ecosystem in the standards compliant run time. so through those two directions, the, the run times are sort of, um, going back towards each other. I don't think they'll ever merge. but we're, we're, we're getting to a point here pretty soon, I think, where it doesn't really matter what runtime you write for, um, because you'll be able to write code written for one runtime in the other runtime relatively easily.

[00:12:03] Jeremy: If you're saying the two are becoming closer to one another, becoming closer to the web standard that runs in the browser, if you're talking to someone who's currently developing in node, what's the incentive for them to switch to Deno versus using Node and then hope that eventually they'll kind of meet in the middle.

[00:12:26] Luca: Yeah, so I think, like Deno is a lot more than just a runtime, right? Like a runtime executes JavaScript, Deno executes JavaScript, it executes type script. But Deno is so much more than that. Like Deno has a built-in format, or it has a built-in linter. It has a built-in testing framework, a built-in benching framework.

It has a built-in Bundler, it, it like can create self-hosted, um, executables. yeah, like Bundle your code and the Deno executable into a single executable that you can trip off to someone. Um, it has a dependency analyzer. It has editor integrations. it has, Yeah. Like I could go on for hours, (laughs) about all of the auxiliary tooling that's inside of Deno, that's not a JavaScript runtime.

And also Deno as a JavaScript runtime is just more standards compliant than any of the other servers at Runtimes right now. So if, if you're really looking for something which is standards complaint, which is gonna like live on forever, then it's, you know, like you cannot kill off the Fetch API ever.

The Fetch API is going to live forever because Chrome supports it. Um, and the same goes for local storage and, and like, I don't know, the Blob API and all these other web APIs like they, they have shipped and browsers, which means that they will be supported until the end of time. and yeah, maybe Node has also reached that with its api probably to some extent.

but yeah, don't underestimate the power of like 3 billion Chrome users. that would scream immediately if the Fetch API stopped working Right?

[00:13:50] Jeremy: Yeah, I, I think maybe what it sounds like also is that because you're using the API that's used in the browser places where you deploy JavaScript applications in the future, you would hope that those would all settle on using that same API so that if you were using Deno, you could host it at different places and not worry about, do I need to use a special API maybe that you would in node?

WinterCG (W3C group for server side JavaScript)

[00:14:21] Luca: Yeah, exactly. And this is actually something which we're specifically working towards. So, I don't know if you've, you've heard of WinterCG? It's a, it's a community group at the W3C that, um, CloudFlare and, and Deno and some others including Shopify, have started last year. Um, we're essentially, we're trying to standardize the concept of what a server side JavaScript runtime is and what APIs it needs to have available to be standards compliant.

Um, and essentially making this portability sort of written down somewhere and like write down exactly what code you can write and expect to be portable. And we can see like that all of the big, all of the big players that are involved in, in, um, building JavaScript run times right now are, are actively, engaged with us at WinterCG and are actively building towards this future.

So I would expect that any code that you write today, which runs. in Deno, runs in CloudFlare, workers runs on Netlify Edge functions, runs on Vercel's Edge, runtime, runs on Shopify Oxygen, is going to run on the other four. Um, of, of those within the next couple years here, like I think the APIs of these is gonna converge to be essentially the same.

there's obviously gonna always be some, some nuances. Um, like, I don't know, Chrome and Firefox and Safari don't perfectly have the same API everywhere, right? Like Chrome has some web Bluetooth capabilities that Safari doesn't, or Firefox has some, I don't know, non-standard extensions to the error object, which none of the other runtimes do.

But overall you can expect these front times to mostly be aligned. yeah, and I, I think that's, that's really, really, really excellent and that, that's I think really one of the reasons why one should really consider, like building for, for this standard runtime because it, it just guarantees that you'll be able to host this somewhere in five years time and 10 years time, with, with very little effort.

Like even if Deno goes under or CloudFlare goes under, or, I don't know, nobody decides to maintain node anymore. It'll be easy to, to run somewhere else. And also I expect that the big cloud vendors will ultimately, um, provide, manage offerings for, for the standards compliant JavaScript on time as well.

Is Node part of WinterCG?

[00:16:36] Jeremy: And this WinterCG group is Node a part of that as well?

[00:16:41] Luca: Um, yes, we've invited Node, um, to join, um, due to the complexities of how node's, internal decision making system works. Node is not officially a member of WinterCG. Um, there is some individual members of the node, um, technical steering committee, which are participating. for example, um, James m Snell is, is the co-chair, is my co-chair on, on WinterCG.

He also works at CloudFlare. He's also a node, um, TSC member, Mateo Colina, who has been, um, instrumental to getting fetch landed in Node, um, is also actively involved. So Node is involved, but because Node is node and and node's decision making process works the way it does, node is not officially listed anywhere as as a member.

but yeah, they're involved and maybe they'll be a member at some point. But, yeah, let's. , see (laughs)

[00:17:34] Jeremy: Yeah. And, and it, so it, it sounds like you're thinking that's more of a, a governance or a organizational aspect of note than it is a, a technical limitation. Is that right?

[00:17:47] Luca: Yeah. I obviously can't speak for the node technical steering committee, but I know that there's a significant chunk of the node technical steering committee that is, very favorable towards, uh, standards compliance. but parts of the Node technical steering committee are also not, they are either indifferent or are actively, I dunno if they're still actively working against this, but have actively worked against standards compliance in the past.

And because the node governance structure is very, yeah, is, is so, so open and let's, um, and let's, let's all these voices be heard, um, that just means that decision making processes within Node can take so long, like. . This is also why the fetch API took eight years to ship. Like this was not a technical problem.

and it is also not a technical problem. That Node does not have URL pattern support or, the file global or, um, that the web crypto API was not on this, on the global object until like late last year, right? Like, these are not technical problems, these are decision making problems. Um, and yeah, that was also part of the reason why we started Deno as, as like a separate thing, because like you can try to innovate node, from the inside, but innovating node from the inside is very slow, very tedious, and requires a lot of fighting.

And sometimes just showing somebody, from the outside like, look, this is the bright future you could have, makes them more inclined to do something.

Why it takes so long to ship new features in Node

[00:19:17] Jeremy: Do, do you have a sense for, you gave the example of fetch taking eight years to, to get into node. Do you, do you have a sense of what the typical objection is to, to something like that? Like I, I understand there's a lot of people involved, but why would somebody say, I, I don't want this

[00:19:35] Luca: Yeah. So for, for fetch specifically, there was a, there was many different kinds of concerns. Um, one of the, I, I can maybe list two of them. One of them was for example, that the fetch API is not a good API and as such, node should not have it. which is sort of. missing the point of, because it's a standard API, how good or bad the API is is much less relevant because if you can share the API, you can also share a wrapper that's written around the api.

Right? and then the other concern was, node does need fetch because Node already has an HTTP API. Um, so, so these are both kind of examples of, of concerns that people had for a long time, which it took a long time to either convince these people or, or to, push the change through anyway. and this is also the case for, for other things like, for example, web, crypto, um, like why do we need web crypto?

We already have node crypto, or why do we need yet another streams? Implementation node already has four different streams implementations. Like, why do we need web streams? and the, the. Like, I don't know if you know this XKCD of, there's 14 competing standards. so let's write a 15th standard, to unify them all.

And then at the end we just have 15 competing standards. Um, so I think this is also the kind of concern that people were concerned about, but I, I think what we've seen here is that this is really not a concern that one needs to have because it ends up that, or it turns out in the end that if you implement web APIs, people will use web APIs and will use web APIs only for their new code.

it takes a while, but we're seeing this with ESM versus require like new code written with require much less common than it was two years ago. And, new code now using like Xhr, whatever it's called, form request or. You know, the one, I mean, compared to using Fetch, like nobody uses that name.

Everybody uses Fetch. Um, and like in Node, if you write a little script, like you're gonna use Fetch, you're not gonna use like Nodes, htp, dot get API or whatever. and we're gonna see the same thing with Readable Stream. We're gonna see the same thing with Web Crypto. We're gonna see, see the same thing with Blob.

I think one of the big ones where, where Node is still, I, I, I don't think this is one that's ever gonna get solved, is the, the Buffer global and Node. like we have the Uint8, this Uint8 global, um, and like all the run times including browsers, um, and Buffer is like a super set of that, but it's in global scope.

So it, it's sort of this non-standard extension of unit eight array that people in node like to use and it's not compatible with anything else. Um, but because it's so easy to get at, people use it anyway. So those are, those are also kind of problems that, that we'll have to deal with eventually. And maybe that means that at some point the buffer global gets deprecated and I don't know, probably can never get removed.

But, um, yeah, these are kinds of conversations that the no TSE is going have to have internally in, I don't know, maybe five years.

Write once, have it run on any hosting platform

[00:22:37] Jeremy: Yeah, so at a high level, What's shipped in the browser, it went through the ECMAScript approval process. People got it into the browser. Once it's in the browser, probably never going away. And because of that, it's safe to build on top of that for these, these server run times because it's never going away from the browser.

And so everybody can kind of use it into the future and not worry about it. Yeah.

[00:23:05] Luca: Exactly. Yeah. And that's, and that's excluding the benefit that also if you have code that you can write once and use in both the browser and the server side around time, like that's really nice. Um, like that, that's the other benefit.

[00:23:18] Jeremy: Yeah. I think that's really powerful. And that right now, when someone's looking at running something in CloudFlare workers versus running something in the browser versus running something in. it's, I think a lot of people make the assumption it's just JavaScript, so I can use it as is. But it, it, there are at least currently, differences in what APIs are available to you.

[00:23:43] Luca: Yep. Yep.

Why bundle so many things into Deno?

[00:23:46] Jeremy: Earlier you were talking about how Deno is more than just the runtime. It has a linter, formatter, file watcher there, there's all sorts of stuff in there. And I wonder if you could talk a little bit to the, the reasoning behind that

[00:24:00] Luca: Mm-hmm.

[00:24:01] Jeremy: Having them all be separate things.

[00:24:04] Luca: Yeah, so the, the reasoning here is essentially if you look at other modern run time or mo other modern languages, like Rust is a great example. Go is a great example. Even though Go was designed around the same time as Node, it has a lot of these same tools built in. And what it really shows is that if the ecosystem converges, like is essentially forced to converge on a single set of built-in tooling, a that built-in tooling becomes really, really excellent because everybody's using it.

And also, it means that if you open any project written by any go developer, any, any rest developer, and you look at the tests, you immediately understand how the test framework works and you immediately understand how the assertions work. Um, and you immediately understand how the build system works and you immediately understand how the dependency imports work.

And you immediately understand like, I wanna run this project and I wanna restart it when my file changes. Like, you immediately know how to do that because it's the same everywhere. Um, and this kind of feeling of having to learn one tool and then being able to use all of the projects, like being able to con contribute to open source when you're moving jobs, whatever, like between personal projects that you haven't touched in two years, you know, like being able to learn this once and then use it everywhere is such an incredibly powerful tool.

Like, people don't appreciate this until they've used a runtime or, or, or language which provides this to them. Like, you can go to any go developer and ask them if they would like. There, there's this, there's this saying in the Go ecosystem, um, that Go FMT is nobody's favorite, but, or, uh, wait, no, I don't remember what the, how the saying goes, but the saying essentially implies that the way that go FMT formats code, maybe not everybody likes, but everybody loves go F M T anyway, because it just makes everything look the same.

And like, you can read your friend's code, your, your colleagues code, your new jobs code, the same way that you did your code from two years ago. And that's such an incredibly powerful feeling. especially if it's like well integrated into your IDE you clone a repository, open that repository, and like your testing panel on the left hand side just populates with all the tests, and you can click on them and run them.

And if an assertion fails, it's like the standard output format that you're already familiar with. And it's, it's, it's a really great feeling. and if you don't believe me, just go try it out and, and then you will believe me, (laughs)

[00:26:25] Jeremy: Yeah. No, I, I'm totally with you. I, I think it's interesting because with JavaScript in particular, it feels like the default in the community is the opposite, right? There's so many different ways. Uh, there are so many different build tools and testing frameworks and, formatters, and it's very different than, like you were mentioning, a go or a Rust that are more recent languages where they just include that, all Bundled in. Yeah.

[00:26:57] Luca: Yeah, and I, I think you can see this as well in, in the time that average JavaScript developer spends configuring their tooling compared to a rest developer. Like if I write Rust, I write Rust, like all day, every day. and I spend maybe two, 3% of my time configuring Rust tooling like. Doing dependency imports, opening a new project, creating a format or config file, I don't know, deleting the build directory, stuff like that.

Like that's, that's essentially what it means for me to configure my rest tooling. Whereas if you compare this to like a front-end JavaScript project, like you have to deal with making sure that your React version is compatible with your React on version, it's compatible with your next version is compatible with your ve version is compatible with your whatever version, right?

this, this is all not automatic. Making sure that you use the right, like as, as a front end developer, you developer. You don't have just NPM installed, no. You have NPM installed, you have yarn installed, you have PNPM installed. You probably have like, Bun installed. And, and, and I don't know to use any of these, you need to have corepack enabled in Node and like you need to have all of their global bin directories symlinked into your or, or, or, uh, included in your path.

And then if you install something and you wanna update it, you don't know, did I install it with yarn? Did I install it with N pNPM? Like this is, uh, significant complexity and you, you tend to spend a lot of time dealing with dependencies and dealing with package management and dealing with like tooling configuration, setting up esent, setting up prettier.

and I, I think that like, especially Prettier, for example, really showed, was, was one of the first things in the JavaScript ecosystem, which was like, no, we're not gonna give you a config where you, that you can spend like six hours configuring, it's gonna be like seven options and here you go. And everybody used it because, Nobody likes configuring things.

It turns out, um, and even though there's always the people that say, oh, well, I won't use your tool unless, like, we, we get this all the time. Like, I'm not gonna use Deno FMT because I can't, I don't know, remove the semicolons or, or use single quotes or change my tab width to 16. Right? Like, wait until all of your coworkers are gonna scream at you because you set the tab width to 16 and then see what they change it to.

And then you'll see that it's actually the exact default that, everybody uses. So it'll, it'll take a couple more years. But I think we're also gonna get there, uh, like Node is starting to implement a, a test runner. and I, I think over time we're also gonna converge on, on, on, on like some standard build tools.

Like I think ve, for example, is a great example of this, like, Doing a front end project nowadays. Um, like building new front end tooling that's not built on Vite Yeah. Don't like, Vite's it's become the standard and I think we're gonna see that in a lot more places.

We should settle on what tools to use

[00:29:52] Jeremy: Yeah, though I, I think it's, it's tricky, right? Because you have so many people with their existing projects. You have people who are starting new projects and they're just searching the internet for what they should use. So you're, you're gonna have people on web pack, you're gonna have people on Vite, I guess now there's gonna be Turbo pack, I think is another one that's

[00:30:15] Luca: Mm-hmm.

[00:30:16] Jeremy: There's, there's, there's all these different choices, right? And I, I think it's, it's hard to, to really settle on one, I guess,

[00:30:26] Luca: Yeah,

[00:30:27] Jeremy: uh, yeah.

[00:30:27] Luca: like I, I, I think this is, this is in my personal opinion also failure of the Node Technical Steering committee, for the longest time to not decide that yes, we're going to bless this as the standard format for Node, and this is the standard package manager for Node. And they did, they sort of did, like, they, for example, node Blessed NPM as the standard, package manager for N for for node.

But it didn't innovate on npm. Like no, the tech nodes, tech technical steering committee did not force NPM to innovate NPMs, a private company ultimately bought by GitHub and they had full control over how the NPM cli, um, evolved and nobody forced NPM to, to make sure that package install times are six times faster than they were.

Three years ago, like nobody did that. so it didn't happen. And I think this is, this is really a failure of, of the, the, the, yeah, the no technical steering committee and also the wider JavaScript ecosystem of not being persistent enough with, with like focus on performance, focus on user experience, and, and focus on simplicity.

Like things got so out of hand and I'm happy we're going in the right direction now, but, yeah, it was terrible for some time. (laughs)

Node compatibility layer

[00:31:41] Jeremy: I wanna talk a little bit about how we've been talking about Deno in the context of you just using Deno using its own standard library, but just recently last year you added a compatibility shim where people are able to use node libraries in Deno.

[00:32:01] Luca: Mm-hmm.

[00:32:01] Jeremy: And I wonder if you could talk to, like earlier you had mentioned that Deno has, a different permissions model.

on the website it mentions that Deno's HTTP server is two times faster than node in a Hello World example. And I'm wondering what kind of benefits people will still get from Deno if they choose to use packages from Node.

[00:32:27] Luca: Yeah, it's a great question. Um, so I think a, again, this is sort of a like, so just to clarify what we actually implemented, like what we have is we have support for you to import NPM packages. Um, so you can import any NPM package from NPM, from your type script or JavaScript ECMAScript module, um, that you have, you already have for your Deno code.

Um, and we will under the hood, make sure that is installed somewhere in some directory globally. Like PNPM does. There's no local node modules folder you have to deal with. There's no package of Jason you have to deal with. Um, and there's no, uh, package. Jason, like versioning things you need to deal with.

Like what you do is you do import cowsay from NPM colon cowsay at one, and that will import cowsay with like the semver tag one. Um, and it'll like do the sim resolution the same way node does, or the same way NPM does rather. And what you get from that is that essentially it gives you like this backdoor to a callout to all of the existing node code that Isri been written, right?

Like you cannot expect that Deno developers, write like, I don't know. There was this time when Deno did not really have that many, third party modules yet. It was very early on, and I don't know the, you either, if you wanted to connect to Postgres and there was no Postgres driver available, then the solution was to write your own Postgres driver.

And that is obviously not great. Um, (laughs) . So the better solution here is to let users for these packages where there's no Deno native or, or, or web native or standard native, um, package for this yet that is importable with url. Um, specifiers, you can import this from npm. Uh, so it's sort of this like backdoor into the existing NPM ecosystem.

And we explicitly, for example, don't allow you to, create a package.json file or, import bare node specifiers because we don't, we, we want to stay standards compliant here. Um, but to make this work effectively, we need to give you this little back door. Um, and inside of this back door. All hell is like, or like everything is terrible inside there, right?

Like inside there you can do bare specifiers and inside there you can like, uh, there's package.json and there's crazy node resolution and underscore underscore DIRNAME and common js. And like all of that stuff is supported inside of this backdoor to make all the NPM packages work. But on the outside it's exposed as this nice, ESM only, NPM specifiers.

and the, the reason you would want to use this over, like just using node directly is because again, like you wanna use TypeScript, no config, like necessary. You want to use, you wanna have a formatter you wanna have a linter, you wanna have tooling that like does testing and benchmarking and compiling or whatever.

All of that's built in. You wanna run this on the edge, like close to your users and like 30 different, 35 different, uh, points of presence. Um, it's like, Okay, push it to your git repository. Go to this website, click a button two times, and it's running in 35 data centers. like this is, this is the kind of ex like developer experience that you can, you do not get.

You, I will argue that you cannot get with Node right now. Like even if you're using something like ts-node, it is not possible to get the same level of developer experience that you do with Deno. And the, the, the same like speed at which you can iterate, iterate on your projects, like create new projects, iterate on them is like incredibly fast in Deno.

Like, I can open a, a, a folder on my computer, create a single file, may not ts, put some code in there and then call Deno Run may not. And that's it. Like I don't, I did not need to do NPM install I did not need to do NPM init -y and remove the license and version fields and from, from the generated package.json and like set private to true and whatever else, right?

It just all works out of the box. And I think that's, that's what a lot of people come to deno for and, and then ultimately stay for. And also, yeah, standards compliance. So, um, things you build in Deno now are gonna work in five, 10 years, with no hassle.

Node shims and testing

[00:36:39] Jeremy: And so with this compatibility layer or this, this shim, is it where the node code is calling out to node APIs and you're replacing those with Deno compatible equivalents?

[00:36:54] Luca: Yeah, exactly. Like for example, we have a shim in place that shims out the node crypto API on top of the web crypto api. Like sort of, some, some people may be familiar with this in the form of, um, Browserify shims. if anybody still remembers those, it's essentially. , your front end tooling, you were able to import from like node crypto in your front end projects and then behind the scenes your web packs or your browser replies or whatever would take that import from node crypto and would replace it with like the shim that was essentially exposed the same APIs node crypto, but under the hood, wasn't implemented with native calls, but was implemented on top of web crypto, or implemented in user land even.

And Deno does something similar. there's a couple edge cases of APIs that there's, where, where we do not expose the underlying thing that we shim to, to end users, outside of the node shim. So like there's some, some APIs that I don't know if I have a good example, like node nextTick for example.

Um, like to properly be able to shim node nextTick, you need to like implement this within the event loop in the runtime. and. , you don't need this in Deno, because Deno, you use the web standard queueMicrotask to, to do this kind of thing. but to be able to shim it correctly and run node applications correctly, we need to have this sort of like backdoor into some ugly APIs, um, which, which natively integrate in the runtime, but, yeah, like allow, allow this node code to run.

[00:38:21] Jeremy: A, anytime you're replacing a component with a, a shim, I think there's concerns about additional bugs or changes in behavior that can be introduced. Is that something that you're seeing and, and how are you accounting for that?

[00:38:38] Luca: Yeah, that's, that's an excellent question. So this is actually a, a great concern that we have all the time. And it's not just even introducing bugs, sometimes it's removing bugs. Like sometimes there's bugs in the node standard library which are there, and people are relying on these bugs to be there for the applications to function correctly.

And we've seen this a lot, and then we implement this and we implement from scratch and we don't make that same bug. And then the test fails or then the application fails. So what we do is, um, we actually run node's test suite against Deno's Shim layer. So Node has a very extensive test suite for its own standard library, and we can run this suite against, against our shims to find things like this.

And there's still edge cases, obviously, which node, like there was, maybe there's a bug which node was not even aware of existing. Um, where maybe this, like it's is, it's now standard, it's now like intended behavior because somebody relies on it, right? Like the second somebody relies on, on some non-standard or some buggy behavior, it becomes intended.

Um, but maybe there was no test that explicitly tests for this behavior. Um, so in that case we'll add our own tests to, to ensure that. But overall we can already catch a lot of these by just testing, against, against node's tests. And then the other thing is we run a lot of real code, like we'll try run Prisma and we'll try run Vite and we'll try run NextJS and we'll try run like, I don't know, a bunch of other things that people throw at us and, check that they work and they work and there's no bugs. Then we did our job well and our shims are implemented correctly. Um, and then there's obviously always the edge cases where somebody did something absolutely crazy that nobody thought possible. and then they'll open an issue on the Deno repo and we scratch our heads for three days and then we'll fix it.

And then in the next release there'll be a new bug that we added to make the compatibility with node better. so yeah, but I, yeah. Running tests is the, is the main thing running nodes test.

Performance should be equal or better

[00:40:32] Jeremy: Are there performance implications? If someone is running an Express App or an NextJS app in Deno, will they get any benefits from the Deno runtime and performance?

[00:40:45] Luca: Yeah. It's actually, there is performance implications and they're usually. The opposite of what people think they are. Like, usually when you think of performance implications, it's always a negative thing, right? It's always okay. Like you, it's like a compromise. like the shim layer must be slower than the real node, right?

It's not like we can run express faster than node can run, express. and obviously not everything is faster in Deno than it is in node, and not everything is faster in node than it is in Deno. It's dependent on the api, dependent on, on what each team decided to optimize. Um, and this also extends to other run times.

Like you can always cherry pick results, like, I don't know, um, to, to make your runtime look faster in certain benchmarks. but overall, what really matters is that you do not like, the first important step for for good node compatibility is to make sure that if somebody runs your code or runs their node code in Deno or your other run type or whatever, It performs at least the same.

and then anything on top of that great cherry on top. Perfect. but make sure the baselines is at least the same. And I think, yeah, we have very few APIs where we behave, where we, where, where like there's a significant performance degradation in Deno compared to Node. Um, and like we're actively working on these things.

like Deno is not a, a, a project that's done, right? Like we have, I think at this point, like 15 or 16 or 17 engineers working on Deno, spanning across all of our different projects. And like, we have a whole team that's dedicated to performance, um, and a whole team that's dedicated node compatibility.

so like these things get addressed and, and we make patch releases every week and a minor release every four weeks. so yeah, it's, it's not a standstill. It's, uh, constantly improving.

What should go into the standard library?

[00:42:27] Jeremy: Uh, something that kind of makes Deno stand out as it's standard library. There's a lot more in there than there is in in the node one.

[00:42:38] Luca: Mm-hmm.

[00:42:39] Jeremy: Uh, I wonder if you could speak to how you make decisions on what should go into it.

[00:42:46] Luca: Yeah, so early on it was easier. Early on, the, the decision making process was essentially, is this something that a top 100 or top 1000 NPM library implements? And if it is, let's include it. and the decision making is still short of based on that. But right now we've already implemented most of the low hanging fruit.

So things that we implement now are, have, have discussion around them whether we should implement them. And we have a process where, well we have a whole team of engineers on our side and we also have community members that, that will review prs and, and, and make comments. Open issues and, and review those issues, to sort of discuss the pros and cons of adding any certain new api.

And sometimes it's also that somebody opens an issue that's like, I want, for example, I want an API to, to concatenate two unit data arrays together, which is something you can really easily do node with buffer dot con cat, like the scary buffer thing. and there's no standards way of doing that right now.

So we have to have a little utility function that does that. But in parallel, we're thinking about, okay, how do we propose, an addition to the web standards now that makes it easy to concatenate iterates in the web standards, right? yeah, there's a lot to it. Um, but it's, it's really, um, it's all open, like all of our, all of our discussions for, for, additions to the standard library and things like that.

It's all, all, uh, public on GitHub and the GitHub issues and GitHub discussions and GitHub prs. Um, so yeah, that's, that's where we do that.

[00:44:18] Jeremy: Yeah, cuz to give an example, I was a little surprised to see that there is support for markdown front matter built into the standard library. But when you describe it as we look at the top a hundred thousand packages, are people looking at markdown? Are they looking at front matter? I, I'm sure there's a fair amount that are so that that makes sense.

[00:44:41] Luca: Yeah, like it sometimes, like that one specifically was driven by, like, our team was just building a lot of like little blog pages and things like that. And every time it was either you roll your own front matter part or you look for one, which has like a subtle bug here and the other one has a subtle bug there and really not satisfactory with any of them.

So, we, we roll that into the standard library. We add good test coverage for it good, add good documentation for it, and then it's like just a resource that people can rely on. Um, and you don't, you then don't have to make the choice of like, do I use this library to do my front meta parsing or the other library?

No, you just use the one that's in the standard library. It's, it's also part of this like user experience thing, right? Like it's just a much nicer user experience, not having to make a choice, about stuff like that. Like completely inconsequential stuff. Like which library do we use to do front matter parsing? (laughs)

[00:45:32] Jeremy: yeah. I mean, I think when, when that stuff is not there, then I think the temptation is to go, okay, let me see what node modules there are that will let me parse the front matter. Right. And then it, it sounds like probably ideally you want people to lean more on what's either in the standard library or what's native to the Deno ecosystem.

Yeah.

[00:46:00] Luca: Yeah. Like the, the, one of the big benefits is that the Deno Standard Library is implemented on top of web standards, right? Like it's, it's implemented on top of these standard APIs. so for example, there's node front matter libraries which do not run in the browser because the browser does not have the buffer global.

maybe it's a nice library to do front matter pricing with, but. , you choose it and then three days later you decide that actually this code also needs to run in the browser, and then you need to go switch your front matter library. Um, so, so those are also kind of reasons why we may include something in Strand Library, like maybe there's even really good module already to do something.

Um, but if there's certain reliance on specific node features that, um, we would like that library to also be compatible with, with, with web standards, we'll, uh, we might include in the standard library, like for example, YAML Parser, um, or the YAML Parser in the standard library is, is a fork of, uh, of the node YAML module.

and it's, it's essentially that, but cleaned up and, and made to use more standard APIs rather than, um, node built-ins.

[00:47:00] Jeremy: Yeah, it kind of reminds me a little bit of when you're writing a front end application, sometimes you'll use node packages to do certain things and they won't work unless you have a compatibility shim where the browser can make use of certain node APIs. And if you use the APIs that are built into the browser already, then you won't, you won't need to deal with that sort of thing.

[00:47:26] Luca: Yeah. Also like less Bundled size, right? Like if you don't have to shim that, that's less, less code you have to ship to the client.

WebAssembly use cases

[00:47:33] Jeremy: Another thing I've seen with Deno is it supports running web assembly.

[00:47:40] Luca: Mm-hmm.

[00:47:40] Jeremy: So you can export functions and call them from type script. I was curious if you've seen practical uses of this in production within the context of Deno.

[00:47:53] Luca: Yeah. there's actually a Bunch of, of really practical use cases, so probably the most executed bit of web assembly inside of Deno right now is actually yes, build like, yes, build has a web assembly, build like yeses. Build is something that's written and go. You have the choice of either running. Um, natively in machine code as, as like an ELF process on, on Linux or on on Windows or whatever.

Or you can use the web assembly build and then it runs in web assembly. And the web assembly build is maybe 50% slower than the, uh, native build, but that is still significantly faster than roll up or, or, or, or I don't know, whatever else people use nowadays to do JavaScript Bun, I don't know. I, I just use es build always, um,

So, um, for example, the Deno website, is running on Deno Deploy. And Deno Deploy does not allow you to run Subprocesses because it's, it's like this edge run time, which, uh, has certain security permissions that it's, that are not granted, one of them being sub-processes. So it needs to execute ES build. And the way it executes es build is by running them inside a web assembly.

Um, because web assembly is secure, web assembly is, is something which is part of the JavaScript sandbox. It's inside the JavaScript sandbox. It doesn't poke any holes out. Um, so it's, it's able to run within, within like very strict security context. . Um, and then other examples are, I don't know, you want to have a HTML sanitizer, which is actually built on the real HTML par in a browser.

we, we have an hdml sanitizer called com or, uh, ammonia, I don't remember. There's, there's an HTML sanitizer library on denoland slash x, which is built on the html parser from Firefox. Uh, which like ensures essentially that your html, like if you do HTML sanitization, you need to make sure your HTML par is correct, because if it's not, you might like, your browser might parse some HTML one way and your sanitizer pauses it another way and then it doesn't sanitize everything correctly.

Um, so there's this like the Firefox HTML parser compiled to web assembly. Um, you can use that to. HTML sanitization, or the Deno documentation generation tool, for example. Uh, Deno Doc, there's a web assembly built for it that allows you to programmatically, like generate documentation for, for your type script modules.

Um, yeah, and, and also like, you know, deno fmt is available as a WebAssembly module for programmatic access and a Bunch of other internal Deno, programs as well. Like, or, uh, like components, not programs.

[00:50:20] Jeremy: What are some of the current limitations of web assembly and Deno for, for example, from web assembly, can I make HTTP requests? Can I read files? That sort of thing.

[00:50:34] Luca: Mm-hmm. . Yeah. So web assembly, like when you spawn as web assembly, um, they're called instances, WebAssembly instances. It runs inside of the same vm, like the same, V8 isolate is what they're called, but. it does not have it, it's like a completely fresh sandbox, sort of, in the sense that I told you that between a runtime and like an engine essentially implements no IO calls, right?

And a runtime does, like a runtime, pokes holds into the, the, the engine. web assembly by default works the same way that there is no holes poked into its sandbox. So you have to explicitly poke some holes. Uh, if you want to do HTTP calls, for example, when, when you create web assembly instance, it gives you, or you can give it something called imports, uh, which are essentially JavaScript function bindings, which you can call from within the web assembly.

And you can use those function bindings to do anything you can from JavaScript. You just have to pass them through explicitly. and. . Yeah. Depending on how you write your web assembly, like if you write it in Rust, for example, the tooling is very nice and you can just call some JavaScript code from your Rust, and then the build system will automatically make sure that the right function bindings are passed through with the right names.

And like, you don't have to deal with anything. and if you're writing go, it's slightly more complicated. And if you're writing like raw web assembly, like, like the web assembly, text format and compiling that to a binary, then like you have to do everything yourself. Right? It's, it's sort of the difference between writing C and writing JavaScript.

Like, yeah. What level of abstraction do you want? It's definitely possible though, and that's for limitations. it, the same limitations as, as existing browsers apply. like the web assembly support in Deno is equivalent to the web assembly support in Chrome. so you can do, uh, many things like multi-threading and, and stuff like that already.

but especially around, shared mutable memory, um, and having access to that memory from JavaScript. That's something which is a real difficulty with web assembly right now. yeah, growing web assembly memory is also rather difficult right now. There's, there's a, there's a couple inherent limitations right now with web assembly itself.

Um, but those, those will be worked out over time. And, and Deno is like very up to date with the version of, of the standard, it, it implements, um, through v8. Like we're, we're, we're up to date with Chrome Beta essentially all the time. So, um, yeah. Any, anything you see in, in, in Chrome beta is gonna be in Deno already.

Deno Deploy

[00:52:58] Jeremy: So you talked a little bit about this before, the Deno team, they have their own, hosting. Platform called Deno Deploy. So I wonder if you could explain what that is.

[00:53:12] Luca: Yeah, so Deno has this really nice, this really nice concept of permissions which allow you to, sorry, I'm gonna start somewhere slightly, slightly unrelated. Maybe it sounds like it's unrelated, but you'll see in a second. It's not unrelated. Um, Deno has this really nice permission system which allows you to sandbox Deno programs to only allow them to do certain operations.

For example, in Deno, by default, if you try to open a file, it'll air out and say you don't have read permissions to read this file. And then what you do is you specify dash, dash allow read um, maybe you have to give it. they can either specify, allow, read, and then it'll grant to read access to the entire file system.

Or you can explicitly specify files or folders or, any number of things. Same goes for right permissions, same goes for network permissions. Um, same goes for running subprocesses, all these kind of things. And by limiting your permissions just a little bit. Like, for example, by just disabling sub-processes and foreign function interface, but allowing everything else, allowing reeds and allowing network access and all that kind of stuff.

we can run Deno programs in a way that is significantly more cost effective to you as the end user than, and, and like we can cold start them much faster than, like you may be able to with a, with a more conventional container based, uh, system. So what, what do you, what Deno Deploy is, is a way to run JavaScript or Deno Code, on our data centers all across the world with very little latency.

like you can write some JavaScript code which execute, which serves HTTP requests deploy that to our platform, and then we'll make sure to spin that code up all across the world and have your users be able to access it through some URL or, or, or some, um, custom domain or something like that. and this is some, this is very similar to CloudFlare workers, for example.

Um, and it's like Netlify Edge functions is built on top of Deno Deploy. Like Netlify Edge functions is implemented on top of Deno Deploy, um, through our sub hosting product. yeah, essentially Deno Deploy is, is, um, yeah, a cloud hosting service for JavaScript, um, which allows you to execute arbitrary JavaScript.

and there there's a couple, like different directions we're going there. One is like more end user focused, where like you link your GitHub repository and. Like, we'll, we'll have a nice experience like you do with Netlify and Versace, that word like your commits automatically get deployed and you get preview deployments and all that kind of thing.

for your backend code though, rather than for your front end websites. Although you could also write front-end websites and you know, obviously, and the other direction is more like business focused. Like you're writing a SaaS application and you want to allow the user to customize, the check like you're writing a SaaS application that provides users with the ability to write their own online store.

Um, and you want to give them some ability to customize the checkout experience in some way. So you give them a little like text editor that they can type some JavaScript into. And then when, when your SaaS application needs to hit this code path, it sends a request to us with the code, we'll execute that code for you in a secure way.

In a secure sandbox. You can like tell us you, this code only has access to like my API server and no other networks to like prevent data exfiltration, for example. and then you do, you can have all this like super customizable, code in inside of your, your SaaS application without having to deal with any of the operational complexities of scaling arbitrary code execution, or even just doing arbitrary code execution, right?

Like it's, this is a very difficult problem and give it to someone else and we deal with it and you just get the benefits. yeah, that's Deno Deploy, and it's built by the same team that builds the Deno cli. So, um, all the, all of your favorite, like Deno cli, or, or Deno APIs are available in there.

It's just as web standard is Deno, like you have fetch available, you have blob available, you have web crypto available, that kind of thing. yeah.

Running code in V8 isolates

[00:56:58] Jeremy: So when someone ships you their, their code and you run it, you mentioned that the, the cold start time is very low. Um, how, how is the code being run? Are people getting their own process? It sounds like it's not, uh, using containers. I wonder if you could explain a little bit about how that works.

[00:57:20] Luca: Yeah, yeah, I can, I can give a high level overview of how it works. So, the way it works is that we essentially have a pool of, of Deno processes ready. Well, it's not quite Deno processes, it's not the same Deno CLI that you download. It's like a modified version of the Deno CLI based on the same infrastructure, that we have spun up across all of our different regions across the world, uh, across all of our different data centers.

And then when we get a request, we'll route that request, um, the first time we get request for that, that we call them deployments, that like code, right? We'll take one of these idle Deno processes and will assign that code to run in that process, and then that process can go serve the requests. and these process, they're, they're, they're isolated and they're, you.

it's essentially a V8 isolate. Um, and it's a very, very slim, it's like, it's a much, much, much slimmer version of the Deno cli essentially. Uh, which the only thing it can do is JavaScript execution and like, it can't even execute type script, for example, like type script is we pre-process it up front to make the the cold start faster.

and then what we do is if you don't get a request for some amount of. , we'll, uh, spin down that, um, that isolate and, uh, we'll spin up a new idle one in its place. And then, um, if you get another request, I don't know, an hour later for that same deployment, we'll assign it to a new isolate. And yeah, that's a cold start, right?

Uh, if you have an isolate which receives, or a, a deployment rather, which receives a Bunch of traffic, like let's say you receive a hundred requests per second, we can send a Bunch of that traffic to the same isolate. Um, and we'll make sure that if, that one isolate isn't able to handle that load, we'll spin it out over multiple isolates and we'll, we'll sort of load balance for you.

Um, and we'll make sure to always send to the, to the point of present that's closest to, to the user making the request. So they get very minimal latency. and they get we, we've these like layers of load balancing in place and, and, and. I'm glossing over a Bunch of like security related things here about how these, these processes are actually isolated and how we monitor to ensure that you don't break out of these processes.

And for example, Deno Deploy does, it looks like you have a file system cuz you can read files from the file system. But in reality, Deno Deploy does not have a file system. Like the file system is a global virtual file system. which is, is, uh, yeah, implemented completely differently than it is in Deno cli.

But as an end user you don't have to care about that because the only thing you care about is that it has the exact same API as the Deno cli and you can run your code locally and if it works there, it's also gonna work in deploy. yeah, so that's, that's, that's kind of. High level of Deno Deploy. If, if any of this sounds interesting to anyone, by the way, uh, we're like very actively hiring on, on Deno Deploy.

I happen to be the, the tech lead for, for a Deno Deploy product. So I'm, I'm always looking for engineers, to, to join our ranks and, and build cool distributed systems. Deno.com/jobs.

[01:00:15] Jeremy: for people who aren't familiar with the isolates, are these each run in their own processes, or do you have a single process and that has a whole Bunch of isolates inside it?

[01:00:28] Luca: in, in the general case, you can say that we run, uh, one isolate per process. but there's many asterisks on that. Um, because, it's, it's very complicated. I'll just say it's very complicated. Uh, in, in the general case though, it's, it's one isolate per process.

Yeah.

Configuring permissions

[01:00:45] Jeremy: And then you touched a little bit on the permissions system. Like you gave the example of somebody could have a website where they let their users give them code to execute. how does it look in terms of specifying what permissions people have? Like, is that a configuration file? Are those flags you pass in?

What, what does that look?

[01:01:08] Luca: Yeah. So, so that product is called sub hosting. It's, um, slightly different from our end user platform. Um, it's essentially a service that allows you to, like, you email us, well, we'll send you a, um, onboard you, and then what you can do is you can send HTTP requests to a certain end point with a, authentication token and.

a reference to some code to execute. And then what we'll do is, we'll, um, when we receive that HTTP request, we'll fetch the code, it's spin up and isolate, execute the code. execute the code. We serve the request, return you the response, um, and then we'll pipe logs to you and, and stuff like that.

and the, and, and part of that is also when we, when we pull the, um, the, the code for to spin up the isolate, that code doesn't just include the code that we're executing, but also includes things like permissions, and, and various other, we call this isolate configuration. Um, you can inspect, this is all public.

we have public docs for this at Deno.com/subhosting. I think. Yes, Deno.com/subhosting.

[01:02:08] Jeremy: And is that built on top of something that's a part of the public Deno project, the open source part? Or is this specific to this sub hosting product?

[01:02:19] Luca: Um, so the underlying engine or underlying runtime that executes the code here, like all of the code execution is performed by code, which is execute, which is public. Like all our, our, yeah, it uses, the Deno CLI just strips out a Bunch of stuff. It doesn't need the orchestration code, is not public.

The orchestration code is proprietary. and yeah, if you have use cases that where you would like to run this orchestration code on your own infrastructure, and yeah, you have interesting use cases, please email us. We would love to hear from you.

[01:02:51] Jeremy: separate from the, the orchestration, if it's more of an example of, let's say I deploy a Deno application and in the case that someone was able to get some, like malicious code or URLs into my application, could I tell Deno I only want this application to be able to call out to these URLs for just as an example?

[01:03:18] Luca: yes. So it's, it's slightly more complicated because you can't actually tell it that it can only call out to specific URLs, but you can tell it to call out only to specific domains or IP addresses. which sort of the same thing, but, uh, just slightly different layer of abstraction. Yeah, you can do that.

the allow net flag, allows you to specify a set of domains to allow requests to those domains. Yes,

[01:03:41] Jeremy: I see. So on the, user facing open source part, there are configuration flags where you could say, I want this application to be able to access these domains, or I don't want it to be able to use IO or whatever

[01:03:56] Luca: Yeah, exactly.

[01:03:57] Jeremy: their, their flags.

[01:03:59] Luca: Yeah. And, and on, on subhosting, this is done via the isolate configuration, which is like a JSON blob. And, yeah, like there, there's, it's, but ultimately it's all, it's all sort of, the same concept, just slightly different interfaces because like, like the, the subhosting one needs to be programmatic interface instead of, uh, something you type as an end user.

Right?

Why deploy your application on the edge?

[01:04:20] Jeremy: One of the things you mentioned about Deno Deploy is it's, centered around deploying your application code to a Bunch of different locations. And you also mentioned the, the cold start times very low.

could you kind of give the case for wanting your application code at a Bunch of different sites?

[01:04:38] Luca: Mm-hmm. . Yeah. So the, the, the, the main benefit of this is that when your user makes request for your application, um, you don't have to round trip back to, um, wherever centrally hosted your application would otherwise be. Like, if you are, a, a startup, even if you're just in the US for example, it's nice to have, points of presence, not just on one of the US coasts, but on both of the US coasts because that means that your round trip time is not gonna be a hundred milliseconds, but it's gonna be 20 milliseconds.

this sort of relies on. the, like, this doesn't, there's obviously always the problem here that if your database lives in only one of the two coasts, you still need to do the round trip. And there's solutions to this, which is one caching, uh, that's the, the, the obvious sort of boring solution. Um, and then there's the solution of using databases which are built exactly for this.

For example, CockroachDB is a database which is Postgres compatible, but it's really built for, um, global distribution and built for being able to shard data across regions and have different, um, primary regions for different, uh, shards of your, of your, of your tables. which means, for example, you could have the, your users on the East coast, their data could live on a database in the east coast and your users on the west coast, their data could live on a database on the west coast.

and. your like admin panel needs to show all of them. It has an aggregate view over both coasts, right? like this is something which, which something like CockroachDB can do and it can be a really great, um, great thing here. And, we acknowledge that this is not something which is very easy to do right now and Deno tries to make everything very easy.

So you can imagine that this is something we're working on and we're working on, on database solutions. And actually I should more generally say persistent solutions that allow you to persist data, in a way that makes sense for an edge system like this. Um, where the data has persisted close to users that need it.

Consistency in local development vs deployment

[01:06:44] Luca: Um, and data is cached around the world. and you still have sort of semantics, which, which are consistent with the semantics that you have, when you're locally developing your application. Like you don't want, for example, your local application development. , you don't want there to be like strong consistency there, but then in production you have eventual consistency where suddenly, I don't know, all of your code breaks because you didn't, your US west region didn't pick up the changes from US east because it's eventually consistent, right?

I mean, this is a problem that we see with a lot of the existing solutions here. like specifically CloudFlare KV for example. CloudFlare KV is, um, a single primary is a system with, with single primary, um, right regions, where there's just a Bunch of caching going on. And this leads to ventral consistency, which can be very confusing for, for end user developers.

Um, especially because if you're using this locally, the local emulator does not emulate the eventual consistency, right? so this, this, this can become very confusing very quickly. And so a, anything that we build in, in this persistence field, for example, we take very, we very seriously, um, Weigh these trade offs and make sure that if there's something that's eventually consistent, it's very clear and it works the same way, the same eventually consistent way in the cli.

[01:08:03] Jeremy: So for someone, let's say they haven't made that jump yet to use a cockroach. They, they just have their. their database instance in AWS East or whatever. does having the code at the edge where it all ends up needing to go to east, is that better than having the code be located next to the database?

[01:08:27] Luca: Yeah. Yeah. It, it, it totally does. Um, there's, there's def there's different, um, there, there's trade-offs here, right? Obviously, like if you have a, a, if you have an admin panel, for example, or a, a like user dashboard, which is very, very reliant on data from your database, and for every single request needs to fetch fresh data from the database, then maybe the trade off isn't worth it.

But most applications are not like that. Most applications are, for example, you have a landing page and that landing page needs to do AB tests. and those AB tests are based on some heuristic that you can fetch from the database every five seconds. That's fine. Like, it doesn't need to be perfect, right?

So you, you have caching in place, which, um, like by doing this caching locally to the user, um, and, and still being able to programmatically control this, like based on, I don't know, the user's user agent or, the IP address of the user or the region of the user, or. the past browsing history of that user through as, as measured by their cookies or whatever else, right?

being able to do these highly user customized actions very close to the user, means that like latency is, is like, this is a much better user experience than if you have to do the roundtrip, especially if you're a, a startup or, or, or, or a, um, service which is globally distributed and, and serves not just users in the US or the EU but like all across the world.

Caching options

[01:09:52] Jeremy: And when you talk about caching in the context of Deno Deploy, is there a cache native to the system or are you expecting someone to have, uh, a Redis or a memcached, that sort of thing?

[01:10:07] Luca: Yeah. So Deno Deploy, actually has, there's a built, there's a, there's a web cache api, um, which is also the web cache API that's used by service workers and, and others. and CloudFlare also implements this cache api. Um, and this is something that's implemented in Deno cli, and it's gonna be coming to Deploy this quarter, which is, that's the native way to do caching, and otherwise you can also use Redis you can use services like Upstash or, uh, even like primitive in-memory caches where it's just an LRU that's in memory, like a, like a JavaScript data structure, right?

or even just a JavaScript map or JavaScript object, with a, with a time on it. And you automatically, and like every time you read from it and the time is above some certain threshold, you delete the cache and go fetch it again, right? Like this is, there's many things that you could consider a cache that are not like Redis or, or, or, or like the web cache api.

So there's, there's ways to do that. And there's also a Bunch of, like, modules on, in the standard library, or not in the standard library story in the, in the third party module registry and also on NPM that you can use to, to implement different cache behaviors.

[01:11:15] Jeremy: And when you give the example of a in memory cache, when you're running in Deno deploy, you're running in these isolates, which presumably can be shut down at any time. So what kind of guarantees do users have that whatever they put into memory will still be there?

[01:11:34] Luca: none like the, it's, it's a cache, right? The cache can be evicted at any time. Your isolate can be restarted at any time. It can be shut down. You can be moved to a different region. The data center could go for, go down for maintenance. Like this is something your application has to be built in, in a way that it is tolerant to, to restarts essentially.

but because it's a cache, that's fine. Because if the cache expires or, or, or the cache is cleared through some external means, the worst thing that happens is that you have a cold request again, right? And, if you're serving like a hundred requests a second, I can essentially guarantee to you that not every single request will invoke a cold start.

Like, I can guarantee to you that probably less than 0.1% of requests will, will cause a cold start. this is not like SLA anywhere. Um, because it's like totally up to, to however the, the system decides to scale you. but yeah, like it's, it, it would be very wasteful for us, for example, to spin up a new isolate for every request.

So we don't, we reuse isolates wherever possible. yeah. It's like it's in our best interest to not cold start you, um, because it's expensive for us to do all the CPU work to, to cold start an isolate, right?

Working with CDNs

[01:12:47] Jeremy: and typically with applications, people will put a, a CDN in front and they'll use things like cache control headers to be able to serve straight from the CDN Is that a supported use case with Deno Deploy or are there anything that, anything that people should be aware of when they're doing that sort of thing?

[01:13:09] Luca: Yeah, so you can do that. Um, like you could put a cache in front of Deploy but in most cases it's really not necessary. Um, because the main reasons people use CDNs is, it is essentially to like do this global distribution problem, right? Like you, you want to be able to cache close to users, but if your end application is already executing close to users, the cost of a, of a, of serving something, like serving a request from a JavaScript cache is like marginal.

It's so low. there's, there's like no nearly no CPU time involved here. it's, it's network bandwidth. That's the, that's the limiting factor and that's the limiting factor for all CDNs. Uh, so, so whether you're serving on Deploy or you have a, a separate CDN that you put in front of it, hmm. not really that big a difference.

Like you can do it. but I don't know. Deno.com doesn't, or, or, and Deno.land, like they don't have a CDN in front of them. They're running bare on, on Deno Deploy and, yeah, it's fine.

[01:14:06] Jeremy: So for, even for things like images, for example, something that. Somebody might store in object storage and put a CDN in in front.

[01:14:17] Luca: Mm-hmm.

[01:14:18] Jeremy: are you suggesting that people could put it on Deno deployed directly or just kind of curious what your thoughts are there?

[01:14:26] Luca: Yeah. Uh, like if you have a blog and your profile image is, is part of your blog, right? And you can put that in your static file folder and serve that directly from your Deno Deploy application, like that's totally cool. Uh, you should do that because that's obvious and that's the obvious way to do things.

if you're specifically building like a, image serving CDN , go reach out to us because we'd love to work with you. But also, um, like there's probably different constraints that you have. Um, like you probably very, very, very much care about network bandwidth costs, um, because that is like your one number one primary cost factor.

so yeah, it's just what's the trade off? What, what trade-offs are you willing to make? Like does some other provider give you a lower network bandwidth cost? I would argue that if you're building an, like an image cdn, then you'd probably, like, even if you have to write your application code in Haskell or in whatever, it's probably worth it if you can get like a cent cheaper gigabyte transfer fees.

just because that is like 100% of your, of your costs, um, is, is network bandwidth. So it's really a trade off based on what, what you're trying to build.

Workloads currently not handled by Deno Deploy (Coming soon)

[01:15:36] Jeremy: And if I understand correctly, Deno Deploy, it's centered around applications. That take HTTP requests. So it could be a website, it could be an API that sort of thing. and sometimes when people build applications, they have other things surrounding them. They'll, they'll need scheduled jobs. They may need some form of message queue, things like that.

Things that don't necessarily fit into what Deno Deploy currently hosts. And so I wonder for things like that, what you recommend people would do while working with Deno Deploy.

[01:16:16] Luca: Yeah. Great question. unfortunately I can't tell you too much about that without, like, spoiling everything (laughs), but what I'm gonna say is you should keep your eyes peeled on our blog over the next two to three months here. I consider message queues and like, especially message queues they are a persistence feature and we are currently working on persistence features.

So yeah, that's all I'm gonna say. But, uh, you can expect Deno deployed to do things other than, um, just HTTP requests in the not so far. Future, and like cron jobs and stuff like that. Also, uh, at some point, yeah.

Who's using deno?

[01:16:54] Jeremy: All right. We'll look, we'll look out for that I guess as we wrap up, maybe you could give some examples of who's using Deno and, and what types of projects do you think are are ideal for Deno?

[01:17:11] Luca: Yeah. yeah. Uh, Deno or Deno Deploy, like do you know, like, do you know as in all of Deno or Deno deploy specifically?

[01:17:17] Jeremy: I, I mean, I guess either (laughs)

[01:17:19] Luca: Okay. . Okay. Okay. Yeah, yeah. Uh, let's, let's do it. So, one really cool use case, for example, for Deno is Slack. Uh, slack has this app platform that they're building, um, which allows you to execute arbitrary JavaScript from within inside of Slack, in response to like slash commands and like actions. I dunno if you've ever seen like those little buttons you can have in messages if you press one of those buttons, like that can execute some Deno code.

And Slack has built like this entire platform around that, and it makes use of Deno's like security features and, and built in tooling and, and all that kind of thing. Um, and that's really cool. And Netlify has built edge functions like, which is like a really, really awesome primitive they have for, for being able to customize outgoing requests to even, come up with completely new requests on the spot, um, as part of their CDN layer.

Uh, also built on top of Deno. And GitHub has built, like this platform called, flat, which allows you to like sort of, um, on cron schedules, pull data, um, into git repositories and, and process that and, and post-process that and, and, and do do things with that. And it's integrated with GitHub actions, all kind of thing.

It's kind of cool. Supabase also has some Edge has like an Edge functions product that's built on top of Deno. I'm just thinking about other, like those are, those are the obvious ones that are on the homepage. there's, I, I know for example, there's a image CDN actually that's serves images with Deno, like 400 million of them a day.

kind of related to what we were talking about earlier. Actually, I don't know if it's still 400 million. I think it's more, um, the last data I got from them was like maybe eight months ago. So probably more at this point. Um, . Yeah. A Bunch of cool, cool, cool things like that. Um, we have like a really active discord channel and there's always people showcasing what kind of stuff they built in there that we have a showcase channel.

I think that's like, if, if you're really interested in like what people are, what cool things people are building with, you know, that's like, that's a great place to, to look. I think actually we maybe also have a showcase. Do we have Deno.land/showcase? I don't remember. Show case. Oh yeah, we do Deno.com/showcase, which is a page of like a Bunch of Yeah. Projects built with Deno or, or, or products using Deno or, um, other things like that.

[01:19:35] Jeremy: Cool. if people wanna learn more about Deno or see what you're up to, where should they head?

[01:19:42] Luca: Yeah. Uh, if you wanna learn more about Deno Cli, head to Deno.land. If you wanna learn more about Deno Deploy, head to Deno.com/deploy. Um, if you want to chat to me, uh, you can hit me up on my website, lcas.dev. if you wanna chat about Deno, you can go to discord.gg/deno. yeah, and if you're interested in any of this and thought that maybe you have something to contribute here, you can either become an open source contributor on our open source project, or this is really something you wanna work on and you like distributed systems or systems engineering or fast performance, head to deno.com/jobs and, send in your resume.

We're, we're very actively hiring and, be super excited to, to, work with you.

[01:20:20] Jeremy: All right, Luca. Well thank you so much for coming on Software Engineering Radio.

[01:20:24] Luca: Thank you so much for having me.

Megan Cutrofello on Leaguepedia

Tue, 10 Jan 2023 06:07:12 -0000

Leaguepedia is a MediaWiki instance that covers tournaments, teams, and players in the League of Legends esports community. It's relied on by fans, analysts, and broadcasters from around the world.

Megan "River" Cutrofello joined Leaguepedia in 2014 as a community manager and by the end of her tenure in 2022 was the lead for Fandom's esports wikis.

She built up a community of contributing editors in addition to her role as the primary MediaWiki developer.

She writes on her blog and is a frequent speaker at the Enterprise MediaWiki Conference

Topics covered:

When to use MediaWiki
Visual vs code editor
MediaWiki's rough syntax
Templates and markup
Limiting user input to simplify pages
Choosing not to transliterate long player names in certain languages
Handling mobile clients
Building aliases for search results
Creating a single source of truth
Roster changes and caching
Cargo (Query data in MediaWiki templates using SQL)
Hiding implementation details from editors
Optimizing for the editor, not a clean codebase
Training your users to use workarounds
MediaWiki only supports es5
The wiki aesthetic
Who is working on the wiki + onboarding
Who is using the wiki
The future of Leaguepedia
How Megan got into wiki development
Issues as opportunities to onboard

MediaWiki extensions

CharInsert - Add code snippets into the MediaWiki editor
Semantic MediaWiki (SMW) - Store and query data inside Wiki pages
Cargo - Replaced SMW at Leaguepedia

Conference Talks

Other podcast appearances

Between the Brackets

Transcript

You can help edit this transcript on GitHub.

[00:00:00] Jeremy: Today I'm talking to Megan Cutrofello. She managed the Leaguepedia eSports wiki for eight years, and in 2017 she got an award for being the unsung hero of the year for eSports. So Megan, thanks for joining me today.

[00:00:17] Megan: Thanks for having me.

[00:00:19] Jeremy: A lot of the people I talk to are into web development, so they work with web frameworks and things like that. And I guess when you think about it, wikis are web development, but they're kind of their own world, I suppose. for someone who's going to build some kind of a site, like when does it make sense for them to use a wiki versus, uh, a content management system or just like a more traditional web framework?

[00:00:55] Megan: I think it makes the most sense to use a wiki if you're going to have a lot of contributors and you don't want all of your contributors to have access to your server.

also if your contributors aren't necessarily as tech savvy as you are, um, it can make sense to use a wiki. if you have experience with MediaWiki, I guess it makes sense to use a Wiki.

Anytime I'm building something, my instinct is always, oh, I wanna make a Wiki (laughs) . Um, so even if it's not necessarily the most appropriate tool for the job, I always. My, my first thought is, hmm, let's see, I'm, I'm making a blog. Should I make my blog in in MediaWiki? Um, so, so I always, I always wanna do that. but I think it's always, when you're collaborating is pretty much, you always wanna do MediaWiki

[00:01:47] Jeremy: And I, I think that's maybe an important point when you say people are collaborating. When I think about Wikis, I think of Wikipedia, uh, and the fact that I can click the edit button and I can see the markup right there, make a change and, and click save. And I didn't even have to log in or anything. And it seems like that workflow is built into a wiki, but maybe not so much into your typical CMS or WordPress or something like that.

[00:02:18] Megan: Yeah. Having a public ability to solicit contributions from anyone. so for Leaguepedia, we actually didn't have open contributions from the public. You did have to create an account, but it's still that open anyone can make an account and all you have to do is like, go through that one step of create an account.

Admittedly, sometimes people are like, I don't wanna make an account that's so much work. And we're like, just make the account. Come on. It's not that hard. but, uh, you still, you're a community and you want people to come and contribute ideas and you want people to come and be a part of that community to, document your open source project or, record the history of eSports or write down all of the easter eggs that you find in a video game or in a TV show, or in your favorite fantasy novels.

Um, and it's really about community and working together to create something where the whole is bigger than the sum of its parts.

[00:03:20] Jeremy: And in a lot of cases when people are contributing, I've noticed that on Wikipedia when you edit, there's an option for a, a visual editor, and then there's one for looking at the raw markup. in, in your experience, are people who are doing the edits, are they typically using the visual editor or are they mostly actually editing the, the markup?

[00:03:48] Megan: So we actually disabled the Visual editor on Leaguepedia, because the visual editor is not fantastic at knowing things about templates. Um, so a template is when you have one page that gets its content pulled into the larger page, and there's a special syntax for that, and the visual editor doesn't know a lot about that.

Um, so that's the first reason. And then the second reason is that, there's this, uh, one extension that we use that allows you to make a clickable, piece of text. It's called (https://www.mediawiki.org/wiki/Extension:CharInsert) CharInserts, uh, for character inserts. so I made a lot of these things that is sort of along the same philosophy as Visual Editor, where it's to help people not have to have the same burden of knowledge, of knowing every exact piece of source that has to be inserted into the page. So you click the thing that says like, um, insert a pick and band prefill, and then a little piece of JavaScript fires and it inserts a whole bunch of Wiki text and then you just enter the champions in the correct places. In the prefills of champions are like the characters that you play in, uh, league of Legends.

And so then you have like the text is prefilled for you and you only have to fill in into this outline. so Visual Editor would conflict with CharInserts, and I much preferred the CharInserts approach where you have this compromise in between the never interacting with source and having to have all of the source memorized.

So between the fact that Visual Editor like is not a perfect tool and has these bugs in it, and also the fact that I preferred CharInserts, we didn't use Visual Editor at all. I know that some wikis do like to use Visual Editor quite a bit, and especially if you're not working with these templates where you have all of these prefills, it can be a lot more preferred to use Visual Editor.

Visual Editor is an experience much more similar to editing something like Microsoft Word, It doesn't feel like you're editing code. and editing code is, I mean, it's scary. Like for, and when I said like, MediaWiki is when you have editors who aren't as tech savvy, as the person who set up the Wiki.

for people who don't have that experience, I mean, when you just said like you have to edit a wiki, like someone who's never done that before, they can be very intimidated by it. And you're trying to build a sense of community. You don't want to scare away your potential editors. You want everyone to be included there.

So you wanna do everything possible to make everyone feel safe, to contribute their ideas to the Wiki. and if you make them have to memorize syntax, like even something that to me feels as simple as like two open brackets and then the name of a page, and then two closed brackets means linking the page.

Like, I mean, I'm used to memorizing a lot of syntax because like, I'm a programmer, but someone who's never written code before, I mean, they're not used to memorizing things like that. So they wanna be able to click a button that says insert link, and then type the name of the page in the middle of the things that pop up there.

Um, so visual editor is. It's a lot safer to use. so a lot of wikis do prefer that. and if it, if it didn't have the bugs with the type of editing that my Wiki required, and if we weren't using CharInserts so much, we definitely would've gone for it. But, um, it wasn't conducive to the wiki that I built, so we didn't use it at all.

[00:07:42] Jeremy: And the, the compromise you're referring to, is it where the editor sees the raw markup, but then they can, there's like little buttons on the side they can click and they'll know, okay, if I click this one, then it's going to give me the text for creating a list or something like that.

[00:08:03] Megan: Yeah, it's a little bit more high level than creating a list because I would never even insert the raw syntax for creating a list. It would be a template that's going to insert a list at the very end. but basically that, yeah,

[00:08:18] Jeremy: And I, I know for myself, even though I do software development, if I click at it on a wiki and there's all the different curly brace tags, there's the square tags, and. I think if you spend some time with it, you can kind of get a sense of what it means. But for the average person who doesn't work with software in their day to day, do, do you find that, is that a big barrier for them where they, they click edit and there's all this stuff that they don't really understand?

Is that where some people just, they go, oh, I don't, I don't know what to do.

[00:08:59] Megan: I think the biggest barrier is actually clicking at it in the first place. so that was a big barrier to me actually. I didn't wanna click at it in the first place, and I guess my reasons were maybe a little bit different where for me it was like, I know that if I click edit, this is going to be a huge rabbit hole and I'm going to learn way too much about wikis and this is going to consume my entire life and look where I ended up.

So I guess I was pretty right about that. I don't know if other people feel the same way or if they just like, don't wanna get involved at all. but I think once people, click edit, they're able to figure it out pretty well. I think there's, there's two barriers or maybe three barriers. the first one is clicking edit in the first place.

The second one is if they learn to code templates at all. Media Wiki syntax is literally the worst I have encountered other than programming languages that are literally parodies. So like the white space language is worse (laughs https://en.wikipedia.org/wiki/Whitespace_(programming_language)) , but like it's two curly braces for a template and it's three curly braces for a variable.

And like, are you actually kidding me? One of my blog posts is like a plea to editors to write a comment saying the name of the template that they're ending because media wiki like doesn't provide any syntax for what you're ending. And there's no, like, there's no indentation. So you can't visually see what you're ending.

And there's no. So when I said the white sp white space language, that was maybe appropriate because MediaWiki prints all of the white space because it's really just like, PHP functions that are put into the text that you're literally putting onto the page. So any white space that you put gets printed.

So the only way to put white space into your code is if you comment it out. So anytime you wanna put a new line, you have to comment out your new line. And if you wanna indent your code, you have to comment out the indents. So it's just, I, I'm , I'm not exaggerating here. It's, it's just the worst. Occasionally you can put a little bit of white space. Because there's like some divisions in parser functions that get handled when it gets sent to the parser. And, but I mean, for the most part it's just, it's just terrible. so if I'm like writing an if statement, I'll write if, and then I'll write a commented out endif at the end, so once an editor starts to write templates, like with parser functions and stuff, that's another big barrier because, and that's not because like people don't know how to code, it's just because the MediaWiki language, and I use language very loosely, it's like this collection of PHP functions poured into this just disaster

It's just, it's not good! (laughs) And the, the next barrier is when people start to jump to Lua, which is just, I mean, it's just Lua where you can write Lua modules and then, Lua is fine. It's great, it has white space and you can make new lines and it's absolutely fine and you can write an entire code base and as long as you're writing Lua, it's, it's absolutely fantastic and there's nothing wrong with it anymore (laughs)

So as much as I just insulted the MediaWiki language, like writing Lua in MediaWiki is great (laughs) . So for, for most of my time I was writing Lua. Um, and I have absolutely no complaints about that except that Lua is one index, but actually the one indexing of Lua is fine because MediaWiki itself is one indexed.

So people complain about Lua being one index, and I'm like, what are you talking about? If it's, if another language were used, then you'd have all of this offsetting when you go to your scripting language because you'd have like the first argument from your template in MediaWiki going into your scripting language, and then you'd have to offset it to zero and everyone would be like vastly confused about what's going on.

So you should be thankful that they picked a language that's one index because it saves you all of this headache. So anyway, sorry for that tangent, but it's very good that we picked a one index language.

[00:13:17] Jeremy: When you were talking about the, the if statement and having to put in comments to have white space, is it, cuz like when I think about an if statement in most languages, the, the if statement isn't itself rendering anything, it's like deciding if you're going to do something inside of the, if so. like what, what would that white space do if you didn't comment it out in the context of the if?

[00:13:44] Megan: So actually you would be able to put some white space inside of an if statement, but you would not be able to put any white space after an if statement. and there, most likely inside of the if statement, you're printing variables or putting other parser functions. and the other parser functions also end in like two curly braces.

And, depending on what you're printing, you're likely ending with a series of like five or eight, or, I don't know, some very large set of curly braces. And so what I like to do is I would like to be able to see all of the things that I'm ending with, and I wanna know like how far the nesting goes, right.

So I wanna write like an end if, and so I have to comment that out because there's no like end if statement. so I comment out an end if there, it's more that you can't indent the statements inside of the if, because anything that you would be printing inside of your code would get printed. So if I like write text inside of the code, then that indentation would get printed into the page.

And then if I put any white space after the if statement, then that would also get printed. So technically you can have a little bit of white space before the curly braces, but that's only because it's right before the curly braces and PHP will strip the contents right inside of the parser function.

So basically if PHP is stripping something, then you're allowed to have white space there. But if PHP isn't stripping anything, then all of the white space is going to be printed and it's like so inconsistent that for the most part it's not safe to put white space anywhere because you don't, you have to like keep track of am I in a location where PHP is going to be stripping something right now or not?

and I, I wanna know what statement or what variable or what template I'm closing at any location. So I always want to, write out what I'm closing everywhere. And then I have to comment that because there was no foresight to put like an end

if clause in this white space, sensitive language.

[00:16:22] Jeremy: Yeah, I, I think I see what you mean. So you have, if you're gonna start an, if you have the, if inside these curly braces, but then, inside the, if you typically are going to render some text to the page, and so intuitively you would indent it so that it's indented in from the if statement. But then if you do that, then it's gonna be shifted to the right on, on the Wiki.

Did I get that right?

[00:16:53] Megan: Yeah. So you have the flexibility to put white space immediately because PHP will strip immediately, but then you don't have flexibility to put any white space after that, if that makes sense.

[00:17:11] Jeremy: So, so when you say immediately, is that on the following line or is that

[00:17:15] Megan: yeah, so any white space before the first clause, you have flexibility. So like if you were to put an if statement, so it's like if, and then there's a colon, all of the next white space will get stripped. Um, so then you can put some text, but then, if you wanted to like put some text and then another if statement nested within the first if statement.

It's not like Lua where you could like assign a variable and then put a comment and then put some more white space and then put another statement. And it's white space insensitive because you're just writing code and you haven't returned anything yet.

it, it's more like Jinja (View templating language) than Python for, for an analogy.

So everything is getting printed because you're in like a, this templating language, not actually a programming language. Um, so you have to work as if you're in a templating language about, you know, 70% of the time , unless you're in this like very specific location where PHP is stripping your white space because you're at the edge of an argument that's being sent there.

So it's like incredibly inconsistent. And every now and then you get to like, pretend that you're in an actual language and you have some white space, that you can indent or whatever. it's just incredibly

inconsistent, which is like what you absolutely want out of a programming language (laughs)

yeah, it's like you're, you're writing templates, but like, it seems like because of the fact that it's using php, there's

[00:18:56] Jeremy: weird exceptions to the behavior.

Yeah.

[00:18:59] Megan: Exactly. Yeah.

[00:19:01] Jeremy: and then you also mentioned these, these templates. So, if I understand correctly, this is kind of like how a lot of web frameworks will have, partials, I guess, where you'll, you'll be able to have a webpage, but it's made up of different I don't know if you would call them components, but you're able to build a full page that's made up of a bunch of different pieces.

So you could have a

[00:19:31] Megan: Yeah Yeah that's a good analogy.

[00:19:33] Jeremy: Where it's like, here's my table of contents, or here's my info box, or things like that. And those are all things that you would create a MediaWiki template for, and then somehow the, the data gets passed into those templates and the template decides how to, to render it out.

[00:19:55] Megan: Yeah.

[00:19:56] Jeremy: And for these, these templates, I, I noticed on some of the Leaguepedia pages, I noticed there's some html in some of them. I was curious if that's typical to write them with HTML or if there are different ways native to Media Wiki for, for, creating these templates.

[00:20:23] Megan: Um, it depends on what you're doing. MediaWiki has a special syntax for tables specifically. I would say that it's not necessarily recommended to use the special syntax because occasionally you can get things to not work out fantastically if people slightly break things. But it's easier to use it.

So if you know that everything's going to work out perfectly, you can use it. and it's a simple shortcut. if you go to the help page about tables on Wikipedia, everything is explained, and not all HTML works, um, for security reasons. So there's like a list of allowed, things that you can use, allowed tags, so you can't put like forms and stuff natively, but there's the widgets extension that you can use and widgets just automatically renders all html that you put inside of a widget.

Uh, and then the security layer there is that you have to have a special permission to edit a widget. so, you only give trusted people that permission and then they can put the whatever html they want there. So, we have a few forms on Leaguepedia that are there because I edited, uh, whichever widgets, and then put the widgets into a Lua module and then put the Lua module into a template and then put the template onto the page.

I was gonna say, it's not that complicated. It's not as complicated as it sounds, but I guess it really is as complicated as it sounds (laughs) . Um, so, uh, I, I won't say that. I don't know how standard it is on other wikis to use that much html, I guess Leaguepedia is pretty unique in how complicated it is.

There aren't that many wikis that do as many things as we did there. but tables are pretty common. I would say like putting divs places to style them, uh, is also pretty common. but beyond that, usually there's not too many HTML elements just because you typically wanna be mobile friendly and it's relatively hard to stay mobile friendly within the bounds of MediaWiki if you're like putting too many elements everywhere.

And then also allowing users to put whatever content inside of them that they want. The reason that we were able to get away with it is because despite the fact that we had so many editors, our content was actually pretty limited. Like if there's a bracket, it's only short team names going into it.

So, and short team names were like at most five or six characters long, so we don't have to worry about like overflow of team names. Although we designed the brackets to support overflow of team names, and the team names would wrap around and the bracket would not break. And a lot of CSS Magic went into making that work that, we worked really hard on and then did not end up using (laughsz)

[00:23:39] Jeremy: Oh no.

[00:23:41] Megan: Only short team names go into brackets.

But, that's okay. uh, and then for example, like in, uh, schedules and stuff, a lot of fields like only contain numbers or only contain timestamps. there's like a lot of tables again where like there's only two digit numbers with one decimal point and stuff like that. So a lot of the stuff that I was designing, I knew the content was extremely constrained, and if it wasn't then I said, well, too bad.

This is how I'm telling you to put the content . Um, and for technical reasons, that's the content that's gonna go here and I don't care. so there's like, A lot of understanding that if I said for technical reasons, this is how we have to do it. Then for technical reasons, that was how we had to do it.

And I was very lucky that all of the people that I worked with like had a very big appreciation with like, for technical reasons, like argument over. This is what's happening. And I know that with like different people on staff, like they would not be willing to compromise that way. Um, so I always felt like extremely lucky that like if I couldn't figure out a way to redesign or recode something in order to be more flexible, then like that would just be respected.

And that was like how we designed something. But in general, like it's, if you are not working with something as rigid as, I mean, and like the history of eSports sounds like a very fluid thing, but when you think about it, like it's mostly names of teams, names of players and statistics. There's not that much like variable stuff going on with it.

It's very easy to put in relational databases. It's very easy to put in fixed width tables. It's very easy to put in like charts that look the same on every single page. I'm not saying. It was always easy to like write everything that I wrote, and it's not, it wasn't always easy to like, deal with designs and stuff, but like relative to other topics that you can pick, it was much easier to put constraints on what was going to go where because everything was very similar across regions, across, although actually one thing.

Okay, so this will be like the, the exception that proves the rule. uh, we would trans iterate players' names when we, showed them in team rosters. So, uh, for example, when we were showing the hangul, the Korean player's names, we would show an English translation also.

Um, and we would do this for every single alphabet. but Hungarian players' names are really, really, really long. And so the transliteration doesn't fit in the table when we show the translation to the Roman alphabet. And so we couldn't do this, so we actually had to make a cargo table. Of alphabets that are allowed to be transliterated into the Roman alphabet, uh, when we have players names in that alphabet.

So we had, like, hangul was allowed and Arabic was allowed, and I can't remember the exact list, but we had like three alphabet, three or four alphabets were allowed and the rest of the alphabets were dis allowed to be transliterate into, uh, the Roman alphabet. and so again, we made up a rule that was like a hard rule across the entire Wiki where we forced the set of alphabets that were transliterated so that this tables could be the same size roughly across every single team page because these Hungarian player names are too long (laughs)

So I guess even this exception ended up being part of the rule of everything had to be standardized because these tables were just way too wide and they were running into the info box. They couldn't fit on the side. so it's really hard when you have like arbitrary user entered content to fit it into the HTML that you design.

And if you don't have people who all agree to the same standards, I mean, Even when we did have people who agreed to all of the same standards, it was really, really, really hard. And we ended up having things like a table of which alphabets to transliterate. Like that's not the kind of thing that you think you're going to end up having when you say, let's catalog the history of League of Legends eSports,

[00:28:40] Jeremy: And, and so when, let's say you had a language that you couldn't trans iterate, what would go into the table.

[00:28:49] Megan: uh, just the native alphabet.

[00:28:51] Jeremy: Oh I see. Okay.

[00:28:53] Megan: Yeah. And then if they went to the player page, then you would be able to see it transliterated. But it wouldn't show up on the team page.

[00:29:00] Jeremy: I see. And then to help people visualize what some of these things you're talking about look like when you're talking about a, a bracket, it's, is it kind of like a tree structure where you're showing which teams are facing which teams and okay,

[00:29:19] Megan: We had a very cool, CSS grid structure that used like before and after pseudo elements to generate the lines, uh, between the teams and then the teams themselves were the elements of the grid. Um, and it's very cool. Uh, I didn't design it. Um, I have a friend who I very, very fortunately have a friend who's amazing at CSS because I am like mediocre at css and she did all of our CSS for us.

And she also like did most of our designs too. Uh, so the Wiki would not be like anything like what it is without her.

[00:30:00] Jeremy: And when you're talking about making sure the designs fit on desktop and, and mobile, um, I think when you were talking earlier, you're talking about how you have these, these templates to build these tables and the, these, these brackets. Um, so I guess in which part of the wiki is it ensuring that it looks different or that it fits when you're working with these different screen sizes

[00:30:32] Megan: Usually it's a peer CSS solution. Every now and then we hide an element on mobile altogether, and some of that is actually MediaWiki core, for example, in, uh, nav boxes don't show up on mobile. And that's actually on Wikipedia too. Uh, well, I guess, yeah. I mean, being MediaWiki core, So if you've ever noticed the nav boxes that are at the bottom of pages on Wikipedia, just don't show up on like en.m.wikipedia.org.

and that way you're not like loading, you're not loading, but display noneing elements on mobile. but for the most part it's pure CSS Solutions. Um, so we use a lot of, uh, display flex to make stuff, uh, appropriate for mobile. Um, some media roles. sometimes we display none stuff for mobile. Uh, we try to avoid that because obviously then mobile users aren't getting like the full content.

Occasionally we have like overflow rules, so you're getting scroll bars on mobile and then every now and then we sort of just say, too bad if you're on mobile, you're gonna have not the greatest solution or not the greatest, uh, experience. that's typically for large data tables. so the general belief at fandom was like, if you can't make it a good experience on mobile, don't put it on the Wiki.

And I just think that's like the worst philosophy because like then no one gets a good experie. And you're just putting less content on the Wiki so no one gets to enjoy it, and no one gets to like use the content that could exist. So my philosophy has always been like the, the, core overview pages should be, as good as possible for both PC and mobile.

And if you have to optimize for one, then you slightly optimize for mobile because the majority of traffic is mobile. but attempt not to optimize for either one and just make it a good experience on both. but then the pages behind that, I say behind because we like have tabs views, so they're like sort of literally behind because it looks like folders sort of, or it looks like the tabs in a folder and you can, like, I, I don't know, it, it looks like it's behind (laughs) , the, the more detailed views where it's just really hard to design for mobile and it's really easy to design for pc and it just feels very unlikely that users on mobile are going to be looking at these pages in depth.

And it's the sort of thing. A PC user is much more likely to be looking at, and you're going to have like multiple windows open and you're gonna be tapping between them and you're gonna be doing all of your research at PC. You absolutely optimize this for PC users. Like, what the hell this is? These are like stats pages.

It's pages and pages and pages of stats. It's totally fine to optimize this for PC users. And if the option is like, optimized for PC users or don't create it at all, what are you thinking To not create it at all, like make it a good experience for someone?

So I don't, I don't understand that philosophy at all.

[00:34:06] Jeremy: Did you, um, have any statistics in terms of knowing on these types of pages, these pages that are information dense or have really big tables? Could you tell that? Oh, most of the people coming here are on computers or, or larger screens.

[00:34:26] Megan: I didn't have stats for individual pages. Um, mobile I accidentally lost Google Analytics access at some point, and honestly I wasn't interested enough to go through the process of trying to get it back. when I had it, it didn't really affect what I put time into, because it was, it was just so much what I expected it to be.

That it, it didn't really affect much. What I actually spent the most time on was looking, so you can, uh, you get URLs for search results. And so I would look through our search results, and I would look at the URL of the failed search results and, so there would be like 45 results for this particular failed search.

And then I would turn that into a redirect for what I thought the target was supposed to be. So I would make sure that people's failed searches would actually resolve to the correct thing. So if they're like typo something, then I make the typo actually resolve. So we had a lot of redirects of like common typos, or if they're using the wrong name for a tournament, then I make the wrong name for the tournament resolve.

So the analytics were actually really helpful for that. But beyond that, I, I didn't really find it that useful.

[00:35:48] Jeremy: And then when you're talking about people searching, are these people using a search box on the Wiki itself And not finding what they were looking for?

[00:36:00] Megan: Yeah. So like the internal search, so like if you search Wikipedia for like New York City, but you spell it C I Y T, , then you're not going to get a result. But it might say, did you mean New York City t y? If like 45 people did that in one month, then that would show up for me. And then I don't want them to be getting, like, that's a bad experience.

Sure. They're eventually getting there, but I mean, I don't want them to have to spend that extra time. So I'm gonna make an automatic redirect from c Y T to c i t Y

[00:36:39] Jeremy: And, and. Maybe we should have talked about this a little earlier, but the, all the information on Leaguepedia is, it's about all of the different matches and players, um, who play League of Legends. so when you edit a, a page on Wikipedia, all of that information, or a lot of it I think is, is hand entered by, by people and on Leagueapedia, which has all this information about like what, how teams did in a tournament or, intricate stats about how a game went.

That seems like a lot of information for someone to be hand entering. So I was wondering how much of that information is somebody actually manually editing those things and how much is, is done automatically or programmatically.

[00:37:39] Megan: So it's mostly hand entered. We do have a little bit of it that's automated, via a couple scripts, but for the most part it's hand entered. But after being handed, entered into a couple of data pages, it gets propagated a lot of times based on a bunch of Lua modules and the cargo extension. So when I originally joined the Wiki back in 2014, it was hand entered.

Not just once, but probably, I don't know, seven times for tournament results and probably 10 or 12 times for roster changes. It was, it was a lot. And starting in 2017, I started rewriting all of the code so that it was entered exactly one time for everything. Tournament results get entered one time into a data page and roster changes get entered one time into a data page.

And, for roster changes, that was very difficult because, for a roster change that needs to update the team history on a player page, which goes, from a join to a leave and it needs to update the, the like roster, change portal for the off season, which goes from a leave to a join because it's showing like the deltas over the off season.

And it needs to update the current team in the, player's info box, which means that the current team has to be calculated from all of the deltas that have ever occurred in that player's history and it needs to update. Current rosters in the team pages, which means that the team page needs to know all of the current players who are currently on the team, which again, needs to know all of the deltas from all of history because all that you're entering is the roster changes.

You're not entering anyone's current team. So nowhere on the wiki does it ever store a current team anymore. It only stores the roster changes. So that was a lot of code to write and deciding even what was going to be entered was a lot because, all I knew was that I was going to single source of truth that somehow and I needed to decide what was I going to single source of truth.

So I decided, um, that I was going to be this Delta and then deciding what to do with that, uh, how to store it in a relational database. It was, it was a big project. and I didn't have a background as a developer either. so this was like, I don't know, this was like my third big project ever. So, that was, that was pretty intense.

but it was, it was a lot of fun. so it is hand entered but I feel like that's underselling it a little bit.

[00:40:52] Jeremy: Yeah, cuz I was initially, I was a little confused when you mentioned how somebody might need to enter the same information multiple times. But, if I understood correctly, it would be if somebody's changing which team they're on, they would have to update, for example, the player's page and say like, oh, this player is on this team now.

And then you would have to go to their old team and remove them from the roster there.

Go to the new team, add them to the roster there, And you can see where it would kind

[00:41:22] Megan: Yeah. And then there's the roster, there's the roster nav box, and there's like the old team, you have to say, like the next team. Cuz in the previous players list, like we show former team members from the old team and you have to say like the next team. Uh, so if they had like already left their old team, you'd have to say like, new team.

Yeah, there's a, there's a lot of, a lot of places.

[00:41:50] Jeremy: And so now what it sounds like is, I'm not sure this is exactly how it works, but if you go to any location that would need that information, which team is this player on? When you go to that page, for example, if you were to go to, uh, a teams page, then it would make a SQL query to figure out I guess who most recently had a, I forget what you called it, but like a join row maybe, or like a, they, they had the action of joining this team, and now, now there's a row in the database that says they did this.

[00:42:30] Megan: it actually looks at the ten-- so I have an in in between table called tenures. And so it looks at the tenures table instead of querying all the way through the joins and leaves table and doing like the whole list of deltas. yeah. So, and it's also cached so you, it doesn't do the SQL query every time that you load the page.

So the only time that the SQL queries actually happen is if you do a save on the page. And then otherwise the entire generated HTML of the page is actually cached on the server. So you're, you're not doing that many database queries every time you load the page, so don't worry about that. but there, there can actually be something like a hundred SQL queries sometimes, when you're, saving a page.

So it would be absolute murder if you were doing that every time you went to the page. But yeah, it works. Something like that.

[00:43:22] Jeremy: Okay, so this, this tenures table is, that's kind of like what's the current state of all these players and where they are, and then.

[00:43:33] Megan: Um, the, the tenures table, caches sort of, or I guess the tenure table captures is a better word than caches um, every, join to leave historically from every team. Um, and then I save that for two reasons. The first one is so that I don't have to recompute it, uh, when I'm doing the team's table, because I have to know both the current members and the former members.

And then the second reason is also that we have a public api and so people can query that.

if they're building tools, like a lot of people use the public api, uh, for various things. And, one person built like, sort of like a six degrees of Kevin Bacon except for League of Legends, uh, using our tenures tables.

So, part of the reason that that exists is so that uh, people can use it for whatever projects that they're doing.

Cause the join, the join leave table is like pretty unfriendly and I didn't wanna have to really document that for anyone to use. So I made tenures so that that was the table I could document for people to use.

[00:44:39] Jeremy: Yeah. That, that's interesting in that, yeah, when you provide an api, then there's so many different things people can do that even if your wiki didn't really need it, they can build their own apps or their own pages built on all this information you've aggregated.

[00:44:58] Megan: Yeah. It's nice because then when someone says like, oh, can you build this as a feature request? I can say no, but you can (laughs)

[00:45:05] Jeremy: Well you've, you've done the, the hard part for them (laughs)

[00:45:09] Megan: Yeah. exactly.

[00:45:11] Jeremy: So that's cool. Yeah. that's, that's interesting too about the, the caching because yeah, I guess when you think about a wiki, most of the people who are visiting it are just visiting it to see what's on there. So the, provided that they're not logged in and they don't need anything specific to them. Yeah, you should be able to cache the whole response. It sounds like.

[00:45:41] Megan: Yeah. yeah. Caching was actually a nightmare with this in this particular thing. the, the team roster changes, because, so cargo, which I mentioned a couple times is the database extension that we used. Um, and it's basically a SQL wrapper that like, doesn't port 80% of the features that SQL has. so you can create tables and you can query, but you can't make, uh, like sub-select queries.

So your queries have to be like very simple. which is good for like most users of MediaWiki because like the average MediaWiki user doesn't have that much coding experience, but if you do have coding experience, then you're like, what, what, what am I doing? I can't, I can't do anything. Um, but it's a very powerful tool, still compared to most of what you could do with Media Wiki without this, basically you're adding a database layer to your software stack, which I mean, I, I, that's what you're doing, (laughs)

Um, so you get a huge amount of power from adding cargo to a wiki. Um, in exchange it's, it's very performance. It's like, it's, it, it's resource heavy. uh, it hurts your performance a lot. and if you don't need it, then you shouldn't use it. But frequently you need it when you're doing, difficult or not necessarily difficult, but like intensive things.

Um, anytime that you need to pull data from one page to another, you wanna use something like that. Um,

So cargo, uh, one of the things that it doesn't do is it doesn't allow you to, uh, set a primary key easily. so you have to like, just like pretend that one row in the table is your primary key, basically. it internally automatically sets one, but it won't be static or it won't be the same every time that you rebuild the table because it rebuilds the table in a random order and it just uses an auto increment primary key.

So you set a row in the table to pretend to be your ran, to pretend to be your primary key. But editors don't know what, your editors don't understand anything about primary keys. And you wanna hide this from them completely. Like, you cannot tell an editor, protect this random number, please don't change this.

So you have to hide it completely. So if you're making your own auto increment, like an editor cannot know that that exists. Like this is back to when we were talking about like visual editor. This is like, one of the things about making the wiki safe for people is like not exposing them to the internals of like, anything scary like that.

So for example, if an editor accidentally reorders two rows and your roster change data like that has to not matter. Because that can't break the entire wiki. They, you can't make an editor like freak out because they just reordered two rows in, in the page. And you can't put like a scary notice somewhere saying, under no circumstances reorder two rows here.

Like, that's gonna scare people away. And you wanna be very welcoming and say like, it's impossible to break this page no matter how hard you tried. Don't worry. Anything you do, we can just fix it. Don't worry. But the thing is that everything's going to be cached. And so in particular, um, when I said I made that tenures table, one thing I did not wanna do was resave every single row from the join leave table.

So you had to join back to, sorry, I'm going to use, join in two different connotations. you had to join back to the join leave table in order to get like all of the auxiliary data, like all of the extra columns, like, I don't know, like role, date, team name and stuff. Because otherwise the tenures table would've had like 50 columns or something.

So I needed to store the fake primary key in the tenures table, but the tenures table is cached on the player page and the join leave table is on the data page, which means that I need to purge the cache on the player page anytime that someone edits the data on the data page. Which means that, so there's like some JavaScript that does that, but if someone like changes the order of the lines, then that primary key is going to change because I have an auto increment going on.

And so I had to like very, very carefully pick a primary key here so that it was literally impossible for any kind of order change to affect what the primary key was so that the cash on the player page wasn't going to be changed by anything that the editor did in unless they were going to then update the cash on that player page after making that change.

If that makes sense. So after an editor makes a change on the news page, they're going to press a button to update the cache on the player page, but they're only going to update the player page for the one line that they change on the news page. These, uh, primary keys had to be like super invariant for accidental row moves, or also later on, like entire moves of separating a bunch of these data pages into like separate subpages because the pages were getting too big and it was like timing out the server because there were too many stores to the database on a single page every time you save the page.

And anyway, it took me like five iterations of making the primary key like more and more specific to the single line because my auto increment was like originally including every single line I was auto incrementing and then I auto incremented only when that single player was was involved. And then I auto incremented only when that player and the team was involved.

And then I reset the auto increment for that date. So, and it was just got like more and more convoluted what my primary key was. It was, it was a mess.

Anyway, this is just like another thing when you're working with volunteers who don't know what's going on and they're editing the page and they can contribute content, you have to code for the editor and not code for like minimizing complexity,

The editor's experience matters more than the cleanliness of your code base, and you just end up with these like absolute messes that make no sense whatsoever because the editor's experience matters and you always have to code to the editor. And Media Wiki is all about community, and the editor just becomes part of the software and part of the consideration of your code base, and it's very, very different from any other kind of development because they're like, the UX is just built so deeply into how you're developing.

[00:53:33] Jeremy: if I am following correctly, when I, when I think of using SQL when you were first talking about cargo and you were talking about how you make your own tables, and I'm envisioning the, the columns and the rows and, it's very common for the primary key to either be auto incrementing or some kind of GUID

But then if I understood correctly, I think what you were saying is that anytime an editor makes changes to the data, it regenerates the whole table. Is that did I get that right?

[00:54:11] Megan: It regenerates all of the rows on that page.

[00:54:14] Jeremy: and when you talk about this, these

data pages, there's some kind of media wiki or cargo specific markup where people are filling in what is going to go into the rows. And the actual primary key that's in MySQL is not exposed anywhere when they're editing the data.

[00:54:42] Megan: That's right

[00:54:44] Jeremy: And so when you're talking about trying to come up with a primary key, um, I'm trying to, I guess I'm trying to picture

[00:54:57] Megan: So usually I do page name underscore an auto increment. But then if people can rearrange the rows which they do because they wanna get the rows chronological, but some people just put it at the top of the page and then other people are like, oh my God, it's not chronological. And then they fix it and then other people are like, oh my God, you messed up the time zone.

And then they rearrange it again. Then, I mean, normally I wouldn't care because I don't really care like what the primary key is. I just care that it exists. But then because I have it cached on these player pages, I really, really do care what the primary key is. And because I need the primary key to actually agree with what it is on the data page, because I'm actually joining these things together.

and people aren't going to be updating the cache on the player page if they don't think that they edited the row because rearranging isn't actually editing and people aren't going to realize that. And again, this is burden of knowledge. People can't, I can't make them know that because they have to feel safe to make any edits.

It's bad enough that they have to know that they have to click this button to update the cache after making an edit in the first place. so, the auto increment isn't enough, so it has to be like an auto increment, but only within the set of rows that incorporate that one player. And then rearranging is pretty safe because they'd have to rearrange two pieces of news, including the same player.

And that's really unlikely to happen. It's really unlikely that someone's going to flip the order of two pieces of news that involve the same player without realizing that they're actually are editing that single player except maybe they are. So then I include the team in that also. So they'd have to rearrange two pieces of news, including the same player and the same team.

And that's like unlikely to happen in the first place. And then like, maybe a mistake happens like once a year. And at the end of the day, the thing that really saves us is that we're a wiki. We're not an official source. And so if we have a mistake once a year, like no one cares really. So we're not going for like five nines or anything.

We're going for like, you know, two (laughs) . Um, so

[00:57:28] Jeremy: so

[00:57:28] Megan: We were having like mistakes constantly until I added player and team and date to the set of things that I was auto incrementing against. and once I got all of those, it was pretty stable.

[00:57:42] Jeremy: And for the caching part, so when you're making a cargo query or a SQL query on one page and it needs to join on or get data from another page, it goes to this cache that you have instead of going directly to the actual table in the database. And the only way to get the right data is for the editor to click this button on the website that tells it to update the cache did I get that right?

[00:58:23] Megan: Not quite. So it, well, or Yes, you did sort of, it goes to the actual table. The issue here is that, the table was last updated, the last time that a page was saved. And the last time the data got saved was the last time that the page that contains the parser function that generates those rows got saved.

So, let me say that again. So, some of the data is being saved from the data page where the users manually enter it, and that's fine because the only time that gets updated is when the users manually enter it and then the page gets saved. But then these tenures tables are stored by my lua code on the player pages, and those aren't going to get updated unless the player page gets blank edited or null edited, or a save action happens from the player page.

And so the way to make a, an edit happen from the player page is either to manually go there and click edit, and then click save, which is called a blank edit because. Blank edited, you didn't do anything but you pressed save or to use my JavaScript gadget, which is clicking a button from the data page that just basically does that for you using the api.

And then that's going to update the table and then the database table, because that's where the, the cargo parser function is that writes to the database and updates the tables there. with the information, Hey, the primary key changed, because that's where the parser function is physically located in the wiki because one of them is on the data page and one of them is on the player page.

So you get this disconnect in the cache where it's on two different pages and so you have to press a save action in both of them before the table is consistent again.

[01:00:31] Jeremy: Okay. It be, it's, so this is really all about the tenure table, which the user will never mod or the editor will never modify directly. You need your code running on the data page and the player's page to run, to update the The tenure table?

[01:00:55] Megan: Yeah, exactly.

[01:00:57] Jeremy: yeah, it's totally hidden that this exists to the editor, but it's something that, that you as the person who put this all together, um, have to always be aware of, yeah.

[01:01:11] Megan: Right. So there was just so many things like this, where you just had to press this one button. I call it refresh overview because originally it was on a tournament page and you had to press, the refresh overview button to purge the cache on the overview page of the tournament. after editing the data and you would refresh, overview, to deal with this cache lag.

And everyone knew you have to refresh overview, otherwise none of your data entry is gonna like, be worth anything because it's not, the cache is just gonna lag. but every editor learned, like if there's a refresh overview button, make sure you press the refresh overview button, , otherwise nothing's gonna happen.

Um, and there is just like tons of these littered across the Wiki. and like to most people, it just like, looks like a simple little button, but like so many things happen when you press this button.

so it is, it is very important.

[01:02:10] Jeremy: Are there, no ways inside of media wiki to if somebody edits one page, for example, to force it to go and, do, I forget what you called it, like a blank save or blank edit on another page?

[01:02:27] Megan: So that wouldn't even really work because, we had 11,000 player pages. And you don't know which one the user just edited. so it, it's unclear to MediaWiki what just happened when the user just edited some part of the data page. and like the whole point here is that I can't even blank edit every single player page that the data page links to because the data page probably links to, I don't know, 200 different player pages.

So I wanna link, I wanna blank it like the five that this one news line links to. so I do that, through like HTML attributes, in the JavaScript,

[01:03:14] Jeremy: Oh, so that's why you're using JavaScript so that you can tell what the person edited because there isn't really a way to know natively in, in MediaWiki. what just changed?

[01:03:30] Megan: there's like a diff so I could, like, MediaWiki knows the characters that got changed, but it doesn't really know like semantically what happened. So it doesn't know, like, oh, a link to this just got edited and especially because, I mean it's like templates that got edited, not really like the final HTML or anything.

So Media Wiki has no idea what's going on. so yeah, so the JavaScript, uh, looks at the HTML attributes and then runs a couple API queries, and then the blank edits happen and then a couple purges after that so that the cache gets purged after the blank edit.

[01:04:08] Jeremy: Yeah. So it, it seems like on these Wiki pages, you have the html, you have the CSS you have the ability to describe these data pages, which I, I guess in the end, end up being rows in in SQL. And then finally you have JavaScript. So it kind of seems like you can do almost everything in the context of a a Wiki page.

You have so many, so

many of these tools at your, at your disposal.

[01:04:45] Megan: Yeah. Except write es6 code.

[01:04:48] Jeremy: Oh, still, still only es5.

[01:04:52] Megan: Yeah,

[01:04:52] Jeremy: Oh no. do, do you know if that's something that they are considering changing or

[01:05:01] Megan: There's a Phabricator ticket open.

[01:05:05] Jeremy: How, um, how, how many years?

[01:05:06] Megan: It has a lot of comments, oh a lot of years. I think it's since like 2014 or something

[01:05:14] Jeremy: Oh yeah. I, I guess the, the one maybe, well now now the browsers all, all support es6, but I, I guess one of the things, it sounds like media wiki, maybe side stepped is the whole, front end ecosystem in, in terms of node packages and build tools and things like that. is, is that right? It's basically you can write JavaScript and there, yeah,

[01:05:47] Megan: You can even write jQuery.

[01:05:49] Jeremy: Oh, okay. That's built in as well.

[01:05:52] Megan: Yeah .So I have to admit, like my, my front end knowledge is like a decade out of date or something because it's like what MediaWiki can do and there's like this entire ecosystem out there that I just like, don't have access to. And so I like barely know about. So I have this like side project that uses React that I've like, kind of sort of been working on.

And so like I know this tiny little bit of react and I'm like, why? Why doesn't MediaWiki do this?

Um, they are adding Vue support. So in theory I'll get to learn vue so that'll be fun.

[01:06:38] Jeremy: So I'm, I'm curious, just from the limited experience you've had, outside of,

MediaWiki, are, are there like specific things, uh, in your experience working with React where you're, you really wish you had in inside of Media Wiki?

[01:06:55] Megan: Well, really the big thing is like es6, like I really wish we could use arrow functions , like that would be fantastic. Being able to build components would be really nice. Yeah, we can't do that.

[01:07:09] Jeremy: I, I suppose you, you've touched a little bit on performance before, but I, I guess that's one thing about Wikis is that, putting what's happening in the back end, aside the, the front end experience of Wikis, they, they feel pretty consistent since they're generally mostly server rendered.

And the actual JavaScript is, is pretty light, at least from, from Wikis I've seen.

[01:07:40] Megan: Yeah. I mean you can add as much JavaScript as you want, so I guess it depends on what the users decide to do. But it's, it's definitely true that wikis tend to load faster than some websites that I've seen.

[01:07:54] Jeremy: Yeah, I mean, I guess when you think of a wiki, it's, you're there cuz you wanna get specific information and so the goal is not to necessarily reproduce like some crazy complex app or something. It's, It's, to get you the, the, information. Yeah.

[01:08:14] Megan: Yeah. No, that's actually one thing that I really like about Wikis also is that you don't have the pressure to make them look nice. I know that some people are gonna hear that and just like, totally cringe and be like, oh my God, what is she saying? ? Um, but it's actually really true. Like there's an aesthetic that Wikis and Media Wiki in particular have, and you kind of stick to that.

And within that aesthetic, I mean, you make them look as nice as you can. Um, and you certainly don't wanna like, make them deliberately ugly, but there's not a pressure to like go over the top with like marketing and branding and like, you know, you, you just make them look reasonably nice. And then the focus is on the information and the focus is on making the information as easy to understand as possible.

And a wiki that looks really nice is a wiki that's very understandable and very intuitive, and one where you. I mean, one, that the information is the joy and, you know, not, not the presentation, I guess. So it's like the presentation of the information instead of the presentation of the brand. so I, I really appreciate that about wikis.

[01:09:30] Jeremy: Yeah, that's a good point about the aesthetics in the sense of like, they have a certain look and yeah, maybe it's an authoritative look, , which, uh, is interesting cuz it's, like a, a wiki that I'll, I'll commonly go to for example, is there's the, the PC gaming Wiki. And when you look at how it's styled, it feels like very dated or it doesn't look like, I guess you could say normal webpages, but it's very much in line with what you expect a wiki to look like.

So it's, it's interesting how they have that, shared aesthetic, I guess.

[01:10:13] Megan: Yeah. yeah. No, I really like it. The Wiki experience,

[01:10:18] Jeremy: We, we kind of touched on this near the beginning, but sometimes when. I would see wikis and, and projects like Leaguepedia I would kind of wonder, you know, what's the decision between or behind it being a wiki versus something being like a custom CMS in, in the case of Leaguepedia but, you know, talking to you about how it's so, like wikis are structured so that people can contribute.

and then like you were saying, you have like this consistent look that brings the data to the user. Um, I actually, it gives me a better understanding of why so many people choose wikis as, as ways to present this information.

[01:11:07] Megan: Yeah, a a lot of people have asked me over the years why, why MediaWiki when it always feels like I'm jumping through so many hoops. Um, I mean, when I just described the caching thing to you, and that's just like one of, I don't know, dozens of struggles that I've had where, MediaWiki has gotten in the way of what I need to do.

Because really Leaguepedia is an entire software layer on top of MediaWiki, and so you might ask why. Why MediaWiki? Why not just build the software layer on top of something easier? And my answer is always, it's about the community. MediaWiki lends itself so well to community and people enjoy contributing to wikis and wikis. Wikis are just kind of synonymous with community, and they always have been. And Wikipedia sort of set the example when they launched, and it's sort of always been that way. And, you know, I feel like I'm a part of a community when I say a Wiki. And if it was just if it were a custom site that had the ability to contribute to it, you know, it just feels like it's not the same.

[01:12:33] Jeremy: I think just even seeing the edit button on Wikis is such a different experience than having the expectation, well, I guess in the case of Leaguepedia, you do have to create an account, but even without creating the account, you can still click edit and you can look at the source and you can see how all this information, or a lot of it, how it got filled in.

And I feel like it's kind of more similar to the earlier days of webpages where people could right click a site and click view source and then look at the HTML and the css, and kind of see how it was put together. versus, now with a lot of sites, the, the code has been minified or there's build tools involved so that when you look at view source on websites, it just looks crazy and you're not sure what's going on.

So I, I, I feel like wikis in some ways are, kind of closer to the, the spirit of, like the earlier H

T M L sites. Yeah.

[01:13:46] Megan: And the knowledge transfers too. If you've edit, if you've, if you've ever edited Wikipedia, then you know that like open bracket, open bracket, closed bracket. Closed bracket is how you link a page. and that knowledge transfers to admittedly maybe a little bit less so for Leaguepedia, since there, you need to know how all the templates work and there's not so much direct source editing.

it's mostly like clicking the CharInsert prefills. but there's still a lot of cross knowledge transfer, if you've edited one wiki and then change to editing another. And then it goes the other way too. If you edit Leaguepedia, then you want to go at it for the Zelda Wiki, that knowledge will transfer.

[01:14:38] Jeremy: And, and talking about the community and the editors. I, I imagine on Wikipedia, most of the people editing are volunteers. Is it the same with Leaguepedia in your experience?

[01:14:55] Megan: Um, yeah, so I was contracted, uh, or I was not contracted. My LLC was contract and then I subcontracted. Um, it changed a bit over the years, um, as people left. Uh, so at first I subcontracted quite a few people. Um, and then I guess, as you can imagine, as, there was a lot more data entry that had to be done at the start.

And less had to be done later on, as I, expanded the code base so that it was more a single source of truth, and less stuff had to be duplicated. And I guess it was, it probably became a lot more fun too, uh, when you didn't have to edit, enter the same thing multiple times. but, uh, a bunch of people, uh, moved on over the years.

and so by the end I was only subcontracting, three people. Um, and everyone else was volunteer.

[01:15:55] Jeremy: And and the people that you were subcontracting, that was for only data entry, or was that also for the actual code?

[01:16:05] Megan: No, that wasn't for data entry at all. Um, and actually that was for all of my wikis, uh, because I was. Managing like all of the eSports wikis. or one of them was for Call of Duty and Halo, uh, to manage those wikis. One of them was for, uh, just the Call of Duty Wiki. and then one of them was for Leaguepedia to do staff onboarding.

[01:16:28] Jeremy: okay. So this is, um, this is to help people contribute to all of these wikis. That's, that's what these, these, uh, subcontractors we're focusing on.

[01:16:41] Megan: Yeah,

[01:16:44] Jeremy: I guess that, that makes sense when we've been talking about the complexity, uh, what's behind Leaguepedia, but there's a lot that the editors, it sounds like, have to learn as well to be able to know basically where to go and how to fill everything out and Yeah.

[01:17:08] Megan: So basically, for the major leagues, in League of Legends, um, we required some onboarding before you could cover them because we wanted results entered within like, about one to four minutes. of the game centering, or sorry, of the games ending. Um, so that was like for North America, Korea, China, Europe, and for the, like for some regions, like the really minor ones, like second tier leagues in, like for example the national leagues in Europe, second tier or something, we kind of didn't really care if it was entered immediately.

And so anyone who wanted to enter could just enter, uh, information. So we did want the experience to be easy enough that people could figure it out on their own. and we didn't really, uh, require onboarding for that. There was like a gradation of how much onboarding we required. But typically we tried to give training as much as we could.

Um, it, it was sort of dependent on how fast people expected the results and how available someone was to provide training. so like for Latin America, there was like a lot of people who were available to provide trainings. So even like the more minor leagues, people got training there. for example, But yeah, it was, it was very collaborative.

and a lot of people, a lot of people got involved, so, yeah.

[01:18:50] Jeremy: And in terms of having this expectation of having the results in, in just a few minutes and things like that, is it, where are, are these people volunteers where they would volunteer for a slot and then there was just this expectation? Or how did that work?

work

[01:19:09] Megan: Yeah. So, um, a lot of people volunteered with us as resume experience to try and get jobs in eSports. Um, and some people just volunteered with us because they wanted to give back to the community because, we're like a really valuable resource for the community. And I mean, without volunteer contribution we wouldn't have existed.

So it was like understood that we needed people's help in order to continue existing. So some people, volunteered for that reason. Some people just found it fun to help out. so there's like a range of reasons to contribute.

[01:19:46] Jeremy: And, and you were talking about how there's some people who they, they really need this data in, in that short time span. you know, who, who are we talking about here? Are these like commentators? Are these journalists? I'm just curious who's, who's,

looking for this in such a short time span

[01:20:06] Megan: Well, fans would look for the data immediately. sometimes if we entered a wrong result, someone would like come into our discord and be like, Hey, the result of this is wrong. you know, within seconds of the wrong result going up. So we knew that people were like looking at the Wiki, like immediately.

But everyone used the data, commentators at Riot. journalists. Fans, yeah. like everyone is using it.

[01:20:33] Jeremy: and since it's so important to, like you're mentioning Riot or the tournament organizers, things like that. What kind of relationship do you have with them? Do they provide any kind of support or is it mostly just, it's something they just use

[01:20:54] Megan: I, so there is, um, I definitely talk to people at Riot pretty regularly. and we. we got like resources from them, so, they'd give us player photos to put up, and like answers to questions and stuff. but for the most part it was just something that they'd use.

[01:21:15] Jeremy: and, and so like now that unfortunately your, your contract wasn't renewed with Leaguepedia like where do you, I guess see the, the future of Leaguepedia but, but also all these other eSports wikis going, is this something that's gonna be more just community driven or, I'm, I guess I'm trying to understand, you know, how this, the gap gets filled.

[01:21:47] Megan: Yeah, I'm, I'm not sure. Um, they're doing an update to Media Wiki 1.39 next year. we'll see if stuff majorly breaks during that. probably no one's gonna be able to fix it if it does. Who knows? (laughs) um, yeah, I don't know. There's another site that hosts, uh, eSports wikis called Liquipedia

um, so it's possible that they'll adopt some of the smaller wikis. Um, I think it's pretty unlikely that they'll want to take Leaguepedia, um, just because it's too complicated of a wiki. but yeah, I, I, I don't know.

[01:22:31] Jeremy: it kind of feels like one of these things where I guess whoever is in charge of making these decisions may not fully understand the implications or, or what it takes to, to run such a, a large wiki. yeah, I guess it'll be interesting to, to see if it ends up being like you said, one, one big mess.

[01:22:58] Megan: Yeah. I got them through the 1.37 upgrade by submitting like three or four patches to cargo, during that time and discovering that the patches needed to be made prior to the upgrade happening. So, you know, I don't think that they're going to update cargo during the 1.39 upgrade and it's cargo changes that have the biggest disruption.

So they're probably safe from that. and, and I don't think 1.39 has any big parser changes. I think that's later, but yeah, there'll probably still be like a bunch of CSS changes and who knows if anyone's going to fix the follow up from that.

So, yeah, we'll see.

[01:23:46] Jeremy: Yeah, that's, um, that's kind of interesting to know too that, these upgrades to MediaWiki and, and to extensions like cargo, that they change so significantly that they require pull requests. Is that, like, is that pretty common in terms of when you do an upgrade of a MediaWiki that there there are these individual things you need to do and it's not just update package.

[01:24:18] Megan: well the cargo change was the first time that we had upgraded in like two and a half years or something. so that one in particular, I think it was expected that that one wasn't going to go so smoothly. generally updates go not that badly. I say with rising intonation, (laughs) , um, if you keep up to date with stuff, it's generally pretty okay.

Cargo is probably one of the less stable ones just because it's a relatively small contributor base, and so kind of crazy things happen sometimes. Um, Semantic Media Wiki is a lot more stable. Uh, but then the downside is that if you have a feature request for SMW it's harder to get pushed through.

But cargo still changes a lot. The big change with cargo, like the big problematic change with cargo was a tiny bug fix that just so happened to change every empty string value to nil in Lua,

You know, no big deal or anything, whatever.

[01:25:42] Jeremy: That, that's, uh, that's a good one right there.

[01:25:47] Megan: I mean,

I I don't know how no one noticed this for like a year and a half or something man,

It was a tiny bug fix.

[01:26:02] Jeremy: Mm.

[01:26:03] Megan: Like it was checked in as a bug fix and it really was a bug fix. I tracked down the guy who made the patch and I was like, I can't reproduce this bug. Can I just revert it? And he was like, I can't reproduce it either.

[01:26:21] Jeremy: Oh, wow. (laughs)

[01:26:23] Megan: And I was like, well, that's great. And I ended up just leaving it in, but then changing them back to empty string.

Um, when the extension was first released, null database values were sent to Lua as empty string due to a bug in the first place. Because null databases, null database values should just be nil in Lula. Like, come on here, . But they were sent as empty string.

And so for like five years, people were writing code, assuming that you would never get a nil value for anything that you requested from the database. So you can't make a breaking change like that without putting a config value that defaults to true.

[01:27:10] Jeremy: Yeah.

[01:27:11] Megan: So I added a legacy, nil value, legacy Lua, nil value as empty string config value or something, and, defaulted it to true and wrote in the documentation that it was recommended that you set it to false.

Or maybe I defaulted it to false. I, I don't remember what I set the default to, but I wrote in the documentation something about how you should, if possible, set this to false, but if you have a large code base, you probably need this . And then we set up Platform Ride to True, and that's the story of how I saved the shit out of our 1.37 upgrade this year.

[01:27:57] Jeremy: Oh yeah, that's, um, that's a rough one. Changing, changing people's data is very scary.

[01:28:05] Megan: Yeah, I mean, it was totally unintended. and I don't know how no one noticed this either. I mean, I guess the answer is that not very many people do the kind of stuff that I do working with Lua and Cargo in this much depth. but a fairly significant number of fandom Wikis do, and this would've just been an absolute disaster.

And the semi ironic thing is that, I, I have a wrapper that fixes the initial cargo bug where I detect every empty string value and then cast it to nil after I get my data from cargo. So I would've been completely unaffected by this. And my wiki was the primary testing wiki for cargo on the 1.37 patch. So we wouldn't have caught this, it would've gone to live

[01:28:56] Jeremy: Wow.

[01:28:58] Megan: So we got extremely lucky that I found out about this ahead of time prior to us QAing and fixed this bug

because it would've gone straight to live.

[01:29:10] Jeremy: that's wild yeah, it's just like kind of catastrophic, right? It's like, if it happens, I feel like whoever is managing the wikis is gonna be very confused. Like, why, why is everything broken? I don't, I don't understand.

[01:29:25] Megan: Right? And this is like so much broken stuff that it's like very difficult to track down what's going on. I actually had a lot of trouble figuring out what was wrong in the code base.

Causing this error. And I submitted an incorrect patch at first, and then the incorrect patch got merged, and then I had to like roll back the incorrect patch.

And then I got a merge conflict on the incorrect patch. And it, it was, it was bad. It took me three patches to get this right.

Um, But eventually, eventually I got there.

[01:30:02] Jeremy: Yeah. that's software, I guess ,

[01:30:06] Megan: Yeah.

[01:30:07] Jeremy: the, the, the thing you were trying to avoid all these years.

[01:30:10] Megan: Yeah,

[01:30:13] Jeremy: you're in it now.

[01:30:14] Megan: It really was, that was actually the reason that I went in, I got into the Wiki in the first place, um, and into e-sports. Uh, was that after Caltech, I wanted to like get away from STEM altogether. I was like, I've had enough of this. Caltech was too much, get me away, (laughs) .

And I wanted to do like event management or tournament organization or something.

And so I wanted to work in eSports. and that was like my life plan. And I wanted nothing to do with STEM and I didn't wanna work in software. I didn't wanna do math. I was a math major. I didn't wanna do math. I didn't wanna go to grad school. I wanted absolutely nothing to do with this. So that was my plan.

And somehow I stumbled and ended up in software.

[01:31:02] Jeremy: Well, at least you got the eSports part.

[01:31:05] Megan: Yeah, so that, that worked out. And really for the first couple of years I was doing like community management and social media and stuff.

Um, and I did stay away from software for about the first two years, so it lasted about two whole years.

[01:31:24] Jeremy: What ended up pulling you in?

[01:31:26] Megan: Um, actually, so when, when I signed back with Gamepedia, our developer just sort of disappeared and I was like, well, shit, I guess that's me now. (laughs)

So we had someone else writing all of our templates for a couple years, so I was able to just like make a lot of feature requests. and I'm very good at making feature requests.

If, if I ever have like, access to someone else who's writing code for me, I'm like, fantastic at just making a ton of like really minor feature requests, and just like taking off all of their time with like a billion tiny QA issues.

[01:32:09] Jeremy: You you are the backlog,

[01:32:12] Megan: Yeah, I really, um, I, there's another OSS project that I've been working on, um, which is a Discord bot and. We, our, our backlog just expands and expands and

[01:32:26] Jeremy: Oh yeah. You know what, I, I think I did look at that. I, I looked at the issues and, usually when you look at a, the issues for an open source project, it's, it's all these people using it, right? That are like, uh, there's this thing I want, but then I looked and it was all, it was all you. So I guess that's okay cuz you're, you're in the position to do something about it.

[01:32:47] Megan: The, the part that you don't know is that I'm like constantly begging other people to open tickets too.

[01:32:53] Jeremy: Really?

[01:32:55] Megan: Yeah. Like constantly. I'm like, guys, it can't just be me opening tickets all the time.

[01:33:04] Jeremy: Yeah. Yeah. If it was, if it was someone else's project, I would be like, oh, this is, uh, .

I don't know about this. But when it's your own, you know, okay. It's, it's basically like, um, it's like a roadmap I guess.

[01:33:20] Megan: Yeah. Some of them have been open for, for quite a long time, but actually a couple months ago we closed one that had been open since, I think like April, 2020.

[01:33:31] Jeremy: Oh, nice.

[01:33:32] Megan: That was quite an event.

[01:33:34] Jeremy: Yeah, it's open source, So you can do whatever you want, right. (laughs)

[01:33:41] Megan: We even have a couple good first issues that are actually good first issues.

[01:33:46] Jeremy: Yeah. Not, not getting any takers?

[01:33:49] Megan: No, we sometimes do. Yeah. I actually, we, so some of them are like semi-important features, but I like feel really bad if I ever do the good first issues myself because like somewhere else could do them. And so like, if it's like a one line ticket, I would just, I feel so much guilt for doing it myself.

[01:34:09] Jeremy: Oh, I see what you mean.

[01:34:10] Megan: I'm like, Yeah. so I just like, I can't do them. But then I'm like, oh, but this is really important. But then I'm like, oh, but we might get someone else who, and I just, I never know if I should just take the plunge and do it myself, so.

[01:34:22] Jeremy: yeah. No, that's, that's a good point. It's, it's like, like these opportunities, right. For people to, and it could, it could make a big difference for them. And then for you, it's like, I could do this in 10 minutes or whatever. ,

Uh, I, I guess it all depends on how annoyed you are by the thing not being there,

[01:34:43] Megan: Right. I know because my entire background is like community and getting new people to onboard and like the potential new contributor is worth like 10 times, like, The one PR that I can make. So I should just like absolutely leave it open for the next year.

[01:35:02] Jeremy: Yeah. Yeah, no, that's a, that's a good way of, of looking at it. I mean, I I think when you talk about open source or, or even wikis, that that sort of community aspect is, is so, so important, right? Because if it's just, if it's just one person, then I mean, it kind of, it lives or dies with the one person, right?

It, it's, it's so different when you actually get a bunch of people involved. And I think that's something like a lot of, a lot of projects struggle with

[01:35:38] Megan: Yeah. That's actually, as much as I'm like bitter about the fact that I was let go from my own project, I think the thing that I should, in a sense be the most proud of is that I grew my project to a place where that was able to happen in a sense. Like, I built this and I built it to a place where it was sustainable.

Although, we'll see how sustainable it was, (laughs) . but like I'm not needed for the day to day. and that means that like I successfully built a community.

[01:36:18] Jeremy: Yeah, no, you should be really proud about that because it's, it's not only like the, the code, right? Like over the years it sounds like you gradually made it easier and easier to contribute, but then also being able to get all these volunteers together and build a community on the discord and, and elsewhere.

Yeah, no, I think that's, I think that's really great to be able to, to do, do something like that.

[01:36:50] Megan: Thanks.

[01:36:53] Jeremy: I think that's, that's a good place to, to wrap up, but is there anything else you wanted to, to mention or do you want to tell people where to check out, uh, what you're up to?

[01:37:05] Megan: Yeah, I, I have a blog that's a little bit inactive for the past couple months, because I recently had surgery, but I, I've been saying for like five weeks that I will start, posting there again. So hopefully that happens soon. Uh, but it's river.me, and so you can check that out.

[01:37:27] Jeremy: Cool. Well, yeah, Megan, I just wanna say thanks for, for taking the time. This was, this was really interesting. the world of wikis is like this, it's like a really big part of the internet that, um, I use wikis, but I, I've never really understood kind of what's going on in, in terms of the actual technology and the community. so so thank you for, for sharing that.

[01:37:53] Megan: Yeah. Thanks so much for having me.

Victor Adossi on Yak Shaving

Mon, 02 Jan 2023 06:07:12 -0000

Victor is a software consultant in Tokyo who describes himself as a yak shaver. He writes on his blog at vadosware and curates Awesome F/OSS, a mailing list of open source products. He's also a contributor to the Open Core Ventures blog.

Before our conversation Victor wrote a structured summary of how he works on projects. I recommend checking that out in addition to the episode.

Topics covered:

Most people should use Dokku or CapRover
But he uses Kubernetes anyways
Hosting a Database in Kubernetes
Learning technology
You don't really know a thing until something goes wrong
History of Frontend Development
Context from lower layers of the stack and historical projects
Good project pages have comparisons to other products
Choosing technologies
Language choice affects maintainability
Knowing an ecosystem
Victor's preferred stack
Technology bake offs
Posting findings means you get free corrections
Why people use medium instead of personal sites

Victor

VADOSWARE - Blog
How Victor works on Projects - Companion post for this episode
Awesome FOSS - Curated list of OSS projects
NimbusWS - Hosted OSS built on top of budget cloud providers
Unvalidated Ideas - Startup ideas for side project inspiration
PodcastSaver - Podcast index that allows you to choose Postgres or MeiliSearch and compare performance and results of each

Victor's preferred stack

Docker - Containers
Kubernetes - Container provisioning (Though at the beginning of the episode he suggests Dokku for single server or CapRover for multiple)
TypeScript - JavaScript with syntax for types. Victor's default choice.
Rust - Language he uses if doing embedded work, performance is critical, or more correctness is desired
Haskell - Language he uses if correctness and type system is the most important for the project
Postgresql - General purpose database that's good enough for most use cases including full text search.
KeyDB - Redis compatible database for caching. Acquired by Snap and then made open source. Victor uses it over Redis because it is multi threaded and supports flash storage without a Redis Enterprise license.
Pulumi - Provision infrastructure with the languages you're already using instead of a specialized one or YAML
Svelte and SvelteKit - Preferred frontend stack. Previously used Nuxt.

Search engines

Postgres Full Text Search vs the rest
Optimizing Postgres Text Search with Trigrams
OpenSearch - Amazon's fork of Elasticsearch
typesense
meilisearch
sonic
Quickwit

JavaScript build tools

JavaScript frameworks

Frameworks built on top of frameworks

Next - React
Nuxt - Vue
SvelteKit - Svelte
Astro - Multiple

Historical JavaScript tools and frameworks

Underscore
jQuery
MooTools
Backbone
AngularJS
Knockout
Aurelia
GWT
Bower - Frontend package manager
Grunt - Task runner
Gulp - Task runner

Transcript

You can help edit this transcript on GitHub.

[00:00:00] Jeremy: This episode, I talk to Victor Adossi who describes himself as a yak shaver. Someone who likes trying a whole bunch of different technologies, seeing the different options. We talk about what he uses, the evolution of front end development, and his various projects.

Talking to just different people it's always good to get where they're coming from because something that works for Google at their scale is going to be different than what you're doing with one of your smaller projects.

[00:00:31] Victor: Yeah, the context. Of course in direct conflict with that statement, I definitely use Google technology despite not needing to at all right? Like, you know, 99% of people who are doing like people like to call it indiehacking or building small products could probably get by with just Dokku. If you know Dokku or like CapRover. Are two projects that'll be like, Oh, you can just push your code here, we'll build it up like a little mini Heroku PaaS thing and just go on one big server, right? Like 99% of the people could just use that. But of course I'm not doing that. So I'm a bit of a hypocrite in that sense.

I know what I should be doing, but I'm not doing that. I am writing a Kubernetes cluster with like five nodes for no reason. Uh, yeah, I dunno, people don't normally count the controllers.

[00:01:24] Jeremy: Dokku and CapRover, I think those are where it's supposed to create a heroku like experience I think it's based off of the heroku buildpacks right? At least Dokku is?

[00:01:36] Victor: Yeah Buildpacks has actually been spun out into like a community thing so like pivotal and heroku, it's like buildpacks.io, they're trying to build a wider standard around it so that more people can get involved.

And buildpacks are actually obviously fantastic as a technology and as a a process piece. There's not much else like them and you know, that's obvious from like Heroku's success and everything. I know Dokku uses that. I don't know that Caprover does, but I haven't, I haven't really run Caprover that much.

They, they probably do. Like at this point if you're going to support building from code, it seems silly to try and build your own buildpacks. Cause that's what you will do, eventually. So you might as well use what's there.

Anyway, this is like just getting to like my personal opinions at this point, but like, if you think containers are a bad idea in 2022, You're wrong, you should, you should stop. Like you should, you should stop. Think about it. I mean, obviously there's not, um, I got a really great question at an interview once, which is, where are containers a bad idea?

That's probably one of the best like recent interview questions I've ever gotten cause I was like, Oh yeah, I mean, like, you can't, it can't be perfect everywhere, right? Nothing's perfect everywhere. So it's like, where is it? Uh, and of course the answer was networking, right? (unintelligible)

So if you need absolute performance, but like for just about everything else. Containers are kind of it at this point. Like, time has born it out, I think. So yeah, I always just like bias at taking containers at this point. So I'm probably more of a CapRover person than a Dokku person, even though I have not used, I don't use CapRover.

[00:03:09] Jeremy: Well, like something that I've heard with containers, and maybe it's changed recently, but, but something that was kind of holdout was when people would host a database sometimes they would oh we just don't wanna put this in a container and I wonder if like that matches with your thinking or if things have changed.

[00:03:27] Victor: I am not a database administrator right like I read postgres docs and I read the, uh, the Postgres documentation, and I think I know a bit about postgres but I don't commit right like so and I also haven't, like, oh, managed X terabytes on one server that you are making sure never goes down kind of deal.

But the stickiness for me, at least from when I've run, So I've done a lot of tests with like ZFS and Postgres and like, um, and also like just trying to figure out, and I run Postgres in Kubernetes of course, like on my cluster and a lot of the stuff I found around is, is like fiddly kernel things like sort of base kernel settings that you need to have set.

Like, you know, stuff like should you be using transparent huge pages, like stuff like that. But once you have that settled. Containers are just processes with name spacing and resource control, right? Like, that's it. there are some other ins and outs, but for the most part, if you're fine running a process, so people ran processes, right?

And they were just completely like unprotected. Then people made users for the processes and they limited the users and ran the processes, right? Then the next step is now you can run a process and then do the limiting the name spaces in cgroups dynamically. Like there, there's, there's sort of not a humongous difference, unless you're hitting something very specific.

Uh, but yeah, databases have been a point of contention, but I think, Kelsey Hightower had that tweet yeah. That was like, um, don't run databases in Kubernetes. And I think he called it back.

[00:04:56] Victor: I don't know, but I, I know that was uh, was one of those things that people were really unsure about at first, but then after people sort of like felt it out, they were like, Oh, it's actually fine. Yeah.

[00:05:06] Jeremy: Yeah I vaguely remember one of the concerns having to do with persistent storage. Like there were challenges with Kubernetes and needing to keep that storage around and I don't know if that's changed yeah or if that's still a concern.

[00:05:18] Victor: Uh, I'd say that definitely has changed. Uh, and it was, it was a concern, depending on where you were. Mostly people who are running AKS or EKS or you know, all those other managed Kubernetes, they're just using EBS or like whatever storage provider is like offering for storage.

Most of those people don't actually have that much of a problem with, storage in general.

Now, high performance storage is obviously different, right? So like, so you'll, you're gonna have to start doing manual, like local volume management and stuff like that. it was a problem, because obviously CSI (Kubernetes Container Storage Interface) didn't exist for some period of time, and like there was, it was hard to know what to do for if you were just running a Kubernetes cluster. I think a lot of people were just using local, first of all, local didn't even exist for a bit.

Um, they were just using host path, right? And just like, Oh, it's on the disk somewhere. Where do we, we have to go get it right? Or we have to like, sort of manage that. So that was something most people weren't ready for, especially if you were just, if you weren't like sort of a, a, a traditional sysadmin and used to doing that stuff.

And then of course local volumes came out, but I think they still had to be, um, pre-provisioned. So that's sysadmin stuff that most people, you know, maybe aren't, aren't necessarily ready for. Uh, and then most of the general solutions were slow. So like, I used Longhorn (https://longhorn.io) for a long time and Longhorn, Longhorn's great. And super easy to set up, but it can be slower and you can have some, like, delays in mount time. it wasn't ideal for, for most people.

So yeah, I, overall it's true. Databases, Databases in Kubernetes were kind of fraught with peril for a while, but it wasn't for the reason that, it wasn't for the fundamental reason that Kubernetes was just wrong or like, it wasn't the reason most people think of, which is just like, Oh, you're gonna break your database.

It's more like, running a database is hard and Kubernetes hasn't solved all the hard problems. Like, cuz that's what Kubernetes does. It basically solves a lot of problems in a very generic way. Right. So it just hadn't solved all those problems yet at this point. I think it's got decent answers on a lot of them.

So I, I mean, I don't know. I I do it. Don't, don't take what I'm saying to your, you know, PM meeting or your standup meeting, uh, anyone who's listening. But it's more like if you could solve the problems with databases in the sense before. You could probably solve 'em on Kubernetes now with a good understanding of Kubernetes.

Cause at the end of the day, it's all the same stuff. Just Kubernetes makes it a little easier to, uh, do it dynamically.

[00:07:50] Jeremy: It sounds like you could do it before, but some of the, I guess the tools or the ways of doing persistent storage were not quite there yet, or they were difficult to use. And so that was why people at the start were like, Okay, maybe it's not a good idea, but, now maybe there's some established practices for how you should run a database in Kubernetes.

And I, I suppose the other aspect too is that, like you were saying, Kubernetes is its own thing. You gotta learn Kubernetes and all its intricacies. And then running a database is also its own challenge. So if you stack the two of them together and, and the path was not really clear then maybe at the start it wasn't the best idea. Um, uh, if somebody was going to try it out now, was there like a specific resource you looked at or a specific path to where like okay this is is how I'm going to do it.

[00:08:55] Victor: I'll just say what I normally recommend to everybody.

Cause it depends on which path you wanna go right? If you wanna go down like running a database path first and figure that out, fill out that skill tree. Like go read the Postgres docs.

Well, first of all, use Postgres. That's the first tip there. But like, read those documents. And obviously you don't have to understand everything. You won't understand everything. But knowing the big pieces and sort of letting your brain see the mention of like a whole bunch of things, like what is toast?

Oh, you can do compression on columns. Like, you can do some, some things concurrently. Um, you know, what ALTER TABLE looks like. You get all that stuff kind of in your head. Um, and then I personally really believe in sort of learning by building and just like iterating. you won't get it right the first time. It's just like, it's not gonna happen. You're get, you can, you can get better the first time, right? By being really prepared and like, and leave yourself lots of outs, but you kind of have to like, get it out there. Do do your best to make sure that you can't fail, uh, catastrophically, right?

So this is like, goes back to that decision to like use ZFS as the bottom of this I'm just like, All right, well, I, I'm not a file systems expert, but if I. I could delegate some of that, you know, some of that, I can get some of that knowledge from someone else. Um, and I can make it easier for me to not fail catastrophically.

For the database side, actually read documentation on Postgres or the whatever database you're going to use, make sure you at least understand that. Then start running it like locally or whatever. Again, Docker use, use Docker locally.

It's, it's, it's fine. and then, you know, sort of graduate to running sort of more progressively, more complicated versions. what I would say for the Kubernetes side is actually similar. the Kubernetes docs are really good. they're very large. but they're good.

So you can actually go through and know all the, like, workload, workload resources, know, like what a config map is, what a secret is, right? Like what etcd is doing in this whole situation. you know, what a kublet is versus an API server, right? Like the, the general stuff, like if you go through all that, you should have like a whole bunch of ideas at least floating around in your head. And then once you try and start setting up a server, they will all start to pop up again, right? And they'll all start to like, you, like, Oh, okay, I need a CNI (Container Networking) plugin because something needs to make the services available, right? Or something needs to power the ingress, right? Like, if I wanna be able to get traffic, I need an ingress object.

But what listens, what does that, what makes that ingress object do anything? Oh, it's an ingress controller. nginx, you know, almost everyone's heard of nginx, so they're like, okay. Um, nginx, has an ingress control. Actually there's, there used to be two, I assume there's still two, but there's like one that's maintained by Kubernetes, one that's maintained by nginx, the company or whatever.

I use traefik, it's fantastic. but yeah, so I think those things kind of fall out and that is almost always my first way to explain it and to start building. And tinkering iteratively. So like, read the documentation, get a good first grasp of it, and then start building yourself because you'll, you'll get way more questions that way.

Like, you'll ask way more questions, you won't be able to make progress. Uh, and then of course you can, you know, hop into slacks or like start looking around and, and searching on the internet. oh, one of the things that really helped me out early learning Kubernetes was, Kelsey Hightower's, um, learn Kubernetes the hard way. I'm also a big believer in doing things the hard way, at least knowing what you're choosing to not know, right? distributing file system, Deltas, right? Or like changes to a file system over the network is not a new problem. Other people have solved it. There's a lot of complexity there. but if you at least know the sort of surface level of what the thing does and what it's supposed to do and how it's supposed to do it, you can make a decision on, Oh, how deep am I going to go?

Right? To prevent yourself from like, making a mistake or going too deep in the rabbit hole. If you have an idea of the sort of ecosystem and especially like, Oh, here, like the basics of how I can use this thing, that's generally very good. And doing things the hard way is a great way to get a, a feel for that, right?

Cause if you take some chunk and like, you know, the first level of doing things the hard way, uh, or, you know, Kelsey Hightower's guide is like, get a machine, right? Like, so, like, if you somehow were like, Oh, I wanna run a Kubernetes cluster. but, you know, I don't want use necessarily EKS and you wanna learn it the hard way.

You have to go get a machine, right? If you, if you're not familiar, if you run on Heroku the whole time, like you didn't manage your own machines, you gotta go like, figure out EC2, right? Or, I personally use, hetzner I love hetzner, so you have to go figure out hetzner, digital ocean, whatever.

Right. And then the next thing's like, you know, the guide's changed a lot, and I haven't, I haven't looked at it in like, in years, actually a while since I, since I've sort of been, I guess living it, but it's, it's like generate certificates, right? So if you've never dealt with SSL and like, sort of like, or I should say TLS uh, and generating certificates and how that whole dance works, right?

Which is fascinating because it's like, oh, right, nothing's secure on the internet, except that we distribute root certificates on computers that are deployed in every OS, right? Like, that's a sort of fundamental understanding you may not go deep enough to realize, but if you are fascinated by it, trying to do it manually would lead you down that path.

You'd be like, Oh, what, like what is this thing? What is a CSR? Like, why, who is signing my request? Right? And it's like, why do we trust those people? Right? And it's like, you know, that kind of thing comes out and I feel like you can only get there from trying to do it, you know, answering the questions you can.

Right. And again, it takes some judgment to know when you should not go down a rabbit hole. uh, and then iterating. of course there are people who are excellent at explaining. you can find some resources that are shortcuts. But, uh, I think particularly my bread and butter has been just to try and do it the hard way.

Avoid pitfalls or like rabbit holes when you can. But know that the rabbit hole is there, and then keep going. And sometimes if something's just too hard, you're not gonna get it the first time. Like maybe you'll have to wait like another three months, you'll try again and you'll know more sort of ambiently about everything else.

You get a little further that time. that's how I feel about that. Anyway.

[00:15:06] Jeremy: That makes sense to me. I think sometimes when people take on a project, they try to learn too many things at the same time. I, I think the example of Kubernetes and Postgres is pretty good example, where if you're not familiar with how do I install Postgres on bare metal or a vm, trying to make sense of that while you're trying to into is probably gonna be pretty difficult.

So, so splitting them up and learning them individually, that makes a lot of sense to me. And the whole deciding how deep you wanna go. That's interesting too, because I think that's very specific to the person right because sometimes you wanna go a little deeper because otherwise you don't understand how the two things connect together.

But other times it's just like with the example with certificates, some people they may go like, I just put in let's encrypt it gives me my cert I don't care right then, and then, and some people they wanna know like okay how does the whole certificate infrastructure work which I think is interesting, depending on who you are, maybe you go ahh maybe it doesn't really matter right.

[00:16:23] Victor: Yeah, and, you know, shout out to Let's Encrypt . It's, it's amazing, right? think Singlehandedly the most, most of the deployment of HTTPS that happens these days, right? so many so many of like internet providers and uh, sort of service providers will use it right?

Under the covers. Like, Hey, we've got you free SSL through Let's Encrypt, right? Like, kind of like under the, under the covers. which is awesome. And they, and they do it. So if you're listening to this, donate to them. I've done it. So now that, now the pressure is on whoever's listening, but yeah, and, and I, I wanna say I am that person as well, right?

Like, I use, Cert Manager on my cluster, right? So I'm just like, I don't wanna think about it, but I, you know, but I, I feel like I thought about it one time. I have a decent grasp. If something changes, then I guess I have to dive back in. I think it, you've heard the, um, innovation tokens idea, right?

I can't remember the site. It's like, um, do, like do boring tech or something.com (https://boringtechnology.club/) . Like it shows up on sort of hacker news from time to time, essentially. But it's like, you know, you have a certain amount of tokens and sort of, uh, we'll call them tokens, but tolerance for complexity or tolerance for new, new ideas or new ways of doing things, new processes.

Uh, and you spend those as you build any project, right? you can be devastatingly effective by just sticking to the stack, you know, and not introducing anything new, even if it's bad, right? and there's nothing wrong with LAMP stack, I don't wanna annoy anybody, but like if you, if you're running LAMP or if you run on a hostgator, right?

Like, if you run on so, you know, some, some service that's really old but really works for you isn't, you know, too terribly insecure or like, has the features you need, don't learn Kubernetes then, right? Especially if you wanna go fast. cuz you, you're spending tokens, right? You're spending, essentially brain power, right?

On learning whatever other thing. So, but yeah, like going back to that, databases versus databases on Kubernetes thing, you should probably know one of those before you, like, if you're gonna do that, do that thing. You either know Kubernetes and you like, at least feel comfortable, you know, knowing Kubernetes extremely difficult obviously, but you feel comfortable and you feel like you can debug.

Little bit of a tangent, but maybe that's even a better, sort of watermark if you know how to debug a thing. If, if it's gone wrong, maybe one or five or 10 or 20 times and you've gotten out. Not without documentation, of course, cuz well, if you did, you're superhuman.

But, um, but you've been able to sort of feel your way out, right? Like, Oh, this has gone wrong and you have enough of a model of the system in your head to be like, these are the three places that maybe have something wrong with them. Uh, and then like, oh, and then of course it's just like, you know, a mad dash to kind of like, find, find the thing that's wrong.

You should have confidence about probably one of those things before you try and do both when it's like, you know, complex things like databases and distributed systems management, uh, and orchestration.

[00:19:18] Jeremy: That's, that's so true in, in terms of you are comfortable enough being able to debug a problem because it's, I think when you are learning about something, a lot of times you start with some kind of guide or some kind of tutorial and you follow the steps. And if it all works, then great.

Right? But I think it's such a large leap from that to something went wrong and I have to figure it out. Right. Whether it's something's not right in my Dockerfile or my postgres instance uh, the queries are timing out. so many things that could go wrong, that is the moment where you're forced to figure out, okay, what do I really know about this not thing?

[00:20:10] Victor: Exactly. Yeah. Like the, the rubber's hitting the road it's uh you know the car's about to crash or has already crashed like if I open the bonnet, do I know what's happening right or am I just looking at (unintelligible).

And that's, it's, I feel sort a little sorry or sad for, for devs that start today because there's so much. Complexity that's been built up. And a lot of it has a point, but you need to kind of have seen the before to understand the point, right? So I like, I like to use front end as an example, right? Like the front end ecosystem is crazy, and it has been crazy for a very long time, but the steps are actually usually logical, right?

Like, so like you start with, you know, HTML, CSS and JavaScript, just plain, right? And like, and you can actually go in lots of directions. Like HTML has its own thing. CSS has its own sort of evolution sort of thing. But if we look at JavaScript, you're like, you're just writing JavaScript on every page, right?

And like, just like putting in script tags and putting in whatever, and it's, you get spaghetti, you get spaghetti, you start like writing, copying the same function on multiple pages, right? You just, it, it's not good. So then people, people make jquery, right? And now, now you've got like a, a bundled set of like good, good defaults that you can, you can go for, right?

And then like, you know, libraries like underscore come out for like, sort of like not dom related stuff that you do want, you do want everywhere. and then people go from there and they go to like backbone or whatever. it's because Jquery sort of also becomes spaghetti at some point and it becomes hard to manage and people are like, Okay, we need to sort of like encapsulate this stuff somehow, right?

And like the new tools or whatever is around at the same timeframe. And you, you, you like backbone views for example. and you have people who are kind of like, ah, but that's not really good. It's getting kind of slow.

Uh, and then you have, MVC stuff comes out, right? Like Angular comes out and it's like, okay, we're, we're gonna do this thing called dirty checking, and it's gonna be, it's gonna be faster and it's gonna be like, it's gonna be less sort of spaghetti and it's like a little bit more structured. And now you have sort of like the rails paradigm, but on the front end, and it takes people to get a while to get adjusted to that, but then that gets too heavy, right?

And then dirty checking is realized to be a mistake. And then, you get stuff like MVVM, right? So you get knockout, like knockout js and you got like Durandal, and like some, some other like sort of front end technologies that come up to address that problem. Uh, and then after that, like, you know, it just keeps going, right?

Like, and if you come in at the very end, you're just like, What is happening? Right? Like if it, if it, if someone doesn't sort of boil down the complexity and reduce it a little bit, you, you're just like, why, why do we do this like this? Right? and sometimes there's no good reason.

Sometimes the complexity is just like, is unnecessary, but having the steps helps you explain it, uh, or helps you understand how you got there. and, and so I feel like that is something younger people or, or newer devs don't necessarily get a chance to see. Cause it just, it would take, it would take very long right? And if you're like a new dev, let's say you jumped into like a coding bootcamp. I mean, I've got opinions on coding boot camps, but you know, it's just like, let's say you jumped into one and you, you came out, you, you made it. It's just, there's too much to know. sure, you could probably do like HTML in one month.

Well, okay, let's say like two weeks or whatever, right? If you were, if you're literally brand new, two weeks of like concerted effort almost, you know, class level, you know, work days right on, on html, you're probably decently comfortable with it. Very comfortable. CSS, a little harder because this is where things get hard.

Cause if you, if you give two weeks for, for HTML, CSS is harder than HTML kind of, right? Because the interactions are way more varied. Right? Like, and, and maybe it's one of those things where you just, like, you, you get somewhat comfortable and then just like know that in the future you're gonna see something you don't understand and have to figure it out. Uh, but then JavaScript, like, how many months do you give JavaScript? Because if you go through that first like, sort of progression that I, I I, I, I mentioned everyone would have a perfect sort of, not perfect but good understanding of the pieces, right? Like, why did we start transpiling at all? Right? Like, uh, or why did you know, why did we adopt libraries?

Like why did Bower exist? No one talks about Bower anymore, obviously, but like, Bower was like a way to distribute front end only packages, right? Um, what is it? Um, Uh, yes, there's grunt. There's like the whole build system thing, right? Once, once we decide we're gonna, we're gonna do stuff to files before we, before we push. So there's grunt, there's, uh, gulp, which is like grunt, but like, Oh, we're gonna do it all in memory. We're gonna pipe, we're gonna use this pipes thing to make sure everything goes fast. then there's like, of course that leads like the insanity that's webpack. And then there's like parcel, which did better.

There's vite there's like, there's all this, there's this progression, but how many months would it take to know that progression? It, it's too long. So they end up just like, Hey, you're gonna learn react. Which is the right thing because it's like, that's what people hire for, right? But then you're gonna be in react and be like, What's webpack, right?

And it's like, but you can't go down. You can't, you don't have the time. You, you can't sort of approach that problem from the other direction where you, which would give you better understanding cause you just don't have the time. I think it's hard for newer devs to overcome this.

Um, but I think there are some, there's some hope on the horizon cuz some things are simpler, right? Like some projects do reduce complexity, like, by watching another project sort of innovate so like react. Wasn't the first component, first framework, right? Like technically, I, I think, I think you, you might have to give that to like, to maybe backbone because like they had views and like marionette also went with that.

Like maybe, I don't know, someone, someone I'm sure will get in like, send me an angry email, uh, cuz I forgot you Moo tools or like, you know, Ember Ember. They've also, they've also been around, I used to be a huge Ember fan, still, still kind of am, but I don't use it. but if you have these, if you have these tools, right?

Like people aren't gonna know how to use them and Vue was able to realize that React had some inefficiencies, right? So React innovates the sort of component. So Reintroduces the component based model component first, uh, front end development model. Vue sees that and it's like, wait a second, if we just export this like data object, and of course that's not the only innovation of Vue, but if we just export this data object, you don't have to do this fine grained tracking yourself anymore, right?

You don't have to tell React or tell your the system which things change when other things change, right? Like you, you don't have to set up this watching and stuff, right? Um, and that's one of the reasons, like Vue is just, I, I, I remember picking up Vue and being like, Oh, I'm done. I'm done with React now.

Because it just doesn't make sense to use React because they Vue essentially either, you know, you could just say they learned from them or they, they realize a better way to do things that is simpler and it's much easier to write. Uh, and you know, functionally similar, right? Um, similar enough that it's just like, oh they boil down some of that complexity and we're a step forward and, you know, in other ways, I think.

Uh, so that's, that's awesome. Every once in a while you get like a compression in the complexity and then it starts to ramp up again and you get maybe another compression. So like joining the projects that do a compression. Or like starting to adopting those is really, can be really awesome. So there's, there's like, there's some hope, right?

Cause sometimes there is a compression in that complexity and you you might be lucky enough to, to use that instead of, the thing that's really complex after years of building on it.

[00:27:53] Jeremy: I think you're talking about newer developers having a tough time making sense of the current frameworks but the example you gave of somebody starting from HTML and JavaScript going to jquery backbone through the whole chain, that that's just by nature of you've put in a lot of time right you've done a lot of work working with each of these technologies you see the progression

as if someone is starting new just by nature of you being new you won't have been able to spend that time

[00:28:28] Victor: Do you think it could work? again, the, the, the time aspect is like really hard to get like how can you just avoid spending time um to to learn things that's like a general problem I think that problem is called education in the general sense.

But like, does it make sense for a, let's say a bootcamp or, or any, you know, school right? To attempt to guide people through the previous solutions that didn't work, right? Like in math, you don't start with calculus, right? It just wouldn't, it doesn't make sense, right? But we try and start with calculus in software, right?

We're just like, okay, here's the complexity. You've got all of it. Don't worry. Just look at this little bit. If, you know, if the compiler ever spits out a weird error uh oh, like, you're, you're, you're in for trouble cuz you, you just didn't get the. get the basics. And I think that's maybe some of what is missing.

And the thing is, it is like the constraints are hard, right? No one has infinite time, right? Or like, you know, even like, just tons of time to devote to learning, learning just front end, right? That's not even all of computing, That's not even the algorithm stuff that some companies love to throw at you, right?

Uh, or the computer sciencey stuff. I wonder if it makes more sense to spend some time taking people through the progression, right? Because discovering that we should do things via components, let's say, or, or at least encapsulate our functionality to components and compose that way, is something we, we not everyone knew, right?

Or, you know, we didn't know wild widely. And so it feels like it might make sense to touch on that sort of realization and sort of guide the student through, you know, maybe it's like make five projects in a week and you just get progressively more complex. But then again, that's also hard cause effort, right?

It's just like, it's a hard problem. But, but I think right now, uh, people who come in at the end and sort of like see a bunch of complexity and just don't know why it's there, right? Like, if you've like, sort of like, this is, this applies also very, this applies to general, but it applies very well to the Kubernetes problem as well.

Like if you've never managed nginx on more than one machine, or if you've never tried to set up a, like a, to format your file system on the machine you just rented because it just, you know, comes with nothing, right? Or like, maybe, maybe some stuff was installed, but, you know, if you had to like install LVM (Logical Volume Manager) yourself, if you've never done any of that, Kubernetes would be harder to understand.

It's just like, it's gonna be hard to understand. overlay networks are hard for everyone to understand, uh, except for network people who like really know networking stuff. I think it would be better. But unfortunately, it takes a lot of time for people to take a sort of more iterative approach to, to learning.

I try and write blog posts in this way sometimes, but it's really hard. And so like, I'll often have like an idea, like, so I call these, or I think of these as like onion, onion style posts, right? Where you either build up an onion sort of from the inside and kind of like go out and like add more and more layers or whatever.

Or you can, you can go from the outside and sort of take off like layers. Like, oh, uh, Kubernetes has a scheduler. Why do they need a scheduler? Like, and like, you know, kind of like, go, go down. but I think that might be one of the best ways to learn, but it just takes time. Or geniuses and geniuses who are good at two things, right?

Good at the actual technology and good at teaching. Cuz teaching is a skill and it's very hard. and, you know, shout out to teachers cuz that's, it's, it's very difficult, extremely frustrating. it's hard to find determinism in, in like methods and solutions.

And there's research of course, but it's like, yeah, that's, that's a lot harder than the computer being like, Nope, that doesn't work. Right? Like, if you can't, if you can't, like if you, if the function call doesn't work, it doesn't work. Right. If the person learned suboptimally, you won't know Right. Until like 10 years down the road when, when they can't answer some question or like, you know, when they, they don't understand. It's a missing fundamental piece anyway.

[00:32:24] Jeremy: I think with the example of front end, maybe you don't have time to walk through the whole history of every single library and framework that came but I think at the very least, if you show someone, or you teach someone how to work with css, and you have them, like you were talking about components before you have them build a site where there's a lot of stuff that gets reused, right? Maybe you have five pages and they all have the same nav bar.

[00:33:02] Victor: Yeah, you kind of like make them do it.

[00:33:04] Jeremy: Yeah. You make 'em do it and they make all the HTML files, they copy and paste it, and probably your students are thinking like, ah, this, this kind of sucks

[00:33:16] Victor: Yeah

[00:33:18] Jeremy: And yeah, so then you, you come to that realization, and then after you've done that, then you can bring in, okay, this is why we have components.

And similarly you brought up, manual dom manipulation with jQuery and things like that. I, I'm sure you could come up with an example of you don't even necessarily need to use jQuery. I think people can probably skip that step and just use the the, the API that comes with the browser.

But you can have them go in like, Oh, you gotta find this element by the id and you gotta change this based on this, and let them experience the. I don't know if I would call it pain, but let them experience like how it was. Right. And, and give them a complex enough task where they feel like something is wrong right. Or, or like, there, should be something better. And then you can go to you could go straight to vue or react. I'm not sure if we need to go like, Here's backbone, here's knockout.

[00:34:22] Victor: Yeah. That's like historical. Interesting.

[00:34:27] Jeremy: I, I think that would be an interesting college course or something that.

Like, I remember when, I went through school, one of the classes was programming languages. So we would learn things like, Fortran and stuff like that. And I, I think for a more frontend centered or modern equivalent you could go through, Hey, here's the history of frontend development here's what we used to do and here's how we got to where we are today.

I think that could be actually a pretty interesting class yeah

[00:35:10] Victor: I'm a bit interested to know you learned fortran in your PL class.

I, think when I went, I was like, lisp and then some, some other, like, higher classes taught haskell but, um, but I wasn't ready for haskell, not many people but fortran is interesting, I kinda wanna hear about that.

[00:35:25] Jeremy: I think it was more in terms of just getting you exposed to historically this is how things were. Right. And it wasn't so much of like, You can take strategies you used in Fortran into programming as a whole. I think it was just more of like a, a survey of like, Hey, here's, you know, here's Fortran and like you were saying, here's Lisp and all, all these different languages nd like at least you, you get to see them and go like, yeah, this is kind of a pain.

[00:35:54] Victor: Yeah

[00:35:55] Jeremy: And like, I understand why people don't choose to use this anymore but I couldn't take away like a broad like, Oh, I, I really wish we had this feature from, I think we were, I think we were using Fortran 77 or something like that.

I think there's Fortran 77, a Fortran 90, and then there's, um, I think,

[00:36:16] Victor: Like old fortran, deprecated

[00:36:18] Jeremy: Yeah, yeah, yeah. So, so I think, I think, uh, I actually don't know if they're, they're continuing to, um, you know, add new things or maintain it or it's just static. But, it's, it's more, uh, interesting in terms of, like we were talking front end where it's, as somebody who's learning frontend development who is new and you get to see how, backbone worked or how Knockout worked how grunt and gulp worked.

It, it's like the kind of thing where it's like, Oh, okay, like, this is interesting, but let us not use this again. Right?

[00:36:53] Victor: Yeah. Yeah. Right. But I also don't need this, and I will never again

[00:36:58] Jeremy: yeah, yeah. It's, um, but you do definitely see the, the parallels, right? Like you were saying where you had your, your Bower and now you have NPM and you had Grunt and Gulp and now you have many choices

[00:37:14] Victor: Yeah.

[00:37:15] Jeremy: yeah. I, I think having he history context, you know, it's interesting and it can be helpful, but if somebody was. Came to me and said hey I want to learn how to build websites. I get into front end development. I would not be like, Okay, first you gotta start moo tools or GWT.

I don't think I would do that but it I think at a academic level or just in terms of seeing how things became the way they are sure, for sure it's interesting.

[00:37:59] Victor: Yeah. And I, I, think another thing I don't remember who asked or why, why I had to think of this lately. um but it was, knowing the differentiators between other technologies is also extremely helpful right? So, What's the difference between ES build and SWC, right? Again, we're, we're, we're leaning heavy front end, but you know, just like these, uh, sorry for context, of course, it's not everyone a front end developer, but these are two different, uh, build tools, right?

For, for JavaScript, right? Essentially you can think of 'em as transpilers, but they, I think, you know, I think they also bundle like, uh, generally I'm not exactly sure if, if ESbuild will bundle as well. Um, but it's like one is written in go, the other one's written in Rust, right? And sort of there's, um, there's, in addition, there's vite which is like vite does bundle and vite does a lot of things.

Like, like there's a lot of innovation in vite that has to have to do with like, making local development as fast as possible and also getting like, you're sort of making sure as many things as possible are strippable, right? Or, or, or tree shakeable. Sorry, is is is the better, is the better term. Um, but yeah, knowing, knowing the, um, the differences between projects is often enough to sort of make it less confusing for me.

Um, as far as like, Oh, which one of these things should I use? You know, outside of just going with what people are recommending. Cause generally there is some people with wisdom sometimes lead the crowd sometimes, right? So, so sometimes it's okay to be, you know, a crowd member as long as you're listening to the, to, to someone worth listening to.

Um, and, and so yeah, I, I think that's another thing that is like the mark of a good project or, or it's not exclusive, right? It's not, the condition's not necessarily sufficient, but it's like a good projects have the why use this versus x right section in the Readme, right? They're like, Hey, we know you could use Y but here's why you should use us instead.

Or we know you could use X, but here's what we do better than X. That might, you might care about, right? That's, um, a, a really strong indicator of a project. That's good cuz that means the person who's writing the project is like, they've done this, the survey. And like, this is kind of like, um, how good research happens, right?

It's like most of research is reading what's happening, right? To knowing, knowing the boundary you're about to push, right? Or try and sort of like push one, make one step forward in, um, so that's something that I think the, the rigor isn't in necessarily software development everywhere, right?

Which is good and bad. but someone who's sort of done that sort of rigor or, and like, and, and has, and or I should say, has been rigorous about knowing the boundary, and then they can explain that to you. They can be like, Oh, here's where the boundary was. These people were doing this, these people were doing this, these people were doing this, but I wanna do this.

So you just learned now whether it's right for you and sort of the other points in the space, which is awesome. Yeah. Going to your point, I feel like that's, that's also important, it's probably not a good idea to try and get everyone to go through historical artifacts, but if just a, a quick explainer and sort of, uh, note on the differentiation, Could help for sure. Yeah. I feel like we've skewed too much frontend. No, no more frontend discussion this point.

[00:41:20] Jeremy: It's just like, I, I think there's so many more choices where the, the mental thought that has to go into, Okay, what do I use next I feel is bigger on frontend.

I guess it depends on the project you're working on but if you're going to work on anything front end if you haven't done it before or you don't have a lot of experience there's so many build tools so many frameworks, so many libraries that yeah, but we

[00:41:51] Victor: Iterate yeah, in every direction, like the, it's good and bad, but frontend just goes in every direction at the same time Like, there's so many people who are so enthusiastic and so committed and and it's so approachable that like everyone just goes in every direction at the same time and like a lot of people make progress and then unfortunately you have try and pick which, which branch makes sense.

[00:42:20] Jeremy: We've been kind of talking about, some of your experiences with a few things and I wonder if you could explain the the context you're thinking of in terms of the types of projects you typically work on like what are they what's the scale of them that sort of thing.

[00:42:32] Victor: So I guess I've, I've gone through a lot of phases, right? In sort of what I use in in my tooling and what I thought was cool. I wrote enterprise java like everybody else. Like, like it really doesn't talk about it, but like, it's like almost at some point it was like, you're either a rail shop or a Java shop, for so many people.

And I wrote enterprise Java for a, a long time, and I was lucky enough to have friends who were really into, other kinds of computing and other kinds of programming. a lot of my projects were wrapped around, were, were ideas that I was expressing via some new technology, let's say. Right?

So, I wrote a lot of haskell for, for, for a while, right? But what did I end up building with that was actually a job board that honestly didn't go very far because I was spending much more time sort of doing, haskell things, right? And so I learned a lot about sort of what I think is like the pinnacle of sort of like type development in, in the non-research world, right?

Like, like right on the edge of research and actual usability. But a lot of my ideas, sort of getting back to the, the ideas question are just things I want to build for myself. Um, or things I think could be commercially viable or like do, like, be, be well used, uh, and, and sort of, and profitable things, things that I think should be built.

Or like if, if I see some, some projects as like, Oh, I wish they were doing this in this way, Right? Like, I, I often consider like, Oh, I want, I think I could build something that would be separate and maybe do like, inspired from other projects, I should say, Right? Um, and sort of making me understand a sort of a different, a different ecosystem.

but a lot of times I have to say like, the stuff I build is mostly to scratch an itch I have. Um, and or something I think would be profitable or utilizing technology that I've seen that I don't think anyone's done in the same way. Right? So like learning Kubernetes for example, or like investing the time to learn Kubernetes opened up an entire world of sort of like infrastructure ideas, right?

Because like the leverage you get is so high, right? So you're just like, Oh, I could run an aws, right? Like now that I, now that I know this cuz it's like, it's actually not bad, it's kind of usable. Like, couldn't I do that? Right? That kind of thing. Right? Or um, I feel like a lot of the times I'll learn a technology and it'll, it'll make me feel like certain things are possible that they, that weren't before.

Uh, like Rust is another one of those, right? Like, cuz like Rust will go from like embedded all the way to WASM, which is like a crazy vertical stack. Right? It's, that's a lot, That's a wide range of computing that you can, you can touch, right? And, and there's, it's, it's hard to learn, right? The, the, the, the, uh, the, the ramp to learning it is quite steep, but, it opens up a lot of things you can write, right?

It, it opens up a lot of areas you can go into, right? Like, if you ever had an idea for like a desktop app, right? You could actually write it in Rust. There's like, there's, there's ways, there's like is and there's like, um, Tauri is one of my personal favorites, which uses web technology, but it's either I'm inspired by some technology and I'm just like, Oh, what can I use this on?

And like, what would this really be good at doing? or it's, you know, it's one of those other things, like either I think it's gonna be, Oh, this would be cool to build and it would be profitable. Uh, or like, I'm scratching my own itch. Yeah. I think, I think those are basically the three sources.

[00:46:10] Jeremy: It's, it's interesting about Rust where it seems so trendy, I guess, in lots of people wanna do something with rust, but then in a lot of they also are not sure does it make sense to write in rust? Um, I, I think the, the embedded stuff, of course, that makes a lot of sense.

And, uh, you, you've seen a sort of surge in command line apps, stuff ripgrep and ag, stuff like that, and places like that. It's, I think the benefits are pretty clear in terms of you've got the performance and you have the strong typing and whatnot and I think where there's sort of the inbetween section that's kind of unclear to me at least would I build a web application in rust I'm not sure that sort of thing

[00:47:12] Victor: Yeah. I would, I characterize it as kind of like, it's a tool toolkit, so it really depends on the problem. And think we have many tools that there's no, almost never a real reason to pick one in particular right?

Like there's, Cause it seems like just most of, a lot of the work, like, unless you're, you're really doing something interesting, right?

Like, uh, something that like, oh, I need to, I need to, like, I'm gonna run, you know, billions and billions of processes. Like, yeah, maybe you want erlang at that point, right? Like, maybe, maybe you should, that should be, you know, your, your thing. Um, but computers are so fast these days, and most languages have, have sort of borrowed, not borrowed, but like adopted features from others that there's, it's really hard to find a, a specific use case, for one particular tool.

Uh, so I often just categorize it by what I want out of the project, right? Or like, either my goals or project goals, right? Depending on, and, or like business goals, if you're, you know, doing this for a business, right? Um, so like, uh, I, I basically, if I want to go fast and I want to like, you know, reduce time to market, I use type script, right?

Oh, and also I'm a, I'm a, like a type zealot. I, I'd say so. Like, I don't believe in not having types, right? Like, it's just like there's, I think it's crazy that you would like have a function but not know what the inputs could be. And they could actually be anything, right? , you're just like, and then you have to kind of just keep that in your head.

I think that's silly. Now that we have good, we, we have, uh, ways to avoid the, uh, ceremony, right? You've got like hindley Milner type systems, like you have a way to avoid the, you can, you know, predict what types of things will be, and you can, you don't have to write everything everywhere. So like, it's not that.

But anyway, so if I wanna go fast, the, the point is that going back to that early, like the JS ecosystem goes everywhere at the same time. Typescript is excellent because the ecosystem goes everywhere at the same time. And so you've got really good ecosystem support for just about everything you could do.

Um, uh, you could write TypeScript that's very loose on the types and go even faster, but in general it's not very hard. There's not too much ceremony and just like, you know, putting some stuff that shows you what you're using and like, you know, the objects you're working with. and then generally if I wanna like, get it really right, I I'll like reach for haskell, right?

Cause it's just like the sort of contortions, and again, this takes time, this not fast, but, right. the contortions you can do in the type system will make it really hard to write incorrect code or code that doesn't, that isn't logical with itself. Of course interfacing with the outside world. Like if you do a web request, it's gonna fail sometimes, right?

Like the network might be down, right? So you have to, you basically pull that, you sort of wrap that uncertainty in your system to whatever degree you're okay with. And then, but I know it'll be correct, right? But and correctness is just not important. Most of like, Oh, I should , that's a bad quote. Uh, it's not that correct is not important.

It's like if you need to get to market, you do not necessarily need every single piece of your code to be correct, Right? If someone calls some, some function with like, negative one and it's not an important, it's not tied to money or it's like, you know, whatever, then maybe it's fine. They just see an error and then like you get an error in your back and you're like, Oh, I better fix that.

Right? Um, and then generally if I want to be correct and fast, I choose rust these days. Right? Um, these days. and going back to your point, a lot of times that means that I'm going to write in Typescript for a lot of projects. So that's what I'll do for a lot of projects is cuz I'll just be like, ah, do I need like absolute correctness or like some really, you know, fancy sort of type stuff.

No. So I don't pick haskell. Right. And it's like, do I need to be like mega fast? No, probably not. Cuz like, cuz so I don't necessarily don't necessarily need rust. Um, maybe it's interesting to me in terms of like a long, long term thing, right? Like if I, if I'm think, oh, but I want x like for example, tight, tight, uh, integration with WASM, for example, if I'm just like, oh, I could see myself like, but that's more of like, you know, for a fun thing that I'm doing, right?

Like, it's just like, it's, it's, you don't need it. You don't, that's premature, like, you know, that's a premature optimization thing. But if I'm just like, ah, I really want the ability to like maybe consider refactoring some of this out into like a WebAssembly thing later, then I'm like, Okay, maybe, maybe I'll, I'll pick Rust.

Or like, if I, if I like, I do want, you know, really, really fast, then I'll like, then I'll go Rust. But most of the time it's just like, I want a good ecosystem so I don't have to build stuff myself most of the time. Uh, and you know, type script is good enough. So my stack ends up being a lot of the time just in type script, right? Yeah.

[00:52:05] Jeremy: Yeah, I think you've encapsulated the reason why there's so many packages on NPM and why there's so much usage of JavaScript and TypeScript in general is that it, it, it fits the, it's good enough. Right? And in terms of, in terms of speed, like you said, most of the time you don't need of rust.

Um, and so typescript I think is a lot more approachable a lot of people have to use it because they do front end work anyways. And so that kinda just becomes the I don't know if I should say the default but I would say it's probably the most common in terms of when somebody's building a backend today certainly there's other languages but JavaScript and TypeScript is everywhere.

[00:52:57] Victor: Yeah. Uh, I, I, I, another thing is like, I mean, I'm, of ignored the, like, unreasonable effectiveness of like rails Cause there's just a, there's tons of just like rails warriors out there, and that's great. They're they're fantastic. I'm not a, I'm not personally a huge fan of rails but that's, uh, that's to my own detriment, right?

In, in some, in some ways. But like, Rails and Django sort of just like, people who, like, I'm gonna learn this framework it's gonna be excellent. It most, they have a, they have carved out a great ecosystem for themselves. Um, or like, you know, even php right? PHP and like Laravel, or whatever. Uh, and so I'm ignoring those, like, those pockets of productivity, right?

Those pockets of like intense productivity that people like, have all their needs met in that same way. Um, but as far as like general, general sort of ecosystem size and speed for me, um, like what you said, like applies to me. Like if I, if I'm just like, especially if I'm just like, Oh, I just wanna build a backend, Like, I wanna build something that's like super small and just does like, you know, maybe a few, a couple, you know, endpoints or whatever and just, I just wanna throw it out there.

Right? Uh, I, I will pick, yeah. Typescript. It just like, it makes sense to me. I also think note is a better. VM or platform to build on than any of the others as well. So like, like I, by any of the others, I mean, Python, Perl, Ruby, right? Like sort of in the same class of, of tool.

So I I am kind of convinced that, um, Node is better, than those as far as core abilities, right? Like threading Right. Versus the just multi-processing and like, you know, other, other, other solutions and like, stuff like that. So, if you want a boring stack, if I don't wanna use any tokens, right?

Any innovation tokens I reach for TypeScript.

[00:54:46] Jeremy: I think it's good that you brought up. Rails and, and Django because, uh, personally I've done, I've done work with Rails, and you're right in that Rails has so many built in, and the ways to do them are so well established that your ability to be productive and build something really fast hard to compete with, at least in my experience with available in the Node ecosystem.

Um, on the other hand, like I, I also see what you mean by the runtimes. Like with Node, you're, you're built on top of V8 and there's so many resources being poured into it to making it fast and making it run pretty much everywhere. I think you probably don't do too much work with managed services, but if you go to a managed service to run your code, like a platform as a service, they're gonna support Node.

Will they support your other preferred language? Maybe, maybe not,

You know that they will, they'll be able to run node apps so but yeah I don't know if it will ever happen or maybe I'm just not familiar with it, but feel like there isn't a real rails of javascript.

[00:56:14] Victor: Yeah, you're, totally right.

There are, there are. It's, it's weird. It's actually weird that there, like Uh, but, but, I kind of agree with you. There's projects that are trying it recently. There's like Adonis, um, there is, there are backends that also do, like, will do basic templating, like Nest, NestJS is like really excellent.

It's like one of the best sort of backend, projects out there. I I, I but like back in the day, there were projects like Sails, which was like very much trying to do exactly what Rails did, but it just didn't seem to take off and reach that critical mass possibly because of the size of the ecosystem, right?

Like, how many alternatives to Rails are there? Not many, right? And, and now, anyway, maybe let's say the rest of 'em sort of like died out over the years, but there's also like, um, hapi HAPI, uh, which is like also, you know, similarly, it was like angling themselves to be that, but they just never, they never found the traction they needed.

I think, um, or at least to be as wide, widely known as Rails is for, for, for the, for the Ruby ecosystem, um, but also for people to kind of know the magic, cause. Like I feel like you're productive in Rails only when you imbibe the magic, right? You, you, know all the magic context and you know the incantations and they're comforting to you, right?

Like you've, you've, you have the, you have the sort of like, uh, convention. You're like, if you're living and breathing the convention, everything's amazing, right? Like, like you can't beat that. You're just like, you're in the zone but you need people to get in that zone. And I don't think node has, people are just too, they're too frazzled.

They're going like, there's too much options. They can't, it's hard to commit, right? Like, imagine if you'd committed to backbone. Like you got, you can't, It's, it's over. Oh, it's not over. I mean, I don't, no, I don't wanna, you know, disparage the backbone project. I don't use it, but, you know, maybe they're still doing stuff and you know, I'm sure people are still working on it, but you can't, you, it's hard to commit and sort of really imbibe that sort of convention or, or, or sort of like, make yourself sort of breathe that product when there's like 10 products that are kind of similar and could be useful as well.

Yeah, I think that's, that's that's kind of big. It's weird that there isn't a rails, for NodeJS, but, but people are working on it obviously. Like I mentioned Adonis, there's, there's more. I'm leaving a bunch of them out, but that's part of the problem.

[00:58:52] Jeremy: On, on one hand, it's really cool that people are trying so many different things because hopefully maybe they can find something that like other people wouldn't have thought of if they all stick same framework. but on the other hand, it's ... how much time have we spent jumping between all these different frameworks when what we could have if we had a rails.

[00:59:23] Victor: Yeah the, the sort of wasted time is, is crazy to think about it uh, I do think about that from time to time. And you know, and personally I waste a lot of my own time. Like, just, just recently, uh, something I've working on, for a long time. I came back to it after just sort of leaving it on the shelf for a while and I was like, You know what?

I should rewrite this in rust. I, I really should. and so I talked myself into it, and I'm like, You know what? It's gonna be so much easier to deploy. I'm just gonna have one binary. I'm not gonna have to deal with anything else. I'm just like, it'll be, it'll be so much better. I'll, I'll be a lot more confident in the code I write.

And then sort of going through it and like finishing this a, a chunk of it and the kind of project it is, is like I'll have a lot of sort of, different services, right? That, that, that sort of do a similar thing, but a sort of different flavor of a, of a thing, if that makes sense. And I know that I can just go back to typescript on the second one, right?

Like, I'm, I'm doing one and I'm just like, and that's what I've decided to do. Cause I'm just like, Yeah, no, this doesn't make any sense. like, I'm spending way too much time, um, when the other thing is like, is good enough. and like, I think maybe just if you feel that, if you can, like, don't know if you stay, stay aware of just like, Oh, how much friction am I encountering and maybe I should switch. Like if you know rails and you know, typescript, you should probably use Rails, if you're bought into the magic of Rails, right? And, and of course Rails is also another thing that has always has great support from, Platforms as service companies. Rails is always gonna be, you know, have great support right, Because it's just one of those places where it's so nice and cozy that, you know, people who use it are just like, the people who don't want to think about the server underneath.

[01:01:03] Jeremy: I think that combination is really powerful. Like you were talking earlier about working with Kubernetes and learning how all that works and how to run a database and all that. And if you think about the Heroku experience, right? You create your, your Rails app. You tell Heroku I want a database and then you push it. you don't have to worry about pods or Docker or any of that. They take care of all of it for you so I think that certainly there's a value to going deeper and, and learning about how to host things yourself and things like that but I can totally understand if you have the money, uh, especially if for a business would say I don't wanna do this type of ops work I don't want to learn how to set up a cluster just want to push it to a heroku and be done with it.

[01:02:00] Victor: Yeah, You don't, no one gives you an award for learning how to, like wrangle LVM right? No no gives you that. They just like, you know you either make it to market or you don't. Uh, and it's like, uh, like I, mean, I'd love to hear about what you sort optimize but I feel like all, it's all about what you want to optimize for.

Like, are you optimizing for time to market? Are you optimizing for, a code base that people won't be able to mess up later? Right? Like a lot of just, you know, seed stage startups or like just early startups or big companies, like, it doesn't matter. We'll rewrite anyway. Right? That like the eBay example was a great, was a great sort of indication of that like it will get rewritten. So maybe it doesn't make sense. Maybe it's silly to, to optimize for strong code base the beginning. Um,

[01:02:45] Jeremy: I think it, uh, at the beginning, especially if you don't have an established audience, like you're not getting any money, then pick something that the team knows and that, you know, um, or at least the majority does, because that, I think, makes the biggest difference in speed. Speed. Because let's, let's say you, you were giving an example of I would use haskell if I need to be correct, and I would use rust if I need to be fast. but if you are picking something everybody knows and you don't have some very specific requirement, for the most part, if you're using something you already know, it's going to be built faster, it's going to be easier to read and maintain and it'll probably be more correct just because you're more familiar with that whole...

[01:03:50] Victor: So I, I agreed right up until the last point I feel like correctness is one of those, if you use a tool that lets you be too sloppy you can't stop people from being sloppy Right? Uh, like I think, and this is actually something I was thinking of earlier today, is like, I think writing good code is either people being disciplined or better systems, and of course it doesn't matter in every case, Right. and so like, so in cases where like, it, it's just not that important and, and it's better to just let it error and then someone just goes and like, fixes it, right? But if you do that too long, you get you can get spaghetti, right? You can get either spaghetti or you can get a code base that's suffering from a lot of technical debt. Uh, and it, it won't be a problem early on, but when it is, it's a big problem, right? and can drain a lot of, a lot of time. but 99% of the time, I agree.

You don't need anything other than like TypeScript or Rails or like Django, or you could, you could use perl if you want php obviously, like, you know, Right? Like, you, you could get very far, very fast with those. And often it's like, not even necessary to go anywhere else. But the only little thing I'd say is just like, I find that it's, It's so hard to be correct if you're not getting any help from your compiler, right?

Like, for me, at the very least, right? Like, if you're not getting any help from the language, it's so hard to like, write stuff. That's correct. That doesn't ship with bugs in it. Right? There was, um, there's a whole period of time where everyone was getting really excited about writing tests that were like, Oh, make sure to like, write a test with negative one.

Right? Like, just like, you know, like the next level test stuff was just like, Oh, but what if you like, you know, you gotta, I mean, and this is true, right? You have to think like, how could your system possibly be broken, right? Like, like thinking of how to break a system is hard. It's different from thinking of how to build a system, right?

It's a different skill set. But like some of those things you should really just be protected from, I think a big, uh, moment in my career was like seeing option I, I'd been lucky enough to have friends that were like exploring with stuff like, um, like haskell, super early on and like common lisp and sort of like, and reading Hacker News, shout out to AJ cuz like, that's his name.

But like, there's a, there's a person that was like, just kind of like, sort of like exploring the frontier. And then I would like hear a little bit and be like, Ooh, that's interesting. And like, kind of like, take a look, but option coming in. Like, I think Java 8 was like, wait a second option should be everywhere, right?

Because it's like NPEs Null Pointer Exceptions should almost, like, they shouldn't really be a thing, right? Like, and then you are like, Oh, wait, option should be here, but that means it has to be there and it kind of like, it just infects everything. And normally stuff that infects everything is bad, right?

You're just like, Oh no, this is bad. I better take it out. But then you're like, Wait a second. But option is right because I don't know if that thing is not a null actually right. Like the language doesn't give me that. So then, you know, you kind of stumble upon non nullable types, right? As a language feature.

And so it's, it's really hard to quantify, but I think things like that can make a, a, a, a worthwhile difference in, base choice of language as far as correctness goes and in preventing. But I also know that like, people are just blazing by in rails like, just like absolutely without the care in the world, and they're doing great and they, like, they have the, all the integrations and it's all, it's working out great for them.

But I personally just like, I'm just like, I have to, I feel the compulsion. I'm just like, I feel the compulsion and I'm just like, I need to at least do typescript and then I have a little bit more protection from myself. Uh, and then I can, and then I can go on. And it's also, it's like, it's also an excuse for me to like, write less tests as well.

Like a little bit like, you know, I'm just like, you know, I, I, I, there's, there's some, there's some, Assurance that I don't have to like go back and actually write that negative one test like the first time, Right. It in practice, like technically you, you, you should, cuz like, you know, at run time it's, it is a completely different world, right?

Like typescript is like a compile time thing thing. But if you, if you write your types well enough, you, you, you're, you're protected from some amount of that. And I find that that helps me. Personally. So, so that's the, that's the one place I'm just like, ah, I do like that correctness stuff,

[01:08:13] Jeremy: Yeah. Yeah. I, I think like, I, I do agree in a general sense with languages that have static type checking where, you know, at compile time whether something can even run, that can make a big difference. Maybe correctness wasn't the right word, but I you work in an ecosystem, whether Rails or Django or something else, you kind of know all of the, the gotchas, I guess? if you're, if you're, let's say you're building a product with Haskell and you've never Haskell before, I feel like yes, you have a lot of strong guarantees from the type system, but there are going to be things about the language or the ecosystem that you, you'll just miss just because you haven't been in it.

And I think that's what I meant by correctness in that you're going to make mistakes, either logical mistakes or mistakes in structure, right? Because if you, if you think about a Rails app, one of the things that I think is really powerful is that you can go to a company day one that uses rails and if they haven't done anything too crazy, you have a general sense of where things are some extent.

And when you're building something from scratch in a language and ecosystem you don't understand, um there's just so many scrapes and cuts you have to get before you're proficient right Um, so I, so I think that is more what I was thinking of yeah.

[01:10:01] Victor: Oh yeah. I, I'd fully agree with that yeah I fully agree with that. you don't know what you, what you don't know right. When you, uh, when you start, um especially with a new ecosystem, right because you just, everything's hard. You have to go figure out everything you have to go try and decide between two libraries that do similar things despite, you know, like knowing how it's done in another language.

But you gotta like figure out how it's done in this language, et cetera. But it's like, well, you know, at least decisions are easier elsewhere sometimes, right? Like, like in the database level or like, maybe the infrastructure level or, but yeah, I, I totally get that. It's just, most of the time you just want to go with that, uh, that faster, that faster thing, you know, Feels funny to say of course.

Cuz I never do this (laughs) . for I never, like all my, all my projects on, on essentially crazy stacks. But, but I, I try and I try and be mindful about is how much of my toil right now is even a good idea, right? Like, depending on my goals. Again, like going back to like that, it depends on what you're optimizing for right if you're optimizing for learning or like getting a really good fundamental understanding of something, then yeah, sure. If you're optimizing for like getting to market? Sure. that's a different answer. If you're, if you are optimizing for, like, being able to hire developers to work alongside you, right?

Like making it easy to hire teammates in the future, that's a different set of languages maybe. so yeah, I don't know. I kind of give the, the weasel answer, which is, you know, it depends , hmm right? But, um, yeah.

[01:11:32] Jeremy: Especially if you're, you're learning or you're doing personal projects for yourself, then yeah, if you, if you want to know how to use haskell better, then yeah, go for it. Use, use haskell, um, uh, or use rust and so on. I think another thing I think about is the deployment so if personal you are running a SaaS or you're running something that you deploy internally, then I think something like Rails, Django is totally fine especially if you use a platform as a service, then there's so many resources for you. But if your goal is to give you an example, like Mastodon, right? So we have the whole,twitter substitute thing.

Mastodon is written in Rails and it has a number of dependencies, right? you have to have Sidekiq, which runs the Workers, Elastisearch for search, um, Postgres for the database and Nginx and so on. And for somebody who's running an instance for a bunch of people, totally makes sense, right?

No big deal. where I think it's maybe a little trickier is, and I don't know if this is the intent of, Mastodon or ActivityPub in but some people, they wanna host their own instance, right? Um, rather than signing up for mastodon.social and having a whole bunch of people in one instance, they wanna able to control their instance.

They wanna host it themselves. And I think for that Rails the, the resources that it requires are a little high for that kind of small usage. So, in an example like that, if I wanted something that I wanted to be able to easily give to you and you could host it, then something like a Go or a Rust I think would make a lot of sense you can run the binary, right? And, you don't have to worry about all the things that go around running a Ruby application.

So I think that's something to think about as well. And, and we talked about command line apps as well, right? If you're gonna build a command line app and you want it to run on Windows, well the person on Windows is not gonna have python or ruby so again having it in Go or in Rust makes a lot of sense there so that's another think I would think about who is it going to be given to and who is going to deploy it as well.

[01:14:25] Victor: Yeah. That's um, that's a great point, uh, because it makes me think of sort of explosion of sysadmins writing go when it first came out I, don't know if I imagined this or I think it was real, but like there were just so, uh, up until then, like most sysadmins would be they'd like obviously like get to know their routers or their, you know, their switches and their, you know, their servers and like racking, stacking doing all that stuff.

Languages and like frameworks can unlock a certain group of people or like unblock a certain group of people and like unlock their sort of productivity. So like Ansible was one of those first things that was like really sort of easy to understand and like, Oh, you can imperatively set this machine up.

But a side effect is you get a lot of sysadmins that know Python, right? So like, now a lot of like the sort of black art stuff is accessible to you. Like, or, sorry, I say accessible to you as in accessible to me as the non sysadmin, right? Cause I'm just like, Oh, I can run this like little script this person wrote, uh, in Python and it like, will do all this stuff, right?

That I, I would've never been able to do before. And maybe I learned a little bit more about that, about that system, right? And so I, I, I saw something similar and go where people were writing a bunch of tools that were just really easy to run, right? Really, really easy to run everywhere. Um, and that means easy to download, easy to like, you know, everything's easier and, A lot of hard things got a lot easier, right? Uh, and this is same with Rust. Like, I, I believe that library that most people use is like clap, I've built a few things with Clap and it's like, it gives you excellent, uh, I guess you'd call them affordances or like ability to make a high quality CLI program with very little effort, right?

Uh, and so that means you end up writing really decent binaries, right? With like, good help texts and like reasonable like, you know, options and stuff like that. and then it's really easy to deploy to Windows, right? And like other, other platforms, uh, like you said, you don't have to try and bundle Python or, or whatever else the sort of interpreter class of languages. So yeah, I think that I'd agree that like just languages and, and, and sort of frameworks can, can unlock, easier creation of certain kinds of apps and certain sort of groups of people to share their knowledge or like to, to, to make a, a tool that's more usable by everyone else.

It could be like, kind of like a, multiplicative factor right. Just like, I made this really, really intense Python script, but like now, but to use it, you'd have to like install Python on Windows, like manage your environments, whatever. Like, I don't know if you're using pyenv, maybe you are, maybe you aren't.

Do you get the wheel? Like what, what do you do with that? no, I'll just give you a, executable and you have an executable and then now you can use all the tools that like normally work with an executable or with something that like produces output and it's just faster for everybody and everybody like just, you know, gets more value

[01:17:17] Jeremy: Cool. Well, is there anything else you wanted to, to mention or, or talk about?

[01:17:26] Victor: I don't know. oh yeah, I guess, I guess I could just like say my stack, right? Um, Oh, I, I really love Sveltekit. I've been kind of all in on Sveltekit for the front end for a while now. it feels like I've used, um, I've used nuxt I've used, like, I've used a lot of frameworks, but I'm trying to think of, of frameworks that like, do the, um, like I think, I think a local, if not global maximum for front end development is power of the front end component driven sort of paradigm and server side rendering, right?

Because there's like, what are the big advantages of using something like Rails or like whatever else that, that just, just, that's completely server side is that the pages are fast, the pages are always fast. It's there, but they don't have interactivity. Right. we've taken a weird path to get here and it looks really wasteful and maybe it is really wasteful, but at this point we now have kind of both kind of like glued and like hacked into one thing.

And I think that class of tools is like, is, is is a local maximum, if not, if not global. so, so yeah. So like, there's like next, nuxt, sveltekit. There's, there's other solutions. There's Astro like there's, there's, which is Astro's really recent.

Um, there's Ember, right? Shout out to Ember right. People, people still pushing that forward as well, which is great. but yeah, so I, I've SvelteKit also, and this is again in like direct conflict to what we've talked about this entire time, which is like, use established things that get you there fast. but like SvelteKit isn't at 1.0 yet, but it is excellent.

Like, I, I am more productive in it than I ever was with Nuxt. Um, and again, Nuxt has changed a lot since I've, you know, sort of made the switch and like, you know, maybe I, maybe it deserves a rethink and like re revisiting it, but I'm so productive with SvelteKit. I just, like, I don't mind. And like half the time I'll just, I'll just use SvelteKit, uh, and my database and then be done like no middle layer.

So like no API layer. I just like stuff it into the SvelteKit app, and then use, postgres on the backend and then I'm done and, and I feel like that's been really productive, you know, again, this is outside of the, the world where you use a rails or whatever.

Um, so yeah. So that's, that's been my stack for a lot of the products I've done recently. so yeah, if I, if I had to, I guess say something about like front end, like give SvelteKit a try. It's pretty good. Uh, and obviously like databases, just use Postgres. Stop using other things. don't, don't do that.

And like infrastructure stuff, I think Kubernetes is cool, but you probably don't need it.

Uh, I like Pulumi. I feel like no one, like I've been recommending Pulumi for a long time over Terraform. So it's just like DSLs have limits and those limits are a bad idea to have when you, like, the rest of your time is spent with no limits, right.

With like just general computing. Right. So, and Pulumi is just like, you can do your general computing and infrastructure stuff too, and it's, I feel like it's, it's always, you know, been better, but, but anyway, yeah. That's like, that's kind of my stack

[01:20:26] Jeremy: So pulumi is um, it's a way to provision infrastructure, but is there a language for it?

[01:20:35] Victor: It integrates with the language you use. And Terraform has caught up in that respect, right? Cause you have that now. but how it works is still slightly different right because if I remember correctly they still generate a Terraform file and execute that file it's, still a little bit different, which is like, it's, and it's AWS' CDK as well, right? So, so the world that's sort of caught up to where, what Pulumi is doing. But you know, I, I think it was like, I don't know, terraform 12 or something like that where it was just like, we've added better for loops.

I'm like, okay, at this point, like this is, that's the indication of like, you now need general, like you, you, you're now the dsl, like DSLs can have for loops, but it's like if you're starting to like pluck, you know, general computing languages, we have really good general computing languages right there.

You know, that was kind of my, indication to be like, okay, I Pulumi is the way, uh, for me, um again, This doesn't matter cuz like at work you're gonna, you're probably using Terraform, like, you know, just every, just like, there's, you know, everyone's using certain tools and you don't have a choice. Sometimes you have to use certain tools, but I personally have my, uh, have my pet pet likes and stuff.

[01:21:49] Jeremy: How about for caching?

[01:21:53] Victor: Uh, KeyDB. I go into rabbit holes a lot. I call myself a yak shaver cause I shave a lot of yaks and it doesn't benefit anyone really except for me most of the time. But there are lots of redis alikes out there. And the best feature set is right now KeyDB.

There's like, there's, there's one called Tendis there's, um, which is like, um, a little bit like more distributed. There's like SSdb, which will do it off disk, which is, I think because we have such fast disks now, it's good enough for a bunch of applications. Right. Especially if, like, if your alternative was like, you know, a much farther away sort of, you know, calls the farther away service.

There's Pelican out of Twitter, so they have a whole, they've got like a caching, it's like a framework kind of, right? Like they, they, they've sort of built a kernel of like really interesting caching, um, originally like sort of to serve their memcache workloads and stuff. But it's kind of grown in like, in lots of directions as well.

KeyDB is, was the most compelling and still is to, to me for, from a resource usage. Multi threading, obviously, like it is multi threaded, so it is now, it's it's way faster. Right. Um, and also like it offers flash storage, using the SSD when you can. And, and that's, Those are game changers. Right. And, and of course all the, you know, usual and clusters, right? It clusters without you, you know, paying Redis Labs any money or whatever. Um, which is, which is fine. You know, people opensource projects and, and businesses have to, you know, make money. That is a thing. But yeah, KeyDB is, is my, uh, I, whenever I'm about to spin up redis, I don't, and I spin up uh, also they were bought by Snap or bought hell of an aquihire.

I think if, if you, cuz I think sometimes that has like a negative pejorative context to it. Like you didn't, like, oh, you didn't make a billion dollars, you just got aquihired or whatever. But hell of an aquihire. Um, and, and so all of it's like free now, like all of the, like all the, the premium features are becoming free.

And I'm like, this is, this is like, I won the lottery, right? Cause um, you know, you get all the, the, the awesome stuff outta KeyDB for, for free. Um, so yeah, Caching KeyDB. I do KeyDB.

[01:24:11] Jeremy: KeyDB. I haven't heard of that one.

[01:24:14] Victor: Oh yeah, it's, um, yeah it's like keydb.dev.

[01:24:17] Jeremy: Oh KeyDB.

[01:24:18] Victor: It's awesome. They did YC.

[01:24:23] Jeremy: Oh, it uses the Redis wire protocol

[01:24:28] Victor: Like Redis is like, is the leader, unless you're using memcached for some other reason and then like obvious like have to use memcached, whatever. But, um, but yeah, Redis is like the sort of app external cache dujour for basically everywhere and when I wanna run Redis, I run KeyDB.

[01:24:51] Jeremy: And for search, do you just in search in postgres or turn to something else?

[01:24:59] Victor: Oh, you've asked a dangerous question. So I recently did some, uh, some writing. So I, I, I, so recently, um, like this year, I've branched out and done a little bit more experiments in writing for companies that have an interesting you know developer product or sometimes where like, you know, my sort of like interest and stuff just aligned, right?

So like, uh, I've worked with, um, OCV Open Core Ventures, um, which is on Sid, if you know Sid from GitLab, That's his, um, his, uh, his fund, uh, and then also Supabase, which does, um, you know, awesome stuff on Postgres. And, you know, it's fully open source that, that company is amazing as well. and search has been a thing.

So Postgres has full text search, SQLite has full text search. They both have it built in. they're very good and I think great approximations for like V1s at the very least, maybe even farther. because a lot of the time if someone's in your product and they're searching something's wrong usually, right?

Like, like, unless you have vast gobs of data, like this means your UX is not good enough for something, right? Um, but um, that said, I almost always start with Postgres full text search. and then I have the, um, there, there are, there's a huge crop of new search engines, right? So if we consider open search to be new, as in like the fork of Amazon from, from Elastic search, there's that, there's a project called Meilisearch.

There's a project called TypeSense. Um, there's Sonic, uh, there's like, um, Tantivy, uh, which which is like the, can be under net. There's like quickwit, which is like shifted to logging a little bit. Like that's their like, path to sort of, um, profitability. I, I think, I think they, they sort of shifted a little bit.

there's a bunch more that I'm, I'm missing. And so that's what I wrote about and had a lot of fun writing about for Supabase very recently. And this was, um, this was something I just had written down, right? So I was just like, I need to do a blog post. And I, I write on my blog a lot, so I'm just like, Alright.

I write up yak shaves to my blog a lot and I'm, and I was just like, I need to try and just use some of these, right? Because there's so many and they all look pretty good. And they have to have learned, like the golden standard is like, uh, solr, right? Lucene, right? Like, it's like, it's like solr and lucene and like, you know, that or whatever.

And, but a lot of times you just don't need, like, you don't necessarily need every single feature of lucene. And so there are so many new projects that are look decent. Uh, and so I got a chance to, to to sort of, I was paid to do some of that experimentation, which is awesome cause I would've done it anyway.

But it's nice to be paid to do it, on search stuff. and I actually have a project I like, I liked that so much that I made a project to try and get a more representative dataset. So I started a site called podcastsaver.com I use the podcast index, right?

Which has a lot of sort of like podcast information. And, know, if someone doesn't know about podcasts, there's like an RSS feed, right? Which is kind of like a, you can think of an XMLy uh, format where people like podcasts are just a publish of a RSS feed and the RSS feed has links to where to download the actual files, right?

So it's really open, right? Um, and so I used, um, that the structure of that to index, in multiple search engines at once, right? Running alongside each other, the information from the podcast index. this is was fun for me cuz it was like an extension of that other project. It was a really good way to test them against each other.

Very fast, right? Like, or, or like in real time. So like right now, um, if you go to podcastsaver.com and you search a podcast, it will go to one of the search engines randomly. So right now there is Postgres FTS, plus Trigram. So, so there is, um, there's also a thing called, um, Tri Trigram searches another really good like, um, sort of basic search feature.

And there's Meilisearch. So both of those are implemented right now. And there's actually a little nerds link, right? Which will show you how many, how many podcasts there are, right? So, so how many documents, essentially you can kind of assume there are. Um, and it'll show you how fast each search engine did, right?

At sort of returning an answer. Now it's a little bit of a problem because I don't you need to do some manual work to figure out whether the answer was good, right? If you're really fast but give a garbage answer, that's not good. But in general, like, so you can, you can actually use the nerd tab to control, You can like switch to only Postgres, uh, and I do that with like cookies and you can, um, you can force it to go to Postgres and you can see the quality of the answers for yourself.

But they're generally, it's pretty good from both. Like it's not, it's not terrible from, from both. So I'm, I'm kind of like glossing over that part right now, but you can see the performance and it's actually, it's like meilisearch does a great job, right? Um, and you know, there's obviously some complexity in running another service and there's some other caveats and stuff like that, but it's, it's pretty good.

And over time, I want to add more. So I wanna add, you know, at the very least typesense, like people have reached out, so like, I, I made a, a comment on this, on Hacker news and like there's a long road ahead for that and like, I honestly shouldn't be working on that cuz I have other things that I'm like, you know, I, I'm really should be full time on.

Um, But like, that's a thing I'm trying to, I'm trying to do sort of grow in the future a little bit more cuz it's just like, it's so fascinating to, to like, everything's so cheap. Like computer is cheap, you know, like there's awesome projects out there with like really advanced functionality that we can just run, like, for free or not, not for free, but like, you don't have to do the work to like build a search engine.

There's like five out there. So all you, the only thing that's missing is like knowing which one's the best fit for you and like, you can just find that out. Yeah.

[01:30:46] Jeremy: Are there any I guess early conclusions in terms of you like Meilisearch because of X or?

[01:30:53] Victor: Yeah, the, the super supabase blog post, uh, was, was a little bit better in terms of, uh, takeaways. I can say that from like meilisearch is definitely faster like meilisearch was harder for me to load and like it took a, a little bit longer cuz you know, you have to do the network call.

And to be fair, if you choose Postgres, it's in the database. So like, your copying is a lot easier. Like, manipulating stuff is a lot easier. Um, but right now when I look at the stats, like Meilisearch goes way faster. It's like almost always under a hundred milliseconds, right? And that's including, you know, um, that network, you know, round trip.

Um, but you know, Postgres is like, I don't know, I just, I just, I think it's, I I'm just so, I'm so biased. Like it is not a good idea to ever bet against Postgres, right? Like, obviously meilisearch is be like, it doesn't make sense for Postgres to be better than purpose-built tools. Um, because they are fully focused, right?

Like, they should be, they should be optimal. Cuz they, they, they don't have any other sort of conflicting constraints to think about. But Postgres is very good. It's just like, it's, it's so excellent and it, it keeps moving. Like it keeps getting better. It gets better and better every year, every like, every quarter. It's hard to not bet on it. So I often, So, so, so yeah, so I just, I, if you, I, I would say based on pure performance of podcastserver.com right now, the data lends itself to saying pick meilisearch. unfortunately that data set is incomplete. I don't have typesense up. I don't have all these other like search engines up.

So, so it's, it's, it's limited. there was also, like in the supabase post, you'll see there, there was support for like, um, misspellings and stuff was different among search engines. So there's also that axis as well. But if you happen to be running on Postgres, I really do suggest just, just give Postgres FTS a try, even if it was just Trigram search.

Like even if you just do Trigram search and do like a sort of like fuzzy search bar, cause that's probably like what a V1 would look like. Anyway, try that and then go off and like, you know, and then like, if you need like crazy faceting or like, you know, you know, really advanced features, then jump off.

Uh, but I, I don't know, that's not interesting cause I feel like it already kind of confirms what I think. So I think other people, other people need to need to do this. I need other people to please replicate, uh, and uh, come up with better, better ideas than I have

[01:33:20] Jeremy: but I think that's a good start in, in terms of when you're comparing different solutions, whether it's databases or, I don't know what you call these, but what do you call an elasticsearch?

[01:33:32] Victor: Search engine.

[01:33:34] Jeremy: You go to open source projects or the company websites and they'll have their charts and go we're x times faster than Y.

But I, I think actually having a running instance where they're going against the same data, I think that's, that's helpful really for anyone trying to compare something to for someone having gone through the time. And I think that a lot of other things too not just search engines where you could have hey, I have my system and it's got, uh I don't know five different databases or something like that. I, I'm not sure the logistics of how you would do it,

[01:34:15] Victor: Like with redis. Like just like all the Redis likes, like just all run, run 'em all at the same time. Someone needs to do that

[01:34:26] Jeremy: Could be you.

[01:34:27] Victor: Ahaha no! I do too much! Like the redis thing is obvious, right? Redis is easier, like comparing these redises and there's some great blog posts out there also that like kind of do it. But like a running service is like a really good way of like showing like, oh, this is like, we hit this cache, you know, x times a second with like, and it's like this, like naturally random sort of traffic.

This is how it performed, this is how they performed against each other. These were like the, the resources allotted or whatever. But yeah, that stuffs, that stuffs really cool. I feel like people haven't done it or aren't doing it enough.

[01:35:01] Jeremy: Yeah. I guess thing about, putting together one of these tests as well, especially when you make it live is then you have to spend the time and spend the money to maintain it right and I think, uh, if somebody's not paying you to do it's gotta be uh, Yeah. You gotta want it that bad to put it together.

[01:35:22] Victor: Hey, but you know what? we can go full circle just use Kubernetes,

Its easy if you just use Kubernetes man.

[01:35:33] Jeremy: First you gotta learn... Where, where were we? First start with postgres, kubernetes.

[01:35:42] Victor: Yeah. If you wanna use Kubernetes first, you start with Postgres and... It's like, what?

[01:35:49] Jeremy: So, learn these ten other things first then you can start to build your project.

[01:35:58] Victor: Yeah, it's silly but I know people out there have the knowledge I just feel like it's, it's like, you they just need to do some of this stuff, right? Like, it's just like, they just like need to like, have the idea or just like go, just go try it Uh, and hopefully we, get more of like, in the future.

Just like, cause, cause at some point, like there's gonna be so much choice that you're like, how are you gonna decide? How does anyone decide these days? Right? Like, you know, more people have to dedicate their time to like, trying out different things, but also sharing it. Cause I think just inside companies, you do this, you do the bakeoffs, right?

Everyone does the bakeoffs to try and figure out, you know, within a week or whatever, whether, whether they should use, let's say like Buddy Base or App Smith, right? Like, just like you, just like the rest of the team has no idea what those are, right? But someone, Does the Bakeoff maybe start sharing Bakeoffs?

There it is. There's another app idea. I, I think of a lot of ideas, and this is a, there's another one, right? Just make a site where people can share their bakeoff, like just share their bakeoff results with certain products. And then that knowledge just being available to people is like, is massively, is massively valuable.

And it, it kind of helps, it helps the products that are mentioned because they can figure out what to change, right? it kind of makes the market more efficient, right? In that vague, uh, capitalistic sense where it's like, oh, like then, you know, if everyone has a chance to improve, then we get a better product at the end of the day.

But, um, yeah, I dunno, Hopefully more people more people yak shave, please, more people waste your time. Uh, not waste, uh, use your time to, uh, to yak shave. It's, it's, it's fine.

[01:37:32] Jeremy: Well I think you have something at the end of it sometimes you can yak shave and at the end it's kind of like, well, I, I played with it and oh well.

Versus you having something to show for it.

[01:37:50] Victor: Yeah, that's true. Yeah. I won't talk about all the other projects that went absolutely nowhere.

But, uh, but yeah, I think you always feel selfish if you learn something to, and I should, I should rephrase this like I am definitely a selfish person. Like you know, like, I'm not, this is not altruism, right? It's just like, but at some point it feels like, man, someone should really know this other stuff, right? Like, if you, if you've found something that's like, interesting, like it's, it's like someone should know, cuz someone who's better at it will be like, Oh, like no, this part and this part.

Like, it's like everyone kind of wins. which is, which is awesome. So, I dunno, maybe if more people have that feeling, they'll like, they'll like share some of their stuff and like maybe you do a thing and it doesn't help you, but then someone else comes along and they're like, Oh, because I read this, I know how to do this.

And like, and then if they give that back too, it's, uh, it's pretty awesome. But anyway, that's all pie in the sky,

[01:38:57] Jeremy: I think in general, the fact that you are running a blog and, you know, you do your posts on Hacker News and, and so on. The fact that you're sharing what you've learned, I think it's is super valuable. And I think that goes for anybody who is learning a new technology or working on a problem and you run into issues or things you get stuck on for sure yeah you should share that and the way I've heard described There's always someone on the internet just waiting to tell you why you're wrong.

[01:39:35] Victor: Oh yeah. Yeah.

[01:39:36] Jeremy: And provided that they're right. That can be very helpful. Right?

[01:39:40] Victor: Yeah. Yeah. I, I actually, I love I, I personally like it because if you're a hacker in the, you know, hacker news sense that's excellent. That's like a free compiler right?

It's like a free checker right? If you just sit next to someone who is amazing at X.

And you just start bouncing ideas of like, around X and like how to do whatever it is off of them, you get it compiled.

They're just like, No, you can't do that cuz of X, Y, and Z. And you're like, Oh, okay, great. I've just saved myself like, you know, months of like thinking I could do it and like, now I know I can't do it. And the internet is great cuz it gives you access to like, to those people who are like, Yeah. And knowing it first, but if you realize that like, oh, they've chosen to share some wisdom with me like that, like, you know, or, or like trying to, Right.

Assuming you're correct, Like, even if they're not correct. Um, it's, it's, it's pretty awesome. So, so I personally welcome that. Of course it doesn't feel good to be wrong, right? I don't like that. But, um, I love it when someone like take, took the time to be like, No, your, your view on this is wrong because of this.

Or like, you know, like 99% of the time you don't need that. You should have just done this, right? Cause then I learn, a lot of my posts will have updates at the top. Right. So like when someone, like, you know, when I posted the, the thing about the throat mic to like hack me is people were like, This sounds terrible,

I was like, I didn't think it was that bad, but, uh, but I was like, you know, maybe I, maybe I shouldn't use this, uh, all the time, but it, it, you know, it was, it was like obvious that, oh, I should have, I should have never made the post without including a sample of the audio at the top, right? So like, I like went back and like an update for that and then, and then people like discussing about like, Oh, you should have used a bone conducting mic instead.

Like, and like all this other stuff that I just like didn't think about. I'm like, Oh, awesome.

And then like I update the post I go on with my life, so anyway, more people please do that and don't post it on Medium. Please don't do that. Stop, stop that. If you like, if you, if you write software, do not like, please put it some, put your writing about software somewhere else, unless, I don't know, You have to or something.

[01:41:52] Jeremy: You've reached your article limit.

[01:41:57] Victor: Yeah, yeah. Oh, also shout out to the web archive. The best way to get almost any article, right? I don't think people in the general populace know this?

But like 99% of the time if you're trying to you just go to the web archive.

It's common knowledge for us. Um, but, but it's not Common knowledge for everybody else and it just feels like they're making a lot of stuff available and legally, right. Cuz like, you know, there's like the, the precedent right now I think is, is is in favor of scraping, right? If you make a thing available to the internet, right?

LinkedIn got ruled against a while ago, but like, if you make a thing available to the internet, uh, publicly available without signing in or whatever it is assumed public, right? So it's just like, yeah, whenever I read something I'm just like, ah, article limit. I hop right on. I hop right on archive today.

But, but I just feel like it's like, it's, it's sad that developers put like, put knowledge enmasse into that particular, It's not a trap. Cause I don't, it's like I don't dislike medium, I don't have any necessarily like animosity towards medium, but it's just like we should be the most capable of, putting up something like maintaining our own websites.

Right. If it's like the death of the personal website, why is it dying with developers? Like, we should be the most capable. We have no hope of the regular world putting out websites if, if it's hard for us.

[01:43:32] Jeremy: I, I mean, I think for stuff like medium maybe sometimes it's the, the technical aspect of not wanting to set up your own site but, I think a large part of it is the social aspect. Like with Medium, you have discoverability you have the likes system, if they even call it that. Um, I think that's the same reason why people can be happy to post on twitter, right?

Um, but when it comes to posting on their own blog, it's like well, I post and then nobody comes and sees it, right? Or I don't get the, I don't get the, Well, the thing is too, like, they could be seeing it but you don't get the feedback and you don't get, you don't get the dopamine hit of like, Oh, I got 40 likes on Medium or Twitter or whatever.

And I think that's one of the challenges with personal sites where I totally agree with you. Wish people would do it and do more but I also understand you are on a little bit of an island unless you can get people to come and interact with you.

[01:44:44] Victor: There's another idea, right? Like just, you know, can you build a self hostable, but decentralized by default, medium clone. there's that's like a personal site that you could easily host you know, like, almost like WordPress, like let's say, right? Um, but with the, with enough metrics, with like, with the engagement stuff built in, even though it's not like powering a company essentially, right?

Cause like the incentives behind building in the engagement, like pumping up engagement. Make sense? If you're running a company cuz you like, you know, you're trying to get MAUs up so you can do your next round or like, you know, make more revenue. Wonder if, I don't know. Yeah, it's just like, like that is a great point cuz it's like, you don't get the positive reinforcement if you don't have the likes and the things that a company would add, right?

Like, as opposed to just like, Oh, I set up nginx and like my site's up or whatever. Like, not that anyone does that these days, but, yeah, that's, that's that's interesting. It's just like, could you make it really like just increasing the engagement of doing it yourself or like, you know, having that. Huh.

[01:45:56] Jeremy: I think sites have, have tried, I mean, it's not quite the same thing, but, dev.to, if you've seen that, like, uh, they, they have, um, I can't remember what it's called, I think it's like a canonical link or something. but basically you can post on their site and then you can put the canonical link to your own website.

And so then when somebody searches on Google, the, the traffic goes to your site. It doesn't bring up dev.to.

And then, people can comment and like on dev.to so I thought it was an interesting idea. I, I don't know how many people use it or take advantage but that's one approach anyways.

[01:46:44] Victor: Yeah, that's actually, that's cool. I don't know enough about that space. I guess.

That sounds awesome. That sounds like actually, you know, useful and like a good middle ground right in like encouraging the ecosystem but also like capturing some of that, of that value, right?

In terms of like just SEO juice, I guess, if you wanna, what, what you wanna call it. But that's awesome. I don't know, I, I, I've always thought of like dev.to And, and clearly I was, you know, at least wrong in part of dev.to Is just like medium 2.0 for, but more developer focused. Um, but I will find great blog posts on there, um, you know, more often than not, and it's just like, okay, yeah, that's, that's awesome.

Like, it, it, it works. Uh, and this canonical link thing sounds actually like very good for, um, for everybody involved, so. Awesome. Sounds like they're, they're good.

[01:47:36] Jeremy: Yeah, if people wanna check out you're up to, what, what, you're working on, where should they head?

[01:47:43] Victor: Oh God. Uh, well, like, I have my blog at, um, vadosware.io, so V A D O S WARE projects I work biggest ones right now. Oh, I guess three. Um, uh, like I, we mentioned Podcast Saver, which is cool. Uh, if you need to download podcasts, do that.

Um, I send out ideas. I send out ideas every week that I think are like valuable. valuable and like things you could turn up into like a startup or a SaaS and like kind of focus on like validating. Cuz like one thing I've learned the hard way is that validating ideas is more important than having them.

Uh, cuz you can think something is good and it won't, won't attract anybody. Um, or you know, if you don't put it in front of people, they'll, it's not gonna take off. so I do that. I send that out at like unvalidatedideas.com So that's, that's a, you know, that's the domain.

I also started, um, trying to highlight FOSS projects cuz in yak shaving what you do is you come across a lot of awesome free and open source projects that are just like, oh, like this is a whole world and like this is like pretty polished and it's like pretty good and I just bookmark So I was just like, I have so many bookmarks, it doesn't make sense that I hold all of them. Um, and like I, someone else has, should see this. So I send out, and this is uh, new for me cuz I send out that newsletter every day. So it's a daily newsletter for like free and open source projects that do, you know, do whatever, like, do lots of various things.

And that is at Awesome Foss. So you can actually spell it multiple ways, but a w s m f o s s.com. So like, awesome without the vowels. Um, but also just if you spell it normally like a normal person, like awesome the word f o s s.com. Um, so that's, that's going.

And then the, the thing that's actually like taking up all my time is nimbus, um, Nimbus Web Services is what I'm calling it.

Uh, it's not out yet, there's nothing to try there, but it is, it is my attempt, to host free and open source software. But give, 10-30% back of revenue, so not profit. Right. Cause they can be different things and like, you know, see the movie industry for like, how that can go wrong, of revenue back to open source projects that, uh, that made the software that I'm hosting.

And I, I think there's more interesting things to be done there, right? Like it can, I can be more aggressive with that. Right. If it, if it works out. Cuz it's just like, you know, it scales so well, you know, see Amazon, right. but yeah, so if you're, if you're interested in that checkout, nimbusws.com.

And that's it. I've, I've plugged everything. Everything plugged.

[01:50:38] Jeremy: Yeah that last one sounds pretty, pretty ambitious. So good luck.

[01:50:42] Victor: Thanks for taking the time.

Xe Iaso on Tailscale

Sat, 01 Oct 2022 04:26:56 -0000

Xe Iaso is the Archmage of Infrastructure at Tailscale and previously worked at Heroku.

This episode originally aired on Software Engineering Radio but includes some additional discussion about their blog near the end of the episode.

Topics covered:

Use cases for VPNs
Simplifying service authentication by identifying users via IP
Peer-to-peer vs centralized "Virtual Pain Networks"
Tailscale's tech stack and why they forked the go compiler
DERP relay servers
Struggling with the iOS network extension size limit
The surprisingly small amount of infrastructure required to run a VPN
Running your company on your own product
Working at Heroku vs Tailscale
Using the socratic style of debate in technical blog posts

Related Links

Transcript

[00:00:00] Jeremy: Today I'm talking to Xe Iaso, they're the archmage of infrastructure at tailscale, and they also have a great blog everyone should check out.

Xe, welcome to software engineering radio.

[00:00:12] Xe: Thanks. It's great to be here.

[00:00:14] Jeremy: I think the first thing we should start with, is what's a, a VPN, because I think some people they may have used it to remote into their workplace or something like that. But I think the, the scope of what it's good for and what it does is a lot broader than that. So maybe you could talk a little bit about that first.

[00:00:31] Xe: Okay. a VPN is short for virtual private network. It's basically a fake network that's overlaid on top of existing networks. And then you can use that network to do whatever you would with a normal computer network. this term has been co-opted by companies that are attempting to get into the, like hide my ass style market, where, you know, you encrypt your internet information and keep it safe from hackers.

But, uh, so it makes it really annoying and hard to talk about what a VPN actually is. Because tailscale, uh, the company I work for is closer to like the actual intent of a VPN and not just, you know, like hide your internet traffic. That's already encrypted anyway with another level of encryption and just make a great access point for, uh, three letter agencies.

But are there, use cases, past that, like when you're developing a piece of software, why would you decide to use a VPN outside of just because I want my, you know, my workers to be able to get access to this stuff.

[00:01:42] Xe: So something that's come up, uh, when I've been working at tailscale is that sometimes we'll make changes to something. And it'll be changes to like the user experience of something on the admin panel or something. So in a lot of other places I've worked in order to have other people test that, you know, you'd have to push it to the cloud.

It would have to spin up a review app in Heroku or some terrifying terraform of abomination would have to put it out onto like an actual cluster or something. But with tail scale, you know, if your app is running locally, you just give like the name of your computer and the port number. And you know, other people are able to just see it and poke it and experience it.

And that basically turns the, uh, feedback cycle from, you know, like having to wait for like the state of the world to converge, to, you know, make a change, press F five, give the URL to a coworker and be like, Hey, is this Gucci?

they can connect to your app as if you were both connected to the same switch.

[00:02:52] Jeremy: You don't have to worry about, pushing to a cloud service or opening ports, things like that.

[00:02:57] Xe: Yep. It will act like it's in the same room, even when they're not it'll even work. if you're at both at Starbucks and the Starbucks has reasonable policies, like holy crap, don't allow devices to connect to each other directly. so you know, you're working on. Like your screenplay app at your Starbucks or something, and you have a coworker there and you're like, Hey, uh, check this out and, uh, give them the link.

And then, you know, they're also seeing the screenplay editor.

[00:03:27] Jeremy: in terms of security and things like that. I mean, I'm picturing it kind of like we were sitting in the same room and there's a switch and we both plugged in. Normally when you do something like that, you kind of have, full access to whatever else is on the switch. Uh, you know, provided that's not being blocked by a, a firewall.

is there like a layer of security on top of that, that a VPN service like tailscale would provide.

[00:03:53] Xe: Yes. Um, there are these things called access control lists, which are kind of like firewall rules, except you don't have to deal with like the nightmare of writing an IP tables rule that also works in windows firewall and whatever they use in Mac OS. The ACL rules are applied at the tailnet level for every device in the tailnet.

So if you have like developer machines, you can put people into groups as things like developers and say that developer machines can talk to production, but not people in QA. They can only talk to testing and people on SRE have, you know, permissions to go everywhere and people within their own teams can connect to each other. you can make more complicated policies like that fairly easily.

[00:04:44] Jeremy: And when we think about infrastructure for, for companies, you were talking about how there could be development, infrastructure, production, infrastructure, and you kind of separate it all out. when you're working with cloud infrastructure. A lot of times, there's the, I always forget what it stands for, but there's like IAM.

There's like policies that you can set up with the cloud provider that says these users can access this, or these machines can access this. And, and I wonder from your perspective, when you would choose to use that versus use something at the, the network or the, the VPN level.

[00:05:20] Xe: The way I think about it is that things like IAM enforce, permissions for like more granularly scoped things like can create EC2 instances or can delete EC2 instances or something like that. And that's just kind of a different level of thing. uh, tailscale, ACLs are more, you know, X is allowed to connect to Y or with tailscale, SSH X is allowed to connect as user Y.

and that's really different than like arbitrary capability things like IAM offers.

you could think about it as an IAM system, but the main permissions that it's exposing are can X connect to Y on Zed port.

[00:06:05] Jeremy: What are some other use cases where if you weren't using a VPN, you'd have to do a lot more work or there's a lot more complexity, kind of what are some cases where it's like, okay, using a VPN here makes a lot of sense.

(The quick and simple guide to go links https://www.trot.to/go-links)

[00:06:18] Xe: There is a service internal to tailscale called go, which is a, clone of Google's so-called go links where it's basically a URL shortener that lives at http://go. And, you know, you have go/something to get to some internal admin service or another thing to get to like, you know, the company directory and notion or something, and this kind of thing you could do with a normal setup, you know, you could set it up and have to do OAuth challenges everywhere and, you know, have to put and make sure that everyone has the right DNS configuration so that, it shows up in the right place.

And then you have to deal with HTTPS um, because OAuth requires HTTPS for understandable and kind of important reasons. And it's just a mess. Like there's so many layers of stuff like the, the barrier to get, you know, like just a darn URL, shortener up turns from 20 minutes into three days of effort trying to, you know, understand how these various arcane things work together.

You need to have state for your OAuth implementation. You need to worry about what the hell a a JWT is (sigh) . It's it it's just bad. And I really think that something like tailscale with everybody has an IP address. In order to get into the network, you have to sign in with your, auth provider, your, a provider tells tailscale who you are.

So transitively every IP address is tied to an owner, which means that you can enforce access permission based on the IP address and the metadata about it that you grab from the tailscale. daemon, it's just so much simpler. Like you don't have to think about, oh, how do I set up OAuth this time? What the hell is an oauth proxy?

Um, what is a Kubernetes? That sort of thing you just think about like doing the thing and you just do it. And then everything else gets taken care of it. It's like kind of the ultimate network infrastructure, because it's both omnipresent and something you don't have to think about. And I think that's really the power of tailscale.

[00:08:39] Jeremy: typically when you would spin up a, a service that you want your developers or your system admins, to be able to log into, you would have to have some way of authenticating and authorizing that user. And so you were talking about bringing in OAuth and having your, your service understand that.

But I, I guess what you're saying is that when you have something like tailscale, that's kind of front loaded, I guess you, you authenticate with tail scale, you get onto the network, you get your IP. And then from that point on you can access all these different services that know like, Hey, because you're on the network, we know you're authenticated and those services can just maybe map that IP that's not gonna change to like users in some kind of table. Um, and not have to worry about figuring out how do I authenticate this user.

[00:09:34] Xe: I would personally more suggest that you use the, uh, whois, uh, look up route in the tailscale daemon's local API, but basically, yeah, you don't really have to worry too much about like the authentication layer because the authentication layer has already been done. You know, you've already done your two factor with Gmail or whatever, and then you can just transitively push that property onto your other machines.

[00:10:01] Jeremy: So when you talk about this, this whois daemon, can you give an example of I'm in the network now I'm gonna make a service call to an application. what, what am I doing with this? This whois daemon?

[00:10:14] Xe: It's more of like a internal API call that we expose via tailscaled's, uh, Unix, socket. but basically you give it an IP address and a port, and it tells you who the person is. It's kind of like the Unix ident protocol in a way, except completely not. And at a high level, you know, if you have something like a proxy for Grafana, you have that proxy for Grafana, make a call to the local tailscale daemon, and be like, Hey, who was this person?

And the tailscale, daemon will spit back at JSON object. Like, oh, it's this person on this device and there you can do additional logic like maybe you shouldn't be allowed to delete things from an iOS device, you know, crazy ideas like that. there's not really support for like arbitrary capabilities and tailscaled at the time of recording, but we've had some thoughts would be cool.

[00:11:17] Jeremy: would that also include things like having roles, for example, even if it's just strings, um, that you get back so that your application would know, okay. This person, is supposed to have admin access to this service based on what I got back from, this, this service.

[00:11:35] Xe: Not currently, uh, you can probably do it via convention or something, but what's currently implemented in the actual, like, source code and user experience that they, you can't do that right now. Um, it is something that I've been, trying to think about different ways to solve, but it's also a problem.

That's a bit big for me personally, to tackle.

[00:11:59] Jeremy: there's, there's so many, I guess, different ways of doing it. That it's kind of interesting to think of a solution that's kind of built into the, the network. Yeah.

[00:12:10] Xe: Yeah. and when I describe that authentication thing to some people, it makes them recoil in shock because there's kind of a Stockholm syndrome type effect with security, for a lot of things where, the easy way to do something and the secure way to do something are, you know, like completely opposite and directly conflicting with each other in almost every way.

And over time, people have come to associate security or like corporate VPNs as annoying, complicated, and difficult. And the idea of something that isn't annoying, complicated or difficult will make people reject it, like just on principle, because you know, they've been trained that, you know, VPN equals virtual pain network and it, it's hard to get that association outta people's heads because you know, a lot of VPNs are virtual pain networks.

Like. I used to work for Salesforce and Salesforce had this corporate VPN where no matter what you did, all of your traffic would go out to the internet from their data center. I think it was in San Francisco or something. And I was in the Seattle area. So whenever I had the VPN on my latency to Google shot up by like eight times and being a software person, you know, I use Google the same way that others breathe and it, it was just not fun.

And I only had the VPN on for the bare minimum of when I needed it. And, oh God, it was so bad.

[00:13:50] Jeremy: like some people, when they picture a VPN, they picture exactly what you're describing, where all of my traffic is gonna get routed to some central point. It's gonna go connect to the thing for me and then send the result back. so maybe you could talk a little bit about why that's, that's maybe a wrong assumption, I guess, in the case of tailscale, or maybe in the case of just more modern VPN solutions.

[00:14:13] Xe: Yeah. So the thing that I was describing is what I've been lovingly calling the, uh, single point of failure as a service type model of VPN, where, you know, you have like the big server somewhere, it concentrates all the connections and, you know, like does things to make the computer feel like they've teleported over there, but overall it's a single point of failure.

And if that falls over, you know, like goodbye, VPN. everybody's just totally screwed. And in contrast, tailscale does a more peer-to-peer thing so that everyone is basically on equal footing. Everyone can send traffic directly to each other, and if it can't get directly to there, it'll use a network of, uh, relay servers, uh, lovingly called Derp and you don't have to worry about, your single point of failure in your cluster, because there's just no single point of failure.

Everything will directly communicate as much as possible. And if it can't, it'll still communicate anyway.

[00:15:18] Jeremy: let's say I start up my computer and I wanna connect to a server in a data center somewhere at the very beginning, am I connecting to some server hosted at tailscale? And then. There's some kind of negotiation process where after that I connect directly or do I just connect directly straight away?

[00:15:39] Xe: If you just turn on your laptop and log in, you know, to it signs into tailscale and gets you on the tailnet and whatnot, then it will actually start all connections via Derp just so that it can negotiate the, uh, direct connection. And in case it can't, you know, it's already connected via Derp so it just continues the connection with Derp and this creates a kind of seamless magic type experience where doing things over Derp is slower.

Yes, it is measurably slower because you know, like you're not going directly, you're doing TCP inside of TCP. And you know, that comes with a average minefield of lasers or whatever you call it. And it does work though. It's not ideal if you wanna do things like copy large amounts of data, but if you want just want ssh into prod and see the logs for what the heck is going on and why you're getting paged at 3:00 AM. it's pretty great.

[00:16:40] Jeremy: What you, you were calling Derp is it where you have servers kind of all over the world and somehow it determines which one's, I guess, is it which one's closest to your destination or which one's closest to you. I'm kind of

[00:16:54] Xe: It's really interesting. It's one of the most weird distributed systems, uh, type things that I've ever seen. It's the kind of thing that could only come outta the mind of an X Googler, but basically every tailscale, every tailscale node has a connection to all of the Derp servers and through process of, you know, latency testing.

It figures out which connection is the fastest and the lowest latency. And it calls that it's home Derp but because it's connected to everything is connected to every Derp you can have two people with different home Derps getting their packets relayed too other clients from different Derps.

So, you know, if you have a laptop in Ottawa and a laptop in San Francisco, the laptop in San Francisco will probably use the, uh, Derp that's closest to it. But the laptop in Ottawa will also use the Derp that's closest to it. So you get this sort of like asynchronous thing, and it actually works out a lot better in practice, than you're probably imagining.

[00:17:52] Jeremy: And then these servers, what was the, the technical term for them? Are they like relays or what's

[00:17:58] Xe: They're relays. Uh, they only really deal with encrypted wire guard packets, and there's, no way for us at tailscale, to see the contents of Derp messages, it is literally just a forwarder. It, it literally just forwards things based on the key ID.

[00:18:17] Jeremy: I guess if tail scale isn't able to decrypt the traffic, is, is that because the, the keys are only on the user's devices, like it's on their laptop and on the server they're trying to reach, or

[00:18:31] Xe: Yeah. The private keys are live and die with those devices or the devices they were minted on. And the public keys are given to the coordination server and the coordination server spreads those around to every device in your tailnet. It does some limiting so that like, if you don't have ACL access to something, you don't get the private key, you don't get the, uh, public key for it.

The public key, not the private key, the public key, not the private key. And yeah. Then, you know, you just go that way and it'll just figure it out. It's pretty nice.

[00:19:03] Jeremy: When we're kind of talking about situations where it can't connect directly, that's where you would use the relay. what are kind of the typical cases where that happens, where you, you aren't able to just connect directly?

[00:19:17] Xe: Hotel, wifi and paranoid network security setups, hotel wifi is the most notorious one because you know, you have like an overpriced wifi connection. And if you bring, like, I don't know like, You you're recording a bunch of footage on your iPhone. And because in, 2022. The iPhone has the USB2 connection on it.

And you know, you wanna copy that. You wanna use the network, but you can't. So you could just let it upload through iCloud or something, or, you know, do the bare minimum. You need to get the, to get the data off with Derp it wouldn't be ideal, but it would work. And ironically enough, that entire complexity involved with, you know, doing TCP inside of TCP to copy a video file over to your laptop might actually be faster than USB2, which is something that I did the math for a while ago.

And I just started laughing.

[00:20:21] Jeremy: Yeah, that that is pretty, pretty ridiculous

[00:20:23] Xe: welcome to the future, man (laughs) .

[00:20:27] Jeremy: in terms of connecting directly, usually when you have a computer on the internet, you don't have all your ports open, you don't necessarily allow, just anybody to send you traffic over UDP and so forth. let's say I wanna send, UDP data to a, a server on my network, but, you know, maybe it has some TCP ports open. I I'm assuming once I connect into the network via the VPN, I'm able to use other protocols and ports that weren't necessarily exposed. Is that correct?

[00:21:01] Xe: Yeah, you can use UDP. you can do basically anything you would do on a normal network except multicast um, because multicast is weird.

I mean, there's thoughts on how to handle multicast, but the main problem is that like wireguard, which is what is tail tailscale is built on top of, is, so called OSI model layer three network, where it's at like, you know, the IP address level and multicast is a layer two or data link layer type thing.

And, those are different numbers and, you can't really easily put, you know, like broadcast packets into IP, uh, IPV4 thinks otherwise, but, uh, in practice, no people don't actually use the broadcast address.

[00:21:48] Jeremy: so for someone who's, they, they have a project or their company wants to get started. I mean, what does onboarding look like? What, what do they have to do to get all these devices talking to one another?

[00:22:02] Xe: basically you, install tail scale, you log in with a little GUI thing or on a Linux server, you run tailscale up, and then you all log to the, to a, like a G suite account with the same domain name. So, you know, if your domain is like example.com, then everybody logs in with their example.com G suite account.

And, there is no step three, everything is allowed and everything can just connect and you can change the permissions from there. By default, the ACLs are set to a, you know, very permissive allow everyone to talk to everyone on any port. Uh, just so that people can verify that it's working, you know, you can ping to your heart's content.

You can play Minecraft with others. You can, you know, host an HTTP server. You can SSH into your development box and and write blog post with emacs, whatever you want.

[00:22:58] Jeremy: okay, you install the, the software on your servers, your workstations, your laptops, and so on. And then at, after that there's some kind of webpage or dashboard you would go in and say, I want these people to be able to access these things and

[00:23:14] Xe: Mm-hmm

[00:23:15] Jeremy: these ports and so on.

[00:23:17] Xe: you, uh, can customize the access control rules with something that looks like JSON, but with trailing commas and comments allowed, and you can go from there to customize basically anything to your heart's content. you can set rules so that people on the DevOps team can access everything, but you know, maybe marketing doesn't need access to the production database.

So you don't have to worry about that as much.

[00:23:45] Jeremy: there's, there's kind of different options for VPNs. CloudFlare access, zero tier, there's, there's some kind of, I think it's Nebula from slack or something like that. so I was kind of curious from your perspective, what's the, difference between those kinds of services and, and tailscale.

[00:24:04] Xe: I'm gonna lead this out by saying that I don't totally understand the differences between a lot of them, because I've only really worked with tailscale. I know things about the other options, but, uh, I have the most experience with tailscale but from what I've been able to tell, there are things that tailscale offers that others don't like reverse mapping of IP addresses to people, or, there's this other feature that we've been working on, where you can embed tail scale as a library inside your go application, and then write a internal admin service that isn't exposed to the internet, but it's only exposed over tailscale.

And I haven't seen a way to do those things with those others, but again, I haven't done much research. Um, I understand that zero tier has some layer, two capabilities, but I've, I don't have enough time in the day to look into.

[00:25:01] Jeremy: There's been different, I guess you would call them VPN protocols. I mean, there's people have probably worked with IP sec in some situations they may have heard of OpenVPN, wireguard. in the case of tailscale, I believe you chose to build it on top of wireguard.

So I wonder if you could talk a little bit about why, you chose wireguard and, and maybe what makes it unique.

[00:25:27] Xe: I wasn't on the team that initially wrote like the core of tailscale itself. But from what I understand, wire guard was chosen because, what overhead, uh, it's literally, you just encrypt the packets, you send it to the other server, the other server decrypts them. And you know, you're done. it's also based purely on the public key. Um, the key pairs involved. And from what I understand, like at the wireguard protocol level, there's no reason why you, why you would need an IP address at all in theory, but in practice, you kind of need an IP address because you know, everything sucks. But also wire guard is like UDP only, which I think it at it's like core implementation, which is a step up from like AnyConnect and OpenVPN where they have TCP modes.

So you can experience the, uh, glorious, trash fire of TCP in TCP. And from what I understand with wireguard, you don't need to set up a certificate authority or figure out how the heck to revoke certificates. Uh, you just have key pairs and if a node needs to be removed, you delete the key pair and you're done.

And I think that really matches up with a lot of the philosophy behind how tailscale networks work a lot better. You know, you have a list of keys and if the network changes the list of keys changes, that's, that's the end of the story.

So maybe one of the big selling points was just What has the least amount of things I guess, to deal with, or what's the, the simplest, when you're using a component that you want to put into your own product, you kind of want the least amount of things that could go wrong, I guess.

[00:27:14] Xe: Yeah. It's more like simple, but not like limiting. Like, for example, a set of tinker toys is simple in that, you know, you can build things that you don't have to worry too much about the material science, but a set of tinker toys is also limiting because you know, like they're little wooden, dowels and little circles made out of wind that you stick the dowels into, you know, you can only do so much with it.

And I think that in comparison, wireguard is simple. You know, there's just key pairs. They're just encryption. And it's simple in it's like overall theory and it's implementation, but it's not limiting. Like you can do pretty much anything you want with it.

inherently whenever we build something, that's what we want, but that's a, that's an interesting way of putting it. Yeah.

[00:28:05] Xe: Yeah. It. It can be kind of annoyingly hard to figure out how to make things as simple as they need to be, but still allow for complexity to occur. So you don't have to like set up a keyboard macro to write if error not equals nil over and over.

[00:28:21] Jeremy: I guess the next thing I'd like to talk a little bit about is. We we've covered it a little bit, but at a high level, I understand that that tailscale uses wireguard, which is the open source, VPN protocol, I guess you could call it. And then there's the client software. You're saying you need to install on each of the servers and workstations.

But there's also a, a control plane. and I wonder if you could kind of talk a little bit about I guess at a high level, what are all the different components of, of tailscale?

[00:28:54] Xe: There's the agent that you install in your devices. The agent is basically the same between all the devices. It's all written in go, and it turns out that go can actually cross compile fairly well. So you have. Your, you know, your implementation in go, that is basically the, the same code, more or less running on windows, MacOS, freeBSD, Android, ChromeOS, iOS, Linux.

I think I just listed all the platforms. I'm not sure, but you have that. And then there's the sort of control plane on tailscale's side, the control plane is basically like control, uh, which is, uh, I think a get smart reference. and that is basically a key dropbox. So, you know, you You authenticate through there. That's where the admin panel's hosted. And that's what tells the different tailscale nodes uh, the keys of all the other machines on the tailnet. And also on tailscale side there's, uh, Derp which is a fleet of a bunch of different VPSs in various clouds, all over the world, both to try to minimize cost and to, uh, have resiliency because if both digital ocean and Vultr go down globally, we probably have bigger problems.

[00:30:15] Jeremy: I believe you mentioned that the, the clients were written in go, are the control plane and the relay, the Derp portion. Are those also written in go or are they

[00:30:27] Xe: They're all written and go, yeah,

go as much as possible. Yeah.

It's kind of what happens when you have some ex go team members is the core people involved in tail scale, like. There's a go compiler fork that has some additional patches that go upstream either can't accept, uh, won't accept or hasn't yet accepted, for a while. It was how we did things like trying to shave off by bites from binary size to attempt to fit it into the iOS network extension limit.

Because for some reason they only allowed you to have 15 megabytes of Ram for both like your application and working Ram. And it turns out that 15 megabytes of Ram is way more than enough to do something like OpenVPN. But you know, when you have a peer-to-peer VPN engine, it doesn't really work that well.

So, you know, that's a lot of interesting engineering challenge.

[00:31:28] Jeremy: That was specifically for iOS. So to run it on an iPhone.

[00:31:32] Xe: Yeah. Um, and amazingly after the person who did all of the optimization to the linker, trying to get the binary size down as much as possible, like replacing Unicode packages was something that's more coefficient, you know, like basically all but compressing parts of the binary to try to save space. Then the iOS, I think 15 beta dropped and we found out that they increased the network extension Ram limit to 50 megabytes and the look of defeat on that poor person's face. I feel very bad for him.

[00:32:09] Jeremy: you got what you wanted, but you're sad about it,

[00:32:12] Xe: Yeah.

[00:32:14] Jeremy: so that's interesting too. you were using a fork of the go compiler

[00:32:19] Xe: Basically everything that is built is built using, uh, the tailscale fork, of the go compiler.

[00:32:27] Jeremy: Going forward is the sort of assumption is that's what you'll do, or is it you're, you're hoping you can get this stuff upstreamed and then eventually move off of it.

[00:32:36] Xe: I'm pretty sure that, I, I don't know if I can really make a forward looking statement like that, but, I've come to accept the fact that there's a fork of the go compiler. And as a result, it allows a lot more experimentation and a bit more of control, a bit more control over what's going on. like I'm, I'm not like the most happy with it, but I've, I understand why it exists and I'm, I've made my peace with it.

[00:33:07] Jeremy: And I suppose it, it helps somewhat that the people who are working on it actually originally worked on the, go compiler at Google. Is that right?

[00:33:16] Xe: Oh yeah. If, uh, there weren't ex go team people working on that, then I would definitely feel way less comfortable about it. But I trust that the people that are working on it, know what they're doing at least enough.

[00:33:30] Jeremy: I, I feel like, that's, that's kind of the position we put ourselves in with software in general, right? Is like, do we trust our ourselves enough to do this thing we're doing?

[00:33:39] Xe: Yeah. And trust is a bitch.

[00:33:44] Jeremy: um, I think one of the things that's interesting about tail scale is that it's a product that's kind of it's like network infrastructure, right? It's to connect you to your other devices. And that's a little different than somebody running a software as a service. And so. how do you test something that's like built to support a network and, and how is that different than just making a web app or something like that.

[00:34:11] Xe: Um, well, it's a lot more complicated for one, especially when you have to have multiple devices in the mix with multiple different operating systems. And I was working on some integration tests, doing stuff for a while, and it was really complicated. You have to spin up virtual machines, you know, you have to like make sure the virtual machines are attempting to download the version of the tailscale client you wanna test and. It's it's quite a lot in practice.

[00:34:42] Jeremy: I mean, do you have a, a lab, you know, with Android phones and iPhones and laptops and all this sort of stuff, and you have some kind of automated test suite to see like, Hey, if these machines are in Ottawa and, my servers in San Francisco, like you're mentioning before that I can get from my iPhone to this server and the data center over here, that kind of thing.

[00:35:06] Xe: What's the right way to phrase this without making things look bad. Um, it's a work in progress. It it's, it's really a hard problem to solve, uh, especially when the company is fully remote and, uh, like. Address that's listed on the business records is literally one of the founders condos because you know, the company has no office.

So that makes the logistics for a lot of this. Even more fun.

[00:35:37] Jeremy: Probably any company that's in an early stage feels the same way where it's like, everything's a work in progress and we're just gonna, we're gonna keep going and we're gonna get there. And as long as everything keeps running, we're good.

[00:35:50] Xe: Yeah. I, I don't like thinking about it in that way, because it kind of sounds like pessimistic or defeatist, but at some level it's, it, it really is a work in progress because it's, it's a hard problem and hard problems take a lot of time to solve, especially if you want a solution that you're happy with.

[00:36:10] Jeremy: And, and I think it's kind of a unique case too, where it's not like if it goes down, it's like people can't do their job. Right. So it's yeah.

[00:36:21] Xe: Actually, if tail scales like control plane goes down, I don't think people would notice until they tried to like boot up a, a reboot, a laptop, or connect a new device to their tailnet. Because once, once all the tailscale agents have all of the information they need from the control plate, you know, they just, they just continue on independently and don't have to care.

Derp is also fairly independent of the, like the key dropbox component. And, you know, if that, if that goes down Derp doesn't care at all,

[00:37:00] Jeremy: Oh, okay. So if the control plane is down, as long as you had authenticated earlier in the day, you can still, I don't know if it's cached or something, but you can still continue to reach the relay servers, the Derp servers or your,

[00:37:15] Xe: other nodes. Yeah. I, I'm pretty sure that in most cases, the control plane could be down for several hours a day and nobody would notice unless they're trying to deal with the admin panel.

[00:37:28] Jeremy: Got it. that's a little bit of a relief, I suppose, for, for all of you running it,

[00:37:33] Xe: Yeah. Um, it's also kind of hard to sell people on the idea of here is a VPN thing. You don't need to self host it and they're like, what? Why? And yeah, it can be fun.

[00:37:49] Jeremy: though, I mean, I feel like anybody who has, self-hosted a VPN, they probably like don't really wanna do it. I don't know. Maybe I'm wrong.

[00:38:00] Xe: well, so a lot of the idea of wanting to self host it is, uh, I think it's more of like trying to be self-sufficient and not have to rely on other companies, failures dictating your company's downtime. And, you know, like from some level that's very understandable. And, you know, if, you know, like tail scale were to get bought out and the new owners would, you know, like basically kill the product, they'd still have something that would work for them.

I don't know if like such a defeatist attitude is like productive. But it is certainly the opinion that I have received when I have asked people why they wanna self-host. other people, don't want to deal with identity providers or the, like, they wanna just use their, they wanna use their own identity provider.

And what was hilarious was there was one, there was one thing where they were like our old VPN server died once and we got locked out of our network. So therefore we wanna, we wanna self-host tailscale in the future so that this won't happen again.

And I'm like, buddy, let's, let's just, let's just take a moment and retrace our steps here. CAuse I don't think you mean what you think you mean.

[00:39:17] Jeremy: yeah, yeah.

[00:39:19] Xe: In general, like I suggest people that, you know, even if they're like way deep into the tailscale, Kool-Aid they still have at least one other method of getting into their servers. Ideally, two. I, I admit that I'm, I come from an SRE style background and I am way more paranoid than most, but it, I usually like having, uh, a backup just in case.

[00:39:44] Jeremy: So I, I suppose, on, on that note, let's, let's talk a little bit about your role at tailscale. the title of the archmage of infrastructure is one of the, the coolest titles I've, uh, I've seen. So maybe you can go a little bit into what that entails at, at tailscale.

[00:40:02] Xe: I started that title as a joke that kind of stuck, uh, my intent, my initial intent was that every time someone asked, I'd say, I'd have a different, you know, like mystic sounding title, but, uh, archmage of infrastructure kind of stuck. And since then, I've actually been pivoting more into developer relations stuff rather than pure software engineering.

And, from the feedback that I've gotten at the various conferences I've spoken at, they like that title, even though it doesn't really fit with developer relations work at all, it it's like it fits because it doesn't. You know, that kind of coney kind of way.

[00:40:40] Jeremy: I guess this would go more into the, the infrastructure side, but. What does the, the scale of your infrastructure look like? I mean, I, I think that you touched a little bit on the fact that you have relay servers all over the place and you've got this control plane, but I wonder if you could give people a little bit of perspective of what kind of undertaking this is.

[00:41:04] Xe: I am pretty sure at this point we have more developer laptops and the like, than we do production servers. Um, I'm pretty sure that the scale of the production of production servers are in the tens, at most. Um, it turns out that computers are pretty darn and efficient and, uh, you don't really need like a lot of computers to do something amazing.

[00:41:27] Jeremy: the part that I guess surprises me a little bit is, is the relay servers, I suppose, because, I would imagine there's a lot of traffic that goes through those. are you finding that just most of the time they just aren't needed and usually you can make a direct connection and that's why you don't need too many of these.

[00:41:45] Xe: From what I understand. I don't know if we actually have a way to tell, like what percentage of data is going over the relays versus not. And I think that was an intentional decision, um, that may have been revisited I'm operating based off of like six to 12 month old information right now. But in general, like the only state that the relay servers has is in Ram.

And whenever the relay, whenever you disconnect the server, the state is dropped.

[00:42:18] Jeremy: Okay.

[00:42:19] Xe: and even then that state is like, you know, this key is listening. It is, uh, connected, uh, in case you wanna send packets over here, I guess.

it's a bit less bandwidth than you're probably thinking it's not like enough to max it out 24/7, but it is, you know, measurable and there are some, you know, costs associated with it. This is also why it's on digital ocean and vulture and not AWS. but in general, it's a lot less than you'd think. I'm pretty sure that like, if I had to give a baseless assumption, I'd say that probably about like 85% of traffic goes directly.

And the remaining is like the few cases in the whole punching engine that we haven't figured out yet. Like Palo Alto fire walls. Oh God. Those things are a nightmare.

[00:43:13] Jeremy: I see. So it's most of the traffic actually ends up. Being straight peer to peer. Doesn't have to go through your infrastructure. And, and therefore it's like, you don't need too many machines, uh, to, to make this whole thing work.

[00:43:28] Xe: Yeah. it turns out that computers are pretty darn fast and that copying data is something that computers are really good at doing. Um, so if you have, you know, some pretty darn fast computers, basically just sitting there and copying data back and forth all day, like it, you can do a lot with shockingly little.

Um, when I first started, I believe that the Derp VMs were using like sometimes as little as one core and 512 megabytes of Ram as like a primary Derp. And, you know, we only noticed when, there were some weird connection issues for people that were only on Derp because there were enough users that the machine had ran out of memory.

So we just, you know, upped the, uh, virtual machine size and called it a day. But it's, it's truly remarkable how mu how far you can get with very little

[00:44:23] Jeremy: And you mentioned the relay servers, the, the Derp servers were on services like digital ocean and Vultr. I'm assuming because of the, the bandwidth cost, for the control plane, is, is that on AWS or some other big cloud provider?

[00:44:39] Xe: it's on AWS. I believe it's in EU central 1.

[00:44:44] Jeremy: You're helping people connect from device to device and in a situation like that. what does monitoring look like in, in incidents? Like what are you looking for to determine like, Hey, something's not working.

[00:44:59] Xe: there's monitoring with, you know, Prometheus, Grafana, all of that stuff. there are some external probing things. there's also some continuous functional testing for trying to connect to tailscale and like log in as an account. And if that fails like twice in a row, then, you know, something's very wrong and, you know, raise the alarm.

But in general. A lot of our monitoring is kind of hard at some level because you know, we're tailscale at a tailscale can't always benefit from tailscale to help operate tail scale because you know, it's tailscale. Um, so it, it still trying to figure out how to detangle the chicken and egg situation.

It's really annoying.

there's the, the term dog fooding, right? Where they're saying like, oh, we, we run, um, our own development on our own platform or our own software. but I could see when your product is network infrastructure, VPNs, where that could be a little, little dicey.

[00:46:06] Xe: Yeah, it is very annoying. But I I'm pretty sure we'll figure something out. It is just a matter of when, another thing that's come up is we've kind of wanted to use tailscale's SSH features, where you specify ACLs in your, you specify ACL rules to allow people to SSH, to other nodes as various users.

but if that becomes your main access to production, then you know, like if tailscale is down and you're tailscale, like how do you get in, uh, then there's been various philosophical discussions about this. it's also slightly worse if you use what's called check mode in SSH, where, uh, tail scale, SSH without check mode, you know, you just, it, the, the server checks against the policy rules and the ACL and if it. if it's okay, it lets you in. And if not, it says no, but with check mode, there's also this like eight hour, there's this like eight hour quote unquote lifetime for you to have like sudo mode on GitHub, where you do an auth an auth challenge with your auth aprovider. And then, you know, you're given a, uh, Hey, this person has done this thing type verification.

And if that's down and that goes through the control plane, and if the control plane is down and you're tailscale, trying to debug the control plane, and in order to get into the control plane over tailscale, you need to use the, uh, control plane. It, you know, that's like chicken and egg problem level 78,

which is a mythical level of chicken egg problem that, uh, has only been foretold in the legends of yore or something.

[00:47:52] Jeremy: at that point, it sounds like somebody just needs to, to drive to the data center and plug into the switch.

[00:47:59] Xe: I mean, It's not, it's not going to, it probably wouldn't be like, you know, we need to get a person with an angle grinder off of Craigslist type bad. Like it was with the Facebook BGP outage, but it it's definitely a chicken and egg problem in its own right.

it makes you do a lot of lateral thinking too, which is also kind of interesting.

[00:48:20] Jeremy: When, when you say lateral thinking, I'm just kind of curious, um, if you have an example of what you mean.

[00:48:27] Xe: I don't know of any example that isn't NDAed. Um, but basically, you know, tail scale is getting to the, to the point where tailscale is relying on tailscale to make tailscale function and you know, yeah. This is classic oroboros style problem.

I've heard a, uh, a wise friend of mine said that that is an ideal problem to have, which sounds weird at face value. But if you're getting to that point, that means that you're successful enough that, you know, you're having that problem, which is in itself a good thing, paradoxically.

[00:49:07] Jeremy: better to have that problem than to have nobody care about the product. Right.

[00:49:12] Xe: Yeah.

[00:49:13] Jeremy: kind of on that, that note, um, you mentioned you worked at, at Salesforce, uh, I believe that was working on Heroku. I wonder if you could talk a little about your experience working at, you know, tailscale, which is kind of more of a, you know, early startup versus, uh, an established company like Salesforce.

[00:49:36] Xe: So at the time I was working at Heroku, it definitely didn't feel like I was working at Salesforce for the majority of it. It felt like I was working, you know, at Heroku, like on my resume, I listed as Heroku. When I talked about it to people, I said, I worked at Heroku and that sales force was this, you know, mythical, Ohana thing that I didn't have to deal with unless I absolutely had to.

By the end of the time I was working at Heroku, uh, the salesforce, uh, sort of started to creep in and, you know, we moved from tracking issues in GitHub issues. Like we were used to, to using their, oh, what's the polite way to say this, their creation, which is, which was like the moral equivalent of JIRA implemented on top of Salesforce.

You had to be behind the VPN for it. And, you know, every ticket had 20 fields and, uh, there were no templates. And in comparison with tail scale, you know, we just use GitHub issues, maybe some like things in notion for doing like longer term tracking or Kanban stuff, but it's nice to not have. you know, all of the pomp and ceremony of filling out 20 fields in a ticket for like two sentences of this thing is obviously wrong and it's causing X to happen.

Please fix.

[00:51:08] Jeremy: I, I like that, that phrase, the, the creation, that's a very, very diplomatic term.

[00:51:14] Xe: I mean, I can think of other ways to describe it, but I'm pretty sure those ways wouldn't be allowed on the podcast. So

[00:51:25] Jeremy: Um, but, but yeah, I, I know what you mean for sure where, it, it feels like there's this movement from, Hey, let's just do what we need. Like let's fill in the information that's actually relevant and don't do anything else to a shift to, we need to fill in these 10 fields because that's the thing we do.

Yeah.

[00:51:48] Xe: Yeah. and in the time I've been working for tail scale, I'm like employee ID 12. And, uh, tail scale has gone from a company where I literally know everyone to just recently to the point where I don't know everyone anymore. And it's a really weird feeling. I've never been in a, like a small stage startup that's gotten to this size before, and I've described some of my feelings to other people who have been there and they're like, yeah, welcome to the club. So I figure a lot of it is normal. from what I understand, though, there's a lot of intentionality to try to prevent tail skill from becoming, you know, like Google style, complexity, organizational complexity, unless that is absolutely necessary to do something.

[00:52:36] Jeremy: it's a function of size, right? Like as you have more people, more teams, then more process comes in. that's a really tricky balance to, to grow and still keep that feeling of, I'm just doing the thing, I'm doing the work rather than all this other process stuff.

[00:52:57] Xe: Yeah, but it, I've also kind of managed to pigeonhole myself off into a corner with devrel stuff. And that's been nice. I've been working a bunch with, uh, like marketing people and, uh, helping out with support occasionally and doing a, like a godawful amount of writing.

[00:53:17] Jeremy: the, the writing, for our audience's benefit, I, I think they should, they should really check out your blog because I think that the way you write your, your articles is very thoughtful in terms of the balance of the actual example code or example scripts and the descriptions and, and some there's a little bit of a narrative sometimes too.

So,

[00:53:40] Xe: Um, I'm actually more of a prose writer just by like how I naturally write things. And a lot of the style of how I write things is, I will take elements from, uh, the Socratic style of dialogue where, you know, you have the student and the teacher. And, you know, sometimes the student will ask questions that the teacher will answer.

And I found that that's a particularly useful way to help model understanding or, you know, like put side concepts off into their own little blurbs or other things like that. I also started doing those conversation things with, uh, furry art, specifically to dunk on a homophobe that was getting very angry at furry art being in, uh, another person's blog.

And that's it, it's occasionally fun to go into the, uh, orange website of bad takes and see the comments when people complain about it. oh gosh, the bad takes are hilariously good. Sometimes.

[00:54:45] Jeremy: it's good that you have like a, a positive, mindset around that. I know some people can read, uh, that sort of stuff and go, you know, just get really bummed out.

[00:54:54] Xe: One of the ways I see it is that a lot of the time algorithms are based on like sheer numbers. So if you like get something that makes people argue in the comments, that number will go up and because there's more comments on it, it makes more people more likely to, to read the article and click on it.

So, sometimes I have been known to sprinkle, what's the polite way to say this. I've been known to sprinkle like intentionally kind of things that will, uh, get people and make them want to argue about it in the comments. Purely to make the engagement numbers rise up, which makes more people likely to read the article.

And, it's kind of a dirty practice, but you know, it makes more people read the article and more people benefit. So, you know, like it's kind of morally neutral, I guess.

[00:55:52] Jeremy: usually that, that seems like, a sketchy thing. But I feel like if it's in service to, uh, like a technical blog post, I mean, why not? Right.

[00:56:04] Xe: And a lot of the times I'll usually have the like, uh, kind of bad take, be in a little conversation blurb thing so that people will additionally argue about the characterization of, you know, the imaginary cartoon shark or whatever.

[00:56:20] Jeremy: That's good. It's the, uh, it's the Xe Xe universe that they're, they're stepping into.

[00:56:27] Xe: I've heard people describe it, uh, lovingly as the xeiaso.net cinematic universe.

I've had some ideas on how to expand it in the future with more characters that have more different kind of diverse backgrounds. But, uh, it turns out that writing this stuff is hard. Like actually very hard because you have to get this right.

You have to get the right balance of like snark satire, uh, like enlightenment. And

it's, it's surprisingly harder than you'd think. Um, but after a while, I've just sort of managed to like figure out as I'm writing where the side tangents come off and which ones I should keep and which ones I should, uh, prune and which ones can also help, Gain deeper understanding with a little like Socratic dialogue to start with a Mo like an incomplete assumption, like an incomplete picture.

And then, you know, a question of, wait, what about this thing? Doesn't that conflict with that? And like, well, yes. technically it does, but realistically we don't have to worry about that as much. So we can think about it just in terms of this bigger model and, uh, that's okay. Like, uh, I mentioned the OSI model earlier, you know, like the seven layer OSI model it's, you know, genuinely overkill for basically everything, except it's a really great conceptual model for figuring out the difference between, you know, like an ethernet cable, an ethernet, like the ethernet card, the IP stack TCP and, you know, TLS or whatever.

I have a couple talks that are gonna be up by the time this is published. Uh, one of them is my, uh, rustconf talk on my, or what was it called? I think it was called the surreal horrors of PAM or something where I discussed my experience, trying to bug a PAM module in rust, uh, for work. And, uh, it's the kind of story where, you know, it's bad when you have a break point on dlopen.

[00:58:31] Jeremy: That sounds like a nightmare.

[00:58:32] Xe: Oh yeah. Like part of the attempting to fix that process involved, going very deep. We're talking like an HTML frame set in the internet archive for sunOS documentation that was written around the time that PAM was used. Like it's things that are bad enough were like everything in the frame set, but the contents had eroded away through bit rot and you know, you're very lucky just to have what you do.

[00:59:02] Jeremy: well, I'm, I'm glad it was. It was you and not me. we'll get to, to hear about it and, and not have to go through the, the suffering ourselves.

[00:59:11] Xe: yeah. One of the things I've been telling people is that I'm not like a brilliant programmer. Like I know a bunch of people who are definitely way smarter than me, but what I am is determined and, uh, determination is a bit stronger of a force than you'd think.

[00:59:27] Jeremy: Yeah. I mean, without it, nothing gets done. Right.

[00:59:30] Xe: Yeah.

[00:59:31] Jeremy: as we wrap up, is there anything we missed or anything else you wanna mention?

[00:59:36] Xe: if you wanna look at my blog, it's on xeiaso.net. That's X, E I a S o.net. Um, that's where I post things. You can see, like the 280 something articles at time of recording. It's probably gonna get to 300 at some point, oh God, it's gonna get to 300 at some point. Um, and yeah, from, I try to post articles about weekly, uh, depending on facts and circumstances, I have a bunch of talks coming up, like one about the hilarious over engineering I did in my blog.

And maybe some more. If I get back positive responses from calls for paper submissions,

[01:00:21] Jeremy: Very cool. Well, Xe thank you so much for, for coming on software engineering radio.

[01:00:27] Xe: Yeah. Thank you for having me. I hope you have a good day and, uh, try out tailscale, uh, note my bias, but I think it's great.

Jonathan Shariat on Tragic Design

Fri, 09 Sep 2022 16:59:28 -0000

Jonathan Shariat is the coauthor of the book Tragic Design and co-host of the Design Review Podcast.

He's currently a Sr. Interaction Designer & Accessibility Program Lead at Google.

This episode originally aired on Software Engineering Radio.

Topics covered:

How poor design kills in medical environments
Causing harm with features meant to bring joy
Considerations during the product development cycle
Industry specific checklists and testing requirements
Creating guiding principles for a team
Why medical software often has poor UX
Designing for crisis situations
Why dark patterns can be bad in the long term

Related Links

Transcript

You can help edit this transcript on GitHub.

[00:00:00] Jeremy: Today I'm talking to Jonathan Shariat, he's the co-author of Tragic design. The host of the design review podcast. And he's currently a senior interaction designer and accessibility program lead at Google. Jonathan, welcome to software engineering radio.

[00:00:15] Jonathan: Hi, Jeremy, thank you So much for having me on.

[00:00:18] Jeremy: the title of your book is tragic design. And I think that people can take a lot of different meanings from that. So I wonder if you could start by explaining what tragic design means to you.

[00:00:33] Jonathan: Hmm. For me, it really started with this story that we have in the beginning of the book. It's also online. Uh, I originally wrote it as a medium article and th that's really what opened my eyes to, Hey, you know, design has, is, is this kind of invisible world all around us that we actually depend on very critically in some cases.

And So this story was about a girl, you know, a nameless girl, but we named her Jenny for the story. And in short, she came for treatment of cancer at the hospital, uh, was given the medication and the nurses that were taking care of her were so distracted with the software they were using to chart, make orders, things like that, that they miss the fact that she needed hydration and that she wasn't getting it.

And then because of that, she passed away. And I still remember that feeling of just kind of outrage. And, you know, when we hear a lot of news stories, A lot of them are outraging. they, they touch us, but some of them, some of those feelings stay and they stick with you.

And for me, that stuck with me, I just couldn't let it go because I think a lot of your listeners will relate to this. Like we get into technology because we really care about the potential of technology. What could it do? What are all the awesome things that could do, but we come at a problem and we think of all the ways it could be solved with technology and here it was doing the exact opposite.

It was causing problems. It was causing harm and the design of that, or, you know, the way that was built or whatever it was failing Jenny, it was failing the nurses too, right? Like a lot of times we blame that end user and, and it caused it. So to me, that story was so tragic. Something that deeply saddened me and was regrettable and cut short someone's uh, you know, life and that's the definition of tragic, and there's a lot of other examples with varying degrees of tragic, but, um, you know, as we look at the impact technology has, and then the impact we have in creating those technologies that have such large impacts, we have a responsibility to, to really look into that and make sure we're doing as best of job as we can and avoid those as much as possible.

Because the biggest thing I learned in researching all these stories was, Hey, these aren't bad people. These aren't, you know, people who are clueless and making these, you know, terrible mistakes. They're me, they're you, they're they're people. Um, just like you and I, that could make the same mistakes.

[00:03:14] Jeremy: I think it's pretty clear to our audience where there was a loss of life, someone, someone died and that's, that's clearly tragic. Right? So I think a lot of things in the healthcare field, if there's a real negative outcome, whether it's death or severe harm, we can clearly see that as tragic.

and I, I know in your book you talk about a lot of other types of, I guess negative things that software can cause. So I wonder if you could, explain a little bit about now past the death and the severe injury. What's tragic to you.

[00:03:58] Jonathan: Yeah. still in that line of like of injury and death, And, you know, the side that most of us will actually, um, impact, our work day-to-day is also physical harm. Like, creating this software in a car. I think that's a fairly common one, but also, ergonomics, right?

Like when we bring it back to something like less impactful, but still like multiplied over the impact of, multiplied over the impact of a product rather, it can be quite, quite big, right? Like if we're designing software in a way that's very repetitive or, you know, everyone's, everyone's got that, that like scroll, thumb, scroll, you know, issue.

Right. if, uh, our phones aren't designed well, so there's a lot of ways that it can still physically impact you ergonomically. And that can cause you a lot of problem arthritis and pain, but yeah, there's, there's other, there's other, other ways that are still really impactful. So the other one is by saddening or angry.

You know, that emotional harm is very real. And oftentimes sometimes it gets overlooked a little bit because it's, um, you know, physical harm is what is so real to us, but sometimes emotional harm isn't. But, you know, we talk about in the book, the example of Facebook, putting together this great feature, which takes your most liked photo, and, you know, celebrates your whole year by you saying, Hey, look at as a hero, you're in review this, the top photo from the year, they add some great, you know, well done illustrations behind it, of, of balloons and confetti and, people dancing.

But some people had a bad year. Some people's most liked engaged photo is because something bad happened and they totally missed. And because of that, people had a really bad time with this where, you know, they lost their child that year. They lost their loved one that year, their house burnt down. Um, something really bad happened to them.

And here was Facebook putting that photo of their, of their dead child up with, you know, balloons and confetti and people dancing around it. And that was really hard for people. They didn't want to be reminded of that. And especially in that way, and these emotional harms also come into the, in the play of, on anger.

You know, we talk about, well, one, you know, there's, there's a lot of software out there that, that, um, tries to bring up news stories that anger us and which equals engagement. Um, but also ones that, um, use dark patterns to trick us into purchasing and buying and forgetting about that free trial. So they charge us for a yearly subscription and won't refund us.

Uh, if you've ever tried to cancel a subscription, you start to see some real their their real colors. Um, so emotional harm and, uh, anger is a, is a big one. We also talk about injustice in the book where there are products that are supposed to be providing justice. Um, and you know, in very real ways like voting or, you know, getting people the help that they need from the government, or, uh, for people to see their loved ones in jail.

Um, or, you know, you're getting a ticket unfairly because you couldn't read the sign was you're trying to read the sign and you, and you couldn't understand it. so yeah, we look at a lot of different ways that design and our saw the software that we create can have very real impact on people's lives and in a negative way, if we're not careful.

[00:07:25] Jeremy: the impression I get, when you talk about tragic design, it's really about anything that could harm a person, whether physically, emotionally, you know, make them angry, make them sad. And I think the, the most liked photo example is a great one, because like you said, I think the people may be building something that, that harms and they may have no idea that they're doing it.

[00:07:53] Jonathan: Exactly like that. I love that story because not, not to just jump on the bandwagon of saying bad things about like Facebook or something. No, I love that story because I can see myself designing the exact same thing, like being a part of that product, you know, building it, you know, looking at the, uh, the, the specifications, the, um, the, the PM, you know, put it that put together and the decks that we had, you know, like I could totally see that happening.

And just never, I think, never having the thought, because our we're so focused on like delighting our users and, you know, we have these metrics and these things in mind. So that's why, like, in the book, we really talk about a few different processes that need to be part of. Product development cycle to stop, pause, and think about like, well, what are the, what are the negative aspects here?

Like what are the things that could go wrong? What are the, what are the other life experiences that are negative? Um, that could be a part of this and you don't need to be a genius to think of every single thing out there. You know, like in this example, I think just talking about, you know, like, oh, well, some people might've had, you know, if they would have taken probably like, you know, one hour out of their entire project, or maybe even 10 minutes, they might've come up with like, oh, there could be bad thing.

Right. But, um, so if you don't have that, that, that moment to pause that moment to just say, okay, we have time to brainstorm together about like how this could go wrong or how, you know, the negative of life could be impacted by this, um, feature that that's all that it takes. It doesn't necessarily mean that you need to do.

You know, giant study around the impact, potential impact of this product and all the, all the ways, but really just having a part of your process that takes a moment to think about that will just create a better product and better, product outcomes. You know, if you think about all of life's experiences and Facebook can say, Hey, condolences, and like, you know, and show that thoughtfulness that would be, uh, I would have that have higher engagement that would have higher, uh, satisfaction, right?

So they could have created a better outcome by considering these things and obviously avoid the impact negative impact to users and the negative impact to their product.

[00:10:12] Jeremy: continuing on with that thought you're a senior interaction designer and you're an accessibility program lead. And so I wonder on the projects that you work on, and maybe you can give us a specific example, but how are you ensuring that you're, you're not running up against these problems where you build something that you think is going to be really great, um, for your users, but in reality ends up being harmful and specifically.

[00:10:41] Jonathan: Yeah, one of the best ways is, I mean, it should be part of multiple parts of your cycle. If, if you want something, if you want a specific outcome out of your product development life cycle, um, it needs to be from the very beginning and then a few more times, so that it's not, you know, uh, I think, uh, programmers, uh, will all latch onto this, where they have the worst end of the stick, right?

Because a and Q and QA as well. Because, you know, any bad decision or assumption that's happened early on with, you know, the, the business team or, or the PM, you know, gets like multiplied when they talk to the designer and then gets multiplied again, they hand it off. And it's always the engineer who has to, has to put the final foot down, be like, this doesn't make sense.

Or I think users are going to react this way, or, you know, this is the implication of that, that assumption. So, um, it's the same thing, you know, in our team, we have it in the very early stage when someone's putting together the idea for the feature, our project, we want to work on it's right there. There's a few, there's like a section about accessibility and a few other sections, uh, talking about like looking out for this negative impact.

So right away, we can have a discussion about it when we're talking about like what we should do about this and the D and the different, implications of implementing it. That's the perfect place for it. You know, like maybe, maybe when you're a brainstorm. Uh, about like, what should we should do? Maybe it's not okay there because you're trying to be creative.

Right. You're trying to think. But at the very next step, when you're saying, okay, like what would it mean to build this that's exactly where I should start showing up and, you know, the discussion from the team. And it depends also the, the risk involved, right? Like, uh, it depends, which is attached to how much, uh, time and effort and resources you should put towards avoiding that risk it's risk management.

So, you know, if you work, um, like my, um, you know, colleagues, uh, or, you know, some of my friends were working in the automotive industry and you're creating a software and you're worried that it might be distracting. There might be a lot more time and effort or the healthcare industry. Um, those were, those are, those might need to take a lot more resources, but if you're a, maybe a building, um, you know, SaaS software for engineers to spin up, you know, they're, um, you know resources.

Um, there might be a different amount of resources. It never is zero, uh, because you still have, are dealing with people and you'll impact them. And, you know, maybe, you know, that service goes down and that was a healthcare service that went down because of your, you know, so you really have to think about what the risk is.

And then you can map that back to how much time and effort you need to be spending on getting that. Right. And accessibility is one of those things too, where a lot of people think that it takes a lot of effort, a lot of resources to be accessible. And it really isn't. It just, um, it's just like tech debt, you know, if, if you have ignored your tech debt for, you know, five years, and then they're saying, Hey, let's all fix all the tech debt. Yeah. Nobody's going to be on board for that as much. Versus like, if, if addressing that and finding the right level of tech debt that you're okay with and when you address it and how, um, because, and just better practice. That's the same thing with accessibility is like, if you're just building it correctly, as you go, it's, it's very low effort and it just creates a better product, better decisions.

Um, and it is totally worth the increased amount of people who can use it and the improved quality for all users. So, um, yeah, it's just kind of like a win-win situation.

[00:14:26] Jeremy: one of the things you mentioned was that this should all start. At the very beginning or at least right after you've decided on what kind of product you're going to build, and that's going to make it much easier than if you come in later and try to, make fixes then, I wonder when you're all getting together and you're trying to come up with these scenarios, trying to figure out negative impacts, what kind of accessibility, needs you need to have, who are the people who are involved in that conversation?

Like, um, you know, you have a team of 50 people who needs to be in the room from the very beginning to start working this out.

[00:15:05] Jonathan: I think it would be the same people who are there for the project planning, like, um, at, on my team, we have our eng counter counterparts there. at least the team lead, if, if, if there's a lot of them, but you know, if they would go to the project kickoff, uh, they should be there.

you know, we, we have everybody in their PM, design, engineers, um, our project manager, like anyone who wants to contribute, uh, should really be there because the more minds you have with this the better, and you'll, you'll tease out much, much more of, of of all the potential problems because you have a more, more, um, diverse set of brains and life experiences to draw from.

And so you'll, you'll get closer to that 80% mark, uh, that you can just quickly take off a lot of those big items off the table, right?

[00:16:00] Jeremy: Is there any kind of formal process you follow or is it more just, people are thinking of ideas, putting them out there and just having a conversation.

[00:16:11] Jonathan: Yeah, again, it depends which industry you're in, what the risk is. So I previously worked at a healthcare industry, um, and for us to make sure that we get that right, and how it's going to impact the patients, especially though is cancer care. And they were using our product to get early warnings of adverse effects.

Our, system of figuring that like, you know, if that was going to be an issue was more formalized. Um, in, in some cases, uh, like, like actually like healthcare and especially if the, if it's a device or, or in certain software circumstances, it's determined by the FDA to be a certain category, you literally have a, uh, governmental version of this.

So the only reason that's there is because it can prevent a lot of harm, right? So, um, that one is enforced, but there's, there's reasons, uh, outside of the FDA to have that exact formalized part of your process. And it can, the size of it should scale depending on what the risk is. So on my team, the risk is, is actually somewhat low.

it's really just part of the planning process. We do have moments where we, we, um, when we're, uh, brainstorming like what we should do and how the feature will actually work. Where we talk about like what those risks are and calling out the accessibility issues. And then we address those. And then as we are ready to, um, get ready to ship, we have another, um, formalized part of the process.

There will be check if the accessibility has been taken care of and, you know, if everything makes sense as far as, you know, impact to users. So we have those places, but in healthcare, but it was much stronger where we had to, um, make sure that we re we we've tested it. We've, uh, it's robust. It's going to work on, we think it's going to work.

Um, we, you know, we do user testing has to pass that user testing, things like that before we're able to ship it, uh, to the end user.

[00:18:12] Jeremy: So in healthcare, you said that the FDA actually provides, is it like a checklist of things to follow where you must have done this? As you're testing and you must have verified these, these things that's actually given to you by the government.

[00:18:26] Jonathan: That's right. Yeah. It's like a checklist and the testing requirement. Um, and there's also levels there. So, I have, I've only, I've only done the lowest level. I know. There's like, I think like two more levels above that. Um, and again, that's like, because the risk is higher and higher and there's more stricter requirements there where maybe somebody in the FDA needs to review it at some point.

And, um, so again, like mapping it back to the risk that your company has is, is really important to understanding that is going to help you avoid and, and build a better product, avoid, you know, the bad impact and build a better product. And, and I think that's one of the things I would like to focus on as well.

And I'd like to highlight for your, for your listeners, is that, it's not just about avoiding tragic design because one thing I've discovered since writing the book and sharing it with a lot of people. Is that the exact opposite thing is usually, you know, in a vast majority of the cases ends up being a strategically great thing to pursue for the product and the company.

You know, if you think about, that, that example with, with Facebook, okay. You've run into a problem that you want to avoid, but if you actually do a 180 there and you find ways to engage with people, when they're grieving, you find people to, to develop features that help people who are grieving, you've created a value to your users, that you can help build the company off of.

Right. Um, cause they were already building a bunch of joy features. Right. Um, you know, and also like user privacy, like I, we see apple doing that really well, where they say, okay, you know, we are going to do our ML on device. We are going to do, you know, let users decide on every permission and things like that.

And that, um, is a strategy. We also see that with like something like T-Mobile, when they initially started out, they were like one of the nobody, uh, telecoms in the world. And they said, okay, what are all the unethical bad things that, uh, our competitors are doing? They're charging extra fees, you know, um, they have these weird data caps that are really confusing and don't make any sense their contracts, you get locked into for many years.

They just did the exact opposite of that. And that became their business strategy and it, and it worked for them now. They're, they're like the top, uh, company. So, um, I think there's a lot of things like that, where you just look at the exact opposite and, you, one you get to avoid the bad, tragic design, but you also see boom, you see an opportunity that, um, become, become a business strategy.

[00:21:03] Jeremy: So, so when you referred to exact opposite, I guess you're, you're looking for the potentially negative outcomes that could happen. there was the Facebook example of, of seeing a photo or being reminded of a really sad event and figuring out can I build a product around, still having that same picture, but recontextualizing it like showing you that picture in a way that's not going to make you sad or upset, but is actually a positive.

[00:21:35] Jonathan: Yeah. I mean, I don't know maybe what the solution was, but like one example that comes to mind is some companies. Now, before mother's day, we'll send you an email and say, Hey, this is coming up. Do you want us to send you emails about mother's day? Because for some people that's Can, be very painful. That's that's very thoughtful.

Right. And that's a great way to show that you, that you care. Um, but yeah, like, you know, uh, thinking about that Facebook example, like if there's a formalized way to engage with, with grieving, like, I would use Facebook for that. I don't use Facebook very often or almost at all, but you know, if somebody passed away, I would engage right with my, my Facebook account.

And I would say, okay, look, there's like, there's this whole formalized, you know, feature around, you know, uh, and, and Facebook understands grieving and Facebook understands like this w this event and may like smooth that process, you know, creates comfort for the community that's value and engagement. that is worthwhile versus artificial engagement.

That's for the sake of engagement. and that would create, uh, a better feeling towards Facebook. Uh, I would maybe like then spend more time on Facebook. So it's in their mutual interest to do it the right way. Um, and so it's great to focus on these things to avoid harm, but also to start to see new opportunities for innovation.

And we see this a lot already in accessibility where there's so many innovations that have come from just fixing accessibility issues like closed captions. We all use it, on our TVs, in busy crowded spaces, on, you know, videos that have no, um, uh, translation for us in different places.

So, SEO is, is the same thing. Like you get a lot of SEO benefit from, you know, describing your images and, and making everything semantic and things like that. And that also helps screen readers. and different innovations have come because somebody wanted to solve an accessibility need.

And then the one I love, I think it's the most common one is readability, like contrast and tech size. Sure. There's some people who won't be able to read it at all, but it hurts my eyes to read bad contrast and bad text size. And so it just benefits. Everyone creates a better design. And one of the things that comes up so often when I'm, you know, I'm the accessibility program lead.

And so I see a lot of our bugs is so many issues that, that are caught because of our, our audits and our, like our test cases around accessibility that just our bad design and our bad experience for everyone. And so we're able to fix that. And, uh, and it's just like an another driver of innovation and there's, there's, there's a ton of accessibility examples, and I think there's also a ton of these other, you know, ethical examples or, you know, uh, avoiding harm where you just can see it. It's an opportunity area where it's like, oh, let's avoid that. But then if you turn around, you can see that there's a big opportunity to create a business strategy out of it.

[00:24:37] Jeremy: Can, can you think of any specific examples where you've seen that? Where somebody, you know, doesn't treat it as something to avoid, but, but actually sees that as an opportunity.

[00:24:47] Jonathan: Yeah. I mean, I, I think that the, um, the apple example is a really good one where from the beginning, like they, they saw like, okay, in the market, there's a lot of abuse of information and people don't like that. So they created a business strategy around that And that's become a big differentiator for them.

Right. Like they, they have like ML on the device. They do. Um, they have a lot of these permission settings, you know, the Facebook. It was very much focused right. On, on using customer data and a lot of it without really asking their permission. And so once apple said, okay, now all apps need to show what you're tracking.

And, and then, um, and asked for permission to do that. A lot of people said no, and that caused about $10 billion of loss for, for Facebook. and for, for apple, it's, you know, they advertise on that now that we're, you know, ethical that, you know, we, we source things ethically and we, we care about user privacy and that's a strong position, right?

Uh, I think there's a lot of other examples out there. Like I mentioned accessibility and others, but like it they're kind of overflowing, so it's hard to pick one.

[00:25:58] Jeremy: Yeah. And I think what's interesting about that too, is with the example of focusing on user privacy or trying to be more sensitive around, death or things like that, as I think that other people in the industry will, will notice that, and then in their own products, then they may start to incorporate those things as well.

[00:26:18] Jonathan: Yeah. Yeah, exactly what the example of with T-Mobile. once that worked really, really well and they just ate up the entire market, all the other companies followed suit, right? Like now, um, having those data caps that, you know, are, are very rare, having those surprise fees are a lot, uh, rare.

Um, you know, there's, there's no more like deep contracts that lock you in and et cetera, et cetera. A lot of those have become industry standard now. Um, and so It, and it does improve the environment for everyone because, because now it becomes a competitive advantage that everybody needs to meet. Um, so yeah, I think that's really, really important.

So when you're going through your product's life cycle, you might not have the ability to make these big strategic decisions. Like, you know, we want to, you know, not have data caps or whatever, but, you know, if you, if you're on that Facebook level and you run into that issue, you could say, well, look, what could we do to address this?

What could we could do to, to help this and make, make that a robust feature? You know, when we talk about, lot of these dating apps, one of the problems was a lot of abuse, where women were being harassed or, you know, after the day didn't go well and you know, things were happening. And so a lot of apps have now dif uh, these dating apps have differentiated themselves and attracted a lot of that market because they deal with that really well.

And they have, you know, it's built into the strategy. It's oftentimes like a really good place to start too, because one it's not something we generally think about very, very well, which means your competitors. Haven't thought about it very well, which means it's a great place to, to build products, ideas off of.

[00:27:57] Jeremy: Yeah, that's a good point because I think so many applications now are like social media applications, their messaging applications there, their video chat, that sort of thing. I think when those applications were first built, they didn't really think so much about what if someone is, you know, sending hateful messages or sending, pictures that people really don't want to see.

Um, people are doing abusive things. It was like, they just assume that, oh, people will be, people will be good to each other and it'll be fine. But, uh, you know, in the last 10 years, pretty much all of the major social media companies have tried to figure out like, okay, um, what do I do if someone is being abusive and, and what's the process for that?

And basically they all have to do something now. Um,

[00:28:47] Jonathan: Yeah. And that's a hard thing to like, if, if that, uh, unethical or that, um, bad design decision is deep within your business strategy and your company's strategy. It's hard to undo that like some companies are still, still have to do that very suddenly and deal with it. Right. Like, uh, I know Uber had a big, big part of them, like, uh, and some other companies, but, uh, we're like almost suddenly, like everything will come to a head and they'll need to deal with it.

Or, you know, like, Twitter now try to try to get, be acquired by Elon Musk. Uh, some of those things are coming to light, but, I, what I find really interesting is that these these areas are like really ripe for innovation. So if you're interested in, a startup idea or you're, or you're working in a startup, or, you know, you're about to start one, you know, there's a lot of maybe a lot of people out there who are thinking about side projects right now, this is a great way to differentiate and win that market against other well-established competitors is to say, okay, well, what are they, what are they doing right now that is unethical. And it's like, you know, core to their business strategy and doing that differently is really what will help you, to win that market. And we see that happening all the time, you know, especially the ones that are like these established, uh, leaders in the market. they can't pivot like you can, so being able to say, I'm, we're going to do this ethically.

We're going to do this, uh, with, you know, with these tragic design in mind and doing the opposite, that's going to help you to, to find your, your attraction in the market.

[00:30:25] Jeremy: Earlier, we were talking about. How in the medical field, there is specific regulation or at least requirements to, to try and avoid this kind of tragic design. Uh, I noticed you also worked for Intuit before. Uh, um, so for financial services, I was wondering if there was anything similar where the government is stepping in and saying like, you need to make sure that, these things happen to avoid, these harmful things that can come up.

[00:30:54] Jonathan: Yeah, I don't know. I mean, I didn't work on TurboTax, so I worked on QuickBooks, which is like a accounting software for small businesses. And I was surprised, like we didn't have a lot, like a lot of those robust things, we just relied on user feedback to tell us like, things were not going well. And, you know, and I think we should have, like, I think, I think that that was a missed opportunity, um, to.

Show your users that you understand them and you care, and to find those opportunity areas. So we didn't have enough of that. And there was things that we shipped that didn't work correctly right out of the box, which, you know, it happens, but had a negative impact to users. So it's like, okay, well, what do we do about that?

How do we fix that? Um, and if the more you formalize that and make it part of your process, the more you get out of it. And actually this is like, this is a good, a good, um, uh, pausing point bit that I think will affect a lot of engineers listening to this. So if you remember in the book, we talk about the Ford Pinto story and there isn't, I want to talk about this story and why I added it to the book.

Is that, uh, one, I think this is the thing that engineers deal with the most, um, and, and designers do too, which is that okay. we see the problem, but we don't think it's worth fixing. Okay. Um, so that, that's what I'm going. That's what we're going to dig into here. So it's a, hold on for a second while I explain some, some history about this car.

So the Ford Pinto, if you're not familiar is notorious, uh, because it was designed, um, and built and shipped and there, they knowingly had this problem where if it was rear-ended at even like a pretty low speed, it would burst into flames because the gas tank would rupture the, and then oftentimes the, the, the doors would get jammed.

And so it became a death trap of fire and caused many deaths, a lot of injuries. And, um, in an interview with the CEO at the time, like almost destroyed Ford like very seriously would have brought the whole company down and during the design of it, uh, and design meaning in the engineering sense. Uh, and the engineering design of it, they say they found this problem and the engineers came up with their best solution.

Was this a rubber block. Um, and the cost was, uh, I forget how many dollars let's say it was like $9. let's say $6, but this is again, uh, back then. And also the margin on these cars was very, very, very thin and very important to have the lowest price in the market to win those markets. The customers were very price sensitive, so they, uh, they being like the legal team looked at like some recent, cases where they have the value of life and started to come up with like a here's how many people would sue us and here's how much it would cost to, uh, to, to settle all those.

And then here's how much it would cost to add this to all the cars. And it was cheaper for them to just go with the lawsuits and they, they found. Um, and I think why, I think why this is so important is because of the two things that happened afterward, one, they were wrong. it was a lot more people it affected and the lawsuits were for a lot more money.

And two after all this was going crazy and it was about to destroy the company, they went back to the drawing board and what did the engineers find? They found a cheaper solution. They were able to rework that, that rubber block and and get it under the margin and be able to hit the mark that they wanted to.

And I think that's, there's a lot of focus on the first part because it's so unethical to the value of life and, and, um, and doing that calculation and being like we're willing to have people die, but in some industries, it's really hard to get away with that, but it's also very easy. To get into that.

It's very easy to get lulled into this sense of like, oh, we're just going to crunch the numbers and see how many users it affects. And we're okay with that. Um, versus when you have principals and you have kind of a hard line and you, and you care a lot more than you should. And, and you really push yourself to create a more ethical, more, a safer, you know, avoiding, tragic design, then you, there there's a solution out there.

Like you actually get to innovation, you actually get to the solving the problem versus when you just rely on, oh, you know, the cost benefit analysis we did is that it's going to take an engineer in a month to fix this and blah blah blah. But if, if you have those values, if you have those principles and you're like, you know what, we're not okay shipping this, then you'll, you'll find that.

They're like, okay, there's, there's a cheaper way to, to fix this. There's another way we could address this. And that happens so often. and I know a lot of engineers deal with that. A lot of saying like, oh, you know, this is not worth our time to fix. This is not worth our time to fix. And that's why you need those principles is because oftentimes you don't see it and it's, but it's right there at right outside of the edge of your vision.

[00:36:12] Jeremy: Yeah. I mean, with the Pinto example, I'm just picturing, you know, obviously there wasn't JIRA back then, but you can imagine that somebody's having an issue that, Hey, when somebody hits the back of the car, it's going to catch on fire. Um, and, and going like, well, how do I prioritize that? Right? Like, is this a medium ticket?

Is this a high ticket? And it's just like, it's just, it just seems insane, right? That you could, make the decision like, oh no, this isn't that big an issue. You know, we can move it down to low priority and, and, and, ship it.

Okay.

[00:36:45] Jonathan: Yeah. And, and, and that's really what principals do for you, right? Is they help you make the tough decisions. You don't need a principle for an easy one. Uh, and that's why I really encourage people in the book to come together as a team and come up with what are your guiding principles. Um, and that way it's not a discussion point every single time.

It's like, Hey, we've agreed that this is something that we, that we're going to care about. This is something that we are going to stop and, fix. Like, one of the things I really like about my team at Google is product excellence is very important to us. and. there are certain things that, uh, we're, you know, we're Okay. with, um, letting slip and fixing at a next iteration.

And, you know, obviously we make sure we actually do that. Um, so it's not like we, we, we always address everything, but because it's one of our principles. We care more. We have more, we take on more of those tickets and we take on more of those things and make sure that they ship before, um, can make sure that they're fixed before we ship.

And, and it shows like to the end user that th that this company cares and they have quality. Um, so it's one of it. You need a principal to kind of guide you through those difficult things that aren't obvious on a decision to decision basis, but, you know, strategically get you in somewhere important, you know, and, and like, like design debt or, um, our technical debt where it's like, this should be optimized, you know, this chunk of code, like, nah, but you know, in, in it grouping together with a hundred of those decisions.

Yeah. It's gonna, it's gonna slow it down every single project from here on out. So that's why you need those principles.

[00:38:24] Jeremy: So in the book, uh, there are a few examples of software in healthcare. And when you think about principles, you would think. Generally everybody on the team would be on board that we want to give whatever patient that's involved. We want to give them good care. We want them to be healthy. We don't want them to be harmed.

And given that I I'm wondering because you, you interviewed multiple people in the book, you have a few different case studies. Um, why do you think that medical software in particular seems to be, so it seems to have such poor UX or has so many issues.

[00:39:08] Jonathan: Yeah, that's a, complicated topic. I would summarize it with a few, maybe three different reasons. Um, one which I think is, uh, maybe a driving factor of, of some of the other ones. Is that the way that the medical, uh, industry works is the person who purchases the software. It's not the end user. So it's not like you have doctors and nurses voting on, on which software to use.

Um, and so oftentimes it's, it's more of like a sales deal and then just gets pushed out and they, and they also have to commit to these things like, um, the software is very expensive and, uh, initially with, you know, like in the early days was very much like it needs to be installed, maintain, there has to be training.

So there was a lot to money to be made, in those, in that software. And, and so the investment from the hospital was a lot, so they can't just be like, oh, can it be to actually, don't like this one, we're going to switch to the next one. So, because like, once it's sold, it's really easy to just like, keep that customer.

There's very little incentive to like really improve it unless you're selling them a new feature. So there's a lot of feature add ons. Because they can charge more for those, but improving the experience and all that kind of stuff. There is less of that. I think also there's just generally a lot less like, uh, understanding of design, in that field.

And there's a lot more because there's sort of like traditions of things. they end up putting a lot of the pressure and the, that responsibility on the end individuals. So, you know, you've heard recently of that nurse who made a medication error and she's going to jail for that. And sh you know, And oftentimes we blame that end, that end person.

So the, the nurse gets all the blame or the doctor gets all the blame. Well, what about the software, you know, who like made that confusing or, you know, what about the medication that looks exactly like this other medication? Or what about the pump tool that you have to, you know, type everything in very specifically, and the nurses are very busy.

They're doing a lot of work. There's a 12 hour shifts. They're dealing with lots of different patients, a lot of changing things for them to have to worry about having to type something a specific way. And yet when those problems happen, what do they do? They don't go in like redesign the devices. Are they more training, more training, more training, more training, and people only can absorb so much training.

and so I think that's part of the problem is that like, there's no desire to change. They blame the end, the wrong person, and. Uh, lastly, I think that, um, it is starting to change. I think we're starting to see like the ability for, because of the fact that the government is pushing healthcare records to be more interoperable, meaning like I can take my health records anywhere, that a lot of the power comes in where the data is.

And so, um, I'm hoping that, uh, you know, as the government and people and, um, and initiatives push these big companies, like epic to be more open, that things will improve. One is because they'll have to keep up with their competitors and that more competitors will be out there to improve things. Because I, I think that there's, there's the know-how out there, but like, because the there's no incentive to change and, and, and there's no like turnover and systems and there's the blaming of the end user.

We're not going to see a change anytime soon.

[00:42:35] Jeremy: that's a, that's a good point in terms of like, it, it seems like even though you have all these people who may have good ideas may want to do a startup, uh, if you've got all these hospitals that already locked into this very expensive system, then yeah. Where's, where's the room to kind of get in there in and have that change.

[00:42:54] Jonathan: yeah.

[00:42:56] Jeremy: Uh, another thing that you talk about in the book is about how, when you're in a crisis situation, the way that a user interacts with something is, is very different. And I wonder if you have any specific examples for software when, when that can happen.

[00:43:15] Jonathan: yeah. Designing for crisis is a very important part of every software because, it might be hard for you to imagine being in that situation, but, it, it definitely will still happen so. one example that comes to mind is, uh, you know, let's say you're working on a cloud, um, software, like, uh, AWS or Google cloud.

Right. there's definitely use cases and user journeys in your product where somebody would be very panicked. Right. Um, and if you've ever been on an on-call with, with something and it goes south, and it's a big deal, you don't think. Right. Right. Like when we're in crisis, our brains go into a totally different mode of like that fight or flight mode.

And we don't think the way we do, it's really hard to read and comprehend very hard. and we might not make this, the right decisions and things like that. So, you know, thinking about that, like maybe your, your let's say, like, going back to that, the cloud software, like let's say you're, you're, you're working on that, like.

Are you relying on the user reading a bunch of texts about this button, or is it very clear from the way you've crafted that exact button copy and how big it is? And, and it's where it is relation to a bunch of other content? Like what exactly it does. It's going to shut down the instance where it's gonna, you know, it's, it's gonna, do it at a delay or whatever, like be able to all those little decisions, like are really impactful.

And when you, when you run them through the, um, the, the furnace of, of, of, uh, um, a user journey that's relying on, on a really urgent situation, you'll obviously help that. And you'll, you'll start to see problems in your UI that you hadn't noticed before, or, or different problems in the way you're implementing things that you didn't notice before, because you're seeing it from a different way.

And that's one of the great things about, um, the, the systems and the book that we talk about around, like, thinking about how things could go wrong, or, you know, thinking about, you know, designing for crisis. Is it makes you think of some new use cases, which makes you think of some new ways to improve your product.

You know, that improvement you make to make it so obvious that someone could do it in a crisis would help everyone, even when they're not in a crisis. Um, so that, that's why it's important to, to focus on those things.

[00:45:30] Jeremy: And for someone who is working on these products, it's kind of hard to trigger that feeling of crisis. If there isn't actually a crisis happening. So I wonder if you can talk a little bit about how you, you try to design for that when it's not really happening to you. You're just trying to imagine what it would feel like.

[00:45:53] Jonathan: yeah. Um, you're never really going to be able to do that. Like, so some of it has to be simulated, One of the ways that we are able to sort of simulate what we call cognitive load. Which is one of the things that happen during a crisis. But what also happened when someone's very distracted, they might be using your product while they're multitasking.

We have a bunch of kids, a toddler constantly pulling on their arm and they're trying to get something done in your app. So, one of the ways that has been shown to help, uh, test that is, um, like the foot tapping method. So when you're doing user research, you have the user doing something else, like tapping or like, You know, uh, make it sound like they have a second task that they're doing on the side.

It's manageable, like tapping their feet and their, their hands or something. And then they also have to do your task. Um, so like you can like build up what those tabs with those extra things are that they have to do while they're also working on, uh, finishing the task you've given them. and, and that's one way to sort of simulate cognitive load.

some of the other things is, is really just, um, you know, listening to users, stories and, and find out, okay, this user was in crisis. Okay, great. Let's talk to them and interview them about that. Uh, if it was fairly recently within like the past six months or something like that. but, but sometimes you don't like, you just have to run through it and do your best.

Um, and you know, those black Swan events or those, even if you're able to simulate it yourself, like put your, put your, put yourself into that exact position and be in panic, which, you know, you're not able to, but if you were that still would only be your experience and you wouldn't know all the different ways that people could experience this.

So, and there's going to be some point in time where you're gonna need to extrapolate a little bit and, you know, extrapolate from what you know, to be true, but also from user testing and things like that. And, um, and then wait for a real data

[00:47:48] Jeremy: You have a chapter in the book on design that angers and there were, there were a lot of examples in there, on, on things that are just annoying or, you know, make you upset while you're using software. I wonder for like our audience, if you could share just like a few of your, your favorites or your ones that really stand out.

[00:48:08] Jonathan: My favorite one is Clippy because, um, you know, I remember growing up, uh, you know, writing software, writing, writing documents, and Clippy popping up. And, I was reading an article about it and obviously just like everybody else, I hated it. You know, as a little character, it was fun, but like when you're actually trying to get some work done, it was very annoying.

And then I remember, uh, a while later reading this article about how much work the teams put into clubby. Like, I mean, if you think about it now, It had a lot of like, um, so the AI that we're playing with just now, um, around like natural language processing, understanding, like what, what type of thing you're writing and coming up with contextualized responses, like it was pretty advanced for the, uh, very advanced for the time, you know, uh, adding animation triggers to that and all, all that.

Um, and they had done a lot of user research. I was like, what you did research in, like you had that reaction. And I love that example because, oh, and also by the way, I love how they, uh, took Clippy out and S and highlighted that as like one of the features of the next version of the office, uh, software.

but I love that example again, because I see myself in that and, you know, you ha you have a team doing something technologically amazing doing user research, uh, and putting out a very great product, but he totally missing. And a lot of products do that. A lot of teams do that. And why is that? It's because they're, um, they're not thinking about, uh, they're putting their, they're putting the business needs or the team's needs first and they're putting the user's needs second.

And whenever we do that, whenever we put ourselves first, we become a jerk, right? Like if you're in a relationship and you're always putting yourself first, that relationship is not going to last long or it's not going to go very well. And yet we Do that with our relationship with users where we're constantly just like, Hey, well, what is the business?

The business wants users to not cancel here so let's make it very difficult for people to cancel. And that's a great way to lose customers. That's a great way to create, this dissonance with your, with your users. And, um, and so if you, if you're, focused on like, this is what the we need to accomplish with the users, and then you work backwards from.

You're you're, you're, you're, you're lower your chances of missing it, of getting it wrong of angering your users. and const always think about like, you sometimes have to be very real with yourselves and your team. And I think that's really hard for a lot of teams because we have we don't want to look bad.

We don't want to, but what I found is those are the people who actually, um, get promoted. Like, you know, if you look at the managers and directors and stuff, those are the people who can be brutally honest. Right. Um, who can say, like, I don't think this is ready. I don't, I don't think this is good. And so you actually, I, I, you know, I've done that in the front of like our CEO and things like that.

And I've always had really good responses from them to say, like, we really appreciate that you, you know, uh, you can call that out and you can just call it like, it is like, Hey, this is what we see this user. Maybe we shouldn't do this at all. Maybe. Um, and that can, uh, you know, at Google that's one of the criteria that we have in our software engineers and the designers of being able to spot things that are, you know, things that we shouldn't should stop doing.

Um, and so I think that's really important for the development of, of a senior engineer, uh, to be able to, to know that that's something like, Hey, this project, I would want it to work, but in its current form is not good. And being able to call that out is very important.

[00:51:55] Jeremy: Do you have any specific examples where there was something that was like very obvious to you? To the rest of the team or to a lot of other people that wasn't.

[00:52:06] Jonathan: um, yeah, so here's an example I finally got, I was early on in my career and I finally got to lead in our whole project. So we are redesigning our business micro-site um, and I got to, I got, uh, assigned two engineers and another designer and I got to lead the whole. I was, I was like, this is my chance.

Right? So, and we had a very short timeline as well, and I put together all these designs. And, um, one of the things that we aligned on at the time was like as really cool, uh, so I put together this really cool design for the contact form, where you have like, essentially, I kind of like ad-lib, it looks like a letter.

and you know, by the way, give me a little bit of, of, uh, of, of leeway here. Cause this was like 10 years ago, but, uh, it was like a letter and you would say like, you're addressing it to our company. And so it had all the things we wanted to get out of you around like your company size, your team, like, and so our sales team would then reach out to this customer.

I designed it and I had shown it to the team and everybody loved it. Like my manager signed off on it. Like all the engineers signed off on it, even though we had a short timeline, they're like, yeah, well we don't care. That's so cool. We're going to build it. But as I put it through that test of like, does this make sense for the, what the user wants answers just kept saying no to me.

So I had to go and back in and pitch everybody and argue with them around not doing the cool idea that I wanted to do. And, um, eventually they came around and that form performed once we launched it performed really well. And I think about like, what if users had to go through this really wonky thing?

Like this is the whole point of the website is to get this contact form. It should be as easy and as straightforward as possible. So I'm really glad we did that. And I can think of many, many more of those situations where, you know, um, we had to be brutally honest with ourselves with like this isn't where it needs to be, or this isn't what we should be doing.

And we can avoid a lot of harm that way too, where it's like, you know, I don't, I don't think this is what we should be building. Right.

[00:54:17] Jeremy: So in the case of this form, was it more like you, you had a bunch of drop-downs or S you know, selections where you would say like, okay, these are the types of information that I want to get from the person filling out the form as a company. but you weren't looking so much at, as the person filling out the form, this is going to be really annoying.

Was

that kind

[00:54:38] Jonathan: exactly, exactly. Like, so their experience would have been like, they come up, they come at the end of this page or on like contact us and it's like a letter to our company. And like, we're essentially putting words in their mouth because they're, they're filling out the, letter. Um, and then, yeah, it's like, you know, you have to like read and then understand like what, what that part of this, the, the page was asking you and, you know, versus like a form where you're, you know, it's very easy.

Well-known bam. You're, you're you're on this page. So you're interested in, so like, get it, get them in there. So we were able to, to decide against that and that, you know, we, we also had to, um, say no to a few other things, but like we said yes, to some things that were great, like responsive design, um, making sure that our website worked at every single use case, which is not like a hard requirement at the time, but was really important to us and ended up helping us a lot because we had a lot of, you know, business people who are on their phone, on the go, who wanted to, to check in and fill out the form and do a bunch of other stuff and learn about us.

So that, that, that sales, uh, micro-site did really well because I think we made the right decisions and all those kinds of areas. And like those, those general, those principles helped us say no to the right things, even though it was a really cool thing, it probably would have looked really great in my portfolio for a while, but it just wasn't the right thing to do for the, the, the goal that we had.

[00:56:00] Jeremy: So did it end up being more like just a text box? You know, a contact us fill in. Yeah.

[00:56:06] Jonathan: You know, with usability, you know, if someone's familiar with something and it's, it's tired, everybody does it, but that means everybody knows how to use it. So usability constantly has that problem of innovation being less usable. Um, and so sometimes it's worth the trade-off because you want to attract people because of the innovation and they'll bill get over that hump with you because the innovation is interesting.

So sometimes it's worth it and sometimes it's not, and you really have to, I'd say most times it's not. Um, and So you have to find like, what is, when is it time to innovate and when is it time to do the what's tried and true. Um, and on a business microsite, I think it's time to do tried and true.

[00:56:51] Jeremy: So in your research for the book and all the jobs you've worked previously, are there certain. Mistakes or just UX things that you've noticed that you think that our audience should know about?

[00:57:08] Jonathan: I think dark patterns are one of the most common, you know, tragic design mistakes that we see, because again, you're putting the company first and the user second. And you know, if you go to a trash, sorry, if you go to a dark patterns.org, you can see a great list. Um, there's a few other sites that have a nice list of them and actually Vox media did a nice video about, uh, dark patterns as well.

So it's gaining a lot of traction, but you know, things like if you try to cancel your search, like Comcast service or your Amazon service, it's very hard. Like I think I wrote this in the book, but. Literally re researched what's the fastest way to delete it to, to, you know, uh, remove your Comcast account.

I prepared everything. I did it through chat because that was the fastest way for first, not to mention finding chat by the way was very, very hard for me. Um, so I took me, even though I was like, okay, I have to find I'm going to do it through chat. I'm gonna do all this. It took me a while to find like chat, which I couldn't find it.

So once I finally found it from that point to deleting from having them finally delete my account was about an hour. And I knew what to do going in just to say all the things to just have them not bother me. So th that's on purpose they've purposely. Cause it's easier to just say like fine, I'll take the discount thing.

You're throwing in my face at the last second. And it's almost become a joke now that like, you know, you have to cancel your Comcast every year, so you can keep the costs down. Um, you know, and Amazon too, like trying to find that, you know, delete my account is like so buried. You know, they do that on purpose and a lot of companies will do things like, you know, make it very easy to sign up for a free trial and, and hide the fact that they're going to charge you for a year high.

The fact that they're automatically going to bill you not remind you when it's about to expire so that they can like surprise, get you in to forget about this billing subscription or like, you know, if you've ever gotten Adobe software, um, they are really bad at that. They, they trick you into like getting this like monthly sufficient, but actually you've committed to a year.

And if you want to cancel early, we'll charge you like 80% of the year. And, uh, and there's a really hard to contact anybody about it. So, um, it happens quite often. If the more you read into those, um, different things, uh, different patterns, you'll start to see them everywhere. And users are really catching onto a lot of those things and are responding.

To those in a very negative way. And like, um, we recently, uh, looked at a case study where, you know, this free trial, um, this company had a free trial and they had like the standard free trial, um, uh, kind of design. And then their test was really just focusing on like, Hey, we're not going to scam you. If I had to summarize that the entire direction of the second one, it was like, you know, cancel any time.

Here's exactly how much you'll be charged. And on the, it'll be on this date, uh, at five days before that we'll remind you to cancel and all this stuff, um, that ended up performing about 30% better than the other one. And the reason is that people are now burned by that trick so much so that every time they see a free trial, they're like, forget it.

I don't, I don't want to deal with all this trickery. Like, oh, I didn't even care about to try the product versus like. We were not going to trick you. We really want you to actually try the product and, you know, we'll make sure that if you're not wanting to move forward with this, that you have plenty of time and plenty of chances to lead and that people respond to that now.

So that's what we talked about earlier in the show of doing the exact opposite. This is another example of that.

[01:00:51] Jeremy: Yeah, because I think a lot of people are familiar with, like you said, trying to cancel Comcast or trying to cancel their, their New York times subscription. And they, you know, everybody is just like, they get so mad at the process, but I think they also may be assume that it's a positive for the company, but what you're saying is that maybe, maybe that's actually not in the company's best interest.

[01:01:15] Jonathan: Yeah. Oftentimes what we find with these like dark patterns or these unethical decisions is that th they are successful because, um, when you look at the most impactful, like immediate metric, you can look at, it looks like it worked right. Like, um, you know, let's say for that, those free trials, it's like, okay, we implemented like all this trickery and our subscriptions went up.

But if you look at like the end, uh, result, um, which is like farther on in the process, it's always a lot harder to track that impact. But we all know, like when we look at each other, like when we, uh, we, we, we talk to each other about these different, um, examples. Like we know it to be true, that we all hate that.

And we all hate those companies and we don't want to engage with them. And we don't, sometimes we don't use the products at all. So, um, yeah, it, it, it's, it's one of those things where it actually has like that, very real impact, but harder to track. Um, and so oftentimes that's how these, these patterns become very pervasive is the oh, and page views went up, uh, this was, this was a really, you know, this is high engagement, but it was page views because people were refreshing the page trying to figure out where the heck to go. Right. So um, oftentimes they they're less effective, but they're easier to track

[01:02:32] Jeremy: So I think that's, that's a good place to, to wrap things up, but, um, if people want to check out the book or learn more about what you're working on your podcast, where should they head?

[01:02:44] Jonathan: Um, yeah, just, uh, check out tragic design.com and our podcast. You can find on any of your podcasting software, just search design review podcast.

[01:02:55] Jeremy: Jonathan, thank you so much for joining me on software engineering radio.

[01:02:59] Jonathan: alright, thanks Jeremy. Thanks everyone. And, um, hope you had a good time. I did.

Randy Shoup on Evolving Architecture at eBay

Wed, 17 Aug 2022 19:55:40 -0000

This episode originally aired on Software Engineering Radio.

Randy Shoup is the VP of Engineering and Chief Architect at eBay. He was previously the VP of Engineering at WeWork and Stitch Fix, a Director of Engineering at Google Cloud where he worked on App Engine, and a Chief Engineer and Distinguished Architect at eBay in 2004.

Topics covered:

eBay’s origins as a single C++ class
The five-year migration to Java services
Sharing a database between the old and new systems
Building a distributed tracing system
Working with bare metal
Why most companies should stick to cloud
Why individual services should own their own data storage
How scale has caused solutions to change
Rejoining a former company
The Accelerate Book
Improving delivery time.

Related Links:
@randyshoup

The Epic Story of Dropbox’s Exodus from the Amazon Cloud Empire

Transcript:

[00:00:00] Jeremy: Today, I'm talking to Randy Shoup, he's the VP of engineering and chief architect at eBay.

[00:00:05] Jeremy: He was previously the VP of engineering at WeWork and stitch fix, and he was also a chief engineer and distinguished architect at eBay back in 2004. Randy, welcome back to software engineering radio. This will be your fifth appearance on the show. I'm pretty sure that's a record.

[00:00:22] Randy: Thanks, Jeremy, I'm really excited to come back. I always enjoy listening to, and then also contributing to software engineering radio.

Back at, Qcon 2007, you spoke with Markus Volter he's he was the founder of SE radio. And you were talking about developing eBay's new search engine at the time.

[00:00:42] Jeremy: And kind of looking back, I wonder if you could talk a little bit about how eBay was structured back then, maybe organizationally, and then we can talk a little bit about the, the tech stack and that sort of thing.

[00:00:53] Randy: Oh, sure. Okay. Yeah. Um, so eBay started in 1995. I just want to like, you know, orient everybody. Same, same as the web. Same as Amazon, same as a bunch of stuff. So E-bay was actually almost 10 years old when I joined. That seemingly very old first time. Um, so yeah. What was ebay's tech stack like then? So E-bay current has gone through five generations of its infrastructure.

It was transitioning between the second and the third when I joined in 2004. Um, so the. Iteration was Pierre Omidyar, the founder three-day weekend three-day labor day weekend in 1995, playing around with this new cool thing called the web. He wasn't intending to build a business. He just was playing around with auctions and wanted to put up a webpage.

So he had a Perl backend and every item was a file and it lived on this little 486 tower or whatever you had at the time. Um, so that wasn't scalable and wasn't meant to be. The second generation of eBay's architecture was what we called V2 very, you know, creatively, uh, that was a C++ monolith. Um, an ISAPI DLL with essentially well at its worst, which grew to 3.4 million lines of code in that single DLL and basically in a single class, not just in a single, like repo or a single file, but in a single class.

So that was very unpleasant to work in. As you can imagine, um, eBay had about a thousand engineers at the time and they were, you know, as you can imagine, like really stepping on each other's toes and not being able to make much forward progress. So starting in, I want to call it 2002. So two years before I joined, um, they were migrating to the creatively named V3 and V3 architecture was Java, and.

you know, not microservices, but like we didn't even have that term, but it wasn't even that it was mini applications. So I'm actually going to take a step back. V2 was a monolith. So like all of eBay's code in that single DLL and like that was buying and selling and search and everything. And then we had two monster databases, a primary and a backup big Oracle machines on some hardware that was bigger, you know, bigger than refrigerators and that ran eBay for a bunch of years, before we changed the upper part of the stack, we, um, chopped up the, that single monolithic database into a bunch of, um, domain specific databases or entity specific databases, right?

So a set of databases around users, you know, sharded by the user ID could talk about all that. If you want, you know, items again, sharded by item ID transactions, sharded by transaction ID... I think when I joined, it was the several hundred instances of, uh, Oracle databases, um, you know, spread around, but still that monolithic front end.

And then in 2002, I wanna say we started migrating into that V3 that I was saying, okay. So that's, uh, that was a rewrite in Java, again, many applications. So you take the front end and instead of having it be in one big unit, it was this, uh, ER file, EAR, file, if run and people remember back to, you know, those stays in Java, um, you know, 220 different of those.

So like here is the, you know, one of them for the search pages, you know, so the, you know, one application be the search application and it would, you know, do all the search related stuff, the handful of pages around search, uh, ditto for, you know, the buying area, ditto for the, you know, checkout area, ditto for the selling area...

220 of those. Um, and that was again, domain, um, vertically sliced domains. And then the relationship between those V3, uh, applications and the databases was a many to many things. So like many applicants, many of those applications interact with items. So they would interact with those item databases. Many of them would interact with users.

And so they would interact with a user databases, et cetera, uh, happy to go into as much gory detail as you want about all that. But like, that's what, uh, but we were in the transition period. You know, when I, uh, between the V2 monolith to the V3 mini applications in, uh, 2004, I'm just going to pause there and like, let me know where you want to take it.

[00:05:01] Jeremy: Yeah. So you were saying that it was, um, it started as Perl, then it became a C++, and that's kind of interesting that you said it was all in one class, right? So it's wow. That's gotta be a gigantic

[00:05:16] Randy: I mean, completely brutal. Yeah. 3.4 million lines of code. Yeah. We were hitting compiler limits on the number of methods per class.

[00:05:22] Jeremy: Oh my gosh.

[00:05:23] Randy: I'm, uh, uh, scared that I have that. I happen to know that at least at the time, uh, Microsoft allowed you 16 K uh, methods per class, and we were hitting that limit.

So, uh, not great.

[00:05:36] Jeremy: So it's just kind of interesting to think about how do you walk through that code, right? You have, I guess you just have this giant file.

[00:05:45] Randy: Yeah. I mean, there were, you know, different methods. Um, but yeah, it was a big man. I mean, it was a monolith, it was, uh, you know, it was a spaghetti mess. Um, and you know, as you can imagine, Amazon went through a really similar thing by the way. So this wasn't soup. I mean, it was bad, but like we weren't the only people that were making that, making that a mistake.

Um, and just like Amazon, where they were, uh, they did like one update a quarter (laughs) , you know, at that period, like 2000, uh, we were doing something really similar, like very, very slow. Um, you know, updates and, uh, when we moved to V3, you know, the idea was to get to do changes much faster. And we were very proud of ourselves starting in 2004 that we, uh, upgraded the whole site every two weeks.

And we didn't have to do the whole site, but like each of those individual applications that I was mentioning, right. Those 220 applications, each of those would roll out on this biweekly cadence. Um, and they had interdependencies. And so we rolled them out in this dependency order in any way, lots of, lots of complexity associated with that.

Um, yeah, there you go.

[00:06:51] Jeremy: the V3 that, that was written in Java, I'm assuming this was a, as a complete rewrite. You, you didn't use the C++ code at all.

[00:07:00] Randy: Yeah. And, uh, it was, um, we migrated, uh, page by page. So, uh, you know, in the transition period, which lasted probably five years, um, there were pages, you know, in the beginning, all pages were served by V2. In the end, all pages are served by V3 and, you know, over time you iterate and you like rewrite in parallel, you know, rewrite and maintain in parallel the V3 version of XYZ page and the V2 version of XYZ page.

Um, and then when you're ready, you start to test out at low percentages of traffic, you know, what would, what does V3 look like? Is it correct? And when it isn't do you go and fix it, but then ultimately you migrate the traffic over, um, to fully take, get fully be in the V3 world, and then you, you know, remove or comment out or whatever.

The, the code that supported that in the V2 monolith.

[00:07:54] Jeremy: And then you had mentioned using Oracle databases. Did you have a set for V2 and a separate V3 and you were kind of trying to keep them in sync?

[00:08:02] Randy: Oh, great question. Thank you for asking that question. No, uh, no. We had the databases. Um, so again, as I mentioned, we had pre-demonolith that's my that's a technical term, uh, pre broken up the databases starting in, let's call it 2000. Uh, actually I'm almost certain that's 2000. Cause we had a major site outage in 1999, which everybody still remembers who was there at the time.

Uh wasn't me or I wasn't there at the time. Uh, but you know, you can look it up. Uh, anyway, so yeah, starting in 2000, we broke up that monolithic database into what I was telling you before those entity aligned databases. Again, one set for items, one set for users, one set for transactions, you know, dot dot, dot, um, and that division of those databases was shared.

You know, those databases were shared between. The three using those things and then V sorry, V2 using those things and V3 using those things. Um, and then, you know, so we've completely decoupled the rewrite of the database, you know, kind of data storage layer from the rewrite of the application layer, if that makes sense.

[00:09:09] Jeremy: Yeah. So, so you had V2 that was connecting to these individual Oracle databases. You said like they were for different types of entities, like maybe for items and users and things like that. but it was a shared database situation where V2 was connected to the same database as V3. Is that right?

[00:09:28] Randy: Correct and also in V3, even when done. Different V3 applications, were also connecting to the same database, again, like anybody who used user, anybody who used the user entity, which is a lot we're connecting to the user suite of databases and anybody who used the item entity, which again is a lot, um, you were connecting to the item databases, et cetera.

So yeah, it was this many to many that's, I'm trying to say many to many relationship between applications in the V3 world and databases.

[00:10:00] Jeremy: Okay. Yeah, I think I, I got it because

[00:10:03] Randy: It's easier with a diagram.

[00:10:04] Jeremy: yeah. W 'cause when you, when you think about services now, um, you think of services having dependencies on other services. Whereas in this case you would have multiple services that rather than talking to a different service, they would all just talk to the same database.

They all needed users. So they all needed to connect to the user's database.

[00:10:24] Randy: Right exactly. And so, uh, I don't want to jump ahead in this conversation, but like the problems that everybody has, everybody who's feeling uncomfortable at the moment. You're right. To feel uncomfortable because that wasn't unpleasant situation and microservices, or more generally the idea that individual services would own their own data.

And only in the only are interactions to the service would be through the service interface and not like behind the services back to the, to the data storage layer. Um, that's better. And Amazon discovered that, you know, uh, lots of people discovered that around that same, around that same early two thousands period.

And so yeah, we had that situation at eBay at the time. Uh, it was better than it was before. Right, right. Better than a monolithic database and a monolithic application layer, but it definitely also had issues. Uh, as you can imagine,

[00:11:14] Jeremy: you know, thinking about back to that time where you were saying it's better than a monolith, um, what were sort of the trade-offs of, you know, you have a monolith connecting to all these databases versus you having all these applications, connecting to all these databases, like what were the things that you gained and what did you lose if that made sense?

[00:11:36] Randy: Hmm. Yeah. Well, I mean, why we did it in the first place is develop is like isolation between development teams right? So we were looking for developer productivity or the phrase we used to use was feature velocity, you know, so how quickly would we be able to move? And to the extent that we could move independently, you know, the search team could move independently from the buying team, which could move independently from the selling team, et cetera.

Um, that was what we were gaining. Um, what were we losing? Uh, you know, when you're in a monolith situation, If there's an issue, you know, where it is, it's in the monolith. You might not know where in the monolith. Um, but like there's only one place that could be. And so an issue that one has, uh, when you break things up into smaller units, uh, especially when they have this, you know, shared, shared mutable state, essentially in the form of these databases, like who changed that column?

What, you know, what's the deal. Uh, actually we did have a solution for that or something that really helped us, which was, um, now 20, more than 20 years ago, we had something that we would now call distributed tracing where, uh, actually I talked about this way back in the 2007 thing, cause it was pretty cool, uh, at the time, uh, You know, just like the spans one would create using a modern distributed tracing, you know, open telemetry or, you know, any of the disruptive tracing vendors.

Um, just like you would do that. We, we didn't use the term span, but that same idea where, um, we could, and the goal was the same to like debug stuff. So, uh, every time we were about to make a database call, we would say, Hey, I'm about to make this data, you know, we would log we about to make this database call and then it would happen.

And then we would log whether it was successful or not successful. We could see how long it took, et cetera. Um, and so we built our own, you know, monitoring system, which, which we called central application logging or CAL, uh, totally proprietary to eBay. I'm happy to talk about whatever gory details you want to know about that, but it was pretty cool certainly way back in 2000.

It was, and that was our mitigation against the thing I'm telling you, which is, you know, when something, when not. Something is weird in the database. We can kind of back up and figure out where it might've happened, or things are slow. What's, you know, what's the deal. And, uh, you know, cause sometimes the database is slow for reasons.

Um, and what, which, what thing is, you know, from an application perspective, I'm talking to 20 different databases, but things are slow. Like what is it? And, um, CAL helped us to, to figure out both elements of that, right? Like what applications are talking to, what databases and what backend services and like debug and diagnose from that perspective.

And then for a given application, what, you know, databases in backend services are you talking to? And, um, debug that. And then we have the whole, and then we, um, we, we had monitors on those things and we would notice when databases would, where be a lot of errors or where, when database is starting in slower than they used to be.

Um, and then. We implemented what people would now call circuit breakers, where we would notice that, oh, you know, everybody who's trying to talk to database 1, 2, 3, 4 is seeing it slow down. I guess 1, 2, 3, 4 is unhappy. So now flip everybody to say, don't talk to 1, 2, 3, 4, and like, just that kind of stuff.

You're not going to be able to serve. Uh, but whatever, that's better than stopping everything. So I hope that makes sense. Like, you know, so all these, all these like modern resilience techniques, um, we always had, we had our own proprietary names for them, but you know, we, we implemented a lot of them way back when,

[00:15:22] Jeremy: Yeah. And, and I guess just to contextualize it for the audience, I mean, this was back in 2004. Oh it back in 2000.

[00:15:32] Randy: Again, because we had this, sorry to interrupt you because we have, the problem is that we were just talking about where application many applications are talking to many services and databases and we didn't know what was going on. And so we needed some visibility into what was going on.

Sorry, go ahead.

[00:15:48] Jeremy: yeah. Okay. So all the way back in 2000, there's a lot less, Services out there, like nowadays you think about so many software as a service products. if you were building the same thing today, what are some of the services that people today would just go and say like, oh, I'll just, I'll just pay for this and have this company handle it for me. You know, that wasn't available, then

[00:16:10] Randy: sure. Well, there. No, essentially, no. Well, there was no cloud cloud didn't happen until 2006. Um, and there were a few software as a service vendors like Salesforce existed at the time, but they weren't usable in the way you're thinking of where I could give you money and you would operate a technical or technological software service on my behalf.

Do you know what I mean? So we didn't have any of the monitoring vendors. We didn't have any of the stuff today. So yeah. So what would we do, you know, to solve that specific problem today? Uh, I would, as we do today, I would, uh, instrument everything with open telemetry because that's generic. Thank you, Ben Siegelman and LightStep for starting that whole open sourcing process, uh, of that thing and, and, um, getting all the vendors to, you know, respect it.

Um, and then I would shoot, you know, for my backend, I would choose one of the very many wonderful, uh, you know, uh, distributed tracing vendors of which there are so many, I can't remember, but like LightStep is one honeycomb... you know, there were a bunch of, uh, you know, backend, um, distributed tracing vendors in particular, you know, for that.

Uh, what else do you have today? I mean, we could go on for hours on this one, but like, we didn't have distributed logging or we didn't have like logging vendors, you know? So there was no, uh, there was no Splunk, there was no, um, you know, any, any of those, uh, any of the many, uh, distributed log, uh, or centralized logging vendor, uh, vendors.

So we didn't have any of those things. We didn't. like caveman, you know, we rent, we, uh, you know, had our own data. We built our own data centers. We racked our own servers. We installed all the OSS in them, you know, uh, by the way, we still do all that because it's way cheaper for us at our scale to do that.

But happy to talk about that too. Uh, anyway, but yeah, no, the people who live in, I don't know if this is where you want to go in 2022, the software developer has this massive menu of options. You know, if you only have a credit card, uh, and it doesn't usually cost that much, you can get a lot of stuff done from the cloud vendors, from the software service vendors, et cetera, et cetera.

And none of that existed in 2000.

[00:18:31] Jeremy: it's really interesting to think about how different, I guess the development world is now. Like, cause you mentioned how cloud wasn't even really a thing until 2006, all these, these vendors that people take for granted. Um, none of them existed. And so it just, uh, it must've been a very, very different time.

[00:18:52] Randy: Well, we didn't know. It was every, every year is better than the previous year, you know, in software every year. You know? So at that time we were really excited that we had all the tools and capabilities that, that we did have. Uh, and also, you know, you look back from, you know, 20 years in the future and, uh, you know, it looks caveman, you know, from that perspective.

But, uh, it was, you know, all those things were cutting edge at the time. What happened really was the big companies rolled their own, right. Everybody, you know, everybody built their own data centers, rack their own servers. Um, so at least at scale and the best you could hope for the most you could pay anybody else to do is rack your servers for you.

You know what I mean? Like there were external people, you know, and they still exist. A lot of them, you know, the Rackspaces you know Equinixes, et cetera of the world. Like they would. Have a co-location facility. Uh, and you, you know, you ask them please, you know, I'd like to buy the, these specific machines and please rack these specific machines for me and connect them up on the network in this particular way.

Um, that was the thing you could pay for. Um, but you pretty much couldn't pay them to put software on there for you. That was your job. Um, and then operating. It was also your job, if that makes sense.

[00:20:06] Jeremy: and then back then, would that be where. Employees would actually have to go to the data center and then, you know, put in their, their windows CD or their Linux CD and, you know, actually do everything right there.

[00:20:18] Randy: Yeah. 100%. Yeah. In fact, um, again, anybody who operates data centers, I mean, there's more automation, but the conceptually, when we run three data centers ourselves at eBay right now, um, and all of our, all of our software runs on them. So like we have physical, we have those physical data centers. We have employees that, uh, physically work in those things, physical.

Rack and stack the servers again, we're smarter about it now. Like we buy a whole rack, we roll the whole rack in and cable it, you know, with one big chunk, uh, sound, uh, as distinct from, you know, individual wiring and the networks are different and better. So there's a lot less like individual stuff, but you know, at the end of the day, but yeah, everybody in quotes, everybody at that time was doing that or paying somebody to do exactly that.

Right. Yeah.

[00:21:05] Jeremy: Yeah. And it's, it's interesting too, that you mentioned that it's still being done by eBay. You said you have three, three data centers. because it seems like now maybe it's just assumed that someone's using a cloud service or using AWS or whatnot. And so, oh, go ahead.

[00:21:23] Randy: I was just going to say, well, I'm just going to riff off what you said, how the world has changed. I mean, so much, right? So. Uh, it's fine. You didn't need to say my whole LinkedIn, but like I used to work on Google cloud. So I've been, uh, I've been a cloud vendor, uh, at a bunch of previous companies I've been a cloud consumer, uh, at stitch fix and we work in other places.

Um, so I'm fully aware, you know, fully, fully, personally aware of, of all that stuff. But yeah, I mean, there's this, um, you know, eBay is in the, uh, eBay is at the size where it is actually. Cost-effective very, cost-effective, uh, can't tell you more than that, uh, for us to operate our own, um, uh, our own infrastructure, right?

So, you know, you know, one would expect if Google didn't operate their own infrastructure, nobody would expect Google to use somebody else's right. Like that, that doesn't make any economic sense. Um, and, uh, you know, Facebook is in the same category. Uh, for a while, Twitter and PayPal have been in that category.

So there's like this clap, you know, there are the known hyperscalers, right. You know, the, the Google, Amazon, uh, Microsoft that are like cloud vendors in addition to consumers internally have their own, their own clouds. Um, and then there's a whole class of other, um, places that operate their own internal clouds in quotes.

Uh, but don't offer them externally and again, uh, Facebook or Meta, uh, you know, is one example. eBay's another, you know, there's a, I'm making this up. Dropbox actually famously started in the cloud and then found it was much cheaper for them to operate their own infrastructure again, for the particular workloads that they had.

Um, so yeah, there's probably, I'm making this up. Let's call it two dozen around the world of these, I'm making this term up many hyperscalers, right? Like self hyperscalers or something like that. And eBay's in that category.

[00:23:11] Jeremy: I know this is kind of a, you know, a big what if, but you were saying how once you reach a certain scale, that's when it makes sense to move into your own data center. And, uh, I'm wondering if, if E-bay, had started more recently, like, let's say in the last, you know, 10 years, I wonder if it would've made sense for it to start on a public cloud and then move to, um, you know, its own infrastructure after it got bigger, or if you know, it really did make sense to just start with your own infrastructure from the start.

[00:23:44] Randy: Oh, I'm so glad you asked that. Um, the, the answer is obvious, but like, I'm so glad you asked that because I love to make this point. No one should ever, ever start by building your own servers and your own (laughs) cloud. Like, No, there's be, uh, you should be so lucky (laughs) after years and years and years that you outgrow the cloud vendors.

Right. Um, it happens, but it doesn't happen that often, you know, it happens so rarely that people write articles about it when it happens. Do you know what I mean? Like Dropbox is a good example. So yes, 100% anytime. Where are we? 2022. Any time in, more than the last 10 years? Um, yeah, let's call it. Let's call it 2010, 2012.

Right. Um, when cloud had proved itself over and you know, many times over, um, anybody who starts since that time should absolutely start in the public cloud. There's no argument about it. Uh, and again, one should be so lucky that over time, you're seeing successive zeros added to your cloud bill, and it becomes so many zeros that it makes sense to shift your focus toward building and operating your own data centers.

That's it. I haven't been part of that transition. I've been the other way, you know, at other places where, you know, I've migrated from owned data centers and colos into, into public cloud. Um, and that's the, that's the more common migration. And again, there are, there are a handful, maybe not even a handful of, uh, companies that have migrated away, but when they do, they've done all the math, right.

I mean, uh, Dropbox has done some great, uh, talks and articles about, about their transition and boy, the math makes sense for them. So, yeah.

[00:25:30] Jeremy: Yeah. And it also seems like maybe it's for certain types of businesses where moving off of public cloud. Makes sense. Like you mentioned Dropbox where so much of their business is probably centered around storage or centered around, you know, bandwidth and, you know, there's probably certain workloads that it's like need to leave public cloud earlier.

[00:25:51] Randy: Um, yeah, I think that's fair. Um, I think that, I think that's a, I think that's an insightful comment. Again, it's all about the economics at some point, you know, it's a big investment to, uh, uh, and it takes years to develop the intern, forget the money that you're paying people, but like just to develop the internal capabilities.

So they're very specialized skill sets around building an operating data centers. So like it's a big deal. Um, and, uh, yeah. So are there particular classes of workloads where you would for the same dollar figure or whatever, uh, migrate earlier or later? I'm sure that's probably true. And again, what can absolutely imagine?

Well, when they say Dropbox in this example, um, yeah, it's because like they, they need to go direct to the storage. And then, I mean, like, they want to remove every middle person, you know, from the flow of the bytes that are coming into the storage media. Um, and it makes perfect sense for, for them. And when I understood what they were doing, which was a number of years ago, they were hybrid, right. So they had, they had completely, you know, they kept the top, you know, external layer, uh, in public cloud. And then the, the storage layer was all custom. I don't know what they do today, but people could check.

[00:27:07] Jeremy: And I'm kind of coming back to your, your first time at eBay. is there anything you felt that you would've done differently with the knowledge you have now?

but with the technology that existed, then.

[00:27:25] Randy: Gosh, that's the 20, 20 hindsight. Um, the one that comes to mind is the one we touched on a little bit, but I'll say it more starkly, the. If I could, if I could go back in time 20 years and say, Hey, we're about to do this V3 transition at eBay. I would not. I would have had us move directly to what we would now call microservices in the sense that individual services own their own data storage and are only interacted with through the public interface.

Um, there's a famous Amazon memo around that same time. So Amazon did the transition from a monolith into what we would now call microservices over about a four or five-year period, 2000 to 2005. And there was a famous Jeff Bezos memo from the early part of that, where, you know, seven, you know, requirements I can't remember them, but you know, essentially it was, you may, you may, you may never, you may never talk to anybody else's database. You may only interact with other services through their public interfaces. I don't care what those public interfaces are, so they didn't standardize around. You know, CORBA or JSON or GRPC, which didn't exist at the time, you know, like they didn't standardize around any, any particular, uh, interaction mechanism, but you did need to again, have this kind of microservice capability, that's modern terminology, um, uh, where, you know, the only services own their own data and nobody can talk in the back door.

So that is the one architectural thing that I wish, you know, with 2020 hindsight, uh, that I would bring back in my time travel to 20 years ago, because that would help. That does help a lot. And to be fair, Amazon, um, Amazon was, um, pioneering in that approach and a lot of people internally and externally from Amazon, I'm told, didn't think it would work, uh, and it, and it did famously.

So that's, that's the thing I would do.

[00:29:30] Jeremy: Yeah. I'm glad you brought that up because, when you had mentioned that, I think you said there were 220 applications or something like that at certain scales, people might think like, oh, that sounds like microservices to me. But when you, you mentioned that microservice to you means it having its own data store.

I think that's a good distinction.

[00:29:52] Randy: Yeah. So, um, I talk a lot about microservices that have for, for a decade or so. Yeah. I mean, several of the distinguishing characteristics are the micro in microservices is size and scope of the interface, right? So you can have a service oriented architecture with one big service, um, or some very small number of very large services.

But the micro in microservice means this thing does, maybe it doesn't have one operation, but it doesn't have a thousand. The several or the handful or several handfuls of operations are all about this one particular thing. So that's the one part of it. And then the other part of it that is critical to the success of that is owning the, owning your own data storage.

Um, so each service, you know, again, uh, it's hard to do this with a diagram, but like imagine, imagine the bubble of the service surrounding the data storage, right? So like people, anybody from the outside, whether they're interacting synchronously, asynchronously, messaging, synchronous, whatever HTTP doesn't matter are only interacting to the bubble and never getting inside where the, uh, where the data is I hope that makes sense.

[00:31:04] Jeremy: Yeah. I mean, I mean, it's a kind of in direct contrast to before you're talking about how you had all these databases that all of these services shared. So it was probably hard to kind of keep track of, um, who had modified data. Um, you know, one service could modify it, then another service control to get data out and it's been changed, but it didn't change it.

So it could be kind of hard to track what's going on.

[00:31:28] Randy: Yeah, exactly. Inner integration at the database level is something that people have been doing since probably the 1980s. Um, and so again, I, you know, in retrospect it looks like caveman approach. Uh, it was pretty advanced at the time, actually, even the idea of sharding of, you know, Hey, there are users and the users live in databases, but they don't all live in the same one.

Uh, they live in 10 different databases or 20 different databases. And then there's this layer that. For this particular user, it figures out which of the 20 databases it's in and finds it and gets it back. And, um, you know, that was all pretty advanced. And by the way, that's all those capabilities still exist.

They're just hidden from everybody behind, you know, nice, simple, uh, software as a service, uh, interfaces anyway, but that takes nothing away from your excellent point, which is, yeah. It's, you know, when you're, again, when there's many to many to relations, when there is this many to many relationship between, um, uh, applications and databases, uh, and there's shared mutable state in those databases that when is shared, like that's bad, you know, it's not bad to have state.

It's not bad to have mutable state it's bad to have shared beautiful state.

[00:32:41] Jeremy: Yeah. And I think anybody who's kind of interested in learning more about the, you had talked about sharding and things like that. If they go back and listen to your, your first appearance on software engineering radio, um, yeah. It kind of struck me how you were talking about sharding and how it was something that was kind of unique or unusual.

Whereas today it feels like it's very, I don't know, if quaint is the right word, but it's like, um, it's something that, that people kind of are accustomed to now.

[00:33:09] Randy: Yeah. Yeah. Um, it's obvious. Um, it seems obvious in retrospect. Yeah. You know, at the time, and by the way, he didn't invent charting. As I said, in 2007, you know, Google and Yahoo and, uh, Amazon, and, you know, it was the obvious, it took a while to reach it, but it's one of those things where once, once people have the, you know, brainwave to see, oh, you know what, we don't actually have to stop store this in one, uh, database.

We can, we can chop that database up into, you know, into chunks. And that, that looks similar to that herself similar. Um, yeah, that was, uh, that was, uh, that was reinvented by lots of, uh, Lots of the big companies at the same time again, because everybody was solving that same problem at the same time. Um, but yeah, when you look back and you, I mean, like, and honestly, like everything that I said there, it's still like this, all the techniques about how you shard things.

And there's lots of, you know, it's not interesting anymore because the problems have been solved, but all those solutions are still the solutions, if that makes any sense, but you know,

[00:34:09] Jeremy: Yeah, for sure. I mean, I think anybody who goes back and listens to it. Yeah. Like you said, it's, it's, it's very interesting because it's. it all still applies and it's like, I think the, the solutions that are kind of interesting to me are ones where it's, it's things that could have been implemented long ago, but we just later on realized like, this is how we could do it.

[00:34:31] Randy: Well part of it is, as we grow as an industry, we just, we discover new problems. You know, we, we get to the point where, you know, sharding over databases has only a problem when one database doesn't work. You know, when it, when you're the load that you put on that database is too big, or you want the availability of, you know, multiple.

Um, and so that's not a, that's not a day one problem, right? That's a day two or day 2000 and kind of problem. Right. Um, and so a lot of these things, yeah, well, you know, it's software. So like we could have done, we could have done any of these things in older languages and older operating systems and with older technology.

But for the most part, we didn't have those problems or we didn't have them at sufficiently enough. People didn't have the problem that we, you know, um, for us to have solved it as an industry, if that makes any sense.

[00:35:30] Jeremy: yeah, no, that's a good point because you think about when Amazon first started and it was just a bookstore, right. And the number of people using the site where, uh, who knows it was, it might've been tens a day or hundreds a day. I don't, I don't know. And, and so, like you said, the problems that Amazon has now in terms of scale are just like, it's a completely different world than when they started.

[00:35:52] Randy: Yeah. I mean, probably I'm making it up, but I don't think that's too off to say that it's a billion times more, their problems are a billion fold. You know, what they, what they were

[00:36:05] Jeremy: the next thing I'd like to talk about is you came back to eBay I think about has it been about two years ago.

[00:36:14] Randy: Two years yeah.

[00:36:15] Jeremy: Yeah. And, and so, so tell me about the experience of coming back to an organization that you had been at, you know, 10 years prior or however long it was like, how is your onboarding different when it's somewhere you've been before?

[00:36:31] Randy: Yeah. Sure. So, um, like, like you said, I worked at eBay from 2004 to 2011. Um, and I worked in a different role than I have today. I've worked mostly on eBay search engine. Um, and then, uh, I left to co-found a startup, which was in the 99%. So the one, you know, like didn't really do much. Uh, I joined, I worked at Google in the early days of Google cloud, as I mentioned on Google app engine and had a bunch of other roles including more recently, like you said, stitch fix and we work, um, leading those engineering teams.

And, um, so yeah, coming back to eBay as chief architect and, and, you know, leading. Developer platform, essentially a part of eBay. Um, yeah. What was the onboarding like? I mean, lots of things had changed, you know, in the, in the intervening 10 years or so. Uh, and lots had stayed the same, you know, not in a bad way, but just, you know, uh, some of the technologies that we use today are still some of the technologies we used 10 years ago, a lot has changed though.

Um, a bunch of the people are still around. So there's something about eBay that, um, people tend to stay a long time. You know, it's not really very strange for people to be at eBay for 20 years. Um, in my particular team of let's call it 150, there are four or five people that have crossed their 20 year anniversary at the company.

Um, and I also re I rejoined with a bunch of other boomerangs as the term we use internally. So it's, you know, the, um, including the CEO, by the way. So sort of bringing the band back together, a bunch of people that had gone off and worked at it, but at other places have, have come back for various reasons over the last couple of.

So it was both a lot of familiarity, a lot of unfamiliarity, a lot of familiar faces. Um, yup.

[00:38:17] Jeremy: So, I mean, having these people who you work with still be there and actually coming back with some of those people, um, what were some of the big, I guess, advantages or benefits you got from, you know, those existing connections?

[00:38:33] Randy: Yeah. Well, I mean, as with all things, you know, imagine, I mean, everybody can imagine like getting back together with friends that they had from high school or university, or like you had some people had some schooling at some point and like you get back together with those friends and there's this, you know, there's this implicit trust in most situations of, you know, because you went through a bunch of stuff together and you knew each other, uh, you know, a long time.

And so that definitely helps, you know, when you're returning to a place where again, there are a lot of familiar faces where there's a lot of trust built up. Um, and then it's also helpful, you know, eBay's a pretty complicated place and it's 10 years ago, it was too big to hold in any one person's head and it's even harder to hold it in one person said now, but to be able to come back and have a little bit of that, well, more than a little bit of that context about, okay, here's how eBay works.

And here, you know, here are the, you know, unique complexities of the marketplace cause it's very unique, you know, um, uh, in the world. Um, and so, yeah, no, I mean, it was helpful. It's helpful a lot. And then also, you know, in my current role, um, uh, my, my main goal actually is to just make all of eBay better, you know, so we have about 4,000 engineers and, you know, my team's job is to make all of them better and more productive and more successful and, uh, being able to combine.

Knowing what eBay, knowing the context about eBay and having a bunch of connections to the people that, you know, a bunch of the leaders there, uh, here, um, combining that with 10 years of experience doing other things at other places, you know, that's helpful because you know, now there are things that we do at eBay that, okay, well there, you know, you know, that this other place is doing, this has that same problem and is solving it in a different way.

And so maybe we should, you know, look into that option. So,

[00:40:19] Jeremy: so, so you mentioned just trying to make developers, work or lives easier. you start the job. How do you decide what to tackle first? Like how do you figure out where the problems are or what to do next?

[00:40:32] Randy: yeah, that's a great question. Um, so, uh, again, my, uh, I lead this thing that we internally called the velocity initiative, which is about just making us, giving us the ability to deliver. Features and bug fixes more quickly to customers. Right. And, um, so what do I figure for that problem? How can we deliver things more quickly to customers and improve, you know, get more customer value and business value?

Uh, what I did, uh, with, in collaboration with a bunch of people is what one would call a value stream map. And that's a term from lean software and lean manufacturing, where you just look end to end at a process and like say all the steps and how long those steps take. So a value stream, as you can imagine, like all these steps that are happening at the end, there's some value, right?

Like we produced some, you know, feature or, you know, hopefully gotten some revenue or like helped out the customer and the business in some way. And so value, you know, mapping that value stream. That's what it means. And, um, Looking for you look at that. And when you can see the end-to-end process, you know, and like really see it in some kind of diagram, uh, you can look for opportunities like, oh, okay, well, you know, if it takes us, I'm making this effort, it takes us a week from when we have an idea to when it shows up on the site.

Well, you know, some of those steps take five minutes. That's not worth optimizing, but some of those steps take, you know, five days and that is worth optimizing. And so, um, getting some visibility into the system, you know, looking end to end with some, with a kind of view of the system systems thinking, uh, that will give you the, uh, the knowledge about, or the opportunities about we know what can be improved.

And so that's, that's what we did. And we didn't talk with all 4,000, you know, uh, engineers are all, you know, whatever, half a thousand teams or whatever we had. Um, but we sampled. And after we talked with three teams who were already hearing a bunch of the same things, you know, so we were hearing in the whole product life cycle, which I like to divide into four stages.

I'd like to say, there's planning. How does an idea become a project or a thing that people work on a software development? How does a project or become committed code software delivery? How does committed code become a feature that people actually use? And then what I call post release iteration, which is okay, it's now there are out there on the site and we're turning it on and off for individual users.

We're learning in analytics and usage in the real world and, and experimenting. And so there were opportunities that eBay at all, four of those stages, um, which I'm happy to talk about, but what we ended up seeing again and again, uh, is that that software delivery part was our current bottleneck. So again, that's the, how long does it take from an engineer when she commits her code to, it shows up as a feature on the site.

And, you know, before we started the work. You know, two years ago before we started the work that I've been doing for the last two years with a bunch of people, um, on average and eBay was like a week and a half. So, you know, it'd be a week and a half between when someone's finished and then, okay. It gets code reviewed and, you know, dot, dot, dot it gets rolled out.

It gets tested, you know, all that stuff. Um, it was, you know, essentially 10 days. And now for the teams that we've been working with, uh, it's down to two. So we used a lot of, um, what people may be familiar with, uh, the accelerate book. So it's called accelerate by Nicole Forsgren. Um, Jez humble and Gene Kim, uh, 2018, like if there's one book anybody should read about software engineering, it's that?

Uh, so please read accelerate. Um, it summarizes almost a decade of research from the state of DevOps reports, um, which the three people that I mentioned led. So Nicole Forsgren, you know, is, uh, is a doctor, uh, you know, she's a PhD and, uh, data science. She knows how to do all this stuff. Um, anyway, so, uh, that when your, when your problem happens to be software delivery.

The accelerate book tells you all the kind of continuous delivery techniques, trunk based development, uh, all sorts of stuff that you can do to, to solve that, uh, solve those problems. And then there are also four metrics that they use to measure the effectiveness of an organization, software delivery. So people might be familiar with, uh, there's deployment frequency.

How often are we deploying a particular application lead time for change? That's that time from when a developer commits her code to when it shows up on the site, uh, change failure rate, which is when we deploy code, how often do we roll it back or hot fix it, or, you know, there's some problem that we need to, you know, address.

Um, and then, uh, meantime to re uh, meantime to restore, which is when we have one of those incidents or problems, how, how quickly can we, uh, roll it back or do that hot fix? Um, and again, the beauty of Nicole Forsgren research summarized in the accelerate book is that the science shows that companies cluster, in other words, Mostly the organizations that are not good at, you know, deployment frequency and lead time are also not good at the quality metrics of, uh, meantime to restore and change failure rate and the companies that are excellent at, you know, uh, deployment frequency and lead time are also excellent at meantime, to recover and, uh, change failure rate.

Um, so companies or organizations, uh, divided into these four categories. So there's a low performers, medium performers, high performers, and then elite performers. And, uh, eBay was solidly eBay on average at the time. And still on average is solidly in that medium performer category. So, uh, and what we've been able to do with the teams that we've been working with is we've been able to move those teams to the high category.

So just super brief. Uh, and I w we'll give you a chance to ask you some more questions, but like in the low category, all those things are kind of measured in months, right. So how long, how often are we deploying, you know, measure that in months? How long does it take us to get a commit to the site? You know, measure that in months, you know, um, where, and then the low performer, sorry.

Uh, the medium performers are like, everything's measured in weeks, right? So like, if we were deploy, you know, couple, you know, once every couple of weeks or once a week, uh, lead time is measured in weeks, et cetera. The, uh, the high-performers things are measured in days and the elite performance things are measured in hours.

And so you can see there's like order of magnitude improvements when you go from, you know, when you move from one of those kind of clusters to another cluster. Anyway. So what we were focused on again, because our problem was software delivery was moving a whole, a whole set of teams from that medium performer category where things are measured in weeks to the, uh, high-performer category, where things are missing.

[00:47:21] Jeremy: throughout all this, you said the, the big thing that you focused on was the delivery time. So somebody wrote code and, they felt that it was ready for deployment, but for some reason it took 10 days to actually get out to the actual site. So I wonder if you could talk a little bit about, uh, maybe a specific team or a specific application, where, where, where was that time being spent?

You know, you, you said you moved from 10 days to two days. What, what was happening in the meantime?

[00:47:49] Randy: Yeah, no, that's a great question. Thank you. Um, yeah, so, uh, okay, so now, so we, we, we looked end to end of the process and we found that software delivery was the first place to focus, and then there are other issues in other areas, but we'll get to them later. Um, so then for, um, to improve software delivery, now we asked individual teams, we, we, we did something like, um, you know, some conversation like I'm about to say, so we said, hi, it looks like you're deploying kind of once or twice.

If I, if I told you, you had to deploy once a day, tell me all the reasons why that's not going to work. And the teams are like, oh, of course, well, it's a build times take too long. And the deployments aren't automated and you know, our testing is flaky. So we have to retry it all the time and, you know, dot, dot, dot, dot, dot.

And we said, great, you just gave my team, our backlog. Right. So rather than, you know, just coming and like, let's complain about it. Um, which the teams work it's legit for them to complain. Uh, I was a, you know, we were able, because again, the developer program or sorry, the developer platform, you know, is as part of my team, uh, we said, great, like you just gave us, you just told us all the, all your top, uh, issues or your impediments, as we say, um, and we're going to work on them with you.

And so every time we had some idea about, well, I bet we can use Canary deployments to automate the deployment which we have now done. We would pilot that with a bunch of teams, we'd learn what works and doesn't work. And then we would roll that out to everybody. Um, So what were the impediments like? It was a little bit different for each individual team, but in some, it was, uh, the things we ended up focusing on or have been focusing on our build times, you know, so we build everything in Java still.

Um, and, uh, even though we're generation five, as opposed to that generation three that I mentioned, um, still build times for a lot of applications we're taking way too long. And so we, we spend a bunch of time improving those things and we were able to take stuff from, you know, hours down to, you know, single digit minutes.

So that's a huge improvement to developer productivity. Um, we made a lot of investment in our continuous delivery pipelines. Um, so making all the, making all the automation around, you know, deploying something to one environment and checking it there and then deploying it into a common staging environment and checking it there and then deploying it from there into the production environment.

And, um, and then, you know, rolling it out via this Canary mechanism. We invested a lot in something that we call traffic mirroring, which is a, we didn't invent. Other T other places have a different name for this? I don't know if there's a standard industry name. Some people call it shadowing, but the idea is I have a change that I'm making, which is not intended to change the behavior.

Like a lots of changes that we make, bug fixes, et cetera, uh, upgrading to new, you know, open source, dependencies, whatever, changing the version of the framework. There's a bunch of changes that we make regularly day-to-day as developers, which are like, refactorings kind of where we're not actually intending to change the behavior.

And so a tra traffic mirroring was our idea of. You have the old code that's running in production and you, and you fire a request, a production request at that old code and it responds, but then you also fire that request at the new version and compare the results, you know, did the same, Jason come back, you know, between the old version and the new version.

Um, and that's, that's a great way kind of from the outside to sort of black box detect any unintended changes in the, in the behavior. And so we definitely leveraged that very, very aggressively. Um, we've invested in a bunch of other bunch of other things, but, but all those investments are driven by what does the team, what do the particular teams tell us are getting in their way?

And there are a bunch of things that the teams themselves have, you know, been motivated to do. So my team's not the only one that's making improvements. You know, teams have. Reoriented, uh, moved, moved from branching development to trunk based development, which makes a big difference. Um, making sure that, uh, PR approvals and like, um, you know, code reviews are happening much more regularly.

So like right after, you know, a thing that some teams have started doing is like immediately after standup in the morning, everybody does all the code reviews that you know, are waiting. And so things don't drag on for, you know, two, three days, cause whatever. Um, so there's just like a, you know, everybody kind of works on that much more quickly.

Um, teams are building their own automations for things like testing site speed and accessibility and all sorts of stuff. So like all the, all the things that, you know, a team goes through in the development and roll out of their software, they were been spending a lot of time automating and making, making a leaner, making more efficient.

[00:52:22] Jeremy: So, so some of those, it sounds like the PR example is really, on the team. Like you're, you're telling them like, Hey, this is something that you internally should change how you work. for things like improving the build time and things like that. Did you have like a separate team that was helping these teams, you know, speed that process up? Or what, what was that

[00:52:46] Randy: like?

Yeah. Great. I mean, and you did give to those two examples are, are like you say, very different. So I'm going to start from, we just simply showed everybody. Here's your deployment frequency for this application? Here's your lead time for this application? Here's your change failure rate. And here's your meantime to restore.

And again, as I didn't mention before. All of the state of DevOps research and the accelerate book prove that by improving those metrics, you get better engineering outcomes and you also get better business outcomes. So like it's scientifically proven that improving those four things matters. Okay. So now we've shown to teams, Hey, you're we would like you to improve, you know, for your own good, but you know, more broadly at eBay, we would like the deployment frequency to be faster.

And we would like the lead time to be shorter. And the insight there is when we deploy smaller units of work, when we don't like batch up a week's worth of work, a month's worth of work, uh, it's much, much less risky to just deploy like an hour's worth of work. Right. And the, and the insight is the hours worth of work fits in your head.

And if you roll it out and there's an issue. First off rolling backs, no big deal. Cause you only, you know, not, you've only lost an hour of work for a temporary period of time, but also like you never have this thing, like what in the world broke? Cause like with a month's worth of work, there's a lot of things that changed and a lot of stuff that could break, but with an hour's worth of work, it's only like one change that you made.

So, you know, when, if something happens, like it's pretty much, pretty much guaranteed to be that thing anyway, that's the back. Uh, that's the backstory. And um, and so yeah, we were just working with individual teams. Oh yeah. So they were, the teams were motivated to like, see what's the biggest bang for the buck in order to improve those things.

Like how can we improve those things? And again, some teams were saying, well, you know what, a huge component of our, of that lead time between when somebody commits and it's, it's a feature on the site, a huge percentage of that. Maybe multiple days, it's like waiting for somebody to code review. Okay, great.

We can just change our team kind of agreements and our team behavior to make that happen. And then yes, to answer your question about. Were the other things like building the Canary capability and traffic mirroring and build time improvements. Those were done by central, uh, platform and infrastructure teams, you know, some of which were in my group and some of which are in peer peer groups, uh, in, in my part of the organization.

So, yeah, so I mean like providing the generic tools and, you know, generic capabilities, those are absolutely things that a platform organization does. Like that's our job. Um, and you know, we did it. And, uh, and then there are a bunch of other things like that around kind of team behavior and how you approach building a particular application that are, are, and should be completely in the control of the individual teams.

And we were trying not to be, not trying not to be, we were definitely not being super prescriptive. Like we didn't come in and we say, we didn't come in and say, alright, by next, by next Tuesday, we want you to be doing trunk based development by, you know, the Tuesday after that, we want to see test-driven development, you know, dot, dot, Um, we would just offer to teams, you know, hear it.

Here's where you are. Here's where we know you can get, because like we work with other teams and we've seen that they can get there. Um, you know, they just work together on, well, what's the biggest bang for the buck and what would be most helpful for that team? So it's like a menu of options and you don't have to take everything off the menu, if that makes sense.

[00:56:10] Jeremy: And, and how did that communication flow from you and your team down to the individual contributor? Like you have, I'm assuming you have engineering managers and technical leads and all these people sort of in the chain. How does it

[00:56:24] Randy: Yeah, thanks for asking that. Yeah. I didn't really say how we work as an initiative. So every, um, so there are a bunch of teams that are involved. Um, and we have, uh, every Monday morning, so, uh, just so happens. It's late Monday morning today. So we already did this a couple of hours ago, but once a week we get all the teams that are involved, both like the platform kind of provider teams and also the product.

Or we would say domain like consumer teams. And we do a quick scrum of scrums, like a big old kind of stand up. What have you all done this week? What are you working on next week? What are you blocked by kind of idea. And, you know, there are probably 20 or 30 teams again, across the individual platform capabilities and across the teams that, you know, uh, consume this stuff and everybody gives a quick update and they, and, uh, it's a great opportunity for people to say, oh, I have that same problem too.

Maybe we should offline try to figure out how to solve that together. You built a tool that automates the site speed stuff. That's great. I would S I would so love to have that. And, um, so it, uh, this weekly meeting has been a great opportunity for us to share wins, share, um, you know, help that people need and then get, uh, get teams to help with each other.

And also, similarly, one of the platform teams would say something like, Hey, we're about to be done or beta, let's say, you know, this new Canary capability, I'm making this up. Anybody wanna pilot that for us? And then you get a bunch of hands raised of yo, we would be very happy to pilot that that would be great.

Um, so that's how we communicate back and forth. And, you know, it's a big enough. It's kind of like engineering managers are kind of are the kind of level that are involved in that typically. Um, so it's not individual developers, but it's like somebody on most, every team, if that makes any sense. So like, that's kind of how we do that, that like communication, uh, back to the individual developers.

If that makes sense.

[00:58:26] Jeremy: Yeah. So it sounds like you would have, like you said, the engineering manager go to the standup and um, you said maybe 20 to 30 teams, or like, I'm just trying to get a picture for how many people are in this meeting.

[00:58:39] Randy: Yeah. It's like 30 or 40 people.

[00:58:41] Jeremy: Okay. Yeah.

[00:58:42] Randy: And again, it's quick, right? It's an hour. So we just go, boom, boom, boom, boom. And we've just developed a cadence of people. We have a shared Google doc and like people like write their little summaries, you know, of what they're, what they've worked on and what they're working on.

So we've over time made it so that it's pretty efficient with people's time. And. Pretty dense in a good way of like information flow, back and forth. Um, and then also separately, we meet more in more detail with the individual teams that are involved. Again, try to elicit, okay, now, where are you now?

Here's where you are. Please let us know what problems you're seeing with this part of the infrastructure or problems you're seeing in the pipelines or something like that. And we're, you know, we're constantly trying to learn and get better and, you know, solicit feedback from teams on what we can do differently.

[00:59:29] Jeremy: earlier you had talked a little bit about how there were a few services that got brought over from V2 or V3, basically kind of more legacy or older services that are, have been a part of eBay for quite some time.

And I was wondering if there were things about those services that made this process different, like, you know, in terms of how often you could deploy or, um, just what were some key differences between something that was made recently versus something that has been with the company for a long time?

[01:00:06] Randy: Yeah, sure. I mean, the stuff that's been with the company for a long time was best in class. As of when we built it, you know, maybe 15 and sometimes 20 years ago. Um, there actually, I wouldn't even less than a handful. There are, as we speak, there are two or three of those V3. Uh, clusters or applications or services still around and they should be gone in a completely migrated away from, in the next a couple of months.

So like, we're almost at the end of, um, you know, uh, moving all to more modern things. But yeah, you know, I mean, again, uh, stuff that was state-of-the-art, you know, 20 years ago, which was like deploying things once every two weeks, like that was a big deal in 2000 or 2004. Uh, and it's, you know, like that was fast in 2004 and is slow in 2022.

So, um, yeah, I mean, what's the difference? Um, yeah, I mean, a lot of these things, you know, if they haven't already been migrated, there's a reason. And it's because often that they're way in the guts of something that's really important. You know, this is the, this is a core part. I'm making these examples up and they're not even right, but like it's a core part of the payments flow.

It's a core part of, you know, uh, how, uh, sellers get paid. And those aren't examples. We have, those are modern, but you see what I'm saying? Like stuff that's like really core to the business and that's why it's kind of lasted.

[01:01:34] Jeremy: And, uh, I'm kind of curious from the perspective of some of these new things you're introducing, like you're talking about, um, improving continuous delivery and things like that. Uh, when you're working with some of these services that have been around a long time, are the teams the rate at which they deploy or the rate at which you find defects is that noticeably different from services that are more recent?

[01:02:04] Randy: I mean, and that's true of any legacy at any, at any place. Right? So, um, yeah, I mean, people are legitimately, uh, I have some trepidation that say about, you know, changing something that's, you know, been running the, running the business for a long, long time. And so, you know, it's a lot slower going, uh, exactly because it's not always completely obvious what, um, you know, what the implications are of those changes.

So, you know, we were very careful and we, you know, trust things a whole lot. And, um, you know, maybe we didn't write stuff with a whole bunch of automated tests in the beginning. And so there's a lot of manual stuff there. You know, this is pretty, you know, this is just what happens when you have, uh, you have stuff that, you know, you have a company that's, you know, been around for a long time.

[01:02:51] Jeremy: yeah, I guess just, just kind of to start wrapping up as this process of you coming into the company and identifying where the problems are and working on like, um, you know, ways to speed up delivery. Is there, there anything that kind of came up that really surprised you? I mean, you've been at a lot of different organizations. Is there anything about your experience here at eBay that was very different than what you'd seen before?

[01:03:19] Randy: No. I mean, it's a great question. I don't think, I mean, I think the thing that's surprising is how unsurprising it is. Like there's not, you know, the details are different. Like, okay. You know, we have this V3, I mean, like, you know, we have some uniquenesses around eBay, but, but, um, but I think what is maybe pleasantly surprising is all the techniques about how one.

Notice the things that are going on, uh, in terms of, you know, again, deployment, frequency, lead time, et cetera, and what techniques you would deploy to like make those things better. Um, all the standard stuff applies, you know, so again, uh, all the, all the techniques that are mentioned in the state of DevOps research and an accelerate and just all the, all the known good practices of software development, they all apply everywhere.

Um, and that's the wonderful, I think that's the wonderful thing. So like maybe the most surprising thing is how unsurprising or how, how, how applicable the, you know, the standard industry standard techniques, uh, are, I mean, I certainly hope that to be true, but that's why we, I didn't really say, but we piloted this stuff with a small number of teams.

Exactly. Because we, you know, we thought, and it would turned out to be true that they applied, but we weren't entirely sure. You know, we didn't know what we didn't know. Um, and we also needed proof points, you know, Not just out there in the world, but at eBay that these things made a difference and it turns out they do. So.

[01:04:45] Jeremy: yeah, I mean, I think it's easy for people to kind of get caught up and think like, my problem is unique or my organization is unique and, but it, but it sounds like in a lot of cases, maybe we're not so not so different.

[01:04:57] Randy: I mean, the stuff that works tends to work everywhere, the deeds there's always some detail, but, um, but yeah, I mean all aspects of, you know, the continuous delivery and kind of lean approach the software. I mean, we, the industry have yet to find a place where they don't work seriously. You have to find any place where they don't work.

[01:05:19] Jeremy: if people want to, um, you know, learn more about the work that you're doing at eBay, or just follow you in general, um, where should.

[01:05:27] Randy: Yeah. So, um, I tweet semi-regularly at, at Randy shelf. So my name all one word, R a N D Y S H O U P. Um, I'm not, I had always wanted to be a blogger. Like there is a Randy shop.com and there are some blogs on there, but they're pretty old. Um, someday I hope to be doing more writing. Um, I do a lot of conference speaking though.

So I speak at the Q con conferences. I'm going to be at the craft concert in Budapest in a couple of in week and a half, uh, as of this recording. Um, so you can often find me on, uh, on Twitter or on software conferences.

[01:06:02] Jeremy: all right, Randy. Well, thank you so much for coming back on software engineering radio.

[01:06:06] Randy: Thanks for having me, Jeremy. This was fun.

Ant Wilson on Supabase

Wed, 11 May 2022 06:07:03 -0000

This episode originally aired on Software Engineering Radio.

A few topics covered

Building on top of open source
Forking their GoTrue dependency
Relying on Postgres features like row level security
Adding realtime support based on Postgres's write ahead log
Generating an API layer based on the database schema with PostgREST
Creating separate EC2 instances for each customer's database
How Postgres could scale in the future
Monitoring postgres
Common support tickets
Permissive open source licenses

Related Links

Postgres

Transcript

You can help edit this transcript on GitHub.

[00:00:00] Jeremy: Today I'm talking to Ant Wilson, he's the co-founder and CTO of Supabase. Ant welcome to software engineering radio.

[00:00:07] Ant: Thanks so much. Great to be here.

[00:00:09] Jeremy: When I hear about Supabase, I always hear about it in relation to two other products. The first is Postgres, which is a open source relational database. And second is Firebase, which is a backend as a service product from Google cloud that provides a no SQL data store.

It provides authentication and authorization. It has a functions as a service component. It's really meant to be a replacement for you needing to have your own server, create your own backend. You can have that all be done from Firebase. I think a good place for us to start would be walking us through what supabase is and how it relates to those two products.

[00:00:55] Ant: Yeah. So, so we brand ourselves as the open source Firebase alternative

that came primarily from the fact that we ourselves do use the, as the alternative to Firebase. So, so my co-founder Paul in his previous startup was using fire store. And as they started to scale, they hit certain limitations, technical scaling limitations and he'd always been a huge Postgres fan.

So we swapped it out for Postgres and then just started plugging in. The bits that we're missing, like the real-time streams. Um, He used the tool called PostgREST with a T for the, for the CRUD APIs. And so

he just built like the open source Firebase alternative on Postgres, And that's kind of where the tagline came from.

But the main difference obviously is that it's relational database and not a no SQL database which means that it's not actually a drop-in replacement. But it does mean that it kind of opens the door to a lot more functionality actually. Um, Which, which is hopefully an advantage for us.

[00:02:03] Jeremy: it's a, a hosted form of Postgres. So you mentioned that Firebase is, is different. It's uh NoSQL. People are putting in their, their JSON objects and things like that. So when people are working with Supabase is the experience of, is it just, I'm connecting to a Postgres database I'm writing SQL.

And in that regard, it's kind of not really similar to Firebase at all. Is that, is that kind of right?

[00:02:31] Ant: Yeah, I mean, the other thing, the other important thing to notice that you can communicate with Supabase directly from the client, which is what people love about fire base. You just like put the credentials on the client and you write some security rules, and then you just start sending your data. Obviously with supabase, you do need to create your schema because it's relational.

But apart from that, the experience of client side development is very much the same or very similar the interface, obviously the API is a little bit different. But, but it's similar in that regard. But I, I think, like I said, we're moving, we are just a database company actually. And the tagline, just explained really, well, kind of the concept of, of what it is like a backend as a service. It has the real-time streams. It has the auth layer. It has the also generated APIs. So I don't know how long we'll stick with the tagline. I think we'll probably outgrow it at some point. Um, But it does do a good job of communicating roughly what the service is.

[00:03:39] Jeremy: So when we talk about it being similar to Firebase, the part that's similar to fire base is that you could be a person building the front end part of the website, and you don't need to necessarily have a backend application because all of that could talk to supabase and supabase can handle the authentication, the real-time notifications all those sorts of things, similar to Firebase, where we're basically you only need to write the front end part, and then you have to know how to, to set up super base in this case.

[00:04:14] Ant: Yeah, exactly. And some of the other, like we took w we love fire based, by the way. We're not building an alternative to try and destroy it. It's kind of like, we're just building the SQL alternative and we take a lot of inspiration from it. And the other thing we love is that you can administer your database from the browser.

So you go into Firebase and you have the, you can see the object tree, and when you're in development, you can edit some of the documents in real time. And, and so we took that experience and effectively built like a spreadsheet view inside of our dashboard. And also obviously have a SQL editor in there as well.

And trying to, create this, this like a similar developer experience, because that's where Firebase just excels is. The DX is incredible. And so we, we take a lot of inspiration from it in, in those respects.

[00:05:08] Jeremy: and to to make it clear to our listeners as well. When you talk about this interface, that's kind of like a spreadsheet and things like that. I suppose it's similar to somebody opening up pgAdmin, I suppose, and going in and editing the rows. But, but maybe you've got like another layer on top that just makes it a little more user-friendly a little bit more like something you would get from Firebase, I guess.

[00:05:33] Ant: Yeah.

And, you know, we, we take a lot of inspiration from pgAdmin. PG admin is also open source. So I think we we've contributed a few things and, or trying to upstream a few things into PG admin. The other thing that we took a lot of inspiration from for the table editor, what we call it is airtable.

And because airtable is effectively. a a relational database and that you can just come in and, you know, click to add your columns, click to add a new table. And so we just want to reproduce that experience again, backed up by a full Postgres dedicated database.

[00:06:13] Jeremy: so when you're working with a Postgres database, normally you need some kind of layer in front of it, right? That the person can't open up their website and connect directly to Postgres from their browser. And you mentioned PostgREST before. I wonder if you could explain a little bit about what that is and how it works.

[00:06:34] Ant: Yeah, definitely. so yeah, PostgREST has been around for a while. Um, It's basically an, a server that you connect to, to your Postgres database and it introspects your schemas and generates an API for you based on the table names, the column names. And then you can basically then communicate with your Postgres database via this restful API.

So you can do pretty much, most of the filtering operations that you can do in SQL um, uh, equality filters. You can even do full text search over the API. So it just means that whenever you obviously add a new table or a new schema or a new column the API just updates instantly. So you, you don't have to worry about writing that, that middle layer which is, was always the drag right.

When, what have you started a new project. It's like, okay, I've got my schema, I've got my client. Now I have to do all the connecting code in the middle of which is kind of, yeah, no, no developers should need to write that layer in 2022.

[00:07:46] Jeremy: so this the layer you're referring to, when I think of a traditional. Web application. I think of having to write routes controllers and, and create this, this sort of structure where I know all the tables in my database, but the controllers I create may not map one to one with those tables. And so you mentioned a little bit about how PostgREST looks at the schema and starts to build an API automatically.

And I wonder if you could explain a little bit about how it does those mappings or if you're writing those yourself.

[00:08:21] Ant: Yeah, it basically does them automatically by default, it will, you know, map every table, every column. When you want to start restricting things. Well, there's two, there's two parts to this. There's one thing which I'm sure we'll get into, which is how is this secure since you are communicating direct from the client.

But the other part is what you mentioned giving like a reduced view of a particular date, bit of data. And for that, we just use Postgres views. So you define a view which might be, you know it might have joins across a couple of different tables or it might just be a limited set of columns on one of your tables. And then you can choose to just expose that view.

[00:09:05] Jeremy: so it sounds like when you would typically create a controller and create a route. Instead you create a view within your Postgres database and then PostgREST can take that view and create an end point for it, map it to that.

[00:09:21] Ant: Yeah, exactly (laughs) .

[00:09:24] Jeremy: And, and PostgREST is an open source project. Right. I wonder if you could talk a little bit about sort of what its its history was. How did you come to choose it?

[00:09:37] Ant: Yeah.

I think, I think Paul probably read about it on hacker news at some point. Anytime it appears on hacker news, it just gets voted to the front page because it's, it's So awesome. And we, we got connected to the maintainer, Steve Chavez. At some point I think he just took an interest in, or we took an interest in Postgres and we kind of got acquainted.

And then we found out that, you know, Steve was open to work and this kind of like probably shaped a lot of the way we think about building out supabase as a project and as a company in that we then decided to employ Steve full time, but just to work on PostgREST because it's obviously a huge benefit for us.

We're very reliant on it. We want it to succeed because it helps our business. And then as we started to add the other components, we decided that we would then always look for existing tools, existing opensource projects that exist before we decided to build something from scratch. So as we're starting to try and replicate the features of Firebase we would and auth is a great example.

We did a full audit of what are all the authorization, authentication, authentication open-source tools that are out there and which one was, if any, would fit best. And we found, and Netlify had built a library called gotrue written in go, which did pretty much exactly what we needed. So we just adopted that.

And now obviously, you know, we, we just have a lot of people on the team contributing to, to gotrue as well.

[00:11:17] Jeremy: you touched on this a little bit earlier. Normally when you connect to a Postgres database your user has permission to, to basically everything I guess, by default, anyways. And so. So, how does that work? Where when you want to restrict people's permissions, make sure they only get to see records they're allowed to see how has that all configured in PostgREST and what's happening behind the scenes?

[00:11:44] Ant: Yeah, we, the great thing about Postgres is it's got this concept of row level security, which actually, I don't think I even rarely looked at until we were building out this auth feature where the security rules live in your database as SQL. So you do like a create policy query, and you say anytime someone tries to select or insert or update apply this policy.

And then how it all fits together is our auth server go true. Someone will basically make a request to sign in or sign up with email and password, and we create that user inside the, database. They get issued a URL. And they get issued a JSON, web token, a JWT, and which, you know, when they, when they have it on the, client side, proves that they are this, you, you ID, they have access to this data.

Then when they make a request via PostgREST, they send the JWT in the authorization header. Then Postgres will pull out that JWT check the sub claim, which is the UID and compare it to any rows in the database, according to the policy that you wrote. So, so the most basic one is you say in order to, to access this row, it must have a column you UID and it must match whatever is in the JWT.

So we basically push the authorization down into the database which actually has, you know, a lot of other benefits in that as you write new clients, You don't need to have, have it live, you know, on an API layer on the client. It's kind of just, everything is managed from the database.

[00:13:33] Jeremy: So the, the, you, you ID, you mentioned that represents the user, correct.

[00:13:39] Ant: Yeah.

[00:13:41] Jeremy: Is that, does that map to a user in post graphs or is there some other way that you're mapping those permissions?

[00:13:50] Ant: Yeah. When, so when you connect go true, which is the auth server to your Postgres database for the first time, it installs its own schema. So you'll have an auth schema and inside will be all start users with a list of the users. It'll have a uh, auth dot tokens which will store all the access tokens that it's issued.

So, and one of the columns on the auth start user's table will be UUID, and then whenever you write application specific schemers, you can just join a, do a foreign key relation to the author users table. So, so it all gets into schema design and and hopefully we do a good job of having some good education content in the docs as well.

Because one of the things we struggled with from the start was how much do we abstract away from SQL away from Postgres and how much do we educate? And we actually landed on the educate sides because I mean, once you start learning about Postgres, it becomes kind of a superpower for you as a developer.

So we'd much rather. Have people discover us because we're a firebase alternatives frontend devs then we help them with things like schema design landing about row level security. Because ultimately like every, if you try and abstract that stuff it gets kind of crappy. And maybe not such a great experience.

[00:15:20] Jeremy: to make sure I understand correctly. So you have GoTrue, which is uh, a Netlify open-source project that GoTrue project creates some tables in your, your database that has like, you've mentioned the tokens, the, the different users. Somebody makes a request to GoTrue. Like here's my username, my password go true.

Gives them back a JWT. And then from your front end, you send that JWT to the PostgREST endpoint. And from that JWT, it's able to know which user you are and then uses postgres' built in a row level security to figure out which rows you're, you're allowed to bring back. Did I, did I get that right?

[00:16:07] Ant: That is pretty much exactly how it works. And it's impressive that you garnered that without looking at a single diagram (laughs) But yeah, and, and, and obviously we, we provide a client library supabase JS, which actually does a lot of this work for you. So you don't need to manually attach the JJ JWT in a header.

If you've authenticated with supabase JS, then every request sent to PostgREST. After that point, the header will just be attached automatically, and you'll be in a session as that user.

[00:16:43] Jeremy: and, and the users that we're talking about when we talk about Postgres' row level security. Are those actual users in PostgreSQL. Like if I was to log in with psql, I could actually log in with those users.

[00:17:00] Ant: They're not, you could potentially structure it that way. But it would be more advanced it's it's basically just users in, in the auth.users table, the way, the way it's currently done.

[00:17:12] Jeremy: I see and postgrest has the, that row level security is able to work with that table. You, you don't need to have actual Postgres users.

[00:17:23] Ant: Exactly. And, and it's, it's basically turing complete. I mean, you can write extremely complex auth policies. You can say, you know, only give access to this particular admin group on a Thursday afternoon between six and 8:00 PM. You can get really, yeah. really as fancy as you want.

[00:17:44] Jeremy: Is that all written in SQL or are there other languages they allow you to use?

[00:17:50] Ant: Yeah. It's the default is plain SQL. Within Postgres itself, you can use

I think you can use, like there's a Python extension. There's a JavaScript extension, which is a, I think it's a subsets of, of JavaScripts. I mean, this is the thing with Postgres, it's super extensible and people have probably got all kinds of interpreters.

So you, yeah, you can use whatever you want, but the typical user will just use SQL.

[00:18:17] Jeremy: interesting. And that applies to logic in general, I suppose, where if you were writing a rails application, you might write Ruby. Um, If you're writing a node application, you write JavaScript, but you're, you're saying in a lot of cases with PostgREST, you're actually able to do what you want to do, whether that's serialization or mapping objects, do that all through SQL.

[00:18:44] Ant: Yeah, exactly, exactly. And then obviously like there's a lot of awesome other stuff that Postgres has like this postGIS, which if you're doing geo, if you've got like a geo application, it'll load it up with a geo types for you, which you can just use. If you're doing like encryption and decryption, we just added PG libsodium, which is a new and awesome cryptography extension.

And so you can use all of these, these all add like functions, like SQL functions which you can kind of use in, in any parts of the logic or in the role level policies. Yeah.

[00:19:22] Jeremy: and something I thought was a little unique about PostgREST is that I believe it's written in Haskell. Is that right?

[00:19:29] Ant: Yeah, exactly. And it makes it fairly inaccessible to me as a result. But the good thing is it's got a thriving community of its own and, you know, people who on there's people who contribute probably because it's written in haskell. And it's, it's just a really awesome project and it's an excuse to, to contribute to it.

But yeah. I, I think I did probably the intro course, like many people and beyond that, it's just, yeah, kind of inaccessible to me.

[00:19:59] Jeremy: yeah, I suppose that's the trade-off right. Is you have a, a really passionate community about like people who really want to use Haskell and then you've got the, the, I guess the group like yourselves that looks at it and goes, oh, I don't, I don't know about this.

[00:20:13] Ant: I would, I would love to have the time to, to invest in uh, but not practical right now.

[00:20:21] Jeremy: You talked a little bit about the GoTrue project from Netlify. I think I saw on one of your blog posts that you actually forked it. Can you sort of explain the reasoning behind doing that?

[00:20:34] Ant: Yeah, initially it was because we were trying to move extremely fast. So, so we did Y Combinator in 2020. And when you do Y Combinator, you get like a part, a group partner, they call it one of the, the partners from YC and they add a huge amount of external pressure to move very quickly. And, and our biggest feature that we are working on in that period was auth.

And we just kept getting the question of like, when are you going to ship auth? You know, and every single week we'd be like, we're working on it, we're working on it. And um, and one of the ways we could do it was we just had to iterate extremely quickly and we didn't rarely have the time to, to upstream things correctly.

And actually like the way we use it in our stack is slightly differently. They connected to MySQL, we connected to Postgres. So we had to make some structural changes to do that. And the dream would be now that we, we spend some time upstream and a lot of the changes. And hopefully we do get around to that.

But the, yeah, the pace at which we've had to move over the last uh, year and a half has been kind of scary and, and that's the main reason, but you know, hopefully now we're a little bit more established. We can hire some more people to, to just focus on, go true and, and bringing the two folks back together.

[00:22:01] Jeremy: it's just a matter of, like you said speed, I suppose, because the PostgREST you, you chose to continue working off of the existing open source project, right?

[00:22:15] Ant: Yeah, exactly. Exactly. And I think the other thing is it's not a major part of Netlify's business, as I understand it. I think if it was and if both companies had more resource behind it, it would make sense to obviously focus on on the single codebase but I think both companies don't contribute as much resource as as we would like to, but um, but it's, it's for me, it's, it's one of my favorite parts of the stack to work on because it's written in go and I kind of enjoy how that it all fits together.

So Yeah. I, I like to dive in there.

[00:22:55] Jeremy: w w what about go, or what about how it's structured? Do you particularly enjoy about the, that part of the project?

[00:23:02] Ant: I think it's so I actually learned learned go through, gotrue and I'm, I have like a Python and C plus plus background And I hate the fact that I don't get to use Python and C plus posts rarely in my day to day job. It's obviously a lot of type script. And then when we inherited this code base, it was kind of, as I was picking it up I, it just reminded me a lot of, you know, a lot of the things I loved about Python and C plus plus, and, and the tooling around it as well. I just found to be exceptional. So, you know, you just do like a small amounts of conflig. Uh config, And it makes it very difficult to, to write bad code, if that makes sense.

So the compiler will just, boot you back if you try and do something silly which isn't necessarily the case with, with JavaScript. I think TypeScript is a little bit better now, but Yeah, I just, it just reminded me a lot of my Python and C days.

[00:24:01] Jeremy: Yeah, I'm not too familiar with go, but my understanding is that there's, there's a formatter that's a part of the language, so there's kind of a consistency there. And then the language itself tries to get people to, to build things in the same way, or maybe have simpler ways of building things. Um, I don't, I don't know.

Maybe that's part of the appeal.

[00:24:25] Ant: Yeah, exactly. And the package manager as well is great. It just does a lot of the importing automatically. and makes sure like all the declarations at the top are formatted correctly and, and are definitely there. So Yeah. just all of that tool chain is just really easy to pick up.

[00:24:46] Jeremy: Yeah. And I, and I think compiled languages as well, when you have the static type checking. By the compiler, you know, not having things blow up and run time. That's, that's just such a big relief, at least for me in a lot of cases,

[00:25:00] Ant: And I just loved the dopamine hits of when you compile something on it actually compiles this. I lose that with, with working with JavaScript.

[00:25:11] Jeremy: for sure. One of the topics you mentioned earlier was how super base provides real-time database updates. And which is something that as far as I know is not natively a part of Postgres. So I wonder if you could explain a little bit about how that works and how that came about.

[00:25:31] Ant: Yeah. So, So Postgres, when you add replication databases the way it does is it writes everything to this thing called the write ahead log, which is basically all the changes that uh, have, are going to be applied to, to the database. And when you connect to like a replication database. It basically streams that log across.

And that's how the replica knows what, what changes to, to add. So we wrote a server, which basically pretends to be a Postgres rep, replica receives the right ahead log encodes it into JSON. And then you can subscribe to that server over web sockets. And so you can choose whether to subscribe, to changes on a particular schema or a particular table or particular columns, and even do equality matches on rows and things like this.

And then we recently added the role level security policies to the real-time stream as well. So that was something that took us a while to, cause it was probably one of the largest technical challenges we've faced. But now that it's in the real-time stream is, is fully secure and you can apply these, these same policies that you apply over the CRUD API as well.

[00:26:48] Jeremy: So for that part, did you have to look into the internals of Postgres and how it did its row level security and try to duplicate that in your own code?

[00:26:59] Ant: Yeah, pretty much. I mean it's yeah, it's fairly complex and there's a guy on our team who, well, for him, it didn't seem as complex, let's say (laughs) , but yeah, that's pretty much it it's just a lot of it's effectively a SQL um, a Postgres extension itself, uh which in-in interprets those policies and applies them to, to the, to the, the right ahead log.

[00:27:26] Jeremy: and this piece that you wrote, that's listening to the right ahead log. what was it written in and, and how did you choose that, that language or that stack?

[00:27:36] Ant: Yeah. That's written in the Elixir framework which is based on Erlang very horizontally scalable. So any applications that you write in Elixir can kind of just scale horizontally the message passing and, you know, go into the billions and it's no problem. So it just seemed like a sensible choice for this type of application where you don't know.

How large the wall is going to be. So it could just be like a few changes per second. It could be a million changes per second, then you need to be able to scale out. And I think Paul who's my co-founder originally, he wrote the first version of it and I think he wrote it as an excuse to learn Elixir, which is how, a lot of probably how PostgREST ended up being Haskell, I imagine.

But uh, but it's meant that the Elixir community is still like relatively small. But it's a group of like very passionate and very um, highly skilled developers. So when we hire from that pool everyone who comes on board is just like, yeah, just, just really good and really enjoy is working with Elixir.

So it's been a good source of a good source for hires as well. Just, just using those tools.

[00:28:53] Jeremy: with a feature like this, I'm assuming it's where somebody goes to their website. They make a web socket connection to your application and they receive the updates that way. How have you seen how far you're able to push that in terms of connections, in terms of throughput, things like that?

[00:29:12] Ant: Yeah, I don't actually have the numbers at hand. But we have, yeah, we have a team focused on obviously maximizing that but yeah, I don't I don't don't have those numbers right now.

[00:29:24] Jeremy: one of the last things you've you've got on your website is a storage project or a storage product, I should say. And I believe it's written in TypeScript, so I was curious, we've got PostGrest, which is in Haskell. We've got go true and go. Uh, We've got the real-time database part in elixir.

And so with storage, how did we finally get to TypeScript?

[00:29:50] Ant: (Laughs) Well, the policy we kind of landed on was best tool for the job. Again, the good thing about being an open source is we're not resource constrained by the number of people who are in our team. It's by the number of people who are in the community and I'm willing to contribute. And so for that, I think one of the guys just went through a few different options that we could have went with, go just to keep it in line with a couple of the other APIs.

But we just decided, you know, a lot of people well, everyone in the team like TypeScript is kind of just a given. And, and again, it was kind of down to speed, like what's the fastest uh we can get this up and running. And I think if we use TypeScript, it was, it was the best solution there. But yeah, but we just always go with whatever is best.

Um, We don't worry too much uh, about, you know, the resources we have because the open source community has just been so great in helping us build supabase. And building supabase is like building like five companies at the same time actually, because each of these vertical stacks could be its own startup, like the auth stack And the storage layer, and all of this stuff.

And you know, each has, it does have its own dedicated team. So yeah. So we're not too worried about the variation in languages.

[00:31:13] Jeremy: And the storage layer is this basically a wrapper around S3 or like what is that product doing?

[00:31:21] Ant: Yeah, exactly. It's it's wraparound as three. It, it would also work with all of the S3 compatible storage systems. There's a few Backblaze and a few others. So if you wanted to self host and use one of those alternatives, you could, we just have everything in our own S3 booklets inside of AWS.

And then the other awesome thing about the storage system is that because we store the metadata inside of Postgres. So basically the object tree of what buckets and folders and files are there. You can write your role level policies against the object tree. So you can say this, this user should only access this folder and it's, and it's children which was kind of. Kind of an accident. We just landed on that. But it's one of my favorite things now about writing applications and supervisors is the rollover policies kind of work everywhere.

[00:32:21] Jeremy: Yeah, it's interesting. It sounds like everything. Whether it's the storage or the authentication it's all comes back to postgres, right? At all. It's using the row level security. It's using everything that you put into the tables there, and everything's just kind of digging into that to get what it needs.

[00:32:42] Ant: Yeah. And that's why I say we are a database company. We are a Postgres company. We're all in on postgres. We got asked in the early days. Oh, well, would you also make it my SQL compatible compatible with something else? And, but the amounts. Features Postgres has, if we just like continue to leverage them then it, it just makes the stack way more powerful than if we try to you know, go thin across multiple different databases.

[00:33:16] Jeremy: And so that, that kind of brings me to, you mentioned how your Postgres companies, so when somebody signs up for supabase they create their first instance. What's what's happening behind the scenes. Are you creating a Postgres instance for them in a container, for example, how do you size it? That sort of thing.

[00:33:37] Ant: Yeah. So it's basically just easy to under the hood for us we, we have plans eventually to be multi-cloud. But again, going down to the speed of execution that the. The fastest way was to just spin up a dedicated instance, a dedicated Postgres instance per user on EC2. We do also package all of the API APIs together in a second EC2 instance.

But we're starting to break those out into clustered services. So for example, you know, not every user will use the storage API, so it doesn't make sense to Rooney for every user regardless. So we've, we've made that multitenant, the application code, and now we just run a huge global cluster which people connect through to access the S3 bucket.

Basically and we're gonna, we have plans to do that for the other services as well. So right now it's you got two EC2 instances. But over time it will be just the Postgres instance and, and we wanted. Give everyone a dedicated instance, because there's nothing worse than sharing database resource with all the users, especially when you don't know how heavily they're going to use it, whether they're going to be bursty.

So I think one of the things we just said from the start is everyone gets a Postgres instance and you get access to it as well. You can use your Postgres connection string to, to log in from the command line and kind of do whatever you want. It's yours.

[00:35:12] Jeremy: so did it, did I get it right? That when I sign up, I create a super base account. You're actually creating an two instance for me specifically. So it's like every customer gets their, their own isolated it's their own CPU, their own Ram, that sort of thing.

[00:35:29] Ant: Yeah, exactly, exactly. And, and the way the. We've set up the monitoring as well, is that we can expose basically all of that to you in the dashboard as well. so you can, you have some control over like the resource you want to use. If you want to a more powerful instance, we can do that. A lot of that stuff is automated.

So if someone scales beyond the allocated disk size, the disk will automatically scale up by 50% each time. And we're working on automating a bunch of these, these other things as well.

[00:36:03] Jeremy: so is it, is it where, when you first create the account, you might create, for example, a micro instance, and then you have internal monitoring tools that see, oh, the CPU is getting heady hit pretty hard. So we need to migrate this person to a bigger instance, that kind of thing.

[00:36:22] Ant: Yeah, pretty much exactly.

[00:36:25] Jeremy: And is that, is that something that the user would even see or is it the case of where you send them an email and go like, Hey, we notice you're hitting the limits here. Here's what's going to happen.

[00:36:37] Ant: Yeah.

In, in most cases it's handled automatically. There are people who come in and from day one, they say has my requirements. I'm going to have this much traffic. And I'm going to have, you know, a hundred thousand users hitting this every hour. And in those cases we will over-provisioned from the start.

But if it's just the self service case, then it will be start on a smaller instance and an upgrade over time. And this is one of our biggest challenges over the next five years is we want to move to a more scalable Postgres. So cloud native Postgres. But the cool thing about this is there's a lot of.

Different companies and individuals working on this and upstreaming into Postgres itself. So for us, we don't need to, and we, and we would never want to fork Postgres and, you know, and try and separate the storage and the the computes. But more we're gonna fund people who are already working on this so that it gets upstreamed into Postgres itself.

And it's more cloud native.

[00:37:46] Jeremy: Yeah. So I think the, like we talked a little bit about how Firebase was the original inspiration and when you work with Firebase, you, you don't think about an instance at all, right? You, you just put data in, you get data out. And it sounds like in this case, you're, you're kind of working from the standpoint of, we're going to give you this single Postgres instance.

As you hit the limits, we'll give you a bigger one. But at some point you, you will hit a limit of where just that one instance is not enough. And I wonder if there's you have any plans for that, or if you're doing anything currently to, to handle that.

[00:38:28] Ant: Yeah. So, so the medium goal is to do replication like horizontal scaling. We, we do that for some users already but we manually set that up. we do want to bring that to the self serve model as well, where you can just choose from the start. So I want, you know, replicas in these, in these zones and in these different data centers.

But then, like I said, the long-term goal is that. it's not based on. Horizontally scaling a number of instances it's just a Postgres itself can, can scale out. And I think we will get to, I think, honestly, the race at which the Postgres community is working, I think we'll be there in two years.

And, and if we can contribute resource towards that, that goal, I think yeah, like we'd love to do that, but yeah, but for now, it's, we're working on this intermediate solution of, of what people already do with, Postgres, which is, you know, have you replicas to make it highly available.

[00:39:30] Jeremy: And with, with that, I, I suppose at least in the short term, the goal is that your monitoring software and your team is handling the scaling up the instance or creating the read replicas. So to the user, it, for the most part feels like a managed service. And then yeah, the next step would be to, to get something more similar to maybe Amazon's Aurora, I suppose, where it just kind of, you pay per use.

[00:40:01] Ant: Yeah, exactly. Exactly. Aurora was kind of the goal from the start. It's just a shame that it's proprietary. Obviously.

[00:40:08] Jeremy: right.

Um, but it sounds,

[00:40:10] Ant: the world would be a better place. If aurora was opensource.

[00:40:15] Jeremy: yeah. And it sounds like you said, there's people in the open source community that are, that are trying to get there. just it'll take time. to, to all this, about making it feel seamless, making it feel like a serverless experience, even though internally, it really isn't, I'm guessing you must have a fair amount of monitoring or ways that you're making these decisions.

I wonder if you can talk a little bit about, you know, what are the metrics you're looking at and what are the applications you're you have to, to help you make these decisions?

[00:40:48] Ant: Yeah. definitely. So we started with Prometheus which is a, you know, metrics gathering tool. And then we moved to Victoria metrics which was just easier for us to scale out. I think soon we'll be managing like a hundred thousand Postgres databases will have been deployed on, on supabase. So definitely, definitely some scale. So this kind of tooling needs to scale to that as well. And then we have agents kind of everywhere on each application on, on the database itself. And we listen for things like the CPU and the Ram and the network IO. We also poll. Uh, Postgres itself. Th there's a extension called PG stats statements, which will give us information about what are, the intensive queries that are running on that, on that box.

So we just collect as much of this as possible um, which we then obviously use internally. We set alerts to, to know when, when we need to upgrade in a certain direction, but we also have an end point where the dashboard subscribes to these metrics as well. So the user themselves can see a lot of this information.

And we, I think at the moment we do a lot of the, the Ram the CPU, that kind of stuff, but we're working on adding just more and more of these observability metrics uh, so people can can know it could, because it also helps with Let's say you might be lacking an index on a particular table and not know about it.

And so if we can expose that to you and give you alerts about that kind of thing, then it obviously helps with the developer experience as well.

[00:42:29] Jeremy: Yeah. And th that brings me to something that I, I hear from platform as a service companies, where if a user has a problem, whether that's a crash or a performance problem, sometimes it can be difficult to distinguish between is it a problem in their application or is this a problem in super base or, you know, and I wonder how your support team kind of approaches that.

[00:42:52] Ant: Yeah, no, it's, it's, it's a great question. And it's definitely something we, we deal with every day, I think because of where we're at as a company we've always seen, like, we actually have a huge advantage in that.

we can provide. Rarely good support. So anytime an engineer joins super base, we tell them your primary job is actually frontline support.

Everything you do afterwards is, is secondary. And so everyone does a four hour shift per week of, of working directly with the customers to help determine this kind of thing. And where we are at the moment is we are happy to dive in and help people with their application code because it helps our engineers land about how it's being used and where the pitfalls are, where we need better documentation, where we need education.

So it's, that is all part of the product at the moment, actually. And, and like I said, because we're not a 10,000 person company we, it's an advantage that we have, that we can deliver that level of support at the moment.

[00:44:01] Jeremy: w w what are some of the most common things you see happening? Like, is it I would expect you mentioned indexing problems, but I'm wondering if there's any specific things that just come up again and again,

[00:44:15] Ant: I think like the most common is people not batching their requests. So they'll write an application, which, you know, needs to, needs to pull 10,000 rows and they send 10,000 requests (laughs) . That that's, that's a typical one for, for people just getting started maybe. Yeah. and, then I think the other thing we faced in the early days was. People storing blobs in the database which we obviously solve that problem by introducing file storage. But people will be trying to store, you know, 50 megabytes, a hundred megabyte files in Postgres itself, and then asking why the performance was so bad.

So I think we've, we've mitigated that one by, by introducing the blob storage.

[00:45:03] Jeremy: and when you're, you mentioned you have. Over a hundred thousand instances running. I imagine there have to be cases where an incident occurs, where something doesn't go quite right. And I wonder if you could give an example of one and how it was resolved.

[00:45:24] Ant: Yeah, it's a good question. I think, yeah, w w we've improved the systems since then, but there was a period where our real time server wasn't able to handle rarely large uh, right ahead logs. So w there was a period where people would just make tons and tons of requests and updates to, to Postgres. And the real time subscriptions were failing. But like I said, we have some really great Elixir devs on the team, so they were able to jump on that fairly quickly. And now, you know, the application is, is way more scalable as a result. And that's just kind of how the support model works is you have a period where everything is breaking and then uh, then you can just, you know, tackle these things one by one.

[00:46:15] Jeremy: Yeah, I think any, anybody at a, an early startup is going to run into that. Right? You put it out there and then you find out what's broken, you fix it and you just get better and better as it goes along.

[00:46:28] Ant: Yeah, And the funny thing was this model of, of deploying EC2 instances. We had that in like the first week of starting super base, just me and Paul. And it was never intended to be the final solution. We just kind of did it quickly and to get something up and running for our first handful of users But it's scaled surprisingly well.

And actually the things that broke as we started to get a lot of traffic and a lot of attention where was just silly things. Like we give everyone their own domain when they start a new project. So you'll have project ref dot super base dot in or co. And the things that were breaking where like, you know, we'd ran out of sub-domains with our DNS provider and then, but, and those things always happen in periods of like intense traffic.

So we ha we were on the front page of hacker news, or we had a tech crunch article, and then you discover that you've ran out of sub domains and the last thousand people couldn't deploy their projects. So that's always a fun a fun challenge because you are then dependent on the external providers as well and theirs and their support systems.

So yeah, I think. We did a surprisingly good job of, of putting in good infrastructure from the start. But yeah, all of these crazy things just break when obviously when you get a lot of, a lot of traffic

[00:48:00] Jeremy: Yeah, I find it interesting that you mentioned how you started with creating the EC2 instances and it turned out that just work. I wonder if you could walk me through a little bit about how it worked in the beginning, like, was it the two of you going in and creating instances as people signed up and then how it went from there to where it is today?

[00:48:20] Ant: yeah. So there's a good story about, about our fast user, actually. So me and Paul used to contract for a company in Singapore, which was an NFT company. And so we knew the lead developer very well. And we also still had the Postgres credentials on, on our own machines. And so what we did was we set up the th th the other funny thing is when we first started, we didn't intend to host the database.

We, we thought we were just gonna host the applications that would connect to your existing Postgres instance. And so what we did was we hooked up the applications to, to the, to the Postgres instance of this, of this startup that we knew very well. And then we took the bus to their office and we sat with the lead developer, and we said, look, we've already set this thing up for you.

What do you think. know, when, when you think like, ah, we've, we've got the best thing ever, but it's not until you put it in front of someone and you see them, you know, contemplating it and you're like, oh, maybe, maybe it's not so good. Maybe we don't have anything. And we had that moment of panic of like, oh, maybe we just don't maybe this isn't great.

And then what happened was he didn't like use us. He didn't become a supabase user. He asked to join the team.

[00:49:45] Jeremy: nice, nice.

[00:49:46] Ant: that was a good a good kind of a moment where we thought, okay, maybe we have got something, maybe this is maybe this isn't terrible. So, so yeah, so he became our first employee. Yeah.

[00:49:59] Jeremy: And so yeah, so, so that case was, you know, the very beginning you set everything up from, from scratch. Now that you have people signing up and you have, you know, I don't know how many signups you get a day. Did you write custom infrastructure or applications to do the provisioning or is there an open source project that you're using to handle that

[00:50:21] Ant: Yeah. It's, it's actually mostly custom. And you know, AWS does a lot of the heavy lifting for you. They just provide you with a bunch of API end points. So a lot of that is just written in TypeScript fairly straightforward and, and like I said, you never intended to be the thing that last. Two years into the business.

But it's, it's just scaled surprisingly well. And I'm sure at some point we'll, we'll swap it out for some I don't orchestration tooling like Pulumi or something like this. But actually the, what we've got just works really well.

[00:50:59] Ant: Be because we're so into Postgres our queuing system is a Postgres extension called PG boss. And then we have a fleet of workers, which are. Uh, We manage on EC ECS. Um, So it's just a bunch of VMs basically which just subscribed to the, to the queue, which lives inside the database.

And just performs all the, whether it be a project creation, deletion modification a whole, whole suite of these things. Yeah.

[00:51:29] Jeremy: very cool. And so even your provisioning is, is based on Postgres.

[00:51:33] Ant: Yeah, exactly. Exactly (laughs) .

[00:51:36] Jeremy: I guess in that case, I think, did you say you're using the right ahead log there to in order to get notifications?

[00:51:44] Ant: We do use real time, and this is the fun thing about building supabase is we use supabase to build supabase. And a lot of the features start with things that we build for ourselves. So the, the observability features we have a huge logging division. So, so w we were very early users of a tool called a log flare, which is also written in Elixir.

It's basically a log sync backed up by BigQuery. And we loved it so much and we became like super log flare power users that it was kind of, we decided to eventually acquire the company. And now we can just offer log flare to all of our customers as well as part of using supabase. So you can query your logs and get really good business intelligence on what your users um, consuming in from your database.

[00:52:35] Jeremy: the lock flare you're mentioning though, you said that that's a log sink and that that's actually not going to Postgres, right. That's going to a different type of store.

[00:52:43] Ant: Yeah. That is going to big query actually.

[00:52:46] Jeremy: Oh, big query. Okay.

[00:52:47] Ant: yeah, and maybe eventually, and this is the cool thing about watching the Postgres progression is it's become. It's bringing like transactional and analytical databases together. So it's traditionally been a great transactional database, but if you look at a lot of the changes that have been made in recent versions, it's becoming closer and closer to an analytical database.

So maybe at some point we will use it, but yeah, but big query works just great.

[00:53:18] Jeremy: Yeah. It's, it's interesting to see, like, I, I know that we've had episodes on different extensions to Postgres where I believe they change out how the storage works. So there's yeah, it's really interesting how it's it's this one database, but it seems like it can take so many different forms.

[00:53:36] Ant: It's just so extensible and that's why we're so bullish on it because okay. Maybe it wasn't always the best database, but now it seems like it is becoming the best database and the rate at which it's moving. It's like, where's it going to be in five years? And we're just, yeah, we're just very bullish on, on Postgres.

As you can tell from the amount of mentions it's had in this episode.

[00:54:01] Jeremy: yeah, we'll have to count how many times it's been said. I'm sure. It's, I'm sure it's up there. Is there anything else we, we missed or think you should have mentioned.

[00:54:12] Ant: No, some of the things we're excited about are cloud functions. So it's the thing we just get asked for the most at anytime we post anything on Twitter, you're guaranteed to get a reply, which is like when functions. And we're very pleased to say that it's, it's almost there. So um, that will hopefully be a really good developer experience where also we launched like a, a graph QL Postgres extension where the resolver lives inside of Postgres.

And that's still in early alpha, but I think I'm quite excited for when we can start offering that on the on the hosted platform as well. People will have that option to, to use GraphQL instead of, or as well as the restful API.

[00:55:02] Jeremy: the, the common thread here is that PostgreSQL you're able to take it really, really far. Right. In terms of scale up, eventually you'll have the read replicas. Hopefully you'll have. Some kind of I don't know what you would call Aurora, but it's, it's almost like self provisioning, maybe not sharing what, how you describe it.

But I wonder as a, as a company, like we talked about big query, right? I wonder if there's any use cases that you've come across, either from customers or in your own work where you're like, I just, I just can't get it to fit into Postgres.

[00:55:38] Ant: I think like, not very often, but sometimes we'll, we will respond to support requests and recommend that people use Firebase. they're rarely

like if, if they really do have like large amounts of unstructured data, which is which, you know, documented storage is, is kind of perfect for that. We'll just say, you know, maybe you should just use Firebase.

So we definitely come across things like that. And, and like I said, we love, we love Firebase, so we're definitely not trying to, to uh, destroy as a tool. I think it, it has its use cases where it's an incredible tool yeah. And provides a lot of inspiration for, for what we're building as well.

[00:56:28] Jeremy: all right. Well, I think that's a good place to, to wrap it up, but where can people hear more about you hear more about supabase?

[00:56:38] Ant: Yeah, so supeabase is at supabase.com. I'm on Twitter at ant Wilson. Supabase is on Twitter at super base. Just hits us up. We're quite active on the and then definitely check out the repose gets up.com/super base. There's lots of great stuff to dig into as we discussed. There's a lot of different languages, so kind of whatever you're into, you'll probably find something where you can contribute.

[00:57:04] Jeremy: Yeah, and we, we sorta touched on this, but I think everything we've talked about with the exception of the provisioning part and the monitoring part is all open source. Is that correct?

[00:57:16] Ant: Yeah, exactly.

And as, yeah. And hopefully everything we build moving forward, including functions and graph QL we'll continue to be open source.

[00:57:31] Jeremy: And then I suppose the one thing I, I did mean to touch on is what, what is the, the license for all the components you're using that are open source?

[00:57:41] Ant: It's mostly Apache2 or MIT. And then obviously Postgres has its own Postgres license. So as long as it's, it's one of those, then we, we're not too precious. I, As I said, we inherit a fair amounts of projects. So we contribute to and adopt projects. So as long as it's just very permissive, then we don't care too much.

[00:58:05] Jeremy: As far as the projects that your team has worked on, I've noticed that over the years, we've seen a lot of companies move to things like the business source license or there's, there's all these different licenses that are not quite so permissive. And I wonder like what your thoughts are on that for the future of your company and why you think that you'll be able to stay permissive.

[00:58:32] Ant: Yeah, I really, really, rarely hope that we can stay permissive. forever. It's, it's a philosophical thing for, for us. You know, when we, we started the business, it's what just very, just very, as individuals into the idea of open source. And you know, if, if, if AWS come along at some point and offer hosted supabase on AWS, then it will be a signal that where we're doing something.

Right. And at that point we just, I think we just need to be. The best team to continue to move super boost forward. And if we are that, and I, I think we will be there and then hopefully we will never have to tackle this this licensing issue.

[00:59:19] Jeremy: All right. Well, I wish you, I wish you luck.

[00:59:23] Ant: Thanks. Thanks for having me.

[00:59:25] Jeremy: This has been Jeremy Jung for software engineering radio. Thanks for listening.

Testing with Jason Swett

Mon, 04 Apr 2022 05:34:50 -0000

Jason Swett is the author of the Complete Guide to Rails Testing. We covered Jason's experience with testing while building relatively small Ruby on Rails applications. Our conversation applies to just about any language or framework so don't worry if you aren't familiar with Rails.

A few topics covered:
- Listen to advice but be aware of its context. Something good for a large project may not apply to a small one
- Fast feedback loops help us work quicker and tests are great for this
- If you don't involve things like the database in any of your tests your application may not work at all despite your tests passing
- You may not need to worry about scaling at the start for smaller or internal applications
- Try to break features into the smallest pieces possible so they can be checked in and reviewed quickly
- Jason doesn't remember the difference between a stub and a mock because he rarely uses them

Transcript:

[00:00:00] Jeremy: today I'm talking to Jason Swett, he's the author of the complete guide to rails testing, a frequent trainer and conference speaker. And he's the host of the code with Jason podcast. So Jason, welcome to software sessions.

[00:00:13] Jason: Thanks for having me.

[00:00:15] Jeremy: from listening to your podcast, I get a sense that the size of the projects you work on they're, they're relatively modest.

Like they're not like a super huge thing. There, there may be something that you can fit all within your head. And I was wondering if you could talk a little bit to that first, so that we kind of get where your perspective is and the types of projects you work on are.

[00:00:40] Jason: Yeah. Good question. So that is true. Most of my jobs have been at small companies and I think that's probably typical of the typical developer because most businesses in the world are small businesses. You know, there's, there's a whole bunch of small businesses for every large business. And so most of the code bases I've worked on have been not particularly huge.

And most of the teams I've worked on have been relatively small And sometimes so small that it's just me. I'm the only person working on the application. I, don't really know any different. So I can't really compare it to working on a larger application. I have worked at, I worked at AT&T so that was a big place, but I was at, AT&T just working on my own solo project so that wasn't a big code base either.

So yeah, that's been what my experience has been like.

[00:01:36] Jeremy: Yeah. And I, I think that's interesting that you mentioned most people work in that space as well, because that's basically where I fall as well. So when I listened to your podcast and I hear you talking about like, oh, I have a, I have a rails project where I just have a single server and you know, I have a database and rails, and maybe I have nginx in front, maybe redis it's sort of the scale that I'm familiar with versus when I hear podcasts or articles, you know, I'm reading where they're talking about, oh, we have 500 microservices or we have 200 instances of the application.

That's, that's not a space that I've, I've worked in. So I, I found it helpful to, to hear, you know, from you on your show that like, Hey, you know, not everybody is working on these gigantic projects.

[00:02:28] Jason: Yeah. Yeah. It's not terribly relatable when you hear about those huge projects.

And obviously, sometimes, maybe people earlier in their career can get the wrong idea about what's applicable to their situation. I feel like one of the most dangerous kinds of advice is advice that's good advice, but it's good advice for somebody else.

And then I've, I've. Been victim of that, where I get some advice and maybe it's genuinely good advice, but it's not good advice for me where I am doing what I'm doing. And so, I apply the advice, but it's not the right thing. And so it doesn't work out for me. So I'm always careful to like asterisk a lot of the things I say where it's like, Hey, this is, this is good advice if you're in this particular situation, but maybe not for everybody.

And really the truth is I, I try not to give advice these days because like advice is dangerous stuff for that very reason.

[00:03:28] Jeremy: so, so when you mentioned you try not to give advice and you have this book, the complete guide to rails testing, would you not describe what's in the book as advice? I'm kind of curious what the distinction is there.

[00:03:42] Jason: Yeah, Jeremy, right after I said that, I'm like, what am I talking about? I give all kinds of advice. So forget, I said that I totally give advice. But maybe not in certain things like like business advice or anything like that. I do give a lot of advice around testing and various programming things.

So, yeah, ignore that part of what I said.

[00:04:03] Jeremy: something that I found a little bit unique about rails testing was that a lot of the tests are centered around I guess you could call it like a full integration test, right? Because I noticed when working with rails, if I write a test, a lot of times it's talking to the database, it's talking to if, if I.

Have an API or I have a website it's actually talking to the API. So it's actually going through all the layers and spinning up a database and all that. And I wonder if you, you knew how that work, like each time you run a test, is it creating a new database? So that each test is isolated or how does all that stuff actually work?

[00:04:51] Jason: Yeah, good question. First. I want to mention something about terminology. So I think one of the most important things for somebody who's new to testing to learn is that in our industry, we don't have a consensus around terminology. So what you call an integration test might be different from what I call an integration test.

The thing you just described as an integration test, I might call an acceptance test. Although I happen to also call it an integration test cause I use that terminology too, but I just wanted to give that little asterisk for the listener, because if they're like, wait, I thought an integration test was this.

And not that anyway, you asked how does that work? So. It is true that with those types of rails tests, and just to add more terminology into the mix, they call those system tests or system specs, depending on what framework you're using. But those are the tests that actually instantiate a browser and simulating user input, exercise, the UI of the application.

And those are the kinds of tests that like show you that everything works together. And mechanically how that works. One layer of it is that each test runs in a database transaction. So when you, you know, in order to run a certain test, maybe you need certain records like a user. And then I don't know if it's a scheduling test, you might need to create an appointment and whatever. All those records that you create specifically for that test that's happening inside of a database transaction. And then at the end of the test, the transaction is aborted. So that none of the data you create during the test actually gets persisted to the database. then regarding the whole database, it's not actually like creating a new database instance at the start of each test and then blowing it away.

It's still the same database instance is just the data inside of each test is not being persisted at.

[00:07:05] Jeremy: Okay. So when you run. What you would call, I guess you called it an acceptance test, right? Where it's going, it's opening up your website, it's clicking through the website, creating records, things like that. That's happening in a database instance that's created for, I guess, for all your tests that all your tests get to reuse and rails is automatically wrapping your test in a transaction.

So even if you're doing five or 10 database queries at the end of all, that they all get rolled back because they're all within the same transaction.

[00:07:46] Jason: Exactly. And the reason why we want to do that. Is because of a testing principle that you want your tests to be runnable in any order. And the key thing is you want your tests to be deterministic. So deterministic means that the starting state determines the in-state and it's the same every time, no matter what.

So if you have tests a, B and C, it shouldn't be the case that you can run them in the order, ABC, and they all pass. But if you do it CBA, then test a fails because it should only fail. If something's actually wrong, it shouldn't fail for some other reason, like the order in which you run the tests. And so to ensure that property of deterministic newness we need to make it so that each test doesn't leak into the other tests.

Cause imagine if that. Database transaction. thing didn't happen. And it's, it's only incidental that that's achieved via database transactions. It could conceivably be achieved some other way. That's just how this happens to work in this particular case. But imagine if no measure was taken to clean up afterward and I, I ran a test and it generated an appointment.

And then the test that runs after that does some tests that involves like doing a count of appointments or something like that. And maybe like, coincidentally, my second test passes because I've always run the tests in a certain order. and so unbeknownst to me, test B only passes because of what I did in test a that's bad because now the thing that's happening is different from what I think is happening.

And then if it flipped and when we ran it, test B and then test a. It wouldn't work anymore. So that's why we make each test isolated. So it can be deterministic.

[00:09:51] Jeremy: and I wonder if you've worked with any other frameworks or any other languages, and if you found that the approaches and those frameworks or languages is similar to rails, like where it creates these, the transaction for you, does the rollback for you and all of that.

[00:10:08] Jason: Good question. I have to plead ignorance. I've dabbled a little bit in testing and some other languages or frameworks, but not enough to be able to say anything smart about it.

[00:10:22] Jeremy: Yeah, I mean in my experience and of course there are many different frameworks that I'm not familiar with, but in a lot of cases, I I've seen that they don't have this kind of behavior built in, like, they'll provide you a way to test your application, but it's up to you if you want to write code that will wrap everything in a transaction or create a new database instance per test, things like that.

That's all left up to you. so I, I think it's interesting that that rails makes that decision for you and makes it to where you don't really have to think about that or make that decision. And for me personally, I found that really helpful.

[00:11:09] Jason: Yeah, it's really nice. It's a decision that not everybody is going to be on board with. And by that decision, I mean the general decision of rails to make a lot of decisions for you. And it may not be the case that I agree with every single decision that rails has made, but I do appreciate that that the rails team or DHH, or whoever has decided that rails is just going to have all these sensible defaults.

And that's what you get. And if you want to go tweak that stuff, I guess you can, but you get all this stuff this way. Cause we decided what we think is the best way to do it. And that is how most people use their, their rails apps. I think it's great. It eliminates a lot of overhead and then. Use some other technologies, I've done some JavaScript stuff and it's just astonishing how much boiler plate and how many, how much energy I have to expend on decisions that don't really matter.

And maybe frankly, decisions that I'm not all that equipped to make, because I don't have the requisite knowledge to be able to make those decisions. And usually I'd rather just have somebody else make those decisions for me.

[00:12:27] Jeremy: we've been talking about the more high level tests, the acceptance tests, the integration tests. And when you're choosing on how to test something, how do you decide whether it should be tested that, that level, or if it should be more of a unit level tests, something, something smaller

[00:12:49] Jason: Good question. So I want to zoom out just a little bit in order to answer that question and come at it from a distance. So I recently conducted some interviews for a programmer job. I interviewed about 25 candidates, most of those candidates. Okay. And the first step of the interview was this technical coding exercise. most of the candidates did not pass. And maybe, I don't know. Five or six or seven of the candidates out of those 25 did pass. I thought it was really interesting. The ones who failed all failed in the same way and the ones who passed all passed in the same way. And I thought about what exactly is the difference.

And the difference was that the programmers who passed, they coded in feedback loops. So I'll say that a different way, the ones who failed, they tried to write their whole program at once and they would spend 15, 20 minutes carefully writing the program. And then at the end of that 20 minutes, they would try to run it.

And unsurprisingly to me the program would fail like on line 2 of 30, because nobody's smart enough to write that much code and have the whole thing work. And then the ones who did well. They would write maybe one line of code, run it, observe what happens, compare what they observed to what they expected to see, and if any corrections were needed, they made those corrections and ran it again.

And then only once their expectations were satisfied, did they go and write a second line and they would re repeat that process again, that workflow of programming and feedback loops I think is super important. And I think it's what distinguishes, Hmm. I don't exactly want to say successful programmers from unsuccessful programmers, but there's certainly a lot to do with speed.

like, think about how much slower it is to try to write your whole program, run it and see that it fails. And then try to find the needle in the haystack. It's like, okay, I just wrote 30 lines. There's a problem somewhere. I don't know where, and now I have to dig through and find it It's so much harder than if you just write one line and you see a problem and you know, that, that problem lines in that line, you just wrote.

So I say all that, because testing is just feedback loops automated. So rather than writing a line and then manually running your program and using your own judgment to compare what you observed to what you expected to see you write a test that exercises your code and says, I expect to see this when this happens.

And so the kind of test you write now to answer your question will depend first on the nature of the thing you're writing. But for like, if we take kind of the like typical case of, let's say I'm building a form that will allow me to create a customer in a system. And I put in the first name, last name and email address of the customer. that's a really basic like crud functionality thing. There's not a lot of complexity there. And so I am, to be honest, I might just not write a test at all and we can get into how I decide when to write a test and when not to, but I probably would write a test. And if I did, I would write a system spec to use the rails are spec terminology that spins up a browser.

I would fill in the first name field with a first name, fill in the last name field with the last name, email, with email click, the submit button. And then I would assert that on the subsequent page, I see some indicator of success. And then if we think about something that. Maybe more involved, like I'm thinking about some of the complicated stuff I've been working on recently regarding um, coming up with a patient's balance in the medical system that I work on.

That's a case where I'm not going to spin up a browser to check the correctness of a number. Cause that feels like a mismatch. I'm going to work at a lower level and maybe create some database records and say, when I, when I created this charge and when I create this payment, I expect the remaining balance to be such and such.

So the type of test I write depends highly on the kind of functionality.

[00:17:36] Jeremy: So it sounds like in the case of something that's more straight forward, you might write a high level test, I guess, where you were saying I just click this button and I see if the thing I expected to be created is there on the next page. And you might create that test from the start and then just start filling in the code and continually running that test you know, until it passes.

But you also mentioned that in the case of something simple like that, you might actually. Choose to forego the tests and just take a look you know, visually you open the app and you click that same button and you can see the same result. So I wonder if you could talk a little bit more about how you decide, like, yeah, I'm going to write this test or no, I'm just going to inspect a visually

[00:18:28] Jason: Yeah. So real quick before I answer that, I want to say that it's, it's not one of the tests is straightforward or the feature is straightforward that determines which kind of test I write, because sometimes the acceptance test that I write, which spins up a browser and everything. Sometimes that might be quite an involved test and in complicated feature, or sometimes I might write a lower level test and it's a trivially simple one.

It has more to do with um, What's, what's the thing that I care about. Like, is it primarily like a UI based feature that, that is like the meat of it? Or is it like a, a lower level, like calculation type thing or something like that? That's kind of what determines which kind of right. But you asked when would I decide not to write a test.

So the reason I write tests is because it's just like cost prohibitive to manually perform testing, not just in monetary terms, but like in emotional pain and mental energy and stuff like that. I don't want to go back and manually test everything to make sure that it's still working. And so the ROI on writing automated tests is almost always positive, but sometimes it's not a positive ROI.

And so when I don't write it down, It's if these conditions are true, if the cost of that feature braking is extremely low. And if the I'll put that if, if the consequences of the feature breaking are really small and the frequency of the usage is low and the cost of writing the test is high, then I probably won't write a test.

For example, if there's some report that somebody looks at once every six months and it's like some like maybe a front desk person who uses the feature and if it doesn't work, then it means they have to instead go get the answer manually. And instead of getting the answer in 30 seconds, it takes them five.

Extremely low cost to the failure. And it's like, okay, so I'm costing somebody, maybe 20 bucks once every six months, if this feature breaks. And let's say this test is one that would take like an hour for me to write. Clearly it's better just to accept the risk of that feature breaking once in a while, which it's probably not going to anyway. So those are the questions I ask when I decide and, and to, to be clear, it's not like I run through all those questions for every single test I write in the vast, vast majority of cases. I just write the test because it's a no-brainer that it's, that it's better to write the test, but sometimes my instincts tell me like, Hey, is this really actually important to write a test for?

And when I find myself asking that, then I say, okay, what's the consequences of the breakage? How hard is this test to write all that.

[00:21:46] Jeremy: So you talked about the consequences being low, but you also talked about maybe the time to write the test being high. What are the types of tasks that, that take a long time to write?

[00:21:58] Jason: Usually ones that involve a lot of setup. So pretty much every test requires some data to be in place data, either meaning database, data, or like some object structure or something like that. Sometimes it's really easy sometimes to set up is extremely complicated. and that's usually where the cost comes in.

And then sometimes, sometimes you encounter like a technical challenge, like, oh, how do I like download this file? And then like inspect the contents of this file. Like sometimes you just encounter something that's like technically tricky to achieve. But more frequently when a test is hard to write it's because of the setup is hard.

[00:22:49] Jeremy: and you're talking about set up being, you need to insert a whole bunch of different rows into your database or different things that interact with one, another things like that.

[00:23:02] Jason: Exactly.

[00:23:03] Jeremy: when you're testing a system and you create a database that has all these items in it for you to work with, I'm assuming that what's in your test database is much smaller than what's in the real database. So how do you get something that's representative so that if you only have 10 things in your tasks, but in production, there's thousands of them that you can catch that, Hey, this isn't going to work well, once it gets to production,

[00:23:35] Jason: Yeah. that's a really interesting question. And the answers that I don't like, I usually don't try to make the test beta test database representative of the production database in terms of scale, obviously like the right data has to be there in order to exercise the test that it has to be true. But I don't, for example, in production at this moment I know there's some tens of thousands of appointments in the database, but locally at any given time, there are between zero and three or, or So appointments in any particular test, that's obviously nowhere near realistic, but it's only becomes relevant in a great, great minority of cases with, with regard to that stuff, the way I approach that is rather to So I'm thinking about some of those through the, for the first time right now, but obviously with performance in general premature optimization is usually not a profitable endeavor. And so I'll write features without any thought toward performance. And then once things are out there and perform it in production observe the bottlenecks and then fix the bottlenecks, starting with what's the highest ROI.

And usually tests haven't come into the picture for me. It's cause like, okay. The reason for tests again is, so you don't have to go back and do that manual testing, but with these performance improvements, instead of tests, we have like application performance monitoring tools, and that's what tells me whether something needs an issue or people just say like, Hey, this certain page is slow or whatever.

And so tests would be like redundant to those other measures that we have that tell us if there's performance.

[00:25:38] Jeremy: Yeah. So that sorta touches on what you described before, where let's say you were writing some kind of report or adding a report and when you were testing it locally, it worked great generated the report. Uh, Then you pushed it out to production. Somebody went to run it and maybe because of an indexing problem or some other issue It times out, or it doesn't complete takes a long time, but I guess what you're saying is in a lot of cases, the, the consequences of that are not all that high.

Like the person will try it. They'll see like, Hey, it doesn't work. Either you'll get a notification or they'll let you know, and then that's when you go in and go like, okay, now, now we can fix this.

[00:26:30] Jason: Yeah. And I think like the distinction is the performance aspect of it. Because like with a lot of stuff, you know, if you don't have any tests in your application at all, there's a high potential for like silent failure. And so with the performance stuff, we have other ways of ensuring that there won't be silent failure.

So that's how I think about that particular.

[00:26:56] Jeremy: I guess another thing about tests is when you build an application, a lot of times you're not just interacting with your own database, you're interacting with third-party APIs. You may even be connecting to different pieces of hardware, things like that. So when you're writing a test, how do you choose to approach that?

[00:27:23] Jason: yeah, good question. This is an area where I don't have a lot of personal experience, but I do have some there's another principle in testing that is part of the determinism principle where you don't want to involve external HTTP requests and stuff like that in your tests. Because imagine if I run my test today, And it passes, but then I run my test tomorrow and this third-party API is down and my test fails the behavior of my program didn't change. The only thing that's different is this external API is down right now. And so what I do for, for those is I'll capture the response that I get from the API. And I'll usually somehow um, get my hands on a success response and a failure response and whatever kind of response I want to account for.

And then I'll insert those captured responses into my tests. So that then on every subsequent run, I can be using these canned values rather than hitting the real API.

[00:28:37] Jeremy: I think in your um, the description of your book, you mentioned a section on, on stubs and mocks, and I wonder what you're describing here, which of those two things, is it? And what's the difference?

[00:28:53] Jason: Yeah. it's such a tricky concept And I don't even trust myself to say it right every time that I want to remind myself of the difference between mocks and stubs. I have to go back to my own blog posts that I wrote on it and remind myself, okay, what is the difference between a mock and a stub? And I'll just say, I don't remember.

Because this isn't something that I find myself dealing with very frequently. It's something that people always want to know about at least in the rails world. But I'll speak for myself at least. I don't find myself having to use or wanting to use mocks and stubs very much.

I will say that both mocks and stubs are a form of a testable. So a mock is a testable and a stub is a testable and a testable. It's like a play on stunt double instead of using a real object or whatever it is, you have this fake object. And sometimes that can be used to like trick your program into behaving a certain way or it can be used to um, gain visibility into an area that you otherwise wouldn't have visibility into.

And kind of my main use case for mocks and stubs when I do use them, is that when you're testing a particular thing, You want to test the thing you're interested in testing. You don't want to have to involve all the dependencies of the thing you're testing. And so I will like stub out the dependencies.

So, okay. Here's an example. I have a rare usage of stubs in my, in my uh, test suite and dear listener. I'm going to use the word stub. Don't give too much credence to that. Maybe. I mean, mock, I don't remember. But anyway, I have this area where we determine a patient's eligibility to get a certain kind of medicine and there's a ton that goes into it and there's all these, like, there's, there's these four different, like coarse-grained determinations and they all have to be a yes in order for it to overall be a yes.

That they can get this medicine. It has to do with mostly insurance. And then each one of those four core course grain determinations has some number of fine grain determinations that determines whether it is a yes or a no. If I weren't using mocks and stubs in these tests, then in order to test one determination, I would have to set up the conditions.

This goes back to the setup, work stuff we talked about. I'd have to set up all the conditions for the medicine to be a yes. In addition to, to the thing I'm actually interested in. And so that's a waste because that stuff is all irrelevant to my current concern. Let me try to speak a little bit more concretely.

So let's say I have determinations ABC. When I'm interested in determination, a I don't want to have to do all the setup work for determinations, B, C, and D. And so what I'll do is I'll mock the determinations for B, C and D. And I'll say for B, just have the function returned true for C same thing, just return true for D return.

True. So it'd like short circuits, all that stuff and bypasses the actual logic that gives me the yes, no determination. And it just always gives me a yes. That way. There's no setup work for B, C, and D. And I can focus only on.

[00:32:48] Jeremy: And I think it may be hard to say in this example, but would you, would you still have at least one test that would run through and do all the setup, do the checks for ABC and D and then when you're doing more specific things start to put in doubles for the others, or would you actually just never have a full test that actually did the complete setup?

[00:33:14] Jason: well, here's how I'm doing this one. I described the scenario where I'm like thoroughly testing a under many different conditions, but stubbing out B, C and D. They don't have another set of tests where I thoroughly test B and stub out a C and D. And so on. I have one thorough set for, for each of those. If you're asking whether I have one that like exercises, all four of them, No.

I just have ones for each of the four individually, which is maybe kind of a trade off. Cause it's arguable that I don't have complete confidence because I'm never testing the four together. But in the like trade off of like setup?

work and all that, that's necessary to get that complete con confidence and the value of that, like additional, because really it's just like a tiny bit of additional con confidence that I would get from testing all those things together.

In that particular case, my judgment was that that was not worth

[00:34:19] Jeremy: yeah. Cause I was thinking from their perspective of sometimes I hear that people will have a acceptance test that covers sometimes you hear people call it the happy path, right. Where they everything lines up. It's like a very straightforward case of a feature. But then all the different ways that you can test that feature, they don't necessarily write tests for those, but they just write one for the, the base case.

And then, like you said, you actually drill down into more specifics and maybe only test a, a smaller part there, but it sounds like in this case, maybe you made the decision that, Hey, doing a test, that's going to test all four of these things, even in the simplest case is going to involve so much setup and so much work that, that maybe it's not, not worth it in this case.

[00:35:13] Jason: Yeah. And I'd have to go back and refresh my memory as to like what exactly this scenario is for those tasks. Because in general, I'm a proponent of having integration tests that makes sure multiple things work together. Okay. You might've seen that Gif where it says like um, two unit tests, zero integration tests, and there's like a cabinet with two doors.

Each door can open on its own or, or maybe it's drawers. Each drawer can open on its own, but you can't open both drawers at the same time. And so I think that's not smart to have only unit tests and no integration tests. And so I don't remember exactly why I chose to do that eligibility test with the ABC and D the way I did.

Maybe it was just cost-prohibitive to do it altogether. Um, One thing that I want to want to comment on regarding mocks and stubs, there's a mistake that's made kind of frequently where people overdo it with mocks and stuff. They misunderstand the purpose. The purpose again is that you want to test the thing you're testing, not the dependencies of the thing.

But sometimes people step out the very thing they're testing. And so they'll like assert that a certain method will return such and such value, but they'll stub the method they're testing so that the method is guaranteed to return the same value and that doesn't actually test anything. So I just wanted to make, mention that as a common mistake to avoid

[00:36:47] Jeremy: I wonder if you could maybe give an example of when you, you have a certain feature and the thought process you're going through where you decide like, yes, this is the part that I should have a stub or a mock for. And this is the part where I definitely need to make sure I run the code.

[00:37:07] Jason: Well, again, it's very rare that I will use a mocker stub and it's not common that I'll even consider it for better or worse. Like we're talking about. The nature of rails tests is that we spin up actual database records and, and test our models with database data and stuff like that. In other ecosystems, maybe the testing culture is different and there's more mocks and stubs.

I know when I was doing some coding with angular, there was a lot more mocking and stubbing. But with rails, it's kind of like everything's available all the time and we use the database a lot during testing. And so mocks and stubs don't really come into the picture too much.

[00:37:56] Jeremy: Yeah. It's, it's interesting that you, you mentioned that because like I work with some projects that use C-sharp and asp.net, and you'll a lot of times you'll see people say like you should not be talking to the database in your tests. And you know, they go through all this work to probably the equivalent of a mock or a stub.

But then, you know, when I, when I think about that, then I go like, well, but I'm not really testing how the database is going to react. You know, are my, are my queries actually valid. Things like that, because all this stuff is, is just not being run. in some other communities, maybe they're they have different ideas, I guess, about, about how to run tests.

[00:38:44] Jason: Yeah, And it's always interesting to hear expressions. Like you should do this or you shouldn't do that, or it's good to do this. It's bad to do that. And I think maybe that's not quite the right way to think about it. It's more like, well, if I do this, what are the costs and benefits of doing this? Cause it's like, nothing exactly is a good thing to do or a bad thing to do.

It's just, if you do this, this will happen as a consequence. And if you don't this won't and all that stuff. So people who don't want to talk to the database in their tests, why is that? What, what are the bad things they think will happen if you do that? The drawbacks is it appears to me are it's slow to use the database in any performance problem.

Usually the culprit is the database. That's always the first thing I look at. And if you're involving the database and all of your tests, your tests are going to be much slower than if you don't use the database, but the costs of not talking to the database are exactly what you said, where you're like, you're not exercising your real application, you're missing an entire layer and maybe that's fine.

I've never tried approaching testing in that way. And I would love to like, get some experience like working with some people who do it that way. Cause I can't say that I know for an absolute fact that that doesn't work out. But to me it just makes sense to exercise everything that you're actually using when the app runs.

[00:40:18] Jeremy: what's challenging probably for a lot of people is that if you look online for how to do testing in lots of different frameworks, you'll get different answers. Right. And it's not clear what's gonna fit your situation right? And you know, to, to give an example of, we've been talking about how rails will it, it predominantly focuses on tests that, that talks to the database and it wraps everything in a transaction as we talked about before, so that you can reset the state and things like that.

I've also seen in other frameworks where they'll say like, oh, you can run a database test, but you use this in-memory version of the database instead of actually talking to a real MySQL or Postgres instance, or they'll say, oh, for this test we're going to use SQLite in place of the Postgres database you're actually using in production.

And it, it makes the, the setup, I suppose, easier. Um, And maybe it makes the tests run quicker, but then it's also no longer the same as what you're really running. So there's like a lot of different approaches that, that people describe and take. And I think it can be hard for, for people to know, like what, what makes sense for me.

[00:41:42] Jason: Yeah. And this is another area where I have to plead ignorance because again, I don't have experience doing it the other way. Logically, I feel like my way makes sense, but I don't have empirical experience doing it the other way.

[00:41:57] Jeremy: we've talked a little bit about how there's cases where you'll say I'm not going to do this thing because it's going to take a lot of time and I've weighed the benefits. And I wonder if you could give some examples of things where you spent a lot of time on something, and then in hindsight, you, you realize like this really wasn't worth it.

[00:42:18] Jason: I don't think I have any examples of that because I don't think it tends to happen very much. I really can't emphasize enough how old, the case where I choose not to write a test for something is like a one in 5,000 kind of thing. It's really not something I do frequently. The mistake is overwhelmingly in the opposite direction.

Like somebody may, maybe I will get lazy and I'll skip a test and then I'll realize, oh yeah, This is why I write tests because it actually makes everything easier. And uh, we get pain as as a consequence when we skip tests. So that's usually the mistake I make is not writing a test when I should, rather than writing a test when I should not have

[00:43:08] Jeremy: So since then, in general, you, you said that not writing it is, is the, the mistake. How do you get people in the habit of. Of writing the tests where they feel like it's not this thing that's slowing them down or is in the way, but is rather something that's helping them with that feedback loop and is something that they actively want to do.

[00:43:33] Jason: Yeah. So to me, it's all about a mindset. So there's a common perception that tests are something extra. Like I've heard stories about, like somebody gives a quote for a project and then the prospective client asks like, well, how much, if we skip tests, how much less would that be? And it's like, oh, it wouldn't be less.

It'd be like five times more because tests are a time saver. So I want to try to dispel with that notion. But even so it can be hard to bring oneself, to write task because it feels like something that takes discipline. But in my case, I don't feel like it takes discipline. Because I remind myself of a true fact that it's actually the lazy and easy way to code is to code using tests.

And it's the harder, more laborious way to write code. Not using tests because think about what's, what's the alternative to not writing tests. Like we said earlier, the alternative is to manually test everything. And that's just so painful, especially when it's some feature where like, I'm sure you have experience with this, Jeremy, you, you make a code change.

And then in order to verify that the thing still works, you have to go through like nine different steps in the browser. And only on that last step, do you get that answer you're after. That's just so painful. And if you write a test, you can automate that. Some things that might present friction in that process, or just like a lack of familiarity with how to write tests and maybe a um, a lack of an easy process for writing tests.

And just to briefly touch on that, I think something that can help reduce that. Is to write tests in the same way that I write code in feedback loops. So we talked about writing one line, checking, writing, another line, checking that kind of thing. I write my tests in the same way. First I'll write the shell of the test and then I'll run just the shell, even though it seems kind of dumb to just run the shell cause you know, it doesn't do anything. I do that just to demonstrate to myself that I didn't like make some typo or something like that. I'm starting from like a clean baseline. And then I'll write one line of my test. Maybe if I'm writing a system spec, I'll write a line that creates a user of rum that I know that nothing's going to happen when I run the test, but I'll run it just to see it run and make sure there's no errors.

And then I'll add a line that says, log the user in and then I'll run that. And so on just one line at a time. There's this principle that I think is really useful when working, which is to separate the deciding what to do from the actually doing it. I think a lot of developers mixed those two jobs of deciding what to do and doing it in the same step.

But if, if you separate those, so you'd like, decide what you're going to have your tests do. And then after that, so like maybe I'll open my test and I'll write in comments what I want to achieve, not in technical terms necessarily, but I'll just write a comment that says, create a user, right? Another comment that says, log in another comment that says, click on such and such.

And then once I have those, there, I'll go back to that first line and convert that to code. Okay. My comment that says, create a user, I'll change that to the syntax that actually creates a user and again, using the feedback loop. So I'll run that so that I can, you know, once I'm, once I'm done writing all those comments that say what the test does, I'm now free to forget about it.

And I don't have to hold that in my mental Ram anymore. And I can clear my mental RAM. Now all my mental RAM is available to bring, to bear on the task of converting my steps that I already decided into working syntax. If you try to do both those things at the same time, it's more than twice as hard. And so that's why I try to separate.

[00:48:04] Jeremy: So that's interesting. So it's like you're designing, I guess, the feature, what you want to build in the context of the test first it's would that be accurate?

[00:48:19] Jason: that certainly can be the case. So much of this is context dependent. I very regularly give my self permission to be undisciplined and to go on exploratory spikes. And so if I have like very, if I have a really vague idea about what shape a feature is going to take, I give myself permission to forget about tests and I just write some code and I feel cause there's two reasons to write code.

You know, a code is not only a work product code is also a thinking. so I would let go into a different mode, I'll say, okay, I'm not trying to create a work product right now. I'm just using code as a thinking medium, to figure out what I'm even going to do. So that's what I'll do in that case. And then maybe I'll write the test afterward, but if it's very clear, what the thing is that I'm going to write, then I'll often write the test first again, in those two phases of deciding what it's going to be and the deciding how it works.

And I won't do a thing where, where, like I write 10 test cases and then I go through one by one and write code to make them pass. Usually I'll write one test, make a pass, write a second test, make it pass and so on.

[00:49:38] Jeremy: okay. So the more exploratory aspect, I guess, would be when. You're either doing something that you haven't done before, or it's not clear to you what the features should be is, is that right?

[00:49:58] Jason: Yeah, like maybe it's a feature that involves a lot of details. There's like a lot of room for discretion. It could be implemented in more than one way. Like how would I write a test for that? If I don't even know what form it's going to take? Like there's decisions to be made, like, what is the, the route going to be that I visit for this feature?

What am I even going to call like this entity and that entity and stuff like that. And I think that goes back to my desire to not juggle and manage. Multiple jobs at the same time. I don't want to, I don't want to overly mix the design job with the testing job. Cause testing can help with design, but design in like a code structure sense.

I usually don't want to mix testing with like UI design and not even UI design, like, like design in the highest sense. Meaning like what even is this thing? How does it work? Big picture wise and stuff like that. That's not the kind of design that testing helps with in my mind of the kind of design that testing helps with again, is the code structure.

So I want to have my big picture design out of the way before I start writing my test.

[00:51:21] Jeremy: and in terms of the big picture design, is that something that you keep all in your head or are you writing that down somewhere? I'm just wondering what your process is.

[00:51:34] Jason: Yeah, it can work a number of different ways in the past. I've done usability testing where I will do some uh, pen and paper prototypes and then do some usability testing with, with users. And then I will um, convert those pen and paper prototypes to something on the computer. The idea being pen and paper prototypes are the cheapest to create and change.

And then the more you cement it, the more expensive it gets to change. So only once I'm fairly certain that the pen and paper prototypes are right. Will I put it into something that's more of a formal mock. And then once I have my formal mock-up and that's been through whatever scrutiny I want to put it through, then I will do the even more expensive step of implementing that as a working feature.

Now having said all that, I very rarely do I go through all that ceremony. Sometimes a feature, usually a feature is sufficiently small, that all that stuff would be silly to do. So sometimes I'll start straight with the the mock-up on the computer and then I'll work off of that. Sometimes it's small enough that I'll just make a few notes in a note-taking program and then work off of that.

What is usually true is that our tickets in our ticketing system have a bulleted list of acceptance criteria. So we want to make it very black and white. Very yes, no. Whether a particular thing is done and that's super helpful because again, it goes back to the mixing of jobs and separating of jobs.

If we've decided in advance that this feature needs to do these four things. And if it does those four things it's done and it doesn't need to do anything more and if it doesn't meet those four criteria, then it's not done then building the thing is just a matter of following the instructions. Very little thinking is involved.

[00:53:45] Jeremy: depending on the scope of the feature, depending on how much information you have uh, you could either do something elaborate, I suppose, where, you know, you were talking about doing prototypes or sketches and, and so on before you even look at code or there could be something that's not quite that complicated where you have an idea of what it is and you might even play with code a little bit to get a sense of where it should go and how it should work.

But it's all sort of in service of getting to the point where you know enough about how you're going to do the implementation and you know enough about what the actual feature is to where you're comfortable starting to write steps in the test about like, these are the things that are going to happen.

[00:54:35] Jason: Yeah. And another key thing that might not be obvious is that all these things are small. So I never work well, I shouldn't say never, but in general, I, don't work in a feature. That's going to be like a week long feature or something like that. We try to break them down into features that are at most like half.

And so that makes all that stuff a lot easier. Like I use the number four as an example of how many acceptance criteria there might be. And that's a pretty representative example. We don't have tickets where there's 16 acceptance criteria because the bigger something is the more opportunity there is for the conceive design to turn out, not to be viable.

And the more decisions that can't be made, because you don't know the later step until the earlier decision is made and all that kind of stuff. So the small size of everything helps a lot.

[00:55:36] Jeremy: but I, I would imagine if you're breaking things into that small of a piece, then would there be parts that. You build and you tasked and you deploy, but to the user, they actually don't see anything. Is that the appraoch?

[00:55:52] Jason: definitely, we use feature flags. Like for example, there's this feature we're working on right now, where we have a page where you can see a long list of items. The items are of several different types right now. You just see all of them all the time, but depending on who you are and what your role is in the organization, you're not going to be interested in all those things.

And so we want people to be able to have check boxes for each of those types to show or hide those things. Whereas checkbox feature is actually really big and difficult to add. And so the first thing that I chose to do was to have us add just one single check box for one type. And even that one, single checkbox is sufficiently hard that we're not even giving people that yet.

We coded it so that you get the check boxes and that one checkbox is selected by default. When you uncheck it, the thing goes away, but it's selected by default so that we can feature flag that. So the checkbox UI is hidden. Everything looks just the way it did before. And now we can wait until this feature is totally done before we actually surface it to users.

So it's the idea of making a distinction between deployment and release. Cause if we try to do this whole big thing, it's, it's gonna take weeks. If we try to do the whole thing, that's just too much risk for something to go wrong. And then like, we're going to deploy like three weeks of work at once.

That's like asking for trouble. So I'm a huge fan of feature flags.

[00:57:35] Jeremy: Interesting. So it's like the, it's almost like the foundation of the feature is going in. And if you were to show it to the user well, I guess in this case, it actually did have a function right at you. You could filter by that one category.

[00:57:52] Jason: oh, I was just going to say you're exactly right. It wouldn't be a particularly impressive or useful feature, but what we have is complete it's it's not finished, but it is complete.

[00:58:06] Jeremy: I'm not sure if you have any examples of this, but I imagine that there are changes that are large enough that I'm not sure how you would split it up until you, you mentioned like half a days worth of time. And I, I wonder if either have examples of features like that or a general sense of how, what do you do if you, you can't figure out a way to split it up that small.

[00:58:34] Jason: I have yet to encounter a feature that we haven't been able to break up into pieces that are that small. So, unfortunately, I can't really say anything more than that because I just don't have any examples of exceptions

[00:58:49] Jeremy: For, for people listening, maybe that should be a goal at least like, see if you can make everything smaller, see if you can ship as little as possible, you know, maybe you don't hit that half a day mark, but at least give it a, give it a try and see what you can do.

[00:59:10] Jason: yeah. And the way I care would characterize it, maybe wouldn't be to ship as little as possible at a time, but to give a certain limit that you try not to go over. And it's, it's a skill that I think can be improved with practice. You learn certain techniques that you can use over and over. Like for example, one way that I split things up sometimes is we will add the database tables in one chunk. And we'll just deploy that, cause that presents a certain amount of risk, you know, when you're adding database tables or columns or anything like that, like it's always risky when you're messing with the structure of the database. So I like to do just that by itself. And it's kind of tidy most of the time because because it's not something that's like naturally visible to the user is just a structural change.

So that's an example of the kind of thing that you learn as you gain practice, breaking bigger things up into smaller pieces.

[01:00:16] Jeremy: so, and, and that example, in terms of whatever issue tracking system you use, what, what would you call that? Would you just call that setting up schema for X future features, or I'm just kinda curious how you characterize that.

[01:00:35] Jason: yeah, something like that. Those particular tickets don't have great names because ideally each ticket has some amount of value that's visible to the user and that one totally doesn't, it's a purely nuts and bolts kind of thing. So that's just a case where the name's not going to be great, but what's the alternative can't think of anything better. So we do it like that.

[01:01:02] Jeremy: you feel like that's, that's lower risk shipping something that's not user-facing first. Then it is to wait until you have at least like one small thing that, you know, is connected to that change.

[01:01:19] Jason: Yeah. I had a boss in the past who had a certain conception of the reason to do deployments. And, and her belief was that the reason that you deploy is to deliver value to the user which is of course true, but there's another really good reason to deploy, which is to mitigate risk. The further production and development are able to diverge from one another, the greater, the risk.

When you do a deployment. I remember one particular time at that job, I was made to deploy like three months of work at once and it was a disaster and I got the blame because I was the one who did the work. And quite frankly, I was really resentful that that had. And that's part of what informs my preference for deploying small amounts of work at a time.

I think it's best if things can be deployed serially, like rather than deploying in patches, just finish one thing, deploy it, verify it, finish the next thing, deploy it, verify it. I have the saying that it's better to be a hundred percent done with half your work than halfway done with a hundred percent of your work. For, for the hopefully obvious reason that like, if, if you have 15 things that are each halfway in progress, now you have to juggle 15 balls in your head. Whereas, if you have 15 things you have to do, and then you finish seven of them, then you can completely forget about those seven things that you finished and deployed and verified and all that.

And your mental bandwidth is freed up just to focus on the remaining work.

[01:03:10] Jeremy: yeah, that, that makes sense. And, and also if you are putting things out bit by bit, And something goes wrong, then at least it's not all 15 things you have to figure out, which was it. It's just the last thing he pushed out.

[01:03:26] Jason: Exactly. Yeah. It's never fun when you deploy a big delta and something goes wrong and it's a mystery. What introduced the problem? It's obviously never good if you deploy something that turns out to be a problem, but if you deployed just one thing and something goes wrong, at least you can. Roll it back or at the very least have a pretty decent idea of where the problem lies. So you can address it quickly.

[01:03:56] Jeremy: for sure. Well I think that's probably a good place to leave it off on, but is there anything else about testing or just software in general that you, you thought we should've brought up?

[01:04:09] Jason: Well, maybe if I can leave the listener with one thing um, I want to emphasize the importance of programming and feedback loops. It was a real eye-opener for me when I was interviewing these candidates to notice the distinct difference between programmers, who didn't program and feedback loops and programmers, who do I have a post about it?

I'm just, it's just called how to program and feedback loops. I believe if anybody's interested in the details. Cause I have like. It's like seven steps to that feedback loop. First, you write a line of code, then you do this. I don't remember all seven steps off the top of my head, but it's all there in the blog post.

Anyway, if I could give just one piece of advice to anybody who's getting into programming, it's a program in feedback loops.

[01:05:00] Jeremy: yeah, I think that's been the, the common thread, I suppose, throughout this conversation is that whether it's. Writing the features you want them to be as small as possible. So you get that feedback of it being done. And like you said, taking it off of your plate. Then there's the being able to have the tests there as you write the features so that you get that immediate feedback, that this is not doing what the test says it should be doing.

So yeah, it makes it, it makes a lot of sense that basically in everything we do try to get to a point where we get a thumbs up, we get at, this is complete. The faster we can do that, the better we'll we'll all be off. Right.

[01:05:46] Jason: exactly. Exactly.

[01:05:50] Jeremy: if people want to check out your book, check out your podcast, I think you even have a, a conference coming up, right? Uh, where, w where can they learn about that.

[01:06:02] Jason: So the hub for everything is code with jason.com. So that's where I always. Send people, you can find my blog, my podcast, my book there. And yeah, my conference it's called sin city ruby. It's a Ruby conference. This will only be applicable dear listener, if you're listening before March 24th, 2022. But yeah, it's, it's happening in Las Vegas.

It's going to be just a small intimate conference and it's a whole different story, but I kind of put on this conference accidentally. I didn't intend to do a conference. I just kind of uh, stumbled into it, but I think it will be a lot of fun. But yeah, that's, that's another thing that I have going on.

[01:06:49] Jeremy: What, what was it that I guess. Got you into deciding this is, this is what I want to do. I want to make a conference.

[01:06:58] Jason: Well, it started off as I was going to put on a class, but then nobody bought a ticket. And so I had to pivot. And so I'm like, okay, I didn't sell any tickets to this class. Maybe I can sell some tickets to a conference. And luckily for me, it turns out I was right because I was financially obligated to a hotel where I had reserved space for the class.

So I couldn't just cancel it. I had to move forward somehow. So that's where the conference came.

[01:07:28] Jeremy: interesting. yeah, I'm, I'm always kind of curious. How people decide what they want to attend, I guess, like, you know, you said how you didn't get enough signups for your class, but you get signups for a conference. And you know, the people who are signing up and want to go, I wonder to to them, what is, what is it about the going to a conference that is so much more appealing than, than going to a class?

[01:07:54] Jason: Oh, well, I think in order to go to a class, the topic has to be of interest to you. You have to be in like a specific time and place. The price point for that kind of thing is usually much higher than for, for a conference. Whereas with a conference it's affordable to individuals, you don't have to get your boss's permission necessarily, at least not for the money. It's more of like a, you don't have to be a specific kind of person in a specific scenario in order to benefit from it. It's a much more general interest. So that's why I think I've had an easier time selling tickets to that.

[01:08:31] Jeremy: Mm, mm. Yeah, it's, it's more of a I wanna get into a room with a bunch of people and just learn a bunch of cool stuff and not necessarily have a specific specific thing you're looking to get out of it, I guess.

[01:08:46] Jason: Yeah. There's no specific outcome or anything like that. Honestly, it's mostly just to have a good time. That's the main thing I'm hoping to get out of it. And I think that is the main draw for people they want to, they want to see their friends in the Ruby community form relationships and stuff like that.

[01:09:07] Jeremy: Very cool. Jason good luck with the conference and thank you so much for coming on software software sessions.

[01:09:13] Jason: Thanks a lot. And uh, thanks for having me.

Taking Notes on Serverless

Wed, 10 Nov 2021 06:46:37 -0000

Swizec is the author of the Serverless Handbook and a software engineer at Tia.

Swizec

AWS

Lambda
API Gateway
Operating Lambda (The cold start problem)
Provisioned Concurrency
DynamoDB
Relational Database Service
Aurora
Simple Queue Service
CloudFormation
CloudWatch

Other serverless function hosting providers

Related topics

Transcript

You can help edit this transcript on GitHub.

[00:00:00] Jeremy: Today, I'm talking to Swiz Teller. He's a senior software engineer at Tia. The author of the serverless handbook and he's also got a bunch of other courses and I don't know is it thousands of blog posts now you have a lot of them.

[00:00:13] Swizec: It is actually thousands of, uh, it's like 1500. So I don't know if that's exactly thousands, but it's over a thousand.

I'm cheating a little bit. Cause I started in high school back when blogs were still considered social media and then I just kind of kept going on the same domain.

Do you have some kind of process where you're, you're always thinking of what to write next? Or are you writing things down while you're working at your job? Things like that. I'm just curious how you come up with that.

[00:00:41] Swizec: So I'm one of those people who likes to use writing as a way to process things and to learn. So one of the best ways I found to learn something new is to kind of learn it and then figure out how to explain it to other people and through explaining it, you really, you really spot, oh shit. I don't actually understand that part at all, because if I understood it, I would be able to explain it.

And it's also really good as a reference for later. So some, one of my favorite things to do is to spot a problem at work and be like, oh, Hey, this is similar to that side project. I did once for a weekend experiment I did, and I wrote about it so we can kind of crib off of my method and now use it. So we don't have to figure things out from scratch.

And part of it is like you said, that just always thinking about what I can write next. I like to keep a schedule. So I keep myself to posting two articles per week. It used to be every day, but I got too busy for that. when you have that schedule and, you know, okay on Tuesday morning, I'm going to sit down and I have an hour or two hours to write, whatever is on top of mind, you kind of start spotting more and more of these opportunities where it's like a coworker asked me something and I explained it in a slack thread and it, we had an hour. Maybe not an hour, but half an hour of back and forth. And you actually just wrote like three or 400 words to explain something. If you take those 400 words and just polish them up a little bit, or rephrase them a different way so that they're easier to understand for somebody who is not your coworker, Hey, that's a blog post and you can post it on your blog and it might help others.

[00:02:29] Jeremy: It sounds like taking the conversations most people have in their day to day. And writing that down in a more formal way.

[00:02:37] Swizec: Yeah. not even maybe in a more formal way, but more, more about in a way that a broader audience can appreciate. if it's, I'm super gnarly, detailed, deep in our infrastructure in our stack, I would have to explain so much of the stuff around it for anyone to even understand that it's useless, but you often get these nuggets where, oh, this is actually a really good insight that I can share with others and then others can learn from it. I can learn from it.

[00:03:09] Jeremy: What's the most accessible way or the way that I can share this information with the most people who don't have all this context that I have from working in this place.

[00:03:21] Swizec: Exactly. And then the power move, if you're a bit of an asshole is to, instead of answering your coworkers question is to think about the answer, write a blog post and then share the link with them.

I think that's pushing it a little bit.

[00:03:38] Jeremy: Yeah, It's like you're being helpful, but it also feels a little bit passive aggressive.

[00:03:44] Swizec: Exactly. Although that's a really good way to write documentation. One thing I've noticed at work is if people keep asking me the same questions, I try to stop writing my replies in slack and instead put it on confluence or whatever internal wiki that we have, and then share that link. and that has always been super appreciated by everyone.

[00:04:09] Jeremy: I think it's easy to, have that reply in slack and, and solve that problem right then. But when you're creating these Wiki pages or these documents, how're people generally finding these. Cause I know you can go through all this trouble to make this document. And then people just don't know to look or where to go.

[00:04:30] Swizec: Yeah. Discoverability is a really big problem, especially what happens with a lot of internal documentation is that it's kind of this wasteland of good ideas that doesn't get updated and nobody maintains. So people stop even looking at it. And then if you've stopped looking at it before, stop updating it, people stop contributing and it kind of just falls apart.

And the other problem that often happens is that you start writing this documentation in a vacuum. So there's no audience for it, so it's not help. So it's not helpful. That's why I like the slack first approach where you first answered the question is. And now, you know exactly what you're answering and exactly who the audiences.

And then you can even just copy paste from slack, put it in a conf in JIRA board or wherever you put these things. spice it up a little, maybe effect some punctuation. And then next time when somebody asks you the same question, you can be like, oh, Hey, I remember where that is. Go find the link and share it with them and kind of also trains people to start looking at the wiki.

I don't know, maybe it's just the way my brain works, but I'm really bad at remembering information, but I'm really good at remembering how to find it. Like my brain works like a huge reference network and it's very easy for me to remember, oh, I wrote that down and it's over there even if I don't remember the answer, I almost always remember where I wrote it down if I wrote it down, whereas in slack it just kind of gets lost.

[00:06:07] Jeremy: Do you also take more informal notes? Like, do you have notes locally? You look through or something? That's not a straight up Wiki.

[00:06:15] Swizec: I'm actually really bad at that. I, one of the things I do is that when I'm coding, I write down. so I have almost like an engineering log book where everything, I, almost everything I think about, uh, problems I'm working on. I'm always writing them down on by hand, on a piece of paper. And then I never look at those notes again.

And it's almost like it helps me think it helps me organize my thoughts.

And I find that I'm really bad at actually referencing my notes and reading them later because, and this again is probably a quirk of my brain, but I've always been like this. Once I write it down, I rarely have to look at it again.

But if I don't write it down, I immediately forget what it is.

What I do really like doing is writing down SOPs. So if I notice that I keep doing something repeatedly, I write a, uh, standard operating procedure. For my personal life and for work as well, I have a huge, oh, it's not that huge, but I have a repository of standard procedures where, okay, I need to do X.

So you pull up the right recipe and you just follow the recipe. And if you spot a bug in the recipe, you fix the recipe. And then once you have that polished, it's really easy to turn that into an automated process that can do it for you, or even outsource it to somebody else who can work. So we did, you don't have to keep doing the same stuff and figuring out, figuring it out from scratch every time.

[00:07:55] Jeremy: And these standard operating procedures, they sound a little bit like runbooks I guess.

[00:08:01] Swizec: Yep. Run books or I think in DevOps, I think the big red book or the red binder where you take it out and you're like, we're having this emergency, this alert is firing. Here are the next steps of what we have to check.

[00:08:15] Jeremy: So for those kinds of things, those are more for incidents and things like that. But in your case, it sounds like it's more, uh, I need to get started with the next JS project, or I need to set up a Postgres database things like that.

[00:08:30] Swizec: Yeah. Or I need to reset a user to initial states for testing or create a new user. That's sort of thing.

[00:08:39] Jeremy: These probably aren't in that handwritten log book.

[00:08:44] Swizec: The wiki. That's also really good way to share them with new engineers who are coming on to the team.

[00:08:50] Jeremy: Is it where you just basically dump them all on one page or is it where you, you organize them somehow so that people know that this is where, where they need to go.

[00:09:00] Swizec: I like to keep a pretty flat structure because, I think the, the idea of categorization outlived its prime. We have really good search algorithms now and really good fuzzy searching. So it's almost easier if everything is just dumped and it's designed to be easy to search. a really interesting anecdote from, I think they were they were professors at some school and they realized that they try to organize everything into four files and folders.

And they're trying to explain this to their younger students, people who are in their early twenties and the young students just couldn't understand. Why would you put anything in a folder? Like what is a folder? What is why? You just dump everything on your desktop and then command F and you find it. Why would you, why would you even worry about what the file name is? Where the file is? Like, who cares? It's there somewhere.

[00:09:58] Jeremy: Yeah, I think I saw the same article. I think it was on the verge, right?

I mean, I think that's that's right, because when you're using, say a Mac and you don't go look for the application or the document you want to run a lot of times you open up spotlight and just type it and it comes up.

Though, I think what's also sort of interesting is, uh, at least in the note taking space, there's a lot of people who like setting up things like tags and things like that. And in a way that feels a lot like folders, I guess

[00:10:35] Swizec: Yeah. The difference between tags and categories is that the same file can have multiple tags and it cannot be in multiple folders. So that's why categorization systems usually fall apart. You mentioned note taking systems and my opinion on those has always been that it's very easy to fall into the trap of feeling productive because you are working on your note or productivity system, but you're not actually achieving anything.

You're just creating work for work sake. I try to keep everything as simple as possible and kind of avoid the overhead.

[00:11:15] Jeremy: People can definitely spend hours upon hours curating what's my note taking system going to be, the same way that you can try to set up your blog for two weeks and not write any articles.

[00:11:31] Swizec: Yeah. exactly.

[00:11:32] Jeremy: When I take notes, a lot of times I'll just create a new note in apple notes or in a markdown file and I'll just write stuff, but it ends up being very similar to what you described with your, your log book in that, like, because it's, it's not really organized in any way. Um, it can be tricky to go back and actually, find useful information though, Though, I suppose the main difference though, is that when it is digital, uh, sometimes if I search for a specific, uh, software application or a specific tool, then at least I can find, um, those bits there

[00:12:12] Swizec: Yeah. That's true. the other approach I'd like to use is called the good shit stays. So if I can't remember it, it probably wasn't important enough. And you can, especially these days with the internet, when it comes to details and facts, you can always find them. I find that it's pretty easy to find facts as long as you can remember some sort of reference to it.

[00:12:38] Jeremy: You can find specific errors or like you say specific facts, but I think if you haven't been working with a specific technology or in a specific domain for a certain amount of time, you, it, it can be hard to, to find like the right thing to look for, or to even know if the solution you're looking at is, is the right one.

[00:13:07] Swizec: That is very true. Yeah. Yeah, I don't really have a solution for that one other than relearn it again. And it's usually faster the second time. But if you had notes, you would still have to reread the notes. Anyway, I guess that's a little faster, cause it's customized to you personally.

[00:13:26] Jeremy: Where it's helpful is that sometimes when you're looking online, you have to jump through a bunch of different sites to kind of get all the information together. And by that time you've, you've lost your flow a little bit, or you you've lost, kind of what you were working on, uh, to begin with. Yeah.

[00:13:45] Swizec: Yeah. That definitely happens.

[00:13:47] Jeremy: Next I'd like to talk about the serverless handbook. Something that you've talked about publicly a little bit is that when you try to work on something, you don't think it's a great idea to just go look at a bunch of blog posts. Um, you think it's better to, to go to a book or some kind of more, uh, I don't know what you would call it like larger or authoritative resource. And I wonder what the process was for, for you. Like when you decided I'm going to go learn how to do serverless you know, what was your process for doing that?

[00:14:23] Swizec: Yeah. When I started learning serverless, I noticed that maybe I just wasn't good at finding them. That's one thing I've noticed with Google is that when you're jumping into a new technical. It's often hard to find stuff because you don't really know what you're searching for. And Google also likes to tune the algorithms to you personally a little bit.

So it can be hard to find what you want if you are, if you haven't been in that space. So I couldn't really find a lot of good resources, uh, which resulted in me doing a lot of exploration, essentially from scratch or piecing together different blogs and scraps of information here and there. I know that I spend ridiculous amounts of time in even as deep as GitHub issues on closed issues that came up in Google and answer something or figure, or people were figuring out how something works and then kind of piecing all of that together and doing a lot of kind of manual banging my head against the wall until the wall broke.

And I got through. I decided after all of that, that I really liked serverless as a technology. And I really think it's the future of how backend systems are going to be built. I think it's unclear yet. What kind of systems is appropriate for and what kind of kind of systems it isn't.

It does have pros and cons. it does resolve a lot of the very annoying parts of building a modern website or building upon backend go away when you go serverless. So I figured I really liked this and I've learned a lot trying to piece it together over a couple of years.

And if combined, I felt like I was able to do that because I had previous experience with building full stack websites, building full stack apps and understanding how backends work in general. So it wasn't like, oh, How do I do this from scratch? It was more okay. I know how this is supposed to work in theory.

And I understand the principles. What are the new things that I have to add to that to figure out serverless? So I wrote the serverless handbook basically as a, as a reference or as a resource that I wish I had when I started learning this stuff. It gives you a lot of the background of just how backends work in general, how databases connect, what different databases are, how they're, how they work.

Then I talked some, some about distributed systems because that comes up surprisingly quickly when you're going with serverless approaches, because everything is a lot more distributed. And it talks about infrastructure as code because that kind of simplifies a lot of the, they have opposite parts of the process and then talks about how you can piece it together in the ends to get a full product. and I approached it from the perspective of, I didn't want to write a tutorial that teaches you how to do something specific from start to finish, because I personally don't find those to be super useful. Um, they're great for getting started. They're great for building stuff. If you're building something, that's exactly the same as the tutorial you found.

But they don't help you really understand how it works. It's kind of like if you just learn how to cook risotto, you know how to cook risotto, but nobody told you that, Hey, you actually, now that you know how to cook risotto, you also know how to just make rice and peas. It's pretty much the same process.

Uh, and if you don't have that understanding, it's very hard to then transition between technologies and it's hard to apply them to your specific situation. So I try to avoid that and write more from the perspective. How I can give somebody who knows JavaScript who's a front end engineer, or just a JavaScript developer, how I can give them enough to really understand how serverless and backends works and be able to apply those approaches to any project.

[00:18:29] Jeremy: When people hear serverless, a lot of times they're not really sure what that actually means. I think a lot of times people think about Lambdas, they think about functions as a service. but I wonder to you what does serverless mean?

[00:18:45] Swizec: It's not that there's no server, there's almost always some server somewhere. There has to be a machine that actually runs your code. The idea of serverless is that the machine and the system that handles that stuff is trans is invisible to you. You're offloading all of the dev ops work to somebody else so that you can full focus on the business problems that you're trying to solve.

You can focus on the stuff that is specific and unique to your situation because, you know, there's a million different ways to set up a server that runs on a machine somewhere and answers, a, API requests with adjacent. And some people have done that. Thousands of times, new people, new folks have probably never done it.

And honestly, it's really boring, very brittle and kind of annoying, frustrating work that I personally never liked. So with serverless, you can kind of hand that off to a whole team of engineers at AWS or at Google or, whatever other providers there are, and they can deal with that stuff. And you can, you can work on the level of, I have this JavaScript function.

I want this JavaScript function to run when somebody hits this URL and that's it. That's all, that's essentially all you have to think about. So that's what serverless means to me. It's essentially a cloud functions, I guess.

[00:20:12] Jeremy: I mean, there been services like Heroku, for example, that, that have let people make rails apps or Django apps and things like that, where the user doesn't really have to think about the operating system, um, or about creating databases and things like that. And I wonder, to you, if, if that is serverless or if that's something different and, and what the difference there might be.

[00:20:37] Swizec: I think of that as an intermediary step between on prem or handling your own servers and full serverless, because you still have to think about provisioning. You still have to think of your server as a whole blob or a whole glob of things that runs together and runs somewhere and lives or lifts somewhere.

You have to provision capacity. You have to still think about how many servers you have on Heroku. They're called dynos. you still have to deal with the routing. You have to deal with connecting it to the database. Uh, you always have to think about that a little bit, but you're, you're still dealing with a lot of the frameworky stuff where you have to, okay, I'm going to declare a route. And then once I've declared the route, I'm going to tell it how to take data from the, from the request, put it to the function. That's actually doing the work. And then you're still dealing with all of that. Whereas with full serverless, first of all, it can scale down to zero, which is really useful.

If you don't have a lot of traffic, you can have, you're not paying anything unless somebody is actually using your app. The other thing is that you don't deal with any of the routing or any of that. You're just saying, I want this URL to exist, and I want it to run that function, that you don't deal with anything more than that.

And then you just write, the actual function that's doing the work. So it ends up being as a normal jobs function that accepts a request as an argument and returns a JSON response, or even just a JSON object and the serverless machinery handles everything else, which I personally find a lot easier. And you don't have to have these, what I call JSON bureaucracy, where you're piping an object through a bunch of different functions to get from the request to the actual part that's doing the work. You're just doing the core interesting work.

[00:22:40] Jeremy: Sort of sounds like one of the big distinctions is with something like Heroku or something similar. You may not have a server, but you have the dyno, which is basically a server. You have something that is consistently running,

Whereas with what you consider to be serverless, it's, it's something that basically only launches on when it's invoked. Um, whether that's a API call or, or something else. The, the routing thing is a little bit interesting because the, when I was going through the course, there are still the routes that you write. It's just that you're telling, I guess the API gateway Amazon's API gateway, how to route to your functions, which was very similar to how to route to a controller action or something like that in other languages.

[00:23:37] Swizec: Yeah. I think that part is actually is pretty similar where, I think it kind of depends on what kind of framework you end up building. Yeah, it can be very simple. I know with rails, it's relatively simple to define a new route. I think you have to touch three or four different files. I've also worked in large express apps where.

Hooking up the controller with all of the swagger definitions or open API definitions, and everything else ends up being like six or seven different files that have to have functions that are named just right. And you have to copy paste it around. And I, I find that to be kind of a waste of effort, with the serverless framework.

What I like is you have this YAML file and you say, this route is handled by this function. And then the rest happens on its own with next JS or with Gatsby functions, Gatsby cloud functions. They've gone even a step further, which I really like. You have the slash API directory in your project and you just pop a file in there.

And whatever that file is named, that becomes your API route and you don't even have to configure anything. You're just, in both of them, if you put a JavaScript file in slash API called hello, That exports, a handler function that is automatically a route and everything else happens behind the scenes.

[00:25:05] Jeremy: So that that's more of a matter of the framework you're using and how easy does it make it to, to handle routing? Whether that's a pain or a not.

[00:25:15] Swizec: Yeah. and I think with the serverless frameworks, it's because serverless itself, as a concept makes it easier to set this up. We've been able to have these modern frameworks with really good developer experience Gatsby now with how did they have Gatsby cloud and NextJS with Vercel and I think Netlify is working on it as well.

They can have this really good integration between really tight coupling and tight integration between a web framework and the deployment environment, because serverless is enabling them to spin that up. So easily.

[00:25:53] Jeremy: One of the things about your courses, this isn't the only thing you focus on, but one of the use cases is basically replacing a traditional server rendered application or a traditional rails, django, spring application, where you've got Amazon's API gateway in front, which is serving as the load balancer.

And then you have your Lambda functions, which are basically what would be a controller action in a lot of frameworks. and then you're hooking it up to a database which could be Amazon. It could be any database, I suppose. And I wonder in your experience having worked with serverless at your job or in side projects, whether that's like something you would use as a default or whether serverless is more for background jobs and things like that.

[00:26:51] Swizec: I think the underlying hidden question you're asking is about cold starts and API, and the response times, is one of the concerns that people have with serverless is that if your app is not used a lot, your servers scale down to zero. So then when somebody new comes on, it can take a really long time to respond.

And they're going to bail and be upset with you. One way that I've solved, that is using kind of a more JAM Stacky approach. I feel like that buzzword is still kind of in flux, but the idea is that the actual app front-end app, the client app is running off of CDNs and doesn't even touch your servers.

So that first load is of the entire app and of the entire client system is really fast because it comes from a CDN that's running somewhere as close as possible to the user. And it's only the actual APIs are hitting your server. So in the, for example, if you have something like a blog, you can, most blogs are pretty static.

Most of the content is very static. I use that on my blog as well. you can pre-render that when you're deploying the project. So you, you kind of, pre-render everything that's static when you deploy. And then it becomes just static files that are served from the CDN. So you get the initial article. I think if you, I haven't tested in a while, but I think if you load one of my articles on swizec.com, it's readable, like on lighthouse reports, if you look at the lighthouse where it gives you the series of screenshots, the first screenshot is already fully readable.

I think that means it's probably under 30 or 40 milliseconds to get the content and start reading, but then, then it rehydrates and becomes a react app. and then when it's a react app, it can make for their API calls to the backend. So usually on user interaction, like if you have upvotes or comments or something like that, Only when the user clicks something, you then make an API call to your server, and that then calls a Lambda or Gatsby function or a Netlify cloud function, or even a Firebase function, which then then wakes up and talks to the database and does things, and usually people are a lot more forgiving of that one taking 50 milliseconds to respond instead of 10 milliseconds, but, you know, 50 milliseconds is still pretty good.

And I think there were recently some experiments shared where they were comparing cold start times. And if you write your, uh, cloud functions in JavaScript, the average cold startup time is something like a hundred milliseconds. And a big part of that is because you're not wrapping this entire framework, like express or rails into your function. It's just a small function. So the server only has to load up something like, I don't know. I think my biggest cloud functions have been maybe 10 kilobytes with all of the dependencies and everything bundled in, and that's pretty fast for a server to, to load run, start no JS and start serving your request.

It's way fast enough. And then if you need even more speed, you can go to rust or go, which are even faster. As long as you avoid the java, .net, C-sharp those kinds of things. It's usually fine.

[00:30:36] Jeremy: One of the reasons I was curious is because I was going through the rest example you've got, where it's basically going through Amazon's API gateway, um, goes to a Lambda function written in JavaScript, and then talks to dynamoDB gives you a record back or creates a record and, I, I found that just making those calls, making a few calls, hopefully to account for the cold start I getting response times of maybe 150 to 250 milliseconds, which is not terrible, but, it's also not what I would call fast either.

So I was just kind of curious, when you have a real app, like, are, are there things that you've come across where Lambda maybe might have some issues or at least there's tricks you need to do to, to work around them?

[00:31:27] Swizec: Yeah. So the big problem there is, as soon as a database is involved, that tends to get. Especially if that database is not co-located with your Lambda. So it's usually, or when I've experimented, it was a really bad idea to go from a Vercel API function, talk to dynamo DB in AWS that goes over the open internet.

And it becomes really slow very quickly. at my previous job, I experimented with serverless and connecting it to RDS. If you have RDS in a separate private network, then RDS is that they, the Postgres database service they have, if that's running in a separate private network, then your functions, it immediately adds 200 or 300 milliseconds to your response times.

If you keep them together, it usually works a lot faster. ANd then there are ways to keeping them. Pre-warned usually it doesn't work as well as you would want. There are ways on AWS to, I forget what it's called right now, but they have now what's, some, some sort of automatic rewarming, if you really need response times that are smaller than a hundred, 200 milliseconds.

But yeah, it mostly depends on what you're doing. As soon as you're making API calls or database calls. You're essentially talking to a different server that is going to be slower on a lambda then it is if you have a packaged pserver, that's running the database and the server itself on the same machine.

[00:33:11] Jeremy: And are there any specific challenges related to say you mentioned RDS earlier? I know with some databases, like for example, Postgres sometimes, uh, when you have a traditional server application, the server will pool the connections. So it'll make some connection into your data database and just keep reusing them.

Whereas with the Lambda is it making a new connection every time?

[00:33:41] Swizec: Almost. So Lambdas. I think you can configure how long it stays warm, but what AWS tries to do is reuse your laptops. So when the Lambda wakes up, it doesn't die immediately. After that initial request, it stays, it stays alive for the next, let's say it's one minute. Or even if it's 10 minutes, it's, there's a life for the next couple of minutes.

And during that time, it can accept new requests, new requests and serve them. So anything that you put in the global namespace of your phone. We'll potentially remain alive between functions and you can use that to build a connection pool to your database so that you can reuse the connections instead of having to open new connections every time.

What you have to be careful with is that if you get simultaneous requests at actually simultaneous requests, not like 10 requests in 10 milliseconds, if you get 10 requests at the same millisecond, you're going to wake up multiple Lambdas and you're going to have multiple connection pools running in parallel.

So it's very easy to crash your RDS server with something like AWS Lambda, because I think the default concurrency limit is a thousand Lambdas. And if each of those can have a pool of, let's say 10 requests, that's 10,000 open requests or your RDS server. And. You were probably not paying for high enough tier for the RDS server to survive that that's where it gets really tricky.

I think AWS now has a service that lets you kind of offload a connection pool so that you can take your Lambda and connect it to the connection pool. And the connection pool is keeping warm connections to your server. but an even better approach is to use something like Aurora DB, which is also an on AWS or dynamo DB, which are designed from the ground up to work with serverless applications.

[00:35:47] Jeremy: It's things that work, but you have to know sort of the little, uh, gotchas, I guess, that are out there.

[00:35:54] Swizec: Yeah, exactly. There's sharp edges to be found everywhere. part of that is also that. serverless, isn't that old yet I think AWS Lambda launched in 2014 or 2015, which is one forever in internet time, but it's still not that long ago. So we're still figuring out how to make things better.

And, it's also where, where you mentioned earlier that whether it's more appropriate for backend processes or for user-facing processes, it does work really well for backend processes because you CA you have better control over the maximum number of Lambdas that run, and you have more patience for them being slow, being slow sometimes. And so on.

[00:36:41] Jeremy: It sounds like even for front end processes as long as you know, like you said, the sharp edges and you could do things like putting a CDN in front where your Lambdas don't even get hit until some later time.

There's a lot of things you can do to make it where it is a good choice or a good I guess what you're saying, when you're building an application, do you default to using a serverless type of stack?

[00:37:14] Swizec: Yes, for all of my side projects, I default to using serverless. Um, I have a bunch of apps running that way, even when serverless is just no servers at all. Like my blog doesn't have any cloud functions right now. It's all running from CDNs, basically. I think the only, I don't know if you could even count that as a cloud function is w my email signup forms go to an API with my email provider.

So there's also not, I don't have any servers there. It's directly from the front end. I would totally recommend it if you are a startup that just got tens of millions of dollars in funding, and you are planning to have a million requests per second by tomorrow, then maybe not. That's going to be very expensive very quickly.

But there's always a trade off. I think that with serverless, it's a lot easier to build in terms of dev ops and in terms of handling your infrastructure, it's, it takes a bit of a mind shift in how you're building when it comes to the actual logic and the actual, the server system that you're building.

And then in terms of costs, it really depends on what you're doing. If you're a super huge company, it probably doesn't make sense to go and serverless, but if you're that. Or if you have that much traffic, you hopefully are also making enough money to essentially build your own serverless system for yourself.

[00:38:48] Jeremy: For someone who's interested in trying serverless, like I know for myself when I was going through the tutorial you're using the serverless framework and it creates all these different things in AWS for you and at a high level I could follow. Okay. You know, it has the API gateway and you've got your simple queue service and DynamoDB, and the lambdas all that sort of thing.

So at a high level, I could follow along. But when I log into the AWS console, not knowing a whole lot about AWS, it's creating a ton of stuff for you.

And I'm wondering from your perspective for somebody who's learning about serverless, how much do they need to really dive into the AWS internals and understand what's going on there.

[00:39:41] Swizec: That's a tough one because personally I try to stay away as much as possible. And especially with the serverless framework, what I like is configuring everything through the framework rather than doing it manually. Um, because there's a lot of sharp edges there as well. Where if you go in and you manually change something, then AWS can't allow serverless framework to clean up anymore and you can have ghost processes running.

At Tia, we've had that as a really interesting challenge. We're not using serverless framework, we're using something called cloud formation, which is essentially.

One lower level of abstraction, then serverless framework, we're doing a lot more work. We're creating a lot more work for ourselves, but that's what we have. And that's what we're working with. these decisions predate me. So I'm just going along with what we have and we wanted to have more control, because again, we have dev ops people on the team and they want more control because they also know what they're doing and we keep having trouble with, oh, we were trying to use infrastructure as code, but then there's this little part where you do have to go into the AWS console and click around a million times to find the right thing and click it.

And we've had interesting issues with hanging deploys where something gets stuck on the AWS side and we can take it back. We can tear it down, we can stop it. And it's just a hanging process and you have to wait like seven hours for AWS to do. Oh, okay. Yeah. If it's been there for seven hours, it's probably not needed and then kills it and then you can deploy.

So that kind of stuff gets really frustrating very quickly.

[00:41:27] Jeremy: Sounds like maybe in your personal projects, you've been able to, to stick to the serverless framework abstraction and not necessarily have to understand or dive into the details of AWS and it's worked out okay for you.

[00:41:43] Swizec: Yeah, exactly. it's useful to know from a high, from a high level what's there and what the different parts are doing, but I would not recommend configuring them through the, through the AWS console because then you're going to always be in the, in the AWS console. And it's very easy to get something slightly wrong.

[00:42:04] Jeremy: Yeah. I mean, I know for myself just going through the handbook, just going into the console and finding out where I could look at my logs or, um, what was actually running in AWS. It wasn't that straightforward. So, even knowing the bare minimum for somebody who's new to, it was like a little daunting.

[00:42:26] Swizec: Yeah, it's super daunting. And they have thousands, if not hundreds of different products on AWS. and when it comes to, like you mentioned logs, I, I don't think I put this in the handbook because I either didn't know about it yet, or it wasn't available quite yet, but serverless can all the serverless framework also let you look at logs through the servers framework.

So you can say SLS function, name, logs, and it shows you the latest logs. it also lets you run functions locally to an extent. it's really useful from that perspective. And I personally find the AWS console super daunting as well. So I try to stay away as much as possible.

[00:43:13] Jeremy: It's pretty wild when you first log in and you click the button that shows you the services and it's covering your whole screen. Right. And you're like, I just want to see what I just pushed.

[00:43:24] Swizec: Yeah, exactly. And there's so many different ones and they're all they have these obscure names that I don't find meaningful at all.

[00:43:34] Jeremy: I think another thing that I found a little bit challenging was that when I develop applications, I'm used to having the feedback cycle of writing the code, running the application or running a test and seeing like, did it work? And if it didn't, what's the stack trace, what, what happened? And I found the process of going into CloudWatch and looking at the logs and waiting for them to eventually refresh and all that to be, a little challenging. And, and, um, so I was wondering in your, your experience, um, how you've worked through, you know, how are you able to get a fast feedback loop or is this just kind of just part of it.

[00:44:21] Swizec: I am very lazy when it comes to writing tests, or when it comes to fast feedback loops. I like having them I'm really bad at actually setting them up. But what I found works pretty well for serverless is first of all, if you write your backend a or if you write your cloud functions in TypeScript that immediately resolves most of the most common issues, most common sources of bugs, it makes sure that you're not using something that doesn't exist.

Make sure you're not making typos, make sure you're not holding a function wrong, which I personally find very helpful because I have pretty fast and I make typos. And it's so nice to be able to say, if it's completely. I know that it's at least going to run. I'm not going to have some stupid issue of a missing semi-colon or some weird fiddly detail.

So that's already a super fast feedback cycle that runs right in your IDE the next step is because you're just writing the business logic function and you know, that the function itself is going to run. You can write unit tests that treat that function as a normal function. I'm personally really bad at writing those unit tests, but they can really speed up the, the actual process of testing because you can go and you can be like, okay.

So I know that the code is doing what I want it to be doing if it's running in isolation. And that, that can be pretty fast. The next step that is, uh, Another level in abstraction and and gives you more feedback is with serverless. You can locally invoke most Lambdas. The problem with locally running your Lambdas is that it's not the same environment as on AWS.

And I asked one of the original developers of the same serverless framework, and he said, just forget about accurately replicating AWS on your system. There are so many dragons there it's never going to work. and I had an interesting example about that when I was building a little project for my girlfriend that sends her photos from our relationship to an IOT device every day or something like that.

It worked when I ran SLS invoke and it ran and it even called all of the APIs and everything worked. It was amazing. And then when I deployed it, it didn't work and it turned out that it was a permissions issue. I forgot to give myself a specific, I am role for something to work. That's kind of like a stair-stepping process of having fast feedback cycles first, if it compiles, that means that you're not doing anything absolutely wrong.

If the tests are running, that means it's at least doing what you think it's doing. If it's invoking locally, it means that you're holding the API APIs and the third-party stuff correctly. And then the last step is deploying it to AWS and actually running it with a curl or some sort of request and seeing if it works in production.

And that then tells you if it's actually going to work with AWS. And the nice thing there is because uh serverless framework does this. I think it does a sort of incremental deploys. The, that cycle is pretty fast. You're not waiting half an hour for your C code pipeline or for your CIO to run an integration test, to do stuff.

One minute, it takes one minute and it's up and you can call it and you immediately see if it's working.

[00:47:58] Jeremy: Basically you're, you're trying to do everything you can. Static typing and, running tests just on the functions. But I guess when it comes down to it, you really do have to push everything, update AWS, have it all run, um, in order to, to really know. Um, and so I guess it's, it's sort of a trade-off right. Versus being able to, if you're writing a rails application and you've got all your dependencies on your machine, um, you can spin it up and you don't really have to wait for it to, to push anywhere, but,

[00:48:36] Swizec: Yeah. But you still don't know if, what if your database is misconfigured in production?

[00:48:42] Jeremy: right, right. So it's, it's never, never the same as

production. It's just closer. Right? Yeah. Yeah, I totally get When you don't have the real services or the real databases, then there's always going to be stuff that you can miss. Yeah,

[00:49:00] Swizec: Yeah. it's not working until it's working in production.

[00:49:03] Jeremy: That's a good place to end it on, but is there anything else you want to mention before we go?

[00:49:10] Swizec: No, I think that's good. Uh, I think we talked about a lot of really interesting stuff.

[00:49:16] Jeremy: Cool. Well, Swiz, thank you so much for chatting with me today.

[00:49:19] Swizec: Yeah. Thank you for having me.

Robotic Process Automation with Alexander Pugh

Thu, 28 Oct 2021 03:04:29 -0000

Alexander Pugh is a software engineer at Albertsons. He has worked in Robotic Process Automation and the cognitive services industry for over five years.

This episode originally aired on Software Engineering Radio.

Related Links

Alexander Pugh's personal site

Enterprise RPA Solutions

Enterprise "Low Code/No Code" API Solutions

RPA and the OS

Transcript

You can help edit this transcript on GitHub.

[00:00:00] Jeremy: Today, I'm talking to Alexander Pugh. He's a solutions architect with over five years of experience working on robotic process automation and cognitive services.

Today, we're going to focus on robotic process automation.

Alexander welcome to software engineering radio.

[00:00:17] Alex: Thank you, Jeremy. It's really good to be here.

[00:00:18] Jeremy: So what does robotic process automation actually mean?

[00:00:23] Alex: Right. It's a, it's a very broad nebulous term. when we talk about robotic process automation, as a concept, we're talking about automating things that humans do in the way that they do them. So that's the robotic, an automation that is, um, done in the way a human does a thing.

Um, and then process is that thing, um, that we're automating. And then automation is just saying, we're turning this into an automation where we're orchestrating this and automating this. and the best way to think about that in any other way is to think of a factory or a car assembly line. So initially when we went in and we, automated a car or factory, automation line, what they did is essentially they replicated the process as a human did it. So one day you had a human that would pick up a door and then put it on the car and bolt it on with their arms. And so the initial automations that we had on those factory lines were a robot arm that would pick up that door from the same place and put it on the car and bolt it on there.

Um, so the same can be said for robotic process automation. We're essentially looking at these, processes that humans do, and we're replicating them, with an automation that does it in the same way. Um, and where we're doing that is the operating system. So robotic process automation is essentially going in and automating the operating system to perform tasks the same way a human would do them in an operating system.

So that's, that's RPA in a nutshell,

Jeremy: So when you say you're replicating something that a human would do, does it mean it has to go through some kind of GUI or some kind of user interface?

[00:02:23] Alex: That's exactly right, actually. when we're talking about RPA and we look at a process that we want to automate with RPA, we say, okay. let's watch the human do it. Let's record that. Let's document the human process. And then let's use the RPA tool to replicate that exactly in that way.

So go double click on Chrome, launch that click in the URL line and send key in www.cnn.com or what have you, or servicenow hit enter, wait for it to load and then click, you know, where you want to, you know, fill out your ticket for service. Now send key in. So that's exactly how an RPA solution at the most basic can be achieved.

Now and any software engineer knows if you sit there and look over someone's shoulder and watch them use an operating system. Uh, you'll say, well, there's a lot of ways we can do this more efficiently without going over here, clicking that, you know, we can, use a lot of services that the operating system provides in a programmatic way to achieve the same ends and RPA solutions can also do that.

The real key is making sure that it is still achieving something that the human does and that if the RPA solution goes away, a human can still achieve it. So if you're, trying to replace or replicate a process with RPA, you don't want to change that process so much so that a human can no longer achieve it as well.

that's something where if you get a very technical, and very fluent software engineer, they lose sight of that because they say, oh, you know what? There's no reason why we need to go open a browser and go to you know, the service now portal and type this in when I can just directly send information to their backend.

which a human could not replicate. Right? So that's kind of where the line gets fuzzy. How efficiently can we make this RPA solution?

[00:04:32] Jeremy: I, I think a question that a lot of people are probably having is a lot of applications have APIs now. but what you're saying is that for it to, to be, I suppose, true RPA, it needs to be something that a user can do on their own and not something that the user can do by opening up dev tools or making a post to an end point.

[00:04:57] Alex: Yeah. And so this, this is probably really important right now to talk about why RPA, right? Why would you do this when you could put on a server, a a really good, API ingestion point or trigger or a web hook that can do this stuff. So why would we, why would we ever pursue our RPA?

There there's a lot of good reasons for it. RPA is very, very enticing to the business. RPA solutions and tools are marketed as a low code, no code solution for the business to utilize, to solve their processes that may not be solved by an enterprise solution and the in-between processes in a way.

You have, uh, a big enterprise, finance solution that everyone uses for the finance needs of your business, but there are some things that it doesn't provide for that you have a person that's doing a lot of, and the business says, Okay. well, this thing, this human is doing this is really beneath their capability. We need to get a software solution for it, but our enterprise solution just can't account for it. So let's get a RPA capability in here. We can build it ourselves, and then there we go. So there, there are many reasons to do that. financial, IT might not have, um, the capability or the funding to actually build and solve the solution. Or it it's at a scale that is too small to open up, uh, an IT project to solve for. Um, so, you know, a team of five is just doing this and they're doing it for, you know, 20 hours a week, which is uh large, but in a big enterprise, that's not really. Maybe um, worth building an enterprise solution for it. or, and this is a big one. There are regulatory constraints and security constraints around being able to access this or communicate some data or information in a way that is non-human or programmatic. So that's really where, um, RPA is correctly and best applied and you'll see it most often.

So what we're talking about there is in finance, in healthcare or in big companies where they're dealing with a lot of user data or customer data in a way. So when we talk about finance and healthcare, there are a lot of regulatory constraints and security reasons why you would not enable a programmatic solution to operate on your systems.

You know, it's just too hard. We we're not going to expose our databases or our data to any other thing. It would, it would take a huge enterprise project to build out that capability, secure that capability and ensure it's going correctly. We just don't have the money the time or the strength honestly, to afford for it.

So they say, well, we already have. a user pattern. We already allow users to, to talk to this information and communicate this information. Let's get an RPA tool, which for all intents and purposes will be acting as a user. And then it can just automate that process without us exposing to queries or any other thing, an enterprise solution or programmatic, um, solution.

So that's really why RPA, where and why you, you would apply it is there's, there's just no capability at enterprise for one reason or another to solve for it.

[00:08:47] Jeremy: as software engineers, when we see this kind of problem, our first thought is, okay, let's, let's build this custom application or workflow. That's going to talk to all these API APIs. And, and what it sounds like is. In a lot of cases there just isn't the time there just isn't the money, to put in the effort to do that.

And, it also sounds like this is a way of being able to automate that. and maybe introducing less risk because you're going through the same, security, the same workflow that people are doing currently. So, you know, you're not going to get into things that they're not supposed to be able to get into because all of that's already put in place.

[00:09:36] Alex: Correct. And it's an already accepted pattern and it's kind of odd to apply that kind of very IT software engineer term to a human user, but a human user is a pattern in software engineering. We have patterns that do this and that, and, you know, databases and not, and then the user journey or the user permissions and security and all that is a pattern.

And that is accepted by default when you're building these enterprise applications okay.

What's the user pattern. And so since that's already established and well-known, and all the hopefully, you know, walls are built around that to enable it to correctly do what it needs to do. It's saying, Okay. we've already established that. Let's just use that instead of. You know, building a programmatic solution where we have to go and find, do we already have an appropriate pattern to apply to it? Can we build it in safe way? And then can we support it? You know, all of a sudden we, you know, we have the support teams that, you know, watch our Splunk dashboards and make sure nothing's going down with our big enterprise application.

And then you're going to build a, another capability. Okay. WHere's that support going to come from? And now we got to talk about change access boards, user acceptance testing and, uh, you know, UAT dev production environments and all that. So it becomes, untenable, depending on your, your organization to, to do that for things that might fall into a place that is, it doesn't justify the scale that needs to be thrown upon it.

But when we talk about something like APIs and API exist, um, for a lot of things, they don't exist for everything. And, a lot of times that's for legacy databases, that's for mainframe capability. And this is really where RPA shines and is correctly applied. And especially in big businesses are highly regulated businesses where they can't upgrade to the newest thing, or they can't throw something to the cloud.

They have a, you know, their mainframe systems or they have their database systems that have to exist for one reason or the other until there is the motivation and the money and the time to correctly migrate and, and solve for them. So until that day, and again, there's no, API to, to do anything on a, on a mainframe, in this bank or whatnot, it's like, well, Okay. let's just throw RPA on it.

Let's, you know, let's have a RPA do this thing, uh, in the way that a human does it, but it can do it 24 7. and an example, or use cases, you work at a bank and, uh, there's no way that InfoSec is going to let you query against this database with, your users that have this account or your customers that have this no way in any organization at a bank.

Is InfoSec going to say, oh yeah. sure. Let me give you an Odata query, you know, driver on, you know, and you can just set up your own SQL queries and do whatever they're gonna say no way. In fact, how did you find out about this database in the first place and who are you.

How do we solve it? We, we go and say, Okay. how does the user get in here well they open up a mainframe emulator on their desktop, which shows them the mainframe. And then they go in, they click here and they put this number in there, and then they look up this customer and then they switch this value to that value and they say, save.

And it's like, okay. cool. That's that RPA can do. And we can do that quite easily. And we don't need to talk about APIs and we don't need to talk about special access or doing queries that makes, you know, Infosec very scared. you know, a great use case for that is, you know, a bank say they, they acquire, uh, a regional bank and they say, cool, you're now part of our bank, but in your systems that are now going to be a part of our systems, you have users that have this value, whereas in our bank, that value is this value here. So now we have to go and change for 30,000 customers this one field to make it line up with our systems. Traditionally you would get a, you know, extract, transform load tool an ETL tool to kind of do that. But for 30,000 customers that might be below the threshold, and this is banking. So it's very regulated and you have to be very, very. Intentional about how you manipulate and move data around.

So what do we have to do? okay. We have to hire 10 contractors for six months, and literally what they're going to do eight hours a day is go into the mainframe through the simulator and customer by customer. They're going to go change this value and hit save. And they're looking at an Excel spreadsheet that tells them what customer to go into.

And that's going to cost X amount of money and X, you know, for six months, or what we could do is just build a RPA solution, a bot, essentially that goes, and for each line of that Excel spreadsheet, it repeats this one process, open up mainframe emulator, navigate into the customer profile and then changes value, and then shut down and repeat.

And It can do that in one week and, and can be built in two, that's the, the dream use case for RPA and that's really kind of, uh, where it would shine.

[00:15:20] Jeremy: It sounds like the. best use case for it is an old system, a mainframe system, in COBOL maybe, uh, doesn't have an API. And so, uh, it makes sense to rather than go, okay, how can we get directly into the database?

[00:15:38] Alex: How can we build on top of it? Yeah,

[00:15:40] Jeremy: we build on top of it? Let's just go through the, user interface that exists, but just automate that process. And, the, you know, the example you gave, it sounds very, very well-defined you're gonna log in and you're going to put in maybe this ID, here's the fields you want to get back.

and you're going to save those and you didn't have to make any real decisions, I suppose, in, in terms of, do I need to click this thing or this thing it's always going to be the same path to, to get there.

[00:16:12] Alex: exactly. And that's really, you need to be disciplined about your use cases and what those look like. And you can broadly say a use case that I am going to accept has these features, and one of the best ways to do that is say it has to be a binary decision process, which means there is no, dynamic or interpreted decision that needs to, or information that needs to be made.

Exactly like that use case it's very binary either is, or it isn't you go in you journey into there. and you change that one thing and that's it there's no oh, well this information says this, which means, and then I have to go do this. Once you start getting in those if else, uh, processes you're, you're going down a rabbit hole and it could get very shaky and that introduces extreme instability in what you're trying to do.

And also really expands your development time cause you have to capture these processes and you have to say, okay. tell me exactly what we need to build this bot to do. And for, binary decision processes, that's easy go in here, do this, but nine times out of 10, as you're trying to address this and solution for it, you'll find those uncertainties.

You'll find these things where the business says, oh, well, yeah. that happens, you know, one times out of 10 and this is what we need to do. And it's like, well, that's going to break the bot. It, you know, nine times out of 10, this, this spot is going to fall over. this is now where we start getting into, the machine learning and AI, realm.

And why RPA, is classified. Uh, sometimes as a subset of the AI or machine learning field, or is a, a pattern within that field is because now that you have this bot or this software that enables you to do a human process, let's enable that bot to now do decision-making processes where it can interpret something and then do something else.

Because while we can just do a big tree to kind of address every capability, you're never going to be able to do that. And also it's, it's just a really heavy, bad way to build things. So instead let's throw in some machine learning capability where it just can understand what to do and that's, you know, that's the next level of RPA application is Okay. we've got it. We've, we've gone throughout our organization. We found every kind of binary thing, that can be replaced with an RPA bot. Okay.

Now what are the ones that we said we couldn't do? Because it had some of that decision-making that, required too much of a dynamic, uh, intelligence behind it. And let's see if we can address those now that we have this. And so that's, that's the 2.0, in RPA is addressing those non-binary, paths.

I would argue that especially in organizations that are big enough to justify bringing in an RPA solution to solve for their processes. They have enough binary processes, binary decision processes to keep them busy.

Some people, kind of get caught up in trying to right out the gate, say, we need to throw some machine learning. We need to make these bots really capable instead of just saying, well, we we've got plenty of work, just changing the binary processes or addressing those. Let's just be disciplined and take that, approach.

Uh, I will say towards RPA and bots, the best solution or the only solution. When you talk about building a bot is the one that you eventually turn off. So you can say, I built a bot that will go into our mainframe system and update this value. And, uh, that's successful.

I would argue that's not successful. When that bot is successful is when you can turn it off because there's an enterprise solution that addresses it. and, and you don't have to have this RPA bot that lives over here and does it instead, you're enterprise, capability now affords for it. And so that's really, I think a successful bot or a successful RPA solution is you've been able to take away the pain point or that human process until it can be correctly addressed by your systems that everyone uses.

[00:21:01] Jeremy: from, the business perspective, you know, what are some of the limitations or long-term problems with, with leaving an RPA solution in place?

[00:21:12] Alex: that's a, that's a good question. Uh, from the business there, isn't, it's solved for. leaving it in place is other than just servicing it and supporting it. There's no real issue there, especially if it's an internal system, like a mainframe, you guys own that. If it changes, you'll know it, if it changes it's probably being fixed or addressed.

So there's no, problem. However, That's not the only application for RPA. let's talk about another use case here, your organization, uses, a bank and you don't have an internal way to communicate it. Your user literally has to go to the bank's website, log in and see information that the bank is saying, Hey, this is your stuff, right?

The bank doesn't have an API for their, that service. because that would be scary for the bank. They say, we don't want to expose this to another service. So the human has to go in there, log in, look at maybe a PDF and download it and say, oh, Okay.

So that is happens in a browser. So it's a newer technology.

This isn't our mainframe built in 1980. You know, browser based it's in the internet and all that, but that's still a valid RPA application, right? It's a human process. There's no API, there's no easy programmatic way to, to solution for it. It would require the bank and your it team to get together and, you know, hate each other. Think about why this, this is so hard. So let's just throw a bot on it. That's going to go and log in, download this thing from the bank's website and then send it over to someone else. And it's going to do that all day. Every day. That's a valid application. And then tomorrow the bank changes its logo. And now my bot is it's confused.

Stuff has shifted on the page. It doesn't know where to click anymore. So you have to go in and update that bot because sure enough, that bank's not going to send out an email to you and saying, Hey, by the way, we're upgrading our website in two weeks. Not going to happen, you'll know after it's happened.

So that's where you're going to have to upgrade the bot. and that's the indefinite use of RPA is going to have to keep until someone else decides to upgrade their systems and provide for a programmatic solution that is completely outside the, uh, capability of the organization to change. And so that's where the business would say, we need this indefinitely.

It's not up to us. And so that is an indefinite solution that would be valid. Right? You can keep that going for 10 years as long, I would say you probably need to get a bank that maybe meets your business needs a little easier, but it's valid. And that would be a good way for the business to say yes, this needs to keep running forever until it doesn't.

[00:24:01] Jeremy: you, you brought up the case of where the webpage changes and the bot doesn't work anymore. specifically, you're, you're giving the example of finance and I feel like it would be basically catastrophic if the bot is moving money to somewhere, it shouldn't be moving because the UI has moved around or the buttons not where it expects it to be.

And I'm kind of curious what your experience has been with that sort of thing.

[00:24:27] Alex: you need to set organizational thresholds and say, this is this something this impacting or something that could go this wrong. It is not acceptable for us to solve with RPA, even though we could do it, it's just not worth it. Some organizations say that's anything that touches customer data healthcare and banking specialists say, yeah, we have a human process where the human will go and issue refunds to a customer, uh, and that could easily be done via RPA solution, but it's fraught with, what, if it does something wrong, it's literally going to impact.

Uh, someone somewhere they're their moneys or their, their security or something like that. So that, that definitely should be part of your evaluation. And, um, as an organization, you should set that up early and stick to it and say, Nope, this is outside our purview. Even we can do it. It has these things.

So I guess the answer to that is you should never get to that process, but now we're going to talk about, I guess, the actual nuts and bolts of how RPA solutions work and how they can be made to not action upon stuff when it changes or if it does so RPA software, by and large operates by exposing the operating system or the browsers underlying models and interpreting them.

Right. So when we talk about something like a, mainframe emulator, you have your RPA software on Microsoft windows. It's going to use the COM the component operating model, to see what is on the screen, what is on that emulator, and it's gonna expose those objects. to the software and say, you can pick these things and click on that and do that.

when we're talking about browser, what the RPA software is looking at is not only the COM the, the component object model there, which is the browser, itself. But then it's also looking at the DOM the document object model that is the webpage that is being served through the browser. And it's exposing that and saying, these are the things that you can touch or, operate on.

And so when you're building your bots, what you want to make sure is that the uniqueness of the thing that you're trying to access is something that is truly unique. And if it changes that one thing that the bot is looking for will not change. So we let's, let's go back to the, the banking website, right?

We go in and we launch the browser and the bot is sitting there waiting for the operating system to say, this process is running, which is what you wanted to launch. And it is in this state, you know, the bot says, okay. I'm expecting this kind of COM to exist. I see it does exist. It's this process, and it has this kind of name and cool Chrome is running. Okay. Let's go to this website. And after I've typed this in, I'm going to wait and look at the DOM and wait for it to return this expected a webpage name, but they could change their webpage name, the title of it, right. They can say, one day can say, hello, welcome to this bank. And the next day it says bank website, all of a sudden your bot breaks it no longer is finding what it was told to expect.

So you want to find something unique that will never change with that conceivably. And so you find that one thing on the DOM on the banking website, it's, you know, this element or this tag said, okay, there's no way they're changing that. And so it says cool the page is loaded. Now click on this field, which is log in.

Okay. You want to find something unique on that field that won't change when they upgrade, you know, from bootstrap to this kind of, you know, UI framework. that's all well, and good. That's what we call the happy path. It's doing this perfectly. Now you need to define what it should do when it doesn't find these things, which is not keep going or find similar it's it needs to fail fast and gracefully and pass on that failure to someone and not keep going. And that's kind of how we prevent that scary use case where it's like. okay. it's gone in, it's logged into the bank website now it's transactioning, bad things to bad places that we didn't program it for it, Well you unfortunately did not specify in a detailed enough way what it needs to look for.

And if it doesn't find that it needs to break, instead of saying that this is close enough. And so, in all things, software engineering, it's that specificity, it's that detail, that you need to hook onto. And that's also where, when we talk about this being a low-code no-code solutions that sometimes RPA is marketed to the business.

It's just so often not the case, because yes. It might provide a very user, business, friendly interface for you to build bots. But the knowledge you need to be able to ensure stability and accuracy, um, to build the bots is, is a familiarity that's probably not going to be had in the business. It's going to be had by a developer who knows what the DOM and COM are and how the operating system exposes services and processes and how.

JavaScript, especially when we're talking about single page apps and react where you do have this very reactive DOM, that's going to change. You need to be fluent with that and know, not only how HTML tags work and how CSS will change stuff on you in classes, but also how clicking on something on a single page app is as simple as a username input field will dynamically change that whole DOM and you need to account for it. so, it is it's, traditionally not as easy as saying, oh, the business person can just click, click, click, click, and then we have a bot. You'll have a bot, but it's probably going to be break breaking quite often. It's going to be inaccurate in its execution.

this is a business friendly user-friendly non-technical tool. And I launch it and it says, what do you want to do? And it says, let me record what you're going to do. And you say, cool.

And then you go about you open up Chrome and you type in the browser, and then you click here, click there, hit send, and then you stop recording. The tool says, cool, this is what you've done. Well, I have yet to see a, a solution that is that isn't able to not need further direction or, or defining on that process, You still should need to go in there and say, okay, yeah.

you recorded this correctly, but you know, you're not interpreting correctly or as accurate as you need to that field that I clicked on.

And if you know, anybody hits, you know, F12 on their keyboard while they have Chrome open and they see how the DOM is built, especially if this is using kind of any kind of template, Webpage software. It's going to have a lot of cruft in that HTML. So while yes, the recording did correctly see that you clicked on the input box.

What it's actually seen is that you actually clicked on the div. That is four levels scoped above it, whereas the parent, and there are other things within that as well. And so the software could be correctly clicking on that later, but other things could be in there and you're going to get some instability.

So the human or the business, um, bot builder, the roboticist, I guess, would need to say, okay, listen, we need to pare this down, but it's, it's even beyond that. There are concepts that you can't get around when building bots that are unique to software engineering as a concept. And even though they're very basic, it's still sometimes hard for the business user to, they felt to learn that.

And I I'm talking concepts as simple as for loops or loops in general where the business of course has, has knowledge of what we would call a loop, but they wouldn't call it a loop and it's not as accurately defined. So they have to learn that. And it's not as easy as just saying, oh Yeah.

do a loop. And the business will say, well, what's a loop.

Like I know, you know, conceptually what a loop could be like a loop in my, when I'm tying my shoe. But when you're talking about loop, that's a very specific thing in software and what you can do. And when you shouldn't do it, and that's something that these, no matter how good your low code, no code solution might be, it's going to have to afford for that concept.

And so a business user is still going to have to have some lower level capability to apply those concepts. And, and I I've yet to see anybody be able to get around that in their RPA solutions.

[00:33:42] Jeremy: So in your experience, even though these vendors may sell it as being a tool that anybody can just sit down and use but then you would want a developer to, to sit with them or, or see the result and, and try and figure out, okay, what do you, what do you really want this, this code to do?

Um, not just sort of these broad strokes that you were hoping the tool was gonna take care of for you? Yeah.

[00:34:06] Alex: that that's exactly right. And that's how every organization will come to that realization pretty quickly. the head of the game ones have said, okay, we need to have a really good, um, COE structure to this robotic operating model where we can have, a software engineering, developer capability that sits with the business, capability.

And they can, marry with each other, other businesses who may take, um, these vendors at their word and say, it's a low code meant for business. It just needs to make sure it's on and accessible. And then our business people are just gonna, uh, go in there and do this. They find out pretty quickly that they need some technical, um, guidance to go in because they're building unstable or inaccurate bots.

and whether they come to that sooner or later, they, they always come to that. Um, and they realize that, okay, there there's a technical capability And, this is not just RPA. This is the story of all low-code no-code solutions that have ever existed. It always comes around that, while this is a great interface for doing that, and it's very easy and it makes concepts easy.

Every single time, there is a technical capability that needs to be afforded.

[00:35:26] Jeremy: For the. The web browser, you mentioned the DOM, which is how we typically interact with applications there. But for native applications, you, you briefly mentioned, COM. And I was wondering when someone is writing, um, you know, a bot, uh, what are the sorts of things they see, or what are the primitives they're working with?

Like, is there a name attached to each button, each text, field,

[00:35:54] Alex: wouldn't that be a great world to live in, so there's not. And, and, as we build things in the DOM. People get a lot better. We've seen people are getting much better about using uniqueness when they build those things so that they can latch onto when things were built or built for the COM or, you know, a .NET for OS that might, that was not no one no one was like oh yeah, we're going to automate this.

Or, you know, we need to make this so that this button here is going to be unique from that button over there on the COM they didn't care, you know, different name. Um, so yeah, that is, that is sometimes a big issue when you're using, uh, an RPA solution, you say, okay. cool. Look at this, your calculator app.

And Okay. it's showing me the component object model that this was built. It that's describing what is looking at, but none of these nodes have, have a name. They're all, you know, node one node, 1.1 node two, or, or whatnot, or button is just button and there's no uniqueness around it. And that is, you see a lot of that in legacy older software, um, E legacy is things built in 2005, 2010.

Um, you do see that, and that's the difficulty at that point. You can still solve for this, but what you're doing is you're using send keys. So instead of saying, Okay.

RPA software, open up this, uh, application and then look for. You know, thing, this object in the COM and click on it, it's going to, you know, it can't, there is no uniqueness.

So what you say is just open up the software and then just hit tab three times, and that should get you to this one place that was not unique, but we know if you hit tab three times, it's going to get there now. That's all well and good, but there's so many things that could interfere with that and break it.

And the there's no context for the bot to grab onto, to verify, Okay. I am there. So any one thing, you could have a pop-up which essentially hijacks your send key, right? And so the bot yes, absolutely hit tab three times and it should be in that one place. It thinks it is, and it hits in enter, but in between the first and second tab, a pop-up happened and now it's latched onto this other process, hits enter. And all of a sudden outlook's opening bot doesn't know that, but it's still going on and it's going to enter in some financial information into oops, an email that it launched because it thought hitting enter again would do so. Yeah.

That's, that's where you get that instability. Um, there are other ways around it or other solutions.

and this is where we get into the you're using, um, lower level software engineering solutioning instead of doing it exactly how the user does it. When we're talking about the operating system and windows, there are a ton of interop services and assemblies that a, uh, RPA solution can access.

So instead of cracking open Excel, double-clicking on Excel workbook waiting for it to load, and then reading the information and putting information in, you can use the, you know, the office 365 or whatnot that, um, interop service assembly and say, Hey, launch this workbook without the UI, showing it, attach to that process that, you know, it is.

And then just send to it, using that assembly send information into it. And the human user can't do that. It can't manipulate stuff like that, but the bot can, and it it's the same end as the human users trying. And it's much more efficient and stable because the UI couldn't afford for that kind of stability.

So that would be a valid solution. But at that point, you're really migrating into a software engineering, it developer solution of something that you were trying not to do that for. So when is that? Why, you know, why not just go and solve for it with an enterprise or programmatic solution in the first place?

So that's the balance.

[00:40:18] Jeremy: earlier you're talking about the RPA needs to be something that, uh, that the person is able to do. And it sounds like in this case, I guess there still is a way for the person to do it. They can open up the, the Excel sheet and right it's just that the way the, the RPA tool is doing it is different. Yeah.

[00:40:38] Alex: Right. And more efficient and more stable. Certainly. Uh, especially when we're talking about Excel, you have an Excel with, you know, 200,000 lines, just opening that that's, that's your day, that's going to Excel it, just going to take its time opening and visualizing that information for you. Whereas you, you know, an RPA solution doesn't even need to crack that open.

Uh, it can just send data right directly to that workbook and it that's a valid solution. And again, some of these processes, it might be just two people at your organization that are essentially doing it. So it's, you know, you don't really, it's not at a threshold where you need an enterprise solution for it, but they're spending 30 minutes of their day just waiting for that Excel workbook to open and then manipulating the data and saving it.

And then, oh, their computer crashed. So you can do an RPA solution. It's going to be, um, to essentially build for a more efficient way of doing it. And that would be using the programmatic solution, but you're right. It is doing it in a way that a human could not achieve it. Um, and that again is. The where the discipline and the organizational, aspect of this comes in where it's saying, is that acceptable?

Is it okay to have it do things in this way, that are not human, but achieving the same ends. And if you're not disciplined that creeps, and all of a sudden you have a RPA solution that is doing things in a way that where the whole reason to bring that RPA solution is to not have something that did something like that. And that's usually where the stuff falls apart. IT all of a sudden perks their head up and says, wait, I have a lot of connections coming in from this one computer doing stuff very quickly with a, you know, a SQL query. It's like, what is going on? And so all of a sudden, someone built a bot to essentially do a programmatic connection.

And it is like, you should not be who gave you this permissions who did this shut down everything that is RPA here until we figure out what you guys went and did. So that's, that's the dance.

[00:42:55] Jeremy: it's almost like there's this hidden API or there's this API that you're not intended to use. but in the process of trying to automate this thing, you, you use it and then if your, IT is not aware of it, then things just kind of spiral out of control.

[00:43:10] Alex: Exactly. Right. So let's, you know, a use case of that would be, um, we need to get California tax information on alcohol sales. We need to see what each county taxes for alcohol to apply to something. And so today the human users, they go into the California, you know, tobacco, wildlife, whatever website, and they go look up stuff and okay, let's, that's, that's very arduous.

Let's throw a bot on that. Let's have a bot do that. Well, the bot developers, smart person knows their way around Google and they find out, well, California has an API for that. instead of the bot cracking open Chrome, it's just going to send this rest API call and it's going to get information back and that's awesome and accurate and way better than anything. but now all of a sudden IT sees connections going in and out. all of a sudden it's doing very quickly and it's getting information coming into your systems in a way that you did not know was going to be, uh, happening. And so while it was all well and good, it's, it's a good way for, the people whose job it is to protect yourself or know about these things, to get very, um, angry, rightly so that this is happening.

that's an organizational challenge, uh, and it's an oversight challenge and it's a, it's a developer challenge because, what you're getting into is the problems with having too technical people build these RPA bots, right? So on one hand we have business people who are told, Hey, just crack this thing open and build it.

And it's like, well, they don't have enough technical fluency to actually build a stable bot because they're just taking it at face value. Um, on the other hand, you have software engineers or developers that are very technical that say, oh, this process. Yeah. Okay. I can build a bot for that. But what if I used, you know, these interop services, assemblies that Microsoft gives me and I can access it like that.

And then I can send an API call over here. And while I'm at it, I'm going to, you know, I'm just going to spin up a server just on this one computer that can do this. When the bot talks to it. And so you have the opposite problem. Now you have something that is just not at all RPA, it's just using the tool to, uh, you know, manipulate stuff, programmatically.

[00:45:35] Jeremy: so, as a part of all this, is using the same credentials as a real user, right. You're you're logging in with a username and password. if the form requires something like two factor authentication or, you know, or something like that, like, how does that work since it's not an actual person?

[00:45:55] Alex: Right. So in a perfect world, you're correct. Um, a bot is a user. I know a lot of times you'll hear, say, people will be like, oh, hi, I have 20 RPA bots. What they're usually saying is I have 20 automations that are being run for separate processes, with one user's credentials, uh, on a VDI. So you're right.

They, they are using a user's credentials with the same permissions that any user that does that process has, that's why it's easy. but now we have these concepts, like two factor authentication, which every organization is using that should require something that exists outside of that bot users environment. And so how do you afford for that in a perfect world? It would be a service account, not a user account and service accounts are governed a little differently. A lot of times service accounts, um, have much more stringent rules, but also allow for things like password resets, not a thing, um, or two factor authentication is not a thing for those.

So that would be the perfect solution, but now you're dragging in IT. Um, so, you know, if you're not structurally set up for that, that's going to be a long slog. Uh, so what would want to do some people actually literally have a, we'll have a business person that has their two factor auth for that bot user on their phone.

And then just, you know, they'll just go in and say, yeah.

that's me. that's untenable. So, um, sometimes what a lot of these, like Microsoft, for instance, allow you to do is to install a two factor authentication, application, um, on your desktop so that when you go to log in a website and says, Hey, type in your password.

Cool. Okay. Give me that code. That's on your two factor auth app. The bot can actually launch that. Copy the code and paste it in there and be on its way. But you're right now, you're having to afford for things that aren't really part of the process you're trying to automate. They are the incidentals that also happen.

And so you have to build your bot to afford for those things and interpret, oh, I need to do two factor authentication. And a lot of times, especially if you have an entirely business focused PA um, robotic operating model, they will forget about those things or find ways around them that the bot isn't addressing, like having that authenticator app on their phone.

that's, um, stuff that definitely needs to be addressed. And sometimes is only, found at runtime like, oh, it's asking for login. And when I developed it, I didn't need to do that because I had, you know, the cookie that said you're good for 30 days, but now, oh, no.

[00:48:47] Jeremy: yeah. You could have two factor. Um, you could have, it asking you to check your email for a code. There could be a fraud warning. There's like all sorts of, you know, failure cases that can happen.

[00:48:58] Alex: exactly. And those things are when we talk about, uh, third-party vendors, um, third-party provider vendors, like going back to the banking website, if you don't tell them that you're going to be using a bot to get their information or to interface with that website, you're setting yourself up for a bad time because they're going to see that kind of at runtime behavior that is not possible at scale by user.

And so you run into that issue at runtime, but then you're correct. There are other things that you might run into at runtime that are not again, part of the process, the business didn't think that that was part of the process. It's just something they do that actually the bot has to afford for. that's part of the journey, uh, in building these.

[00:49:57] Jeremy: when you're, when you're building these, these bots, what are the types of tools that, that you've used in the past? Are these commercial, packages, are these open source? Like what, what does that ecosystem look like?

[00:50:11] Alex: Yeah, in this space, we have three big ones, which is, uh, automation anywhere UI path and, blue prism. Those are the RPA juggernauts providing this software to the companies that need it. And then you have smaller ones that are, trying to get in there, or provide stuff in a little different way. and you even have now, big juggernauts that are trying to provide for it, like Microsoft with something like power automate desktop.

So all of these, say three years ago, all of these softwares existed or all of these RPA solution softwares existed or operated in the same kind of way, where you would install it on your desktop. And it would provide you a studio to either record or define, uh, originally the process that was going to be automated on that desktop when you pushed play and they all kind of expose or operate in the same way they would interpret the COM or the DOM that the operating system provided. Things like task scheduler have traditionally, uh, exposed, uh, and they all kind of did that in the same way. Their value proposition in their software was the orchestration capability and the management of that.

So I build a bot to do this, Jim over there built a bot to do that. Okay. This RPA software, not only enabled you to define those processes, But what their real value was is they enable a place where I can say this needs to run at this time on this computer.

And it needs to, you know, I need to be able to monitor it and it needs to return information and all that kind of orchestration capability. Now all of these RPA solutions actually exist in that, like everything else in the browser. So instead of installing, you know, the application and launching it and, and whatnot, and the orchestration capability being installed on another computer that looked at these computers and ran stuff on them.

Now it's, it's all in the cloud as it were, and they are in the browser. So I go to. Wherever my RPA solution is in my browser. And then it says, okay, cool. You, you still need to install something on the desktop where you want the spot to run and it deploys it there. But I define and build my process in the provided browser studio.

And then we're going to give you a capability to orchestrate, monitor, and, uh, receive information on those things that you have, those bots that you have running, and then what they're now providing as well is the ability to tie in other services to your bot so that it has expanded capability. So I'm using automation anywhere and I built my bot and it's going, and it's doing this or that.

And automation anywhere says, Hey, that's cool. Wouldn't you like your bot to be able to do OCR? Well, we don't have our own OCR engine, but you probably as an enterprise do. Just use, you know, use your Kofax OCR engine or Hey, if you're really a high speed, why don't you use your Azure cognitive services capability?

We'll tie it right into our software. And so when you're building your bot, instead of just cracking open a PDF and send key control C and key control V to do stuff instead, we'll use your OCR engine that you've already paid for to, to understand stuff. And so that's, how they expand, what they're offering, um, into addressing more and more capabilities.

[00:53:57] Alex: But now we're, we're migrating into a territory where it's like, well, things have APIs why even build a bot for them. You know, you can just build a program that uses the API and the user can drive this. And so that's where people kind of get stuck. It's they they're using RPA on a, something that just as easily provides for a programmatic solution as opposed to an RPA solution.

but because they're in their RPA mode and they say, we can use a bot for everything, they don't even stop and investigate and say, Hey, wouldn't this be just as easy to generate a react app and let a user use this because it has an API and IT can just as easily monitor and support that because it's in an Azure resource bucket.

that's where an organization needs to be. Clear-eyed and say, Okay. at this point RPA is not the actual solution. We can do this just as easy over here and let's pursue that.

[00:54:57] Jeremy: the experience of making these RPAs. It sounds like you have this browser-based IDE, there's probably some kind of drag and drop set up, and then you, you, you mentioned JavaScript. So I suppose, does that mean you can kind of dive a little bit deeper and if you want to set up specific rules or loops, you're actually writing that in JavaScript.

[00:55:18] Alex: Yeah. So not, not necessarily. So, again, the business does not know what an IDE is. It's a studio. Um,

so that's, but you're correct. It's, it's an IDE. Um, each, whether we're talking about blue prism or UiPath or automation anywhere, they all have a different flavor of what that looks like and what they enable.

Um, traditionally blue prism gave you, uh, a studio that was more shape based where you are using UML shapes to define or describe your process. And then there you are, whereas automation anywhere traditionally used, uh, essentially lines or descriptors. So I say, Hey, I want to open this file. And your studio would just show a line that said open file.

You know, um, although they do now, all of them have a shape based way to define your process. Go here, here. You know, here's a circle which represents this. Uh, let's do that. Um, or a way for you to kind of more, um, creatively define it in a, like a text-based way. When we talk about Java script, um, or anything like that, they provide predefined actions, all of them saying, I want to open a file or execute this that you can do, but all of them as well, at least last time I checked also allow you for a way to say, I want to programmatically run something I want to define.

And since they're all in the browser, it is, uh, you know, Javascript that you're going to be saying, Hey, run this, this JavaScript, run this function. Um, previously, uh, things like automation anywhere would, uh, let you write stuff in, in .NET essentially to do that capability. But again, now everything's in the browser.

So yeah, they do, They do provide for a capability to introduce more low level capability to your automation. That can get dangerous. Uh, it can be powerful and it can be stabilizing, but it can be a very slippery slope where you have an RPA solution bot that does the thing. But really all it does is it starts up and then executes code that you built.

[00:57:39] Alex: Like what, what was the, the point in the first place?

[00:57:43] Jeremy: Yeah. And I suppose at that point, then anybody who knows how to use the RPA tool, but isn't familiar with that code you wrote, they're just, they can't maintain it

[00:57:54] Alex: you have business continuity and this goes back to our, it has to be replicable or close as close to the human process, as you can make. Because that's going to be the easiest to inherit and support. That's one of the great things about it. Whereas if you're a low level programmer, a dev who says, I can easily do this with a couple of lines of, you know, dot net or, you know, TypeScript or whatever.

And so the bot just starts up in executes. Well, unless someone that is just as proficient comes along later and says, this is why it's breaking you now have an unsupportable business, solution. that's bad Juju.

[00:58:38] Jeremy: you have the software engineers who they want to write code. then you have the people who are either in business or in IT that go, I don't want to look at your code.

I don't want to have to maintain it. Yeah. So it's like you almost, if you're a software engineer coming in, you almost have to fight that urge to, to write anything yourself and figure out, okay, what can I do with the tool set and only go to code if I can't do it any other way.

[00:59:07] Alex: That's correct. And that's the, it takes discipline. more often than not, not as fun as writing the code where you're like, I can do this. And this is really where the wheels come off is. You went to the business that is that I have this process, very simple. I need to do this and you say, cool, I can do that.

And then you're sitting there writing code and you're like, but you know what? I know what they really want to do. And I can write that now. And so you've changed the process and while it is, and nine times out of 10, the business will be like, oh, that's actually what we wanted. The human process was just as close as we could get nothing else, but you're right.

That's, that's exactly what we needed. Thank you nine times out of 10. They'll love you for that. But now you own their process. Now you're the one that defined it. You have to do the business continuity. You have to document it. And when it falls over, you have to pick it back up and you have to retrain.

And unless you have an organizational capacity to say, okay, I've gone in and changed your process. I didn't automate it. I changed it. Now I have to go in and tell you how I changed it and how you can do it. And so that, unless you have built your robotic operating model and your, your team to afford for that, your developer could be writing checks bigger than they can cash.

Even though this is a better capability.

[01:00:30] Jeremy: you, you sort of touched on this before, and I think this is probably the, the last topic we'll cover, but you've been saying how the end goal should be to not have to use the RPAs anymore

And I wonder if you have any advice for how to approach that process and, and what are some of the mistakes you've seen people make

[01:00:54] Alex: Mm Hmm. I mean the biggest mistake I've seen organizations make, I think is throwing the RPA solution out there, building bots, and they're great bots, and they are creating that value. They're enabling you to save money and also, enabling your employees to go on and do better, more gratifying work. but then they say, that's, it that's as far as we're going to think, instead of taking those savings and saying, this is for replacing this pain point that we had to get a bot in the first place to do so.

That's a huge common mistake. Absolutely understandable if I'm a CEO or even, you know, the person in charge of, you know, um, enterprise transformation. Um, it's very easy for me to say, ha victory, here's our money, here's our savings. I justified what we've done. Go have fun. Um, and instead of saying, we need to squirrel this money away and give it to the people that are going to change the system. So that, that's definitely one of the biggest things.

The problem with that is that's not realized until years later when they're like, oh, we're still supporting these bots. So it is upfront having a turnoff strategy. When can we turn this bot off? What is that going to look like? Does it have a roadmap that will eventually do that?

And that I think is the best way. And that will define what kind of processes you do indeed build bots for is you go to it and say, listen, we've got a lot of these user processes, human processes that are doing this stuff. Is there anything on your roadmap that is going to replace that and they say, oh yeah you know, in three years we're actually going to be standing up our new thing.

We're going to be converting. And part of our, uh, analysis of the solution that we will eventually stand up will be, does it do these things? And so yes, in three years, you're good. And you say, cool, those are the processes I'm going to automate and we can shut those off.

That's your point of entry for these things not doing that leads to bots running and doing things even after there is a enterprise solution for that. And more often than not, I would say greater than five times out of 10, when we are evaluating a process to build a bot for easily five times out of 10, we say, whoa, no, actually there's, you don't even need to do this.

Our enterprise application can do this. you just need retraining, because your process is just old and no one knew you were doing this. And so they didn't come in and tell you, Hey, you need to use this.

So that's really a lot of times what, what the issue is. And then after that, we go in and say, Okay.

no, there's, there's no solution for this. This is definitely a bot needs to do this. Let's make sure number one, that there isn't a solution on the horizon six months to a year, because otherwise we're just going to waste time, but let's make sure there is, or at least IT, or the people in charge are aware that this is something that needs to be replaced bot or no bot.

And so let's have an exit strategy. Let's have a turn-off strategy.

When you have applications that are relatively modern, like you have a JIRA, a ServiceNow, you know, they must have some sort of API and it may just be that nobody has come in and told them, you just need to plug these applications together.

[01:04:27] Alex: And so kind of what you're hitting on and surfacing is the future of RPA. Whereas everything we're talking about is using a bot to essentially bridge a gap, moving data from here to there that can't be done, programmatically. Accessing something from here to there that can't be done programmatically.

So we use a bot to do it. That's only going to exist for so long. Legacy can only be legacy for so long, although you can conceivably because we had that big COBOL thing, um, maybe longer than we we'd all like, but eventually these things will be. upgraded. and so either the RPA market will get smaller because there's less legacy out there.

And so RPA as a tool and a solution will become much more targeted towards specific systems or we expand what RPA is and what it can afford for. And so that I think is more likely the case. And that's the future where bots or automations aren't necessary interpreting the COM and the DOM and saying, okay, click here do that.

But rather you're able to quickly build bots that utilize APIs that are built in and friendly. And so what we're talking about there is things like Appian or MuleSoft, which are these kind of API integrators are eventually going to be classified as RPA. They're going to be within this realm.

And I think, where, where you're seeing that at least surfaced or moving towards is really what Microsoft's offering in that, where they, uh, they have something called power automate, which essentially is it just a very user-friendly way to access API. that they built or other people have built.

So I want to go and I need to get information to service now, service now has an API. Yeah. Your, IT can go in and build you a nice little app that does a little restful call to it, or a rest API call to it gets information back, or you can go in and, you know, use Microsoft power automate and say, okay, I want to access service now.

And it says, cool. These are the things you can do. And I say, okay, I just want to put information in this ticket and we're not talking about get or patch or put, uh, or anything like that. We're just saying, ah, that's what it's going to do. And that's kind of what Microsoft is, is offering. I think that is the new state of RPA is being able to interface in a user-friendly way with APIs. Cause everything's in the browser to the point. where, you know, Microsoft's enabling add ins for Excel to be written in JavaScript, which is just the new frontier. Um, so that's, that's kind of going to be the future state of this. I believe.

[01:07:28] Jeremy: so, so moving from RPAs being this thing, that's gonna click through website, click through, um, a desktop application instead it's maybe more of this high, higher level tool where the user will still get this, I forget the term you used, but this tool to build a workflow, right. A studio. Okay. Um, and instead of saying, oh, I want this to click this button or fill in this form.

It'll be, um, I want to get this information from service now. And I want to send a message using that information to slack or to Twilio, or, um, you're basically, talking directly to these different services and just telling it what you want and where it should go.

[01:08:14] Alex: That's correct. So, as you said, everything's going to have an API, right? Seemingly everything has an API. And so instead of us, our RPA bots or solutions being UI focused, they're going to be API focused, um, where it doesn't have to use the user interface. It's going to use the other service. And again, the cool thing about APIs in that way is that it's not, directly connecting to your data source.

It's the same as your UI for a user. It sits on top of it. It gets the request and it correctly interprets that. And does it the same thing with your UI where I say I click here and you know, wherever it says. okay. yeah. You're allowed to do that. Go ahead. So that's kind of that the benefit to that.

Um, but to your point, the, the user experience for whether you're using a UI or API to build up RPA bot, it's going to be the same experience for the user. And then at this point, what we're talking about, well, where's the value offering or what is the value proposition of RPA and that's orchestration and monitoring and data essentially.

we'll take care of hosting these for you. we'll take care of where they're going to run, uh, giving you a dashboard, things like that.

[01:09:37] Alex: That's a hundred percent correct. It's it's providing a view into that thing and letting the business say, I want to no code this. And I want to be able to just go in and understand and say, oh, I do want to do that. I'm going to put these things together and it's going to automate this business process that I hate, but is vital, and I'm going to save it, the RPA software enables you to say, oh, I saw they did that. And I see it's running and everything's okay in the world and I want to turn it on or off. And so it's that seamless kind of capability that that's what that will provide.

And I think that's really where it isn't, but really where it's going. Uh, it'll be interesting to see when the RPA providers switch to that kind of language because currently and traditionally they've gone to business and said, we can build you bots or no, no, your, your users can build bots and that's the value proposition they can go in.

And instead of writing an Excel where you had one very, very advanced user. Building macros into Excel with VBA and they're unknown to the, the IT or anybody else instead, you know, build a bot for it. And so that's their business proposition today.

Instead, it's going to shift, and I'd be interested to see when it shifts where they say, listen, we can provide you a view into those solutions and you can orchestrate them in, oh, here's the studio that enables people to build them.

But really what you want to do is give that to your IT and just say, Hey, we're going to go over here and address business needs and build them. But don't worry. You'll be able to monitor them and at least say, yeah okay. this is, this is going.

[01:11:16] Jeremy: Yeah. And that's a, a shift. It sounds like where RPA is currently, you were talking about how, when you're configuring them to click on websites and GUIs, you really do still need someone with the software expertise to know what's going on. but maybe when you move over to communicating with API, Um, maybe that won't be as important maybe, somebody who just knows the business process really can just use that studio and get what they need.

[01:11:48] Alex: that's correct. Right. Cause the API only enables you to do what it defined right. So service now, which does have a robust API, it says you can do these things the same as a user can only click a button that's there that you've built and said they can click. And so that is you can't go off the reservation as easy with that stuff, really what's going to become prime or important is as no longer do I actually have an Oracle server physically in my location with a database.

Instead I'm using Oracle's cloud capability, which exists on their own thing. That's where I'm getting data from. What becomes important about being able to monitor these is not necessarily like, oh, is it falling over? Is it breaking? It's saying, what information are you sending or getting from these things that are not within our walled garden.

And that's really where, it or the P InfoSec is, is going to be maybe the main orchestrator owner of RPA, because they're, they're going to be the ones to say you can't, you can't get that. You're not allowed to get that information. It's not necessarily that you can't do it. Um, and you can't do it in a dangerous way, but it's rather, I don't want you transporting that information or bringing it in.

So that's, that's really, what's the what's going to change.

[01:13:13] Jeremy: I think that's a good place to wrap it up, but, uh, is there anything we missed or anything else you want to plug before we go?

[01:13:21] Alex: No. Uh, I think this was uh pretty comprehensive and I really enjoyed it.

Alex thanks for coming on the show

[01:13:28] Alex: No, thank you for having me. It's been, it's been a joy.

[01:13:31] Jeremy: This has been Jeremy Jung for software engineering radio. Thanks for listening.

Deployment from Scratch with Josef Strzibny

Thu, 16 Sep 2021 04:32:57 -0000

Josef Strzibny is the author of Deployment from Scratch and a current Fedora contributor. He previously worked on the Developer Experience team at Red Hat.

This episode originally aired on Software Engineering Radio.

Links:

Transcript:

You can help edit this transcript on GitHub.

[00:00:00] Jeremy: Today, I'm talking to Josef Strzibny.

He's the author of the book deployment from scratch. A fedora contributor. And he previously worked on the developer experience team at red hat.

Josef welcome to software engineering radio.

[00:00:13] Josef: Uh, thanks for having me. I'm really happy to be here.

There are a lot of commercial services for hosting applications these days. One that's been around for quite a while is Heroku, but there's also services like render and Netlify. why should a developer learn how to deploy from scratch and why would a developer choose to self host an application

[00:00:37] Josef: I think that as a web engineers and backend engineers, we should know a little bit more how we run our own applications, that we write. but there is also a business case, right?

For a lot of people, this could be, uh, saving money on hosting, especially with managed databases that can go, high in price very quickly. and for people like me, that apart from daily job have also some side project, some little project they want to, start and maybe turn into a successful startup, you know but it's at the beginning, so they don't want to spend too much money on it, you know?

And, I can deploy and, serve my little projects from $5 virtual private servers in the cloud. So I think that's another reason to look into it. And business wise, if you are, let's say a bigger team and you have the money, of course you can afford all these services. But then what happened to me when I was leading a startup, we were at somewhere (?) and people are coming and asking us, we need to self host their application.

We don't trust the cloud. And then if you want to prepare this environment for them to host your application, then you also need to know how to do it. Right? I understand completely get the point of not knowing it because already backend development can be huge.

You know, you can learn so many different databases, languages, whatever, and learning also operations and servers. It can be overwhelming. I want to say you don't have to do it all at once. Just, you know, learn a little bit, uh, and you can improve as you go. Uh, you will not learn everything in a day.

[00:02:28] Jeremy: So it sounds like the very first reason might be to just have a better understanding of, of how your applications are, are running. Because even if you are using a service, ultimately that is going to be running on a bare machine somewhere or run on a virtual machine somewhere. So it could be helpful maybe for just troubleshooting or a better understanding how your application works.

And then there's what you were talking about with some companies want to self-host and, just the cost aspect.

[00:03:03] Josef: Yeah. for me, really, the primary reason would be to understand it because, you know, when I was starting programming, oh, well, first of there was PHP and I, I used some shared hosting thing, just some SFTP. Right. And they would host it for me. It was fine. Then I switched to Ruby on Rails and at the time, uh, people were struggling with deploying it and I was asking myself, so, okay, so you ran rails s like for a server, right. It starts in development, but can you just do that on the server for, for your production? You know, can you just rails server and is that it, or is there more to it? Or when people were talking about, uh, Linux hardening, I was like, okay, but you know, your Linx distribution have some good defaults, right.

[00:03:52] Jeremy: So why do you need some further hardening? What does it mean? What to change. So for me, I really wanted to know, uh, the reason I wrote this book is that I wanted to like double down on my understanding that I got it right. Yeah, I can definitely relate in the sense that I've also used Ruby and Ruby on rails as well. And there's this, this huge gap between just learning how to run it in a development environment on your computer versus deploying it onto a server and it's pretty overwhelming. So I think it's, it's really great that, that you're putting together a book that, that really goes into a lot of these things that I think that usually aren't talked about when people are just talking about learning a language.

[00:04:39] Josef: you can imagine that a lot of components you can have into this applications, right? You have one database, maybe you have more databases. Maybe you have a redis key-value store. Uh, then you might have load balancers and all that jazz. And I just want to say that there's one thing I also say in the book, like try to keep it simple. If you can just deploy one server, if you don't need to fulfill some SLE (SLA) uh, uptime, just do the simplest thing first, because you will really understand it. And when there was an error you will know how to fix it because when you make things complex for you, then it will be kind of lost, very quickly. So I try to really make things as simple as possible to stay on top of them.

[00:05:25] Jeremy: I think one of the first decisions you have to make, when you're going to self host an application is you have to decide which distribution you're going to use. And there's things like red hat and Ubuntu, and Debian and all these different distributions. And I'm wondering for somebody who just wants to deploy their application, whether that's rails, Django, or anything else, what are the key differences between them and, and how should they choose a distribution?

[00:05:55] Josef: if you already know one particular distribution, there's no need to constantly be on the hunt for a more shiny thing, you know, uh, it's more important that you know it well and, uh, you are not lost. Uh, that said there are differences, you know, and there could be a long list from goals and philosophy to who makes it whether community or company, if it's showing distribution or not, lack of support, especially for security updates, uh, the kind of init systems, uh, that is used, the kind of c library that is used packaging format, package manager, and for what I think most people will care about number of packages and the quality or version, right?

Because essentially the distribution is distribution of software. So you care about the software. If you are putting your own stuff, on top of it. you maybe don't care. You just care about it being a Linux distribution and that's it. That's fine. But if you are using more things from the distribution, you might star, start caring a little bit more.

You know, other thing is maybe a support for some mandatory access control or in the, you know, world of Docker, maybe the most minimal image you can get established because you will be building a lot of, a lot of times the, the Docker image from the Docker file. And I would say that two main family of systems that people probably know, uh, ones based on Fedora and those based on Debian, right from Fedora, you have, uh, Red Hat Enterprise Linux, CentOS, uh, Rocky Linux.

And on the Debian side you have Ubuntu which is maybe the most popular cloud distribution right now. And, uh, of course as a Fedora packager I'm kind of, uh, in the fedora world. Right. But if I can, if I can mention two things that I think makes sense or like our advantage to fedora based systems. And I would say one is modular packages because it's traditional systems for a long time or for only one version of particular component like let's say postgresql, uh, or Ruby, uh, for one big version.

So that means, uh, either it worked for you or it didn't, you know, with databases, maybe you could make it work. With ruby and python versions. usually you start looking at some version manager to compile their own version because the version was old or simply not the same, the one your application uses and with modular packages, this changed and now in fedora and RHEL and all this, We now have several options to install. There are like four different versions of postgresql for instance, you know, four different versions of redis, but also different versions of Ruby, python, of course still, you don't get all of the versions you want. So for some people, it still might not work, but I think it's a big step forward because even when I was working at Red Hat, we were working on a product called software collections.

This was kind of trying to solve this thing for enterprise customers, but I don't think it was particularly a good solution. So I'm quite happy about this modularity effort, you know, and I think the modular packages, I look into them recently are, are very better, but I will say one thing don't expect to use them in a way you use your regular version manager for development.

So, if you want to be switching between versions of different projects, that's not the use case for them, at least as I understand it, not for now, you know, but for server that's fine. And the second, second good advantage of Fedora based system, I think is good initial SELinux profile settings, you know, SE Linux is security enhanced Linux.

What it really is, is a mandatory access control. So, on usual distribution, you have a discrete permissions that you set that user set themselves on their directories and files, you know, but this mandatory access control means that it's kind of a profile that is there beforehand, the administrators prepares. And, it's kind of orthogonal to those other security, uh, boundaries you have there. So that will help you to protect your most vulnerable, uh, processes because especially with SELinux, there are several modes. So there is, uh, MLS (?) mode for like that maybe an army would use, you know, but for what we use, what's like the default, uh, it's uh, something called targeted policy.

And that means you are targeting the vulnerable processes. So that means your services that we are exposing to external world, like whether it's SSH, postgresql, nginx, all those things. So you have a special profile for them. And if someone, some, attacker takes over, of your one component, one process, they still cannot do much more than what the component was, uh, kind of prepared to do.

I think it's really good that you have this high-quality settings already made because other distributions, they might actually be able to run with SELinux. But they don't necessarily provide you any starting points. You will have to do all your policies yourself. And SELinux is actually a quite complex system, you know, it's difficult.

It's even difficult to use it as a user. Kind of, if you see some tutorials for CentOS, uh, you will see a lot of people mentioned SELinux maybe even turning it off, there's this struggle, you know, and that's why I also, use and write like one big chapter on SELinux to get people more familiar and less scared about using it and running with it.

[00:12:00] Jeremy: So SELinux is, it sounds like it's basically something where you have these different profiles for different types of applications. You mentioned SSH, for example, um, maybe there could be one for nginx or, or one for Postgres. And they're basically these collections of permissions that a process should be able to have access to whether that's, network ports or, file system permissions, things like that.

And they're, they're kind of all pre-packaged for you. So you're saying that if you are using a fedora based distribution, you could, you could say that, I want SSH to be allowed. So I'm going to turn on this profile, or I want nginx to be used on this system. So I'm going to turn on this profile and those permissions are just going to be applied to the process that that needs it is that is that correct?

[00:12:54] Josef: Well, actually in the base system, there will be already a set of base settings that are loaded, you know, and you can make your own, uh, policy models that you can load. but essentially it works in a way that, uh, what's not really permitted and allowed is disallowed.

that's why it can be a pain in the ass. And as you said, you are completely correct. You can imagine it as um nginx as a reverse proxy, communicating with Puma application server via Unix socket, right? And now nginx will need to have access to that socket to be even being able to write to a Unix socket and so on.

So things like that. Uh, but luckily you don't have to know all these things, because it's really difficult, especially if you're starting up. Uh, so there are set of tools and utilities that will help you to use SELinux in a very convenient way. So what you, what you do, what I will suggest you to do is to run SELinux in a permissive mode, which means that, uh, it logs any kind of violations that application does against your base system policies, right?

So you will have them in the log, but everything will work. Your application will work. So we don't have to worry about it. And after some time running your application, you've ran these utilities to analyze these logs and these violations, and they can even generate a profile for you. So you will know, okay, this is the profile I need.

This is the access to things I need to add. once after you do that, if, if there will be some problems with your process, if, if some article will try to do something else, they will be denied.

That action is simply not happening. Yeah. But because of the utilities, you can kind of almost automate how, how you make a profile and that way is much, much easier.

Yeah.

[00:14:54] Jeremy: So, basically the, the operating system, it comes with all these defaults of things that you're allowed to do and not allowed to do, you turn on this permissive flag and it logs all the things that it would have blocked if you, were enforcing SELinux. And then you can basically go in and add the things that are, that are missing.

[00:15:14] Josef: Yes exactly right.

[00:15:16] Jeremy: the, next thing I'd like to go into is, one of the things you talk about in the book is about how your services, your, your application, how it runs, uh, as, as daemons. And I wonder if you could define what a daemon is?

[00:15:33] Josef: Uh, you can think about them as a, as a background process, you know, something that continuously runs In the background. Even if the virtual machine goes down and you reboot, you just want them again to be restarted and just run at all times the system is running.

[00:15:52] Jeremy: And for things like an application you write or for a database, should the application itself know how to run itself in the background or is that the responsibility of some operating system level process manager?

[00:16:08] Josef: uh, every Linux operating system has actually, uh, so-called init system, it's actually the second process after the Linux kernel that started on their system, it has a process ID of one. And it's essentially the parent of all your processes because on Linux, you have always parents and children. Because you use forking to make new, make new processes. And so this is your system process manager, but obviously systemd if it's your system process manager, you already trusted with all the systems services, you can also trust them with your application, right? I mean, who else would you trust even if you choose some other purchase manager, because there are many, essentially you would have to wrap up that process manager being a systemd service, because otherwise there is, you wouldn't have this connection of systemd being a supreme supervisor of your application, right?

When, uh, one of your services struggle, uh, you want it to be restarted and continue. So that's what a systemd could do for you. If you, you kind of design everything as a systemd service, for base packages like base postgresql they've already come with a systemd services, very easy to use. You just simply start it and it's running, you know, and then for your application, uh, you would write a systemd service, which is a little file.

There are some directives it's kind of a very simple and straightforward, uh, because before, before systemd people were using the services with bash and it was kind of error prone, but now with systemd it's quite simple. They're just a set of directives, uh, that you learn. you tell systemd, you know, under what user you should run, uh, what working directory you want it to be running with.

Uh, is there a environment file? Is there a pidfile? And then, uh, A few other things. The most important being a directive called ExecStart, which tells systemd what process to start, it will start a process and it will simply oversee oversee it and will look at errors and so on.

[00:18:32] Jeremy: So in the past, I know there used to be applications that were written where the application itself would background itself. And basically that would allow you to run it in the background without something like a systemd. And so it sounds like now, what you should do instead is have your application be built to just run in the foreground.

and your process manager, like systemd can be configured to, um, handle restarting it, which user is running it. environment variables, all sorts of different things that in the past, you might've had to write in your own bash script or write into the application itself.

[00:19:14] Josef: And there's also some. other niceties about systemd because for example, you can, you can define how reloading should work. So for instance, you've just changed some configuration and you've want to achieve some kind of zero downtime, ah, change, zero downtime deploy, you know, uh, you can tell systemd how this could be achieved with your process and if it cannot be achieved, uh, because for instance, uh, Puma application server.

It can fork processes, and it can actually, it can restart those processes in a way that it will be zero downtime, but when you want to change to evolve (?) Puma process. So what do you do, right? And uh systemd have this nice uh thing called, uh, socket activation. And with system socket activation, you can make another unit.

Uh, it's not a service unit. It's a socket unit there are many kinds of units in systemd. And, uh, you will basically make a socket unit that would listen to those connections and then pass them to the application. So while application is just starting and then it could be a completely normal restart, which means stopping, starting, uh, then it will keep the connections open, keep the sockets open and then pass them. when the application is ready to, to process them.

[00:20:42] Jeremy: So it sounds like if, and the socket you're referring to these would be TCP sockets, for example, of someone trying to access a website.

[00:20:53] Josef: Yes, but actually worked with Unix. Uh, socket as well. Okay.

[00:20:58] Jeremy: so in, in that example, Um, let's say a user is trying to go to a website and your service is currently down. You can actually configure systemd to, let the user connect and, and wait for another application to come back up and then hand that connection off to the application once it's, once it's back up.

[00:21:20] Josef: yes, exactly. That, yeah.

[00:21:23] Jeremy: you're basically able to remove some of the complexity out of the applications themselves for some of these special cases and, and offload those to, to systemd.

[00:21:34] Josef: because yeah, otherwise you would actually need a second server, right? Uh, you will have to, uh, start second server, move traffic there and upgrade or update your first server. And exchange them back and with systemd socket activation you can avoid doing that and still have this final effect of zero downtime deployment.

[00:21:58] Jeremy: So the, this, this introduction of systemd as the process manager, I think there's, this happened a few years ago where a lot of Linux distributions moved to using systemd and there, there was some, I suppose, controversy around that. And I'm kind of wondering, um, if you have any perspective on, on why there's some people who, really didn't want that to happen, know, why, why that's something people should worry about or, or, or not.

[00:22:30] Josef: Yeah. there were, I think there were few things, One one was for instance, the system logging that suddenly became a binary format and you need a special utility to, to read it. You know, I mean, it's more efficient, it's in a way better, but it's not plain text rich, all administrators prefer or are used to. So I understand the concern, you know, but it's kind of like, it's fine.

You know, at least to me, it it's fine. And the second, the second thing that people consistently force some kind of system creep because uh systemd is trying to do more and more every year. So, some people say it's not the Unix way, uh systemd should be very minimal in its system and not do anything else.

It's it's partially true, but at the same time, the things that systemd went into, you know, I think they are essentially easier and nice to use. And this is the system, the services I can say. I certainly prefer how it's done now,

[00:23:39] Jeremy: Yeah. So it sounds like we've been talking about systemd as being this process manager, when the operating system first boots systemd starts, and then it's responsible for starting, your applications or other applications running on the same machine. Uh, but then it's also doing all sorts of other things.

Like you talked about that, that socket activation use case, there's logging. I think there's also, scheduled jobs. There's like all sorts of other things that are part of systemd and that's where some people, disagree on whether it should be one application that's handling all these things.

[00:24:20] Josef: Yeah. Yeah. Uh, you're right with the scheduling job, like replacing Cron, you have, now two ways how to do it. But, you can still pretty much choose, what you use, I mean, I still use Cron, so I don't see a trouble there. we'll see. We'll see how it goes.

[00:24:40] Jeremy: One of the things I remember I struggled with a little bit when I was learning to deploy applications is when you're working locally on your development machine, um, you have to install a language runtime in a lot of cases, whether that's for Ruby or Python, uh, Java, anything like that. And when someone is installing on their own machine, they often use something like a, a version manager, like for example, for Ruby there's rbenv and, for node, for example, there's, there's NVM, there's all sorts of, ways of installing language, run times and managing the versions.

How should someone set up their language runtime on a server? Like, would they use the same tools they use on their development machine or is it something different.

[00:25:32] Josef: Yeah. So there are several ways you can do, as I mentioned before, with the modular packages, if you find the version there. I would actually recommend try to do it with the model package because, uh, the thing is it's so easy to install, you know, and it's kind of instant. it takes no time on your server.

It's you just install it. It's a regular package. same is true when building a Docker, uh, docker image, because again, it will be really fast. So if you can use it, I would just use that because it's like kind of convenient, but a lot of people will use some kind of version manager, you know, technically speaking, they can only use the installer part.

Like for instance, chruby with ruby-install to install new versions. Right. but then you would have to reference these full paths to your Ruby and very tedious. So what I personally do, uh, I just really set it up as if I am on a developer workstation, because for me, the mental model of that is very simple.

I use the same thing, you know, and this is true. For instance, when then you are referencing what to start in this ExecStart directive and systedD you know, because you have several choices. For instance, if you need to start Puma, you could be, you could be referencing the address that is like in your user home, .gem, Ruby version number bin Puma, you know, or you can use this version manager, they might have something like chruby-exec, uh, to run with their I (?) version of Ruby, and then you pass it, the actual Puma Puma part, and it will start for you, but what you can also do.

And I think it's kind of beautiful. You can do it is that you can just start bash, uh, with a login shell and then you just give it the bundle exec Puma command that you would use normally after logging. Because if you install it, everything, normally, you know, you have something.

you know, bashprofile that will load that environment that will put the right version of Ruby and suddenly it works.

And I find it very nice. Because even when you are later logging in to your, your, uh, box, you log in as that user as that application user, and suddenly you have all the environment, then it just can start things as you are used to, you know, no problem there.

[00:28:02] Jeremy: yeah, something I've run in into the past is when I would install a language runtime and like you were kind of describing, I would have to type in the, the full path to, to get to the Ruby runtime or the Python runtime. And it sounds like what you're saying is, Just install it like you would on your development machine.

And then in the systemd configuration file, you actually log into a bash shell and, and run your application from the bash shell. So it has access to the, all the same things you would have in an interactive, login environment. Is that, is that right?

[00:28:40] Josef: yeah, yeah. That's exactly right. So it will be basically the same thing. And it's kind of easy to reason about it, you know, like you can start with that might be able to change it later to something else, but, it's a nice way of how to do it.

[00:28:54] Jeremy: So you mentioned having a user to run your application. And so I'm wondering how you decide what Linux users should run your applications. Are you creating a separate user for each application you run? Like, how are you making those decisions?

[00:29:16] Josef: yes, I am actually making a new user for, for my application. Well, at least for the part of the application, that is the application server and workers, you know, so nginx um, might have own user, postgresql might have his own user, you know, I'm not like trying to consolidate that into one user, but, uh, in terms of rails application, like whatever I run Puma or whenever I run uh sidekiq, that will be part of the one user, you know, application user.

Uh, and I will appropriately set the right access to the directories. Uh, so it's isolated from everything else,

[00:30:00] Jeremy: Something that I've seen also when you are installing Ruby or you're installing some other language runtime, you have. The libraries, like in the case of Ruby there's there's gems. and when you're on your development machine and you install these, these gems, these packages, they, they go into the user's home directory.

And so you're able to install and use them without having let's say, um, sudo or root access. is that something that you carry over to your, your deployments as well, or, or do you store your, your libraries and your gems in some place that's accessible outside of that user? I'm just wondering how you approach it.

[00:30:49] Josef: I would actually keep it next to next to my application, this kind of touches maybe the question or where to put your application files on the system. so, uh, there is something called FHS, file system hierarchy standard, you know, that, uh, Linux distributions use, they, of course use it with some little modifications here and there.

And, uh, this standard is basically followed by packagers and enforced in package repositories. Uh, but other than that, it's kind of random, you know, it could be a different path and, uh, it says where certain files should go. So you have /home we have /usr/bin for executables. /var for logs and so on and so on.

And now when you want to put your, your application file somewhere, you are thinking about to put them, right. Uh, you have essentially, I think like three options, for, for one, you can put it to home because it's, as we talked about, I set up a dedicated user for that application. So it could make sense to put it in home.

Why I don't like putting it at home is because there are certain labeling in SELinux that kind of, makes your life more difficult. it's not meant to be there, uh, essentially on some other system. Uh, without SELinux, I think it works quite fine. I also did before, you know, it's not like you cannot do it.

You can, uh, then you have, the, kind of your web server default location. You know, like /usr/share/nginx/html, or /var/www, and these systems will be prepared for you with all these SELinux labeling. So when you put files there, uh, things will mostly work, but, and I also saw a lot of people do that because this particular reason, what I don't like about it is that if nginx is just my reverse proxy, you know, uh, it's not that I am serving the files from there.

So I don't like the location for this reason. If it will be just static website, absolutely put it there that's the best location. then you can put it to some arbitrary location, some new one, that's not conflicting with anything else. You know, if you want to follow the a file system hierarchy standard, you put it to /srv, you know, and then maybe slash the name of the application or your domain name, hostname you can choose, what you like.

Uh, so that's what I do now. I simply do it from scratch to this location. And, uh, as part of the SELinux, I simply make a model, make a, make a profile, uh, an hour, all this paths to work. And So to answer your question where I would put this, uh, gems would actually go to this, to this directory, it will be like /apps/gems, for instance.

there's a few different places people could put their application, they could put it in the user's home folder, but you were saying because of the built-in SELinux rules SELinux is going to basically fight you on that and prevent you from doing a lot of things in that folder.

[00:34:22] Jeremy: what you've chosen to do is to, to create your own folder, that I guess you described it as being somewhat arbitrary, just being a folder that you consistently are going to use in all your projects. And then you're going to configure SELinux to allow you to run, uh, whatever you want to run from this, this custom folder that you've decided.

[00:34:44] Josef: Yeah, you can say that you do almost the same amount of work for home or some other location I simply find it cleaner to do it this way and in a way. I even fulfilled the FHS, uh, suggestion, to put it to /srv but, uh, yeah, it's completely arbitrary. You can choose anything else. Uh, sysadmins choose www or whatever they like, and it's fine.

It'll work. There's there's no problem. There. And, uh, and for the gems, actually, they could be in home, you know, but I just instruct bundler to put it to that location next to my application.

[00:35:27] Jeremy: Okay. Rather than, than having a common folder for multiple applications to pull your libraries or your gems from, uh, you have it installed in the same place as the application. And that just keeps all your dependencies in the same place.

[00:35:44] Josef: Yep,

[00:35:45] Jeremy: and the example you're giving, you're, you're putting everything in /srv/ and then maybe the name of your application. Is that right?

[00:35:55] Josef: Yeah.

[00:35:55] Jeremy: Ok. Yeah. Cause I've, I've noticed that, Just looking at different systems. I've seen people install things into /opt. installed into /srv and it can just be kind of, tricky as, as somebody who's starting out to know, where am I supposed to put this stuff?

So, so basically it sounds like just, just pick a place and, um, at least if it's in slash srv then sysadmins who are familiar with, the, the standard file system hierarchy will will know to, to look at.

[00:36:27] Josef: yeah. Yeah. opt is also a yeah, common location, as you say, or, you know, if it's actually a packaged web application fedora it can even be in /usr/share, you know? So, uh, it might not be necessarily in locations we talked about before

one of the things you cover in the book is. Setting up a deployment system and you're using, shell scripts in the case of the book. And I was wondering how you decide when shell scripts are sufficient and when you should consider more specialized tools like Ansible or chef puppet, things like.

[00:37:07] Josef: yeah, I chose bash in the book because you get to see things without obstructions. You know, if I would be using, let's say Ansible and suddenly we are writing some YAML files and, uh, you are using a lot of, lot of Python modules to Ansible use and you don't really know what's going on at all times. So you learn to do things with ansible 2.0, let's say, and then new ansible comes out and you have to rely on what you did, you know, and I've got to rewrite the book. Uh, but the thing is that, with just Bash I can show, literally just bash commands, like, okay, you run this and this happens, And, another thing uh why I use it is that you realize how simple something can be.

Like, you can have a typical cluster with sssh, uh, and whatever in maybe 20 bash commands around that, so it's not necessarily that difficult and, uh, it's much easier to actually understand it if it's just those 20, uh, 20 bash comments. Uh, I also think that learning a little bit more about bash is actually quite beneficial because you encounter them in various places.

I mean, RPM spec files, like the packages are built. That's bash, you know, language version managers, uh, like pyenv rbenv that's bash. If you want to tweak it, if you have a bug there, you might look into source code and try to fix it. You know, it will be bash. Then Docker files are essentially bash, you know, their entry points scripts might be bash.

So it's not like you can avoid bash. So maybe learning a little bit. Just a little bit more than, you know, and be a little bit more comfortable. I think it can get you a long way because even I am not some bash programmer, you know, I would never call myself like that. also consider this like, uh, you can have full featured rails application, maybe in 200 lines of bash code up and running somewhere.

You can understand it in a afternoon, so for a small deployment, I think it's quite refreshing to use bash and some people miss out on not just doing the first simple thing possible that they can do, but obviously when you go like more team members, more complex applications or a suite of applications, things get difficult, very fast with bash.

So obviously most people will end up with some higher level too. It can be Ansible. Uh, it can be chef, it might be Kubernetes, you know, so, uh, my philosophy, uh, again, it's just to keep it simple. If I can do something with bash and it's like. 100 lines, I will do this bash because when I come back to it in, after three years, it will work and I can directly see what I have to fix.

You know, if there's a postgresql update at this new location whatever, I, I immediately know what to look and what to change. And, uh, with high-level tooling, you kind of have to stay on top of them, the new versions and, updates. So that's the best is very limited, but, uh, it's kind of refreshing for very small deployment you want to do for your side project.

[00:40:29] Jeremy: Yeah. So it sounds like from a learning perspective, it's beneficial because you can see line by line and it's code you wrote and you know exactly what each thing does. Uh, but also it sounds like when you have a project that's relatively small, maybe there, there aren't a lot of different servers or, the deployment process isn't too complicated.

You actually choose to, to start with bash and then only move to, um, something more complicated like Ansible or, or even Kubernetes. once your project has, has gotten to a certain size.

[00:41:03] Josef: you, you would see it in the book. I even explain a multiple server deployment using bash uh, where you can actually keep your components like kind of separate. So like your database have its own life cycle has its own deploy script and your load balancer the same And even when you have application servers.

Maybe you have more of them. So the nice thing is that when you first write your first script to provision one server configure one server, then you simply, uh, write another Uh, supervising script, that would call this single script just in the loop and you will change the server variable to change the IP address or something.

And suddenly you can deploy tomorrow. Of course, it's very basic and it's, uh, you know, it doesn't have some, any kind of parallelization to it or whatever, but if you have like three application servers, you can do it and you understand it almost immediately. You know, if you are already a software engineer, there's almost nothing to understand and you can just start and keep going.

[00:42:12] Jeremy: And when you're deploying to servers a lot of times, you're dealing with credentials, whether that's private keys, passwords or, keys to third-party APIs. And when you're working with this self hosted environment, working with bash scripts, I was wondering what you use to store your credentials and, and how those are managed.

I use a desktop application called password safe, uh, that can save my passwords and whatever. and you can also put their SSH keys, uh, and so on.

[00:42:49] Josef: And then I simply can do a backup of this keys and of this password to some other secure physical location. But basically I don't use any service, uh, online for that. I mean, there are services for that, especially for teams and in clouds, especially the, big clouds they might have their own services for that, but for me personally, again, I just, I just keep it as simple as I can. It's just on my, my computer, maybe my hard disk. And that's it. It's nowhere else.

[00:43:23] Jeremy: So, so would this be a case of where on your local machine, for example, you might have a file that defines all the environment variables for each server. you don't check that into your source code repository, but when you run your bash scripts, maybe read from that file and, use that in deploying to the server?

[00:43:44] Josef: Yeah, Uh, generally speaking. Yes, but I think with rails, uh, there's a nice, uh, nice option to use, their encrypted credentials. So basically then you can commit all these secrets together with your app and the only thing you need to keep to yourself, it's just like one variable. So it's much more easy to store it and keep it safe because it's just like one thing and everything else you keep inside your repository.

I know for sure there are other programs that we have in the same way that can be used with different stacks that doesn't have this baked in, because rails have have it baked in. But if you are using Django, if you are using Elixir, whatever, uh, then they don't have it. But I know that there are some programs I don't remember the names right now, but, uh, they essentially allow you to do exactly the same thing to just commit it to source control, but in a secure way, because it's, encrypted.

[00:44:47] Jeremy: Yeah, that's an interesting solution because you always hear about people checking in passwords and keys into their source code repository. And then, you know, it gets exposed online somehow, but, but in this case, like you said, it's, it's encrypted and, only your machine has the key. So, that actually allows you to, to use the source code, to store all that.

[00:45:12] Josef: Yeah. I think for teams, you know, for more complex deployments, there are various skills, various tools from HashiCorp vault, you know, to some cloud provider's things, but, uh, you can really start And, keep it very, very simple.

[00:45:27] Jeremy: For logging an application that you're, you're self hosting. There's a lot of different managed services that exist. Um, but I was wondering what you use in a self hosted environment and, whether your applications are logging to standard out, whether they're writing to files themselves, I was wondering how you typically approach that.

[00:45:47] Josef: Yeah. So there are lots of logs you can have, right from system log, your web server log application log, database log, whatever. and you somehow need to stay on top of them because, uh, when you have one server, it's quite fine to just look in, in and look around. But when there are more servers involved, it's kind of a pain and uh so people will start to look in some centralized logging system.

I think when you are more mature, you will look to things like Datadog, right. Or you will build something of your own on elastic stack. That's what we do on the project I'm working on right now. But I kind of think that there's some upfront costs uh, setting it all up, you know, and in terms of some looking at elastic stack we are essentially building your logging application.

Even you can say, you know, there's a lot of work I also want to say that you don't look into your logs all that often, especially if you set up proper error and performance monitoring, which is what I do with my project is one of the first thing I do.

So those are services like Rollbar and skylight, and there are some that you can self host so if people uh, want to self host them, they can. But I find it kind of easier to, even though I'm self hosting my application to just rely on this hosted solution, uh, like rollbar, skylight, appsignal, you know, and I have to say, especially I started to like appsignal recently because they kind of bundle everything together.

When you have trouble with your self hosting, the last thing you want to find yourself in a situation when your self hosted logs and sources, error reporting also went down. It doesn't work, you know, so although I like self-hosting my, my application.

[00:47:44] Josef: I kind of like to offload this responsibility to some hosted hosted providers.

[00:47:50] Jeremy: Yeah. So I think that in and of itself is a interesting topic to cover because we've mostly been talking about self hosting, your applications, and you were just saying how logging might be something that's actually better to use a managed service. I was wondering if there's other. Services, for example, CDNs or, or other things where it actually makes more sense for you to let somebody else host it rather than your

[00:48:20] Josef: I think that depends. Logging for me. It's obvious. and then I think a lot of, lots of developers kind of fear databases. So there are they rather have some kind of, one click database you know, replication and all that jazz back then so I think a lot of people would go for a managed database, although it may be one of those pricy services it's also likes one that actually gives you a peace of mind, you know? maybe I would just like point out that even though you get all these automatic backups and so on, maybe you should still try to make your own backup, just for sure. You know, even someone promised something, uh, your data is usually the most valuable thing you have in your application, so you should not lose it.

And some people will go maybe for load balancer, because it's may be easy to start. Like let's say on DigitalOcean, you know, uh, you just click it and it's there. But if you've got opposite direction, if you, for instance, decide to, uh, self host your uh load balancer, it can also give you more, options what to do with that, right?

Because, uh, you can configure it differently. You can even configure it to be a backup server. If all of your application servers go down. Which is maybe could be interesting use case, right? If you mess up and your application servers are not running because you are just messing with, with them.

Suddenly it's okay. Because your load balancers just takes on traffic. Right. And you can do that if it's, if it's your load balancer, the ones hosted are sometimes limited. So I think it comes to also, even if the database is, you know, it's like maybe you use some kind of extension that is simply not available. That kind of makes you, uh, makes you self host something, but if they offer exactly what you want and it's really easy, you know, then maybe you just, you just do it.

And that's why I think I kind of like deploying to uh, virtual machines, uh, in the cloud because you can mix and match all the services do what you want and, uh, you can always change the configurations to fit, to, uh, meet your, meet your needs. And I find that quite, quite nice.

[00:50:39] Jeremy: One of the things you talk about near the end of your book is how you, you start with a single server. You have the database, the application, the web server, everything on the same machine. And I wonder if you could talk a little bit about how far you can, you can take that one server and why people should consider starting with that approach.

Uh, I'm not sure. It depends a lot on your application. For instance, I write applications that are quite simple in nature. I don't have so many SQL calls in one page and so on.

[00:51:13] Josef: But the applications I worked for before, sometimes they are quite heavy and, you know, even, with little traffic, they suddenly need a more beefy server, you know, so it's a lot about application, but there are certainly a lot of good examples out there. For instance. The team, uh, from X-Plane flight simulator simulator, they just deploy to one, one server, you know, the whole backend all those flying players because it's essentially simple and they even use elixir which is based on BEAM VM, which means it's great for concurrency for distributed systems is great for multiple servers, but it's still deployed to one because it's simple. And they use the second only when they do updates to the service and otherwise they can, they go back to one.

ANother one would be maybe Pieter Levels (?) a maker that already has like a $1 million business. And it's, he has all of his projects on one server, you know, because it's enough, you know why you need to make it complicated. You can go and a very profitable service and you might not leave one server. It's not a problem. Another good example, I think is stackoverflow. They have, I think they have some page when they exactly show you what servers they are running. They have multiple servers, but the thing is they have only a few few servers, you know, so those are the examples that goes against maybe the chant of spinning up hundreds of servers, uh, in the cloud, which you can do.

It's easy, easier when you have to do auto scaling, because you can just go little by little, you know, but, uh, I don't see the point of having more servers. To me. It means more work. If I can do it, if one, I do it. But I would mention one thing to pay attention to, when you are on one server, you don't want suddenly your background workers exhaust all the CPU so that your database cannot serve, uh, your queries anymore right? So for that, I recommend looking into control groups or cgroups on Linux. When you create a simple slice, which is where you define how much CPU power, and how much memory can be used for that service. And then you attach it to, to some processes, you know, and when we are talking about systemd services.

They actually have this one directive, uh, where you specify your, uh, C group slice. And then when you have this worker server and maybe it even forks because it runs some utilities, right? For you to process images or what not, uh, then it will be all contained within that C group. So it will not influence the other services you have and you can say, okay, you know, I give worker service only 20% of my CPU power because I don't care if they make it fast or not.

It's not important. Important is that, uh, every visitor still gets its page, you know, and it's, they are working, uh, waiting for some background processes so they will wait and your service is not going down.

[00:54:34] Jeremy: yeah. So it sort of sounds like the difference between if you have a whole bunch of servers, then you have to, Have some way of managing all those servers, whether that's Kubernetes or something else. Whereas, um, an alternative to that is, is having one server or just a few servers, but going a little bit deeper into the capabilities of the operating system, like the C groups you were referring to, where you could, you could specify how much CPU, how much Ram and, and things, for each service on that same machine to use.

So it's kind of. Changing it, I don't know if it's removing work, but it's, it's changing the type of work you do.

[00:55:16] Josef: Yeah, you essentially maybe have to think about it more in a way of this case of splitting the memory or CPU power. Uh, but also it enables you to use, for instance, Unix sockets instead of TCP sockets and they are faster, you know, so in a way it can be also an advantage for you in some cases to actually keep it on one server.

And of course you don't have a network trip so another saving. So to get there, that service will be faster as long as it's running and there's no problem, it will be faster. And for high availability. Yeah. It's a, it's obviously a problem. If you have just one server, but you also have to think about it in more complex way to be high available with all your component components from old balancers to databases, you suddenly have a lot of things.

You know, to take care and that set up might be complex, might be fragile. And maybe you are better off with just one server that you can quickly spin up again. So for instance, there's any problem with your server, you get alert and you simply make a new one, you know, and if you can configure it within 20, 30 minutes, maybe it's not a problem.

Maybe even you are still fulfilling your, uh, service level contract for uptime. So I think if I can go this way, I prefer it simply because it's, it's so much easy to, to think about it. Like that.

[00:56:47] Jeremy: This might be a little difficult to, to answer, but when you, you look at the projects where you've self hosted them, versus the projects where you've gone all in on say AWS, and when you're trying to troubleshoot a problem, do you find that it's easier when you're troubleshooting things on a VM that you set up or do you find it easier to troubleshoot when you're working with something that's connecting a bunch of managed services?

[00:57:20] Josef: Oh, absolutely. I find it much easier to debug anything I set on myself, uh, and especially with one server it's even easier, but simply the fact that you build it yourself means that you know how it works. And at any time you can go and fix your problem. You know, this is what I found a problem with services like digital ocean marketplace.

I don't know how they call this self, uh, hosted apps that you can like one click and have your rails django app up, up and running. I actually used when I, uh, wasn't that skilled with Linux and all those things, I use a, another distribution called. A turnkey Linux. It's the same idea. You know, it's like that they pre prepare the profile for you, and then you can just easily run it as if it's a completely hosted thing like heroku, but actually it's your server and you have to pay attention, but I actually don't like it because.

You didn't set it up. You don't know how it's set up. You don't know if it has some problems, some security issues. And especially the people that come for these services then end up running something and they don't know. I believe they don't know because when I was running it, I didn't know. Right. So they are not even know what they are running.

So if you really don't want to care about it, I think it's completely fine. There's nothing wrong with that. But just go for that render or heroku. And make your life easier, you know,

[00:58:55] Jeremy: Yeah, it sounds like the solutions where it's like a one-click install on your own infrastructure. you get the bad parts of, of both, like you get the bad parts of having this machine that you need to manage, but you didn't set it up. So you're not really sure how to manage it.

you don't have that team at Amazon who, can fix something for you because ultimately it's still your machine. So That could have some issues there.

[00:59:20] Josef: Yeah. Yeah, exactly. I will. I would recommend it or if you really decide to do it, at least really look inside, you know, try to understand it, try to learn it, then it's fine. But just to spin it up and hope for the best, uh, it's not the way to go

[00:59:37] Jeremy: In, in the book, you, you cover a few different things that you use such as Ruby on rails and nginx, Redis, postgres. Um, I'm assuming that the things you would choose for applications you build in self hosts. You want them to have as little maintenance as possible because you're the one who's responsible for all of it.

I'm wondering if there's any other, applications that you consider a part of your default stack that you can depend on. And, that the, the maintenance burden is, is low.

[01:00:12] Josef: Yeah. So, uh, the exactly right. If I can, I would rather minimize the amount of, uh, dependencies I have. So for instance, I would think twice of using, let's say elastic search, even though I used it before. And it's great for what it can do. Uh, if I can avoid it, maybe I will try to avoid it. You know, you can have descent full text search with Postgres today.

So as long as it would work, I would uh, personally avoid it. Uh, I think one relation, uh, database, and let's say redis is kind of necessary, you know, I I've worked a lot with elixir recently, so we don't use redis for instance. So it's kind of nice that you can limit, uh, limit the number of dependencies by just choosing a different stack.

Although then you have to write your application in a little different way. So sometimes even, yeah. In, in such circumstances today, this could be useful. You know, I, I think, it's not difficult to, to run it, so I don't see, I don't see a problem there. I would just say that with the services, like, uh, elastic search, they might not come with a good authentication option.

For instance, I think asked et cetera, offers it, but not in the free version. You know, so I would just like to say that if you are deploying a component like that, be aware of it, that you cannot just keep it completely open to the world, you know? Uh, and, uh, maybe if you don't want to pay for a version that has it, or maybe are using it at the best, it doesn't have it completely.

You could maybe build out just a little bit tiny proxy. That would just do authentication and pass these records back and forth. This is what you could do, you know, but just not forget that, uh, you might run something unauthenticated.

I was wondering if there is any other, applications or capabilities where you would typically hand off to a managed service rather than, than trying to deal with yourself.

[01:02:28] Josef: Oh, sending emails, not because it's hard. Uh, it's actually surprisingly easy to start sending your own emails, but the problem is, uh, the deliverability part, right? Uh, you want your emails to be delivered and I think it's because of the amount of spam everybody's sending.

It's very difficult to get into people's boxes. You know, you simply be flagged, you have some unknown address, uh, and it would just it would just not work. So actually building up some history of some IP address, it could take a while. It could be very annoying and you don't even know how to debug it. You, you cannot really write Google.

Hey, you know, I'm, I'm just like this nice little server so just consider me. You cannot do that. Uh, so I think kind of a trouble. So I would say for email differently, there's another thing that just go with a hosted option. You might still configure, uh, your server to be sending up emails. That could be useful.

For instance, if you want to do some little thing, like scanning your system, a system log and when you see some troublesome. Logging in all that should, it shouldn't happen or something. And maybe you just want an alert on email to be sent to you that something fishy is going on. And so you, you can still set up even your server, not just your main application and might have a nice library for that, you know, to send that email, but you will still need the so-called relay server. to just pass your email. You. Yeah, because building this trust in an email world, that's not something I would do. And I don't think as a, you know, independent in the maker developer, you can really have resources to do something like that. So will be a perfect, perfect example for that. Yeah.

[01:04:22] Jeremy: yeah, I think that's probably a good place to start wrapping up, but is there anything we missed that you think we should have talked about?

[01:04:31] Josef: we kind of covered it. Maybe, maybe we didn't talk much about containers, uh, that a lot of people nowadays, use. uh, maybe I would just like to point out one thing with containers is that you can, again, do just very minimal approach to adopting containers. You know, uh, you don't need to go full on containers at all.

You can just run a little service, maybe your workers in a container. For example, if I want to run something, uh, as part of my application, the ops team, the developers that develop this one component already provide a Docker file. It's very easy way to start, right? Because you just deployed their image and you run it, that's it.

And they don't have to learn what kind of different stack it is, is a Java, is it python, how I would turn it. So maybe you care for your own application, but when you have to just take something that's already made, and it has a Docker image, you just see the nice way to start. And one more thing I would like to mention is that you also don't really need, uh, using services like Docker hub.

You know, most people would use it to host their artifacts that are built images, so they can quickly pull them off and start them on many, many servers and blah, blah. But if you have just one server like me, but you want to use containers. And I think it's to just, you know, push the container directly.

Essentially, it's just an archive.

And, uh, in that archive, there are few folders that represent the layers. That's the layers you build it. And the Docker file and that's it. You can just move it around like that, and you don't need any external services to run your content around this little service.

[01:06:18] Jeremy: Yeah. I think that's a good point because a lot of times when you hear people talking about containers, uh, it's within the context of Kubernetes and you know, that's a whole other thing you have to learn. You have to learn not only, uh, how containers work, but you have to learn how to deploy Kubernetes, how to work with that.

And, uh, I think it's, it's good to remind people that it is possible to, to just choose a few things, run them as containers. Uh, you don't need to. Like you said, even run, everything as containers. You can just try a few things.

[01:06:55] Josef: Yeah, exactly.

[01:06:57] Jeremy: Where can people, uh, check out the book and where can they follow you and see what you're up to.

[01:07:04] Josef: uh, so they can just go to deploymentfromscratch.com. That's like the homepage for the book. And, uh, if they want to follow up, they can find me on twitter. Uh, that would be, uh, slash S T R Z I B N Y J like, uh, J and I try to put updates there, but also some news from, uh, Ruby, Elixir, Linux world. So they can follow along.

[01:07:42] Jeremy: Yeah. I had a chance to, to read through the alpha version of the book and there there's a lot of, really good information in there. I think it's something that I wish I had had when I was first starting out, because there's so much that's not really talked about, like, when you go look online for how to learn Django or how to learn Ruby on Rails or things like that, they teach you how to build the application, how to run it on your, your laptop.

but there's this, this very, large gap between. What you're doing on your laptop and what you need to do to get it running on a server. So I think anybody who's interested in learning more about how to deploy their own application or even how it's done in general. I think they'll find the book really valuable.

[01:08:37] Josef: Okay. Yeah. Thank you. Thank you for saying that. Uh, makes me really happy. And as you say, that's the idea I really packed, like kind of everything. You need in that book. And I just use bash so, it's easier to follow and keep it without any abstractions. And then maybe you will learn some other tools and you will apply the concepts, but you can do whatever you want.

[01:09:02] Jeremy: All right. Well, Josef thank you, so much for talking to me today.

[01:09:05] Josef: Thank you, Jeremy.

Quality Assurance Testing

Sat, 29 May 2021 07:00:28 -0000

Michael Ashburne and Maxwell Huffman are QA Managers at Aspiritech.

This episode originally aired on Software Engineering Radio.

Related Links:

Transcript

You can help edit this transcript on GitHub.

Jeremy: [00:00:00] Today I'm joined by Maxwell, Huffman and Michael Ashburn. They're both QA managers at Aspiritech. I'm going to start with defining quality assurance. Could one of you start by explaining what it is?

Maxwell: [00:00:15] So when I first joined Aspiritech, I was kind of curious about that as well. One of the main things that we do at Aspiritech besides quality assurance is we also, give meaningful employment to individuals on the autism spectrum. I myself am on the autism spectrum and that's what, initially attracted me to the company.
quality assurance in a nutshell is making sure that, products and software is not defective. That it functions the way it was intended to function.

Jeremy: [00:00:47] how would somebody know when they've, when they've met that goal?

Michael: [00:00:50] It all depends on the client's objectives. I guess. quality assurance testing is always about trying to mitigate risk. There's only so much testing that is realistic to do, you know, you could test forever and never release your product and that's not good for business. It's really about, you know, balancing, like how likely is it that the customer is going to encounter defect X, how much time and energy would be required to, to fix it?

Overall company reputation, impact, there's all sorts of different metrics. Uh, and every, every customer is unique really they, they get to set the pace,

Maxwell: [00:01:30] does the product work well? is the user experience frustrating or not? that's always a bar that I look for. One of the main things that we review in the different defects that we find is customer impact.

and how much of this is going to frustrate the customers. And when we're going through that analysis, is this cost effective or not. The client they'll determine it's worth, the cost of the, uh, quality assurance and of the fix of the software to make sure that that customer experience is smooth.

Jeremy: [00:02:03] When you talk to, to software developers, now, a lot of them are familiar with things like they need to test their code right. They have things like unit tests and integration tests that they're running regularly. where does quality assurance fit in with that? Like, is that considered a part of quality assurance is quality assurance something different?

Michael: [00:02:24] we try to partner with our clients, because the goal is the same, right. It's to release a quality product that's as free of defects, as, you know, as possible.

We have multiple clients that will let us know these are clients typically that we've worked with for a long time that have sort of established a rhythm. they'll let us know when they've got a new product in the pipeline and as soon as they have available, Uh, software requirements, documentations specs, user guides, that kind of thing.
They'll provide that to us, to be able to then plan. Okay. You know, what are these new features? Uh, what defects have been repaired since the last build or, you know, it all depends on what the actual product is. And we start preparing tests even before there may be, uh, A version of the software to test, you know, now that's more of a, what they call a waterfall approach where it's kind of a back and forth where, you know, the client preps the software, we test the software.

If there's something amiss, the client makes changes. Then they give us a new build. but we just as well, we work in, uh, iterative design or agile is a popular term, of course, where. We have embedded testers, that are, you know, on a daily basis, interacting with, uh, client developers to address, you know, to, to verify certain parts of the code as it's being developed.

Because of course the problem with waterfall is you find a defect and it, it could be deep in the code or some sort of linchpin aspect of the code. And then there's a lot of work to be done. To try to fix that sort of thing. Whereas, you know, embedded testers can identify a defect or, or even just like a friction point as early as possible.
And so then they don't have to, you know, tear it all down and start over and it's just, Oh, fix that, you know, while they're working on that part, basically.

Jeremy: [00:04:18] so I think there's two things you touched on there. One is. the ability to bring in QA early into the process. And if I understand correctly, what you were sort of describing is. even if you don't have a complete product yet, if you just have an idea of what you want to build, You were saying you start to generate test cases and it almost feels like you would be a part of generating the requirements generating. Like, what are the things that you need to build into your software before uh, the team that's building it necessarily knows themselves did that I sort of get that.

Maxwell: [00:04:55] I've been in projects that we've worked with the product from cradle to grave. a lot of them haven't gotten all the way to a grave yet, but, um, some of them, the amount of support that they're offering. It's reached that milestone in its life cycle, where they're no longer going to, um, address the defects in the same way.

They want to know that they're there. They want to know what exists. But then now there are new products that are being created, right? So we are, um, engaged in embedded testing, which is, which is testing, certain facets of the code actively, and making sure that it's, doing what it needs to do.

And we can make that quick patch on that code and put it out to market. And we're also doing that at earlier stages with, in, in earlier development where before it's an even, fully formed design concepts, we're offering suggestions, and recommending that, you know, this doesn't follow with the design strategy and the concept design.
so that part of embedded testing or unit testing, can be involved at earlier stages as well. For sure.

Michael: [00:06:08] Of course, those, you have to be very, careful you know, we wouldn't necessarily make blanket recommendations to a new client, a lot of the clients that we have, we have been with us for several years. And so. You know, you develop a rhythm, common vocabulary, you know, you know, which generally speaking, which, goals weigh more than other goals and things like that from, from client to client or even coder to coder.

it's only once we've really developed that. shared language that, you know, we would say by the way, you know, such and such as missing from blankety blank, say great example with a bunch of non words in it, but I think you get the picture.

Jeremy: [00:06:48] when you're first starting to work on a project, you don't know a whole lot about it, right? you're trying to, to understand how this product is supposed to work and, what does that process look like? Like what should a company be providing to you? What are the sorts of meetings or conversations you're having that, that sort of thing.

Michael: [00:07:08] we'll have an initial meeting with a handful of people from both sides and just sort of talk about both what we can bring to the project and what their objectives are. and, and, you know, the, the thing that they want us to test, if you will. And, if we reach an agreement that we want to move forward, then the next step would be like a product demo, basically, we would come together and we would start to fold in, you know, leads and some other analysts, you know, people that were, might be a good match for the project say, and we always ask, our clients.

And they're usually pretty accommodating. if we can record the meeting, you know, now everyone's meeting on Google meet and virtually and so forth. And so, uh, that makes it a little easier, but a lot of our analysts have everyone has their own learning style. Right. You know, some people are more auditory, some people are more visual.

So we preserve, you know, the client's own demonstration of what it's either going to be like, or is like or is wrong or whatever they want us to know about it. and then we can add that file to our secure project folder and anybody down the road that's being onboarded. Like that's, that's a resource an asynchronous resource that they can turn to right? A person doesn't have to re demonstrate the software to onboard them, or sometimes, you know, by the time we're onboarding new people, the software has changed enough that we have to set those aside actually. And then you have to do a live in person kinda deal.

Maxwell: [00:08:32] and you really want to consider, individuals on the, on the spectrum, the different analysts and testers they do have different learning styles. We do want to ask for as many different. resources that are available, to, accommodate for that, but also to have us be, the best enabled to be the subject matter experts on the product.

so what we've found is that what we're really involved in is writing test cases and, and, and rewriting test cases to humanize the software to really get at, what are you asking this software to do that in turn is what the product is doing. a lot of the testing we do is black box testing, and we want to understand what the original design concept is.

So that involves the user interface, design document, right? early stages of that, if available, or just that, dialogue that Michael was referring to, to get that common language of what do you want this product to do? What are you really asking this code to do? having recordings, or any sort of training material, is absolutely essential. To being the subject matter experts and then developing the kind of testing that's required for that.

Michael: [00:09:48] And all sorts of different clients have different, different amounts of testing material, so to speak I mean, everything from, you know, a company that has their own internal, test tracking software and they just have to give us access to it. And the test cases are already there to, a piece of paper, like a physical piece of paper that they copied the checklist into Excel.

And now, like, these are the things that we look at, but of course there's always a lot more to it than that, but that at least gives us a starting point to sort of to build off of and, you know, testing areas and sections and, you know, sort of thematically related features, things like that. And then we can, we develop our own tests, on their behalf, basically.

Jeremy: [00:10:29] And when you're building out your own tests, what, what would be the, the level of detail there? Would it be a high level thing that you want to accomplish in the software and then like absolute step by step, click by click,

Michael: [00:10:42] You know, I hate to make every answer conditional, right. But it sort of depends on the software itself and what the client's goals are. one of our clients, uh, is developing a new, screen-sharing app that's for developers both work on the same code at the same time, but they can take turns, typing, controlling the mouse, that sort of thing.

and although this product has been on the market for awhile, we started out with one of those checklists and now have hundreds of test cases based on, both features that they've added, as well as weird things that we found like, Oh, make sure sometimes you have to write a test case, uh, that tests for the negative, like the, the absence of a problem, right?

So you can make sure X connects to Y and the video doesn't drop or. If you can answer the connection, on, before the first ring is done and it successfully connects anyway, or, or, you know, any host of, of options. So our test cases, for that project, we have a lot of, uh, screen caps and stuff because a picture's worth a thousand words as the cliche goes.

but we also try to describe, describe the features, not just, you know, present the picture with an arrow, like click here and see what happens. Because again, everyone has sort of different data processing styles and some would prefer to read step by step instructions rather than try to interpret, you know, some colors in a picture.

And what does this even mean out of context?

Maxwell: [00:12:08] and lots of times you'll end up potentially seeing test cases they seem like they could be very easily automated. Cause literally they're written all in code. and the client will occasionally ask us to do a test cycle scrub or they'll ask us, okay, well, what can be automated within this?

Right. But one of the key things we really look at is, is to try to humanize that test case a little more away from that just basic automation, lots of times that, that. Literally involves asking, what are you trying to get out of this out of this test case? cause it's fallen so much into the, into the weeds that you no longer can really tell what you're really asking it to really do So lots of times we will, we will help them automate them. But also just give it the proper test environment. and the, and the the proper steps, you'd really be amazed. How many test cases just do not have the proper steps get an, an actual expected result. And if it's written wrong at that basic manual level, you're not adding value.

so that's one thing that we, that we really have found it's added value to the clients and to their test cycles.

Michael: [00:13:21] A lot of people ask about automation because it's a very sexy term right now. And it certainly has its place. Right. But uh you can't automate new feature testing. it has to be an aspect of the product that's mature. Not changing from build to build. And you also have to have test cases that are mature, that you know, every little virtual or otherwise, you know, T is crossed and I is dotted, or else you end up having to do manual testing anyway, because the computer just goes oh
it didn't work. Because that's really all the, you know, the automated process can do is either it passes or it doesn't. And so then we have to come in and, and we have clients where we do plenty of that. Like, okay, they ran through the tests and these three failed figure out why, and then they go in and start digging around and, Oh, it turns out this is missing or this got moved in the latest update or something like that.

Jeremy: [00:14:12] that's an interesting, perspective for testing in general, where it sounds like when a feature is new, when you're making a lot of changes to how the software works. that's when, manual testing can actually be really valuable because as a person, you have a sense of what you want and if things kind of move around or don't work exactly the way you expect them to, but you kind of know what the end goal is. you have an idea of like, yes, this worked or no, this didn't. and then once that's solidified then that's when you said it's easier to, to shift into automatic testing. for example, having, an application, spin up a browser and click through things or, or trigger things through code, things like that.

Michael: [00:14:58] And you have to, you know, you have to get the timing just right. Cause the computer can only wait in so many increments and you know, if it, if it tries to click the next thing too soon and it hasn't finished loading, you know, then it's all over. but that's actually the, the, the discernment that you were sort of referring to the, the, using your judgment when executing a test.

that's where we really, we really do our best work and we have some analysts that specialize in exploratory testing, which is where you're just sort of looking around systematically or otherwise. I personally have never been able to do that very well. uh, but that's critical because those, those exploratory tests are always where you turn up the weirdest combination of things.

Oh, I happened to have this old pair of headphones on and when I switched from Bluetooth to. manual plug, you know, just disconnected the phones or the, you know, the conference call altogether you know, and who does that. Right. But, you know, there's all, all sorts of different kinds of combinations and, and, and who knows what the end user is going to bring. He's not going to necessarily buy all new gear, right. When he gets the new computer, the new software, whatever.

Jeremy: [00:16:05] I feel like there's been a. Uh, kind of a trend in terms of testing at software companies, where they, they used to commonly have, in-house testing or in-house QA, it would be separated from development.

And now you're seeing more and more of, people on the engineering staff, on the developing staff being responsible for testing their own software, whether that be through unit tests, integration tests, Or even just using the software themselves, where you're getting to the point where you have more and more people are engineers that maybe have some expertise or some knowledge in tests and less, so people who are specifically dedicated to test. and so I wonder from your perspective you know, a QA firm or just testers in general? Like what their role is in, in software development going forward.

Maxwell: [00:16:55] having specialized individuals that are constantly testing it and analyzing the components and making sure that you're on track to make that end concept design come to life really is essential. And that's what you get with the quality assurance. It's like a whole other wing of your company that basically is making sure that everything you are, that you are doing with this, product and with this software, is within scope.

and you can't be doing anything better as well. that's the other aspect of it, right? cause lots of times when we find a component and we found something, that we've broken or we've found a flaw in the design we look at, what that means.
bigger picture um, with the overall product. and we try to figure out all right, well, does this part of the functionality... is it worth it to fix this part of the functionality? is it cost-effective right. So lots of times quality assurance.

it comes right down to the, to the cost-effectiveness of the different, patches. and lots of times it's even the safety. of the, uh, product itself. it all depends on what exactly you're, you're designing, but I can give you an, an example of a, of a product that, that we were, that we were working with in the past, where we were able to get a component to overheat, obviously that is a critical defect that needs to be addressed and fixed.

that's something that can be found. as you're just designing the product. But to have a specialized division, that's just focused on quality assurance. They're more liable, they're more inclined. And that is what their directive is, is to find those sorts of defects. And I'll tell you the defects that we found that overheated this, this product, it was definitely an exploratory, find it was actually caught.

Off of a test case that was originally automated. so we definitely, we're engaged in every aspect or a lot of the aspects of the, of the engineering departments with this, uh, products. but in the end it was exploratory testing.

It was out of scope of what they had automated then ended up finding us. That's where I really see quality assurance in this, in this field within software engineering, really gaining respect and gaining momentum in understanding that, Hey, these are, these are really intelligent, potentially software engineers themselves.
That their key focus is to, is to testing our product and making sure that it's a design that, that is within the scope.

Michael: [00:19:36] It's helpful to have a fresh set of eyes too you know, if a person's been working on a product for, you know, day in, day out for months on end, inevitably there will be aspects that become second nature. may allow them to effectively like skip steps in the course of testing, some end result when they're doing their own testing, but you bring in, you know, a group of a group of analysts who know testing, but don't know your product other than generally what it's supposed to do and you sort of have at it and you find all sorts of interesting things that way.

Jeremy: [00:20:13] Yeah, I think you brought up two interesting points. one of them is the fact that. Nowadays, there is such a big focus on, automated testing as a part of a continuous integration process, right? Somebody will write code they'll check in their code, it'll build, and then automated tests will see that it's still working.

But those tests that the developers wrote, they're never going to find things that there were never a test written for. Right. So, I think that whole exploratory, testing aspect is interesting. and then Maxwell also brought up a good point uh, it sounds like QA can also not just help find what defects or issues exists, but they can also help. grade how much of an issue those defects are so that, the developers they can prioritize. Okay. Which ones are really a big deal that we need to fix, uh, versus what are things that, yeah, it's, I guess it's a little broken, but it's not, not such a big deal.

Maxwell: [00:21:14] in a broader sense, there are certain whole areas of design right now. Uh, Bluetooth is a really, uh, big area that we've been working in. I'm the QA manager for the Bose client at, Aspiritech and Bluetooth is really a big thing that, that is, that is involved in all of their different speakers.

So obviously if we, if we find anything, anything wrong with, with a certain area, you know, we want them to consider what areas they might want to focus more manual testing and less automation on. Right. and we're always thinking about, feature specific in that sense, um, to help the clients out as well.

and analysts that are on the spectrum, they really have. it's fascinating how, how they tend to be very particular about certain defects. and, and they can really find things that are very exploratory, but they don't miss. the, uh, forest for the trees in the sense that they still maintain, the larger concept design, funnily enough, where they can let you know, you know, is Bluetooth really the factor in this, that should be fixed here or is it, or is it something else? to, it leads to different, to interesting avenues for sure.

Michael: [00:22:32] Yeah, Bluetooth is really, A bag of knots in a lot of ways, you know, the different versions, different hardware vendors, we work with zebra technologies and they make barcode printers and scanners and so forth.
And you know, many of their printers are Bluetooth enabled. but you know, the question is, is it Bluetooth 4 to Bluetooth, 4 is it, backwards compatible.

And, uh, a certain, uh, rather ubiquitous, computer operating system is notorious for having trouble with Bluetooth management, whether it's headphones or printers or whatever. and in that instance, because we want to, you know, we're not testing the computer OS, we're testing the driver for the printer. Right.

So, part of the protocol we wound up having to build into the, into the test cases is like, okay, first go in deactivate, the computer's own, resident internal hardware, Bluetooth, then connect to this, you know, third party USB dongle, install the software, make sure it's communicating, then try to connect to your printer For a long time, an analyst would run into some kind of issue.

And the first question is always, are you using the computer Bluetooth? Or is it a third-party Bluetooth and is discoverable on, is it a Bluetooth, low energy because you don't want to print using Bluetooth low energy because it'll take forever. Right?

And then the customer thinks, Oh, this isn't working. It's broke. You know, not even knowing that there's. Multiple kinds of Bluetooth and yeah. It's, uh, it's hairy for sure.

Jeremy: [00:24:00] Yeah. And then I guess, as a part of that, that process, you're finding out that there isn't necessarily a problem in the customer's software. but it's some external case so that, when you get a support ticket or a call, then, you know, like, okay, this is another thing we can ask them to check. Yeah.

Maxwell: [00:24:18] Absolutely. And then that's something that we are, that we've been, you know, definitely leveraged for, to help out, to try to resolve customer issues that come in as well, and try to set up a testing environment that mimics that. And, and we've occasionally. integrated that to, to become part of our manual testing and some automated scenarios as well.

so that, so those have been interesting scenarios having to buy different routers and what, and what have you. And once again, it gets back to the cost-effectiveness of it. You know, what is, what is the market impact? Yes. This particular AT&T router or what have you, um, might be having an issue. but you know, how many, how many users in the wind, the world are really running the software on this.

Right. and that's something that, everyone needs to, you know, that every company should consider when they're, considering, uh, you know, a a patch, um, in the, in the software.

Jeremy: [00:25:14] and something you, you also brought up is. As a, as a software developer, when there is a problem. One of the things that we always look for is we look for a reproducible case, right? Like, what are the steps, um, you need to take to have this bug or this problem occur. And it sounds like one of the roles might be.
we get in a report from a customer saying like this part of the software doesn't work. Um, but I'm not sure when that happens or how to get it to happen. And so, as a QA, uh, analysts, one of your roles might be taking those reports and then building a repeatable, um, test case.

Michael: [00:25:55] Absolutely. There's lots of times where clients have said we haven't been able to reproduce this, see if you can. And you know, we get back to them after some increment of time. And, sometimes we can, and sometimes we can't, you know, sometimes we have to buy special, uh, like headphones or some kind of, you know, try to reproduce the investment that the client was using.
Uh, in case there was some magic sauce interaction going on there.

Maxwell: [00:26:21] our analysts on the spectrum. they are so particular in writing up defects, all the little details. and that really is so important in quality assurance is documentation for the entire process. that's one area where I think quality assurance really helps development in general is making sure that everything is documented that it's all there on paper and the documentation is, is solid and really sound. Um, so for a lot of these defects, we've actually come in and I think up the standard a little bit where you can't have the defect written where, you know, the reproducibility is one out of one.

And it turns out this was a special build that the developer was using that no one else in the company was even using. It's a waste of time to track this defect down. And that's based on the fact that it was a poorly written up report in the first place.
so it can be fun to have to. Track down all the various equipment you need for it. And analysts are really well-suited for writing those up and, and, uh, and investigating these different defects that are errors that we, that we find sometimes they're, sometimes they're not actually defects. They're just errors in the system.

Michael: [00:27:34] uh, tell them about like The Bose guide that Bose wound up using the guide that we had made internally.

Maxwell: [00:27:40] Yeah. There have been, so many guides that we've ended up creating that have been like terminal access, shortcuts, uh, just different, different ways to, you know, access the, uh, system, from a tester perspective that have, that have absolutely helped, just documenting all these things that engineers end up using to test code. Right. But lots of times these shortcuts Aren't well documented anywhere. so what quality assurance and what the Aspiritech has done, is we come in and we really create excellent training guides for how to, how to check the code.
Um, and, and what all the various commands are that have to be inputted and how that translates to what, the more obvious user experiences, which is, I think a lot of times what ends up being lost. it ends up all being code language and you don't really know what the user experiences, it's nice to, to be able to, To have found that, that the guides that we've created when we show them to the clients, because really we created them to make life easier for us and make the testing easier for us to make it more translatable.

When you see all this different code that some of us are very well versed in. but other analysts might not be. As well-versed in the code or that aspect of it. Right. but once you humanize it and you, and you sort of say, okay, well, this is what you're asking the code to do. then you have that other perspective of, I actually can think of a better way that we could Potentially do this. so we've brought a lot of those guides to the clients and they've really been, they've really been blown away, at how well documented All of that was, um, all the way down to the level of the, uh, GUIDs of all the systems. We have very good inventory tracking, and even being able to test and run, internal, components of the system. and that's why I bring up the a G U I D S so a lot of the testing that we end up doing, or I wouldn't say a lot, but a portion of it is the sort of tests that installers would be running this sort of functionality that only installers of systems would be, would be running. So, it's still, it's still black box testing, but it's behind the scenes of what the normal user experiences. Right. It's sort of the installer experience for lack of a better word.

And even having that well-documented and finding errors. In, in those processes have been quite beneficial. I, I remember one scenario in which there was an emergency update method that we had to test, right. And this is, this was a type of method where if someone had to run it, they would take it into the store and a technician would run it right. So basically we're, we're running software quality assurance on a technicians test for a system and the way a technician would update the system. And what we found is that what they were asking the technician to do was a flawed and complex series of steps. it did work. But only one out of 30 times, and only if you did everything in a very particular timing.

And it just was not something that was user-friendly, for the, a tech technician. So it's the kind of thing that we ended up finding. And lots of times it requires the creation of a guide because they don't have guides for technicians, to end up to end up finding a defect like that.

Michael: [00:31:21] and the poor technician, you know, he's dealing with hundreds of different devices, whatever it is, you know, whatever the field is, whether it's phones or speakers or printers or computers or whatever. And, you know, this guy is not working with the same, software day in and day out. The way we have to sometimes again, because the developer is sort of building the, the tool that will do the stuff, you know, we're, we're dealing with the stuff it's doing. And so in a lot of ways, uh, we can bring our own level of expertise to a product. Uh, we can surpass, you know, the developer even, it's not like a contest, right.

but just in terms of, you know, how many times is a developer installing it for the first time, like Maxwell was saying, what, when we do out of box testing, we have to reset everything and install it fresh over and over and over and over again. And so, so we wind up being exposed to this particular, you know, series of steps that the end user might only see a couple of times, but you know, who wants their brand new shiny thing, especially if it costs hundreds of dollars, you know, you don't want to have a lot of friction points in the process of finally using it.

You know, you just kind of want it to just work as effectively as possible.

Jeremy: [00:32:45] if I understood correctly in, in Maxwell's example, that would be, you had a physical product, like let's say a pair of headphones or something like that, and you need to upgrade, the firmware or. Perform some kind of reset. And that's something that like you were saying, a technician would normally have to go through, but, as QA, you go in and do the same process and realize like, this process is really difficult and it's really easy to make a mistake or, um, just not do it properly at all.

and then, so you can propose like either, you know, ways to improve those steps or just show the developers like, Hey, look, I have to do all these things just to, you know, just update my firmware. Um, you might want to consider like, making that a little easier on, on your customers. Yeah.

Maxwell: [00:33:32] Absolutely. And the other nice thing about it, Jeremy is, you know, we don't look at it at a series of tests like that as lower level functionality. just because, um, you know, it's more for a technician to have run it. It's actually part of the update testing.

So it's, so it's actually very intricate. as far as the design of the, of the product. We find a defect in how this system updates. It's usually going to be a critical defect. Um, we don't want the product to ever end up being a boat anchor or a doorstop. Right. So that's so that's, so that's what we're always trying to avoid.

and in that scenario, it's one of those things where then we don't exactly close the book on it once we, once we figure out, okay, this, this was a difficult scenario. For the technician, we resolve it for the technician. And then we look at, bigger scope, how does this affect the update process in general?

You know, does this affect the, uh, customers testing, that suite of test cases that we have for those update processes. you know, it can, it can extend to that as well. Uh, and, and then we look at it in terms of automation too. Uh, to see if there's any areas where we need to fix the automation tests.

Michael: [00:34:46] it can be as simple as power loss during the update at exactly the wrong time. the system will recover if it happens within the first 50 seconds or the last 30 seconds, but there's this middle part where it's trying to reboot in the process of updating its own firmware. And if the power happens to go out, then. You're out of luck

that does not make for a good, reputation for the client. that, I mean, the first thing a customer that's unhappy about that kind of thing is going to do is tell everybody else about this horrible experience they had.

Maxwell: [00:35:20] Right. And I can think of a great example, Michael, we had found a ad hoc defect. They had asked us to look in this particular area. There was a very rare customer complaints of update issues. but they could not find it with their automation. we had one analyst that amazingly enough, was able to pull the power.
At the right exact time in the right exact sequence. And, and we reported the ticket and we were able to capture the logs for this incident. And they must have run this through 200,000 automated tests and they could not replicate what this human could do with his hands. Um, and it would have really amazed them after we had found it.
cause they really had run it through that many automation tests, but it does, it does happen where you find those.

Jeremy: [00:36:10] we we've been talking about, uh, in this case you were saying this was for Bose, which is a very, large company. And I think that. When the average developer thinks about quality assurance, they usually think about it in the context of, I have a big enterprise company. Um, I have a large staff, I have money to pay for a whole bunch of analysts, things like that.

I want to go back to where Michael, you had mentioned how. one of your customers was for a, uh, a screen-sharing application. we had an interview with Spencer Dixon who's the CTO at Tuple. I believe that's the product you're referring to. So. I wonder if you could walk us through, like for somebody who has a business that's I want to say they're probably maybe four or five people, something like that.

what's the process for them, bringing on a dedicated analysts or testers. given that you're coming in, you have no knowledge of their software. What's the process there like?

Michael: [00:37:13] first of all, not to, not to kiss up, but the guys at Tuple are a really great bunch of guys. They're very easy to work with. we have like an hourly cap, per month, you know, to try to not exceed a certain number of hours. That agreement helps to manage their costs. They're very forthcoming. and they really have, folded us in to their development process. You know, they've given us access to their, uh, trouble ticket, uh, software.

We use their internal instant messaging application, to double-check on, you know, expected results. And is this a new feature or is this something that's changed unintentionally? so when we first started working with them, there was really only one. person on the project. and this person was in essence, tasked with turning, the Excel checklist of features into suites of test cases.

And, you know, you, you start with Make sure X happens when you click Y and then you make that the title of a test case. And, you know, once you get all the easy stuff done, then you go through the steps of making it happen. They offered us a number of very helpful sort of starting videos that they have on their website for how to use the software, by no means are they comprehensive.

but it was enough to get us comfortable, you know, with the basic functionality and then you just wind up playing with the software a lot. they were very open to giving us the ramp up time that we needed in order to check all the different boxes, uh, both ones on their list. And then new ones that we found because, you know, there's, there's more than one. connection type, right? That can be just a voice call or there can be the screen sharing and you can show your local video from your computer camera, so you can see each other in a small box. And, you know, what order do you turn those things on?

And, which one has to work before the next one can work? Or what if a person changes their preferences in the midst of a call and, you know, these are things that, fortunately Tuple's audience is a bunch of developers. So, uh, when their clients, their customers report a problem, uh, the report is extremely thorough because they know what they're talking about.

And so the reproduction steps are pretty good, but we still, sometimes we'll run into a situation, that they've shared with us. It's like, we can't, we can't make this one happen. And I don't know. I mean, The getting back to the Bluetooth, like they've even had customers where, uh, I guess one headset used a different frequency. Uh, than another one, even though they were on the same Bluetooth version. And when he changed this customer, I shouldn't say he, the customer, whoever whomever, they are, uh, they changed from one headset to another and you know, the whole thing fell apart and it's like, how do you even, you know, cause you don't go to the store and look on the package and see, Oh, this particular, uh, you know, headphone uses 48 kilo Hertz for their, you know, At the outset. I didn't even know that that was a thing that could be a problem. Right. It just, you figured Bluetooth has its band of the telecom spectrum and, but, you know, anything's possible. So they gave us time to ramp up, you know, cause they knew that they didn't have any test cases and uh, over time now, there's a dedicated team of three people that are on the project regularly, but it can expand to as many as six, you know, because it's a sharing application, right?

So you tend to need multiple computers involved, And yeah, we've really, we've really enjoyed a relationship with Tupelo and our, and our eagerly awaiting, uh, if there would be windows version, because there's so many times when we'll be working on another project even, and, you know, talking with the person and saying, Oh, I wish I could, you know, we could use Tuple cause then I could click the thing on your screen and you could see it happen instead of just, you know, um, they are working on a Linux version though.

I don't think that's a trade secret. So that's, that's in the pipeline. We're excited about that. And these guys, they pay their bills in like two days. No customers do that. They're, they're really something.

Jeremy: [00:41:14] I mean, I think that's a, a sort of a unique case because it is a screen sharing application, you have things like headsets and webcams and, you're making calls between machines. So, I guess if you're testing, you would have all these different laptops or workstations set up all talking to one another.
So yeah, I can imagine why it would be really valuable to have, people or a team dedicated to that.

Michael: [00:41:40] And external webcams. And you know, whether you're, you're like my Mac mini is a 2012, so it doesn't have the three band audio. port, right. It's got one for microphone and one for headphone. So that in itself is like, well, I wonder how many of their customers are really going to have an older machine, but, it wound up being an interesting challenge because then I had to, if I was doing a testing, I had to have a microphone sort of distinct from the headphones.

And then that brings in a whole other nest of interactivity that you have to. Account for maybe the microphones USB based, you know, all sorts of craziness.

Jeremy: [00:42:19] I'm wondering if you have projects where you not only have to use the client, but you also have to, to set up the server infrastructure so that you can run tests internally. I'm wondering if you do that with clients and if you do like, what's your process for learning? How do I set up a test environment? How do I make it behave like the real thing, things like that.

Maxwell: [00:42:42] So the production and testing equipment is what the customers have right. It's basically to, to create that setup, we just need the equipment from them and the user guides and the less information, frankly, the better in those setups, because you want to mimic what the customer's scenario is, right? You don't want to mimic too pristine of a setup. and that's something that we're always careful about when we're doing that sort of setup. As far as more of the integration. and the, uh, sandbox testing bed, where you're testing a new build for regressions or what, or what have you, that's going to be going out. we'd be connected to a different server environment.

Michael: [00:43:24] And with zebra technologies, they're their zebra designer, printer driver. Uh, they support windows 7 windows 8.1 windows 10 and windows server 2012, 2016, 2019. And in the case of the non server versions, both 32 bit and 64 bit, because apparently windows 10 32 bit is more common in Europe, I guess, than it is here.

And even though, you know, windows 7 has been deprecated by Microsoft, they've still got a customer base, you know, still running, you know, don't fix what ain't broke. Right. So why would you update a machine if it's doing exactly what you want, you know, in your store or a business or whatever it is. And so we make a point of, of executing tests in all 10 environments.

It, it can be tedious because windows 7 uh, 32 and 64 have their own quirks. So we always have to test those too, you know, windows 8 and windows 10. They're fairly similar, but you know, they keep updating windows 10 and so it keeps changing. and then when it's time, for their printer driver to go through the, uh, The windows logo testing, they call it that's like their, their hardware quality labs, hardware certification, uh, that Microsoft has, which in essence means when you run a software update on your computer, uh, if there's a new version of the driver, it'll download it from Microsoft servers.

You don't have to go to the customer website and specifically seek it out. So we actually do, uh, certification testing for zebra, uh, with that driver in all of those same environments and then submit the final, package for Microsoft's approval. And that's, uh, that's actually been, uh, sort of a job of ours if you will, for several years now.
And that it's not something you take lightly when you're dealing with Microsoft and actually this sort of circles back to the, the writing, the guides, Because, you know, there are instructions that come with the windows hardware lab kit, but it doesn't cover everything obviously. And we wound up creating our own internal zebra.
Printer driver certification guide and it's over a hundred pages because we wanted to be sure to include every weird thing that happens, even if it's only sometimes, and be sure you set this before you do this, because in the wrong order, then it will fail. And he won't tell you why and all sorts of strange things.

And we've of course, uh, when we were nearing completion on that guide. Our contact at zebra was actually wanted, wanted a copy. Um, cause you know, we're not they're only a QA vendor obviously. and so if there's anything that would help and they have other divisions too, you know, they do, uh, uh, they have a browser print application that allows you to print directly to the printer from a web browser without installing a driver and that's a whole separate division and, you know, but overall, all these divisions, you know, have the same end goal as we do, which is, you know, sort of reducing the friction for the customer, using the product.

Jeremy: [00:46:31] That's an example of a case where it, it sounds, you said it's like a hundred pages, so you've got these, these test cases basically ballooning in size. And maybe more specifically towards, the average software project, as development continues, new features get added, the product becomes more complex.

I would think that the number of tests would grow, but I would also think that it, it can't grow indefinitely. Right. There has to be a point where. it's just not worth going through, you know, X number of tests versus the value you're going to get. So I wonder how you, how you manage that as things get more complicated, how do you choose what to drop and what to continue on with?

Michael: [00:47:15] it obviously depends on the client, in the case of zebra to use them again, You know when we first started working with them, they put together the test suites. We just executed the test cases, as time went by, They began letting us put the test suites together. Cause you know, we've been working with the same test cases and you know, trying to come up with a system.

So we sort of spread out the use instead of it always being the same number of test cases, because what happens when you get, when you execute the same tests over and over again, and they don't fail. That doesn't mean that you're you fixed everything. It means that your tests are worthless. Eventually. so they actually, a couple of summers ago, they had us go through all of the test cases, looking at, uh, the various results to evaluate like, okay, if this is a test case that we've run 30 times and it hasn't failed for the last 28 times, is there really any value in running it at all anymore?

Uh, so long as that particular functionality isn't being updated because they update their printer driver every few months when they come out with a new line of printers, but they're not really changing the core functionality of what any given printer can do. They're just adding, like model numbers and things like that.

So when it comes to like the ability of the printer to generate such and such a barcode on a particular kind of media, like that only gets you so far. but when you have, you know, uh, some printers have RFID capability and some don't, and so then you can, you get to kind of mix it up a little bit, depending on what features are present on the model.

So deprecation of, worn out test cases, uh, does help to mitigate, you know, the ballooning, test suite. I'm sure. I'm sure Bose has their own approach

Maxwell: [00:49:02] Absolutely. there are certain features then might also fall off entirely, where, you'll look at how many users are actually using a certain feature. like Michael was saying, you know, there might not be any failures on this particular feature, plus it's not particularly being used a lot.

So, so it's a good candidate for being automated. Right. so also we'll look at cases such as such as that. and we'll go through a test cycle scrubs. we've had to do, um, a series of, update matrixes that we've had to, um, progressively. look at how much of the market has already updated to a certain version.

so if a certain part of the market, if 90% of the market has already updated to this version, You don't you no longer have to test from here to here as far as your update testing. So that's another way in which you can, which you slowly start to reduce test, test cases and coverage.

but you're always, you're always looking at that with risk assessment in mind. Right. And, and you're, and you're looking at, you know, who are the end users that, you know, what, what's the, what's the customer impact. If we're, if we're pulling away. Um, or if we're automating this set of test cases.

so, you know, we go about that very, uh, carefully, but, we've been gradually more and more involved in helping them assess, what test cases are the best ones to be manually run? cause those are the ones that we end up finding defects in time and time again.

so those, so those are the areas. that we've really helped rather than having, you know, cause lots of times clients will, if they do have a QA department you know, the test cases will be written more in an automation type language. So it's like, okay, why don't we just automate these test cases to begin with?

And it'll be very broad scope where they have everything is written as a test case, for the overall functionality. And it's just way too much as you're pointing out Jeremy and as features grow. It just, that just continues on. it has to be whittled down in the early stages to begin with.

but that's how we, that's how we help out. to finally, you know, help manage these cycles to get them in a more reasonable, manual testing, cadence, right. And then having, having the automated section of test cases have that be, you know, the larger portion of the overall coverage as it should be in general.

Jeremy: [00:51:29] so it sounds like there's this, this process of you working with, uh, the client figuring out, what are the test cases that. don't or haven't brought up an issue in a long time, or, the things that get the most or the least use from customers, things like that, you, you, you look at all this information to figure out what are the things that for our manual tests we can focus on, um, and try to push everything else. Like you said, into some automated tests.

Michael: [00:51:59] So if, over time, we're starting to see these trends with older test cases or simpler test cases. You know, if we notice that there's a potential we'll bring that to the, to the client's attention.

And we'll say, we were looking at this batch of tests for basic features and we happened to notice that they haven't, failed ever or in two years or whatever. Would you consider us dropping those, at least for the time being, see how things go. and you know, that way we're spending less of their time.

So to speak, you know, on the whole testing process, because as you pointed out, like the more you build a thing, the more time you have to take, you know, to test it from one end to the other. but at the same time, uh, a number of our analysts are, um, OAST 508 trusted tester certified for accessibility testing, using screen readers and things like that.

uh, it's interesting how many web applications, you know, it just becomes baked into the bones. Right. And so, you know, you'll be having a team meeting talking about. yesterday's work. Um, and somebody will mention, you know, when I, when I went to such and such page, you know, because this person happened to use, a stylus to change the custom colors of the webpage or something like that.

Um, they'll say, you know, it really, it was not very accessible and there was light green, there was dark green, there was light blue, like I can, you know, and so I used my style sheet to make them. Red and yellow and whatever. and you see enough of that kind of stuff. And then that's an opportunity, to grow our engagement with the client, right?

Because we can say by the, by, you know, we noticed these things, we do offer this as a service. If you wanted to fold that in or, you know, set it up as like a one-time thing, even, you know, it all depends on, how much value it can bring. The client, right. you know, we're not pushing sales, trying to, Oh, we'll always get more whatever.

Um, but it's just about, when you see an opportunity, for, improvement of the client's product or, you know, uh, helping, uh, better secure their position in the market or, you know, however, however it works or could work to their advantage. You know, we sort of feel like it's our duty. To mention it as their partner.

we also do data analysis, you know, we don't just do QA testing. I know that's the topic here, of course. but that is another, another way where, you know, our discerning analysts can find, one of our products or one of our clients rather. we do monthly, uh, call center. Like help desk calls.

we analyze that data in aggregate and, you know, they'll find these little spikes, you know, on a certain day, say over there or a clutch over a week of people calling about a particular thing. And then we can say to the, to the client, you know, did you push a new feature that day or was it rainy that day?

Or, you know, I mean, it could be any, and maybe the client doesn't care, but. But we see it. So we say it and, and let them decide what to do with the information.
Jeremy: [00:55:08] The comment about accessibility is, is, um, is really good because it sounds like if you're a company and you're building up your product and you may not be aware of the accessibility issues, um, you have a tested by someone who's using a screen reader, you know, sees those issues with contrast and, and so on.

And now the developer, they have like these, these specific, actionable things to do and potentially even, um, moved those into automated tests to go like, okay, we need to make sure that these UI elements have this level of contrast, things like that.

Michael: [00:55:45] Yeah. And there's different screen readers too. you know, the, the certification process, like with the government to become a trusted tester uses one particular screen reader named Andy it's an initialism. Um, but there are others and, you know, then it's on us to become familiar with, you know, what else is out there because it's not like everyone is going to be using the same screen reader, just like not everyone uses the same browser

Maxwell: [00:56:10] I think the clients realize that, yeah, we do have a good automation department, but is it well balanced with what they're doing manual QA wise? And I think that's where we often find that there's a little bit lacking that we can provide extra value for, or we can boost what is currently there.

Michael: [00:56:28] Our employees are quality assurance analysts. They're not testers. They don't just come in, read the script, then, Pokemon go afterwards. we count on them to bring that critical eye, you know, and they're, and everyone's own unique perspective. Uh, when they go to use any given product, you know, Pay attention to what's happening.

You know, even if it's not in the test case, you know, something might, you know, flash on the screen or there might be this pause before the next, uh, thing kicks off that you are waiting for. And that happens enough times and you kind of notice, like there's always this lag right before the next step, you know, and then you can check that out with.

The developer, like, is this lag, do you guys care about this lag at all? And you know, sometimes we find out that it's unavoidable because something, you know, something under the hood has to happen before, the next thing can happen.

Maxwell: [00:57:20] and even asking those questions, we've found out fascinating things like, you know, why is there this lag every time when we run this test, you know, we never want to want to derail a client too much. You know, we're always very patient for the answer. And sometimes we don't, you know, we might not get the answer, but I think that that does help build that level of respect between us and the developers, uh, that we really care what their, what their code is doing. And we want to understand, you know, if there is a slight hiccup what's causing that slight hiccup, it's, it's, it ends up being fascinating for our analysts as we are learning the product.
And that's what makes us wanna want to really learn, um, exactly what that, what the code is doing.

Michael: [00:58:02] though I'm not a developer. Um, when I first started at Aspiritech, I worked on Bose as well, and I really enjoyed just watching their code, scroll down the screen. As you know, the machine was booting up or the speaker was updating because you can learn all sorts of interesting things about what's happening, you know, that you don't see normally.

There's all sorts of weird inside jokes, uh, in terms of like what they call the command or, you know, Oh, there's that same spelling error where it's only one T or, you know, things that you kind of, you kind of get to know the developers in a way, you know, like, Oh, so-and-so wrote that line.

We always wondered. Cause there's only this one T and that word was supposed to have two teeth, you know, and they say, Oh yeah, we keep giving him a hard time about that. But now we can't change it because, so we have fun.

Jeremy: [00:58:52] If people want to learn more about, what you guys are working on or about Aspiritech where should they head?

Maxwell: [00:58:58] www.aspiritech.org. is our, is our website, head to there, uh, give you all the information you need about us.

Michael: [00:59:06] We also have a LinkedIn presence, uh, that we've been trying to leverage lately and, uh, talk to our current clients. I mean, they've really, they've really been our biggest cheerleaders and the vast majority of our, of our work has come from client referrals. was, is an example of that too.

You know, they were referred by a client who was referred, you know, we're very proud of that. You know, it speaks volumes about, about the quality of our work and the relationships that we build and, and, uh, you know, we have very little customer turnover in addition to very little staff turnover and that's because we invest in these relationships and then it seems to work for both sides.

Jeremy: [00:59:46] Michael, maxwell, thanks for coming on the show

Maxwell: [00:59:49] thank you so much, Jeremy. It's great talking to you.

Michael: [00:59:52] Thanks for having us

Scott Hanselman on What's .NET?

Tue, 23 Mar 2021 06:04:24 -0000

Scott is the Community Program Manager for the .NET team at Microsoft and the host of the Hanselminutes podcast.

This episode originally aired on Software Engineering Radio.

Personal Links and Projects:

Related Links:

Transcript:

You can help edit this transcript on GitHub.

Jeremy: [00:00:00] Today I'm talking to Scott Hanselman. He's a partner program manager at Microsoft, the host of over 750 episodes of the Hanselminutes podcast. And he's got a YouTube channel. He's got a lot of interesting videos. one of the series I really enjoy is one called computer stuff they didn't teach you. You're all over the place. Scott, welcome to software engineering Radio.

Scott: [00:00:22] Thank you for having me. Yeah. I was an adjunct professor for a while and I realized that in all of my career choices and all of my volunteerism and outreach, it's just me trying to get back to teaching. So I just started a TikTok. So I'm like the old man on TikTok now. But it's turned out to be really great.

And I've got to engage with a bunch of people that I would never have found on any other social platform. So whether you find me on my podcast, my blog, which is almost 20 years old now, my YouTube or my TikTok or my Twitter, it's pretty much the same person. I'm a professional enthusiast, but that's all done in the spare time because my primary job is to own community for visual studio.

Jeremy: [00:01:02] That experience of being a teacher in so many ways is one of the big reasons why I wanted to have you on today, because I think in the videos and things you put out, you do a really good job of explaining concepts and just making them approachable. So I thought today we could talk a little bit about .NET and what that ecosystem is because I think there's a lot of people who they know .NET is a thing, they know it exists, but they don't really know what that means. So, maybe you could start with defining what is .NET?

Scott: [00:01:37] Sure, so Microsoft is historically not awesome at naming stuff. And, I joke about the idea that 20 years ago, Microsoft took over a top level domain, right? Thereby destroying my entire .org plan to take over the world. When they made the name .NET they were coming out of COM and C and C++ involved component object models.

And they had made a new runtime while Java was doing its thing in a new language and a new family of languages. And then they gave it a blanket name. So while, Java was one language on one runtime, which allowed us to get very clear and fairly understandable acronyms like JRE, Java, runtime, environment, and JDK... Microsoft came out of the gate with here's .NET and C# and VB and you know, COBOL.NET and this and that. And this is a multilanguage runtime. In fact, they called it a CLR, a common language runtime. So rather than saying here's C#, it's a thing. They said here's .NET. And it can run any one of N number of languages now.

The reality is that if you look over time, Java ended up being kind of a more generic runtime and people run all kinds of languages on the Java runtime. .NET now has C#, which I think is arguably first among equals as its language. Although we also have VB and F# as well as a number of niche languages, but the .NET larger envelope name as a marketing thing remains confusing. If they had just simply said, Hey, there's a thing called C# it probably would have cleared things up. So, yeah. Sorry. That was a bit of a long answer and it didn't actually get to the real answer.

Jeremy: [00:03:24] So, I mean, you had mentioned how there was this runtime that C# runs on top of. Is that the same as the JVM? Like the Java virtual machine?

Scott: [00:03:36] Right. So as with any VM, whether it be V8 or the JVM, when you could take C# and you compile it, you compile it to an intermediate language. We call it IL. Java folks would call it a byte code and it's basically a process or non-specific assembler. For lack of another word. If you looked at the code, you would see things like, you know, load, you know, load four byte string into, you know, pseudo register type of stuff.

And, um, you would not be able to identify that this came from C#. It would just look like this kind of middle place between apples and apple juice. It's kind of apple sauce. It's pre chewed, but not all the way. And then what's interesting about C# is that when you compile it into a DLL, a dynamic link library or Linux which is like a shared object where you compile it into an executable.

The executable contains that IL, but the executable doesn't know the processor it's going to run on. That's super interesting, because that means then I could go over to an ARM processor or an Intel x86 or an x64 or whatever, an iPhone theoretically. And then there's that last moment. And then that jitter that just-in-time compiler goes and takes it the last mile, that local runtime, in this case, the CLR, the common language runtime takes that IL.

Chews it finally up into apple juice. And then does the processor specific optimizations that it needs to do. And this is an oversimplification because the, the pipeline is quite long and pretty exquisite. In fact, for example, if you're on varying levels of processor on varying versions of x64, or on an AMD versus on a, uh, an Intel machine.

There's all kinds of optimizations you can do as well as switches that we can do as developers and config files to give hints about a particular machine. Is this a server machine? Is this a client machine? Does it, is it memory constrained? Uh, does it have 64 processors? Can we prep this stuff both with a naive JIT a naive just-in-time compilation, or then over time, learn, literally learning over the runtime cycle of the process and go, you know, I bet we could optimize this better then swap out in while it's running a new implementation so we can actually do a multi-layered JIT. So here's the, I want to start my application fast and run this for loop.

You know, this for loop's been running for a couple of thousand milliseconds and it's kind of not awesome. I bet you, we can do better then we do a second pass. Swap out the function table, drop a new one in, and now we're going even faster. So our jitter, particularly a 64 bit jitter is really quite extraordinary in that respect.

Jeremy: [00:06:29] That's pretty interesting because I think a question that a lot of people might have is what's, what's the reason for having this intermediary language and not compiling straight to machine code. And what it sounds like, what you're saying is that, it can actually at runtime figure out like how can I run things more efficiently based on what's being executed in the program.

Scott: [00:06:50] Right. So when you are running a strongly typed language that has the benefits and the speed of drawing type language. and you also choose to compile it to a processor specific representation. We, we usually as developers and deployers think about things in terms of architectures like this is for an arm processor, but is this a raspberry PI, is it an Android device you really want to get down to the stepping individual versions of like an Intel?

For example, I'm talking to you right now on an Intel. But it's a desktop class processor. That was the first one released many, many years ago. It doesn't have the same instruction set as a modern i9 or a Xeon or something like that. So we have to be able to know what instruction sets are available and do the right thing.

And then if you add in things like, um, SIMD or doing, you know, data parallelization at the CPU level, If that's available, we might treat a vector of floats differently than we would on another thing. So how much information up front do we know about deployment and is it worth doing all that work on the developer's machine?

And this is the important part, which may not necessarily reflect the reality of where it's going to get deployed. You know, my desktop does not represent a machine in Azure or Heroku or AWS. So getting up to the point where we know as much as we can. And then letting the, the runtime environment, adapt to the true representation of what's happening on the, on the, on the final machine.

That doesn't mean that we can't go and do native compilation or what's called end gen or native image generation that if we know a whole lot, but, um, it provides a lot of value if we wait until the end.

Jeremy: [00:08:34] Yeah, that's an interesting point too, in that if we are compiling down to an intermediate language on our own development machines, then we don't have to worry about like, having all these different targets right. Of saying I'm going to build for this specific type of ARM processor you know, all these different places that our application could possibly go because the runtime has taken care of that for us.

Scott: [00:08:58] Right. And I think that's one of those things where it's taking care of it for us, unless we choose to, it reminds me of the time. Uh, if you remember, when we went from, you know, mostly manual shift cars to automatic shift cars, but people who bought fancy Audis and BMWs were like, you know, when I'm going to the shops to get groceries, I'm cool with automatic, but I really like the ability to downshift.

And then we started to see these hybrid cars that had, you know, D and R for drive in reverse and then a little move left and move right. To take control. And like, I'm going to tell the automatic transmission on this car that I really want to run at higher rpms. .NET tries to do that. It gives you both the ability to just say, you know, do what you're gonna do. I trust the defaults, but then a ton of switches, if you choose to really know about, uh, that, that ending environment.

Jeremy: [00:09:50] We've been talking about the common language runtime, which I believe you said is the equivalent of the Java virtual machine. You also mentioned how there are languages that are built to target the intermediate language that, that runtime uses like C# and VB and things like that. And you touched on this earlier before, but I feel like we haven't quite defined what .NET really is like, is it that runtime and so on?

Scott: [00:10:18] Okay. so .NET as an all encompassing marketing term, I would say is the ecosystem. Just as Java is a language and also an ecosystem. So when someone says, I know .NET, I would assume that they are at the very least familiar with C#. They know how to compile code and deploy code. They might be able to be a website maker, or they might be an iPhone developer.

That's where .NET starts getting complicated. Because it is an ecosystem. Someone could work. Let's say two people could work for 10 years in the ecosystem, and one could be entirely microservices and containers on the backend, in the cloud. And the other person could be Android and Unity and games and Xbox, and they are both .NET Developers, except they never touched each other's environment.

One is entirely a game and mobile app developer and the other one is entirely a backend developer. That's where it's harder now. I think the same thing is true in the Java world. Although I would argue that you'd be surprised how many places that you can find .NET, whether it be, in a Python notebook, you can run .NET And C# and an IPYMB in a notebook and Jupiter um, on, you know, on the Google cloud or you can run it on a microcontroller, not a microprocessor, but literally on a tiny 256 K micro controller, all of that is, is .NET.

Now .NET is also somewhat unique, uh, amongst ecosystems in that it has a very large, what we call BCL the base class library. One of the examples might be that we know with a Python or with a JavaScript. You say console.log, hello world. We got the same thing. And then it's like, Hey, I need a really highly optimized, a vector of floats with a complicated math processing. Well, you have to go and figure out which one of those libraries you want to pull in. Right? There's lots of options in the Python world. There's lots of options in the Java world, the JavaScript world, but then you have to go hunting for the right library. You have to start thinking.

Microsoft, I think is somewhat more prescriptive in that our base class library, the stuff that's included by default has everything from regular expressions to math, to, you know, representations of memory all the way up to putting buttons on the screen.

So you don't always have to go out to the third party until it's, uh, it's time to get into higher level abstractions. And that has made .NET historically attractive to enterprises because when you're picking a framework, And it has a whole ton of ecosystem things. Who do you throttle when it doesn't work? Right?

There's an old joke that people picked IBM because they got, it gave them one throat to choke. Uh, people who are in enterprises typically will buy a Microsoft thing and then they'll buy a support license and then they can call Microsoft when it is. Um, now fast forward for 20 years, Microsoft has a balance between a strong base class library and enrich open-source environment.

So it's a little bit more complicated, but the historical context behind why Microsoft ended up in enterprises is because of that base class library and all of those libraries that were available out of the box.

Jeremy: [00:13:20] And that's sort of in direct contrast to something like say JavaScript where people are used to going to the node package manager and getting a whole bunch of different dependencies that have been developed by other people. It sounds like with .NET, the philosophy is more that Microsoft is going to give you most of what you'll need and you can sort of stay in that environment.

Scott: [00:13:46] I would say that that was the original philosophy. I would say that that was pre open source Microsoft. So maybe 10 and 15 years ago. I would think that that is not a philosophy that is very conducive to a healthy, open source ecosystem. But it was early on when it was like, here's a thing you kind of buy, you didn't really buy it, but like it comes with windows and you buy visual studio.

When I joined about 13-14 years ago, myself and others are really pushing open source because we came from the open source community. And as we started introducing things like Azure and services .NET is free, has been free. Is now open source visual studio code, visual studio community are all free. So the stuff I work on, I actually have nothing to sell.

As such, um, it's better to have a healthy community. So we have things like NuGet so NuGet and N U G E T is the same as NPM is the same as Maven. It's our package manager environment. And that is, uh, allows you to pull both Microsoft libraries and third-party libraries. And we want to encourage an open source community that then, you know, pays open source framework, uh, creators for their work using things like github sponsors.

People who are working in enterprises now need to understand that if you pick just the stuff that Microsoft gives you out of the box, you may be limited in the diversity of cool stuff that's happening. But if you go and start using open source technology, You might need to go and pay a support contract for those different tools or frameworks, depending on whether it's a, you know, a reimagination of ASP.NET or web framework.

And you say, I don't like the one Microsoft has. I'm going to use this one or an OAuth identity server that Microsoft does not ship out of the box. It's a missing piece. Therefore use this third party identity server.

Jeremy: [00:15:33] What are some things that you or Microsoft has done to encourage people to build open source stuff for .NET?

Scott: [00:15:43] Well, early on Microsoft was, uh, you know, attempted to open source .NET By creating a thing called rotor, R O T O R, which was a way of saying, Hey, Java, everyone in, in, in academia is using Java. We have one too, but. It didn't exist. The only real open source licenses at the time were GPL, so Microsoft made a very awkward, this is about 20 years ago, very awkward overture, where it's like, here's a zip file with a weird license that Microsoft made called the MSPL, the Microsoft permissive license.

And here's a kind of a neutered version of .NET That you could potentially use. And it wasn't source it wasn't open source. It was source opened. It's like, here's a zip file. Like, go, go play with this. There were no take backs. Fundamentally Microsoft was not a member of the open-source community. They were just like, Hey, look, we have one too.

When we started creating ASP.NET MVC, the model view controller stuff, which is kind of the Ruby on Rails for .NET. The decision was made by a bunch of folks on the team that I worked on and encouraged by me and others who think like me. Just open source the thing and do takebacks it's it's not just, Hey there's Hey, you can watch us write code and do nothing about it, but the takebacks is so important.

And the rise of GitHub and social coding enabled that even more. So now actually about 60% of .NET comes from outside Microsoft. It is truly collaborative. Now Microsoft used to patent a bunch of stuff and they'd patent stuff all over the place and everyone would be worried. I don't want to commit to this because it might be patented, but then they made the patent promise to say that any of the patents that covered .NET won't be things that will be enforced.

And then they donated .NET to the .NET foundation, which is a third party foundation, much like many other open source foundations and .NET and all of the things within it, the logos and the code and the copyrights are all owned by that. So if Microsoft, God forbid were to disappear or, uh, stop caring about .NET, it is the communities that can be built and deployed and run entirely outside of .NET. So it's overtures like that to try to make sure that people understand that this isn't going anywhere, it's truly open source and then take backs is such a big thing. And this has been going on now for, for many, many years, most people who think .NET and probably the folks that are listening to this show are thinking the big giant enterprise monolith that only runs on Windows that came out in the early 2000s.

Now it is a lightweight cross-platform open source framework that runs in containers and runs on Linuxes and raspberry pis. It's totally open and it's totally open source all the way down. The compilers, open source, all the libraries, everything.

Jeremy: [00:18:21] To your point of people, picturing .NET being closed source running only on windows. I think an important component to that is the fact that there have been multiple .NET runtimes right? There's been the framework there's been core. There's been mono. I wonder if you could elaborate a little bit on what each of those are and what role they fit in today.

Scott: [00:18:46] That is a very, very astute and important observation. So if I were to take Java byte code, Uh, I've taken Java source code and I've turned it into Java byte code. There are multiple Java runtimes, Java VMs that one could call upon it's from Oracle and there's Zeus and there's different open source, open JBMs and whatnot.

There are even interpreters. You can run that through a whole series of them. There's X. You can imagine a pie chart of all the different Java VMs that are available. Now. Certainly there's the 80% case, but there are lots of other ones. The same thing applied in the .NET ecosystem. There is the .NET framework that shipped and ships continually still with windows.

And the .NET framework and its associated CLR, I would say is a, and I'm putting my fingers in quotes here, a .NET. But to your point, there are multiple .NETs. Plural. Mono is a really interesting one because it is an open source, clean room reimplementation of the original Windows-based .NET and clean room in that they did not look at the source code.

They looked at the interface. So if I said to you, Jeremy, um, here's the interface for `System.String`, write me an implementation of this interface here are the unit tests and you wrote yourself Jeremy specific `System.String`. And it, it worked, it passed all the tests. Now, you know, who knows under load, it might behave differently.

But for the most part, you made a, a clean room reimplementation of that. Mono did that for all of .NET proper, but then that became a new .NET. So for those that are listening, they can't see us or see a screencast. But imagine that I go out to the command line and with a Microsoft implementation of .NET, I make a new console app.

It says `Console.WriteLine`. Then I compile it into an executable. I can then run that with Mono. So I created it on one environment and I ran it with another. So I, I did the compilation and the first, the first bit of chewing, and then the finally the jitter would run through mono because the IL language is itself a standard that has been submitted to the standards organizations.

And it is a thing. So if you and I were, uh, you know, advanced computer science students in university, that might be an interesting homework assignment, right? A IL you know, interpreter or write an IL jitter. And if you can successfully consume IL that has been created by one of these many .NETs you have passed the course.

Then .NET core is a complete reimplementation opensource cross-platform with no specific ties to windows. And that's the .NET that has kind of, we're hoping will be rebranded just .NET. So five years from now, people won't think about .NET core .NET framework. They'll just think about .NET.

It will run great on windows. It'll run great on Linux it'll run great right in the cloud, yada yada, yada. That is a kind of a third .NET. We're trying to unify all of those with the release of .NET 5, which just came out and .NET 5 we picked the number because it's greater than 4, which was the last version of .NET on windows.

But we also skipped a number because .NET Core 3. Would have been confusing if we named it .NET Core 4, again, Microsoft not being good at naming stuff. You imagine two lines heading to the right .NET framework and .NET core. They then converge into one .NET. The idea being to have one CLR, one common language runtime, one set of compilers and one base class library.

So that if someone were to learn that. If we harken back to the middle of the beginning of our conversation, when I said that there was a theoretical .NET developer out there, who's a gamer and there's another one who's doing backend work. They should be using the same libraries. They should be using the same languages.

They should have a familiarity with everything up to, but not including the implementation specific details of their systems, so that if a unity person or a, uh, decided to become a backend developer or a backend developer decides to move over to make iPhone games, they're going to say, Oh, This is all the system, dot this and that stuff I'm familiar with. That's a `System.String`. I appreciate that. As opposed to picking an iPhone or an Android specific implementation of a library.

Jeremy: [00:23:09] If .NET core is the open source reimplementation of the .NET Framework, and you've also got, you were talking about how mono is this clean room implementation of it? What role does, does mono play, currently?

Scott: [00:23:25] Hmm. Good question. So mano was originally created by, uh, Miguel de Icaza and the folks that were trying to actually make an outlook competitor on Linux that then turned into the Xamarin, uh, company X, A M A R I N and Xamarin, um, then was a different reimplementation, which allowed them to do interesting stuff like, can we take that IL and send it over to I'm doing a thing called SGEN, uh, and generating, you know, uh code that would run on an iPhone or code that we're running on an Android and they created a whole mobile ecosystem and then Microsoft bought them.

So then, because mono and .NET work on the same team we're getting together right now and we're actively merging those together. Now, originally mono was really well known that it was a just nice, solid, clean C code. You could basically clone the repo run configure run make, and it would build while the windows, implementation of .NET Core and .NET Core in general, wasn't super easy to build it.

Wasn't just a clone configure and make. A lot of work has been done to make the best parts of mono be the best parts of .NET core, as opposed to like disassembling one and, you know, taking them apart and making this frankenbuild, we want to make it so.net core is, is a, uh, a re-imagining of both original .NET framework and windows and the best of a cross platform one.

So that is currently in the process in that .NET 5 .NET 6 wave. So this year in, um, in November, we will release .NET 6, and that will also be an LTS or a long-term support release. So those of you who are familiar with Ubuntu's concept of LTS those long short term long-term support versions will have, you know, many years, three years plus of support.

And by then the mono .NET merge will kind of be complete. And then you'll find that .NET core will basically run everywhere that mono would. It doesn't mean mono as an ecosystem goes away. It just becomes another, uh, not the supported version of .NET, uh, at a runtime level.

Jeremy: [00:25:31] When we look at the Java virtual machine and we look at that ecosystem, there are languages like Scala and Clojure, things that were not made by the creator of the JVM. But are new languages that target that runtime.

And I wonder, from your perspective, why there was never that sort of community, with the common language runtime?

Scott: [00:25:56] I would think that because the interest from academia early on 20 years ago was primarily around Java. And Microsoft was slow on the uptake from an open source perspective that, um, slowed down possible implementations of other languages. And then .NET got so popular and C# evolved because C# is now on version nine, the C# that you wrote the idiomatic C# of 2001 is not the idiomatic C# with includes, you know, records and async and await and interesting things that have shown up on C# and then been borrowed by other languages, async await, being a great example, linq language, integrated query, being another thing that didn't exist, that language has evolved fast enough that people didn't feel that it was worth the effort. And it was really F# that picked an entirely different philosophy in the, in the functional way, by doing kind of an Ocaml style thing. And because C# and F# are so different from each other, they really fit nicely. They're as different as English and another language that isn't Latin-based, you know, And they meaning that they add flavor to the environment.

That doesn't mean that there aren't going to continue to be niche languages. Uh, but there was also things like IronRuby and IronPython, but they weren't really needed because the Python and Ruby runtimes already worked quite well.

Jeremy: [00:27:16] We've been talking about .NET and sort of the parts that, make up .NET, but I was wondering, other languages have niches, right? You have go maybe for more systems type programming or Python for machine learning. Are there niches for .NET?

Scott: [00:27:33] Oh, yeah. I mean, so when I would, I would, I would generally push back on the idea of niche though, because I don't want to use niche as a diminutive right. I kind of like the term domain specific just in the sense of you know, when in Finland speak Finnish, that doesn't mean that we would say, Hey, you know, Finnish didn't win.

Like we're not all speaking. It's not the, it's not the language. That one. But we don't want to think about things like that. There are great languages that exist for a reason because people want to do different stuff with them. So for example a really well thought of language in the .NET space is F# and F# is a functional language.

So for people who are familiar with OCaml and Haskell and things like that, and people who want to go and create a language that is, you know, very correct, very maintainable, very clear that has a real focus on you managing the problem domain. As opposed to the details of, of imperative programming. Um, it has immutable data structures by default.

There's a lot more type inference than there is in C#. Um, a lot of people use F# in the financial services industry. and it also has some interesting language features like records and discriminated unions, uh, that, uh, C# people are just now kind of getting hip to.

Jeremy: [00:28:47] What I typically see or what I think people sort of believe when they think of .NET is they think of enterprise applications. And I'm wondering if you, have some examples of things that are sort of beyond that space.

Scott: [00:29:03] Sure. So when I think of .NET, like in the past, I would think of like big windows apps with giant rows of tabs and text boxes over data. You know, if I go to my eye doctor and I see the giant weird application that he's running, that's multiple colors and lot of grids and stuff. And I think, Oh, that must be written in .NET.

Those are the applications of 20 years ago. The other thing that I think people think about when they think about .NET is it's a thing that is tied to windows and built on windows and probably takes about gigabytes of space. .NET core is meant to be more like node and more like go than it is to be like .NET Framework.

So for example, if I were to create a microservice using .NET core, I go out to the command line and I type `dotnet new`. Web or `dotnet new empty`, or `dotnet new console`. And then I can say `dotnet publish` and I can bring in a runtime. And we talked before about how you could keep things very non-specific until the last minute and let the runtime on the deployed machine decide.

Or you could say, you know, this is going to a raspberry PI, it's going to be a microservice on a raspberry PI. I want it to be as small as possible. So there's multiple steps. We've gone to, we made this little microservice and we published it. Then we publish all of the support, DLLs, the dynamic link libraries, the shared objects.

If it's going to Linux all of the native bindings that, that provide that binding between the runtime and, the native processor and the native operating system, all the system calls then that is right now, about 150, 180 megs on .NET. But then we look at the tree of function calls. And we say, you know, this is a micro service.

It only does like six things. It looks at the weather, it calls like six native functions. Then we do what's called tree trimming or tree shaking and we shake, we give it a good shake. And then we basically delete all of the functions that are never called. And we know those things, right. We can, we can introspect the code and we can shake this thing and we can get microservices down to 30, 40 megs.

Now again, you might say, well, compared to go or compared to writing that in C. Sure. But you're talking about still maintaining the jitter, the garbage collector, the strongly typed abilities of .NET. All of the great features of this cool environment shook, shaken, shaken, shaken down to, you know, 30, 40 megs.

And then you can pack that tightly into a container, even better if the container layer above it. And if you imagine a multi-stage Docker container, doesn't include the compiler. Doesn't include the SDK and the container above. It includes the runtime itself. You might only deploy, you know, 10, 15 megs.

You can really pack nicely together a .NET microservice and then a whole series of them. Thousands of them potentially into something like Kubernetes. That's a very fundamentally different thing than the giant .NET application that took two gigs on your hard drive.

This is the cool part, though. A giant, we call it windows forms or that WPF, that windows presentation application that often we find in large enterprises moves slowly because the not, not run slowly, but move slowly from an, um, from a dev ops perspective because the enterprise is afraid to update it, or they need to wait for windows to get the version of .NET that they want because .NET has historically been tied to windows.

Which sucks then that application and the whole team behind it gets depressed. What if we could take the cool benefits that I just described around containers back port it, so that, that giant application at my doctor's office ships its own local copy of .NET, which it can then maintain, manage, and, and deal with it's it's on its own.

All in a local folder. Doesn't need to wait for windows. That's really cool. Then they could do that, that, that prejudging basically do a native image generation and even create a single executable where inside is .NET. So then I give you my doctor's office dot exe. You double click on it. You install nothing.

It simply unfolds like a flower into memory. Including all of the benefits. Now it's not generated native code at this point, right? It's .NET in an envelope that then blooms and then runs with all native dependencies that it needs. That is pretty cool. And that actually really generates excitement in the enterprises so that they can bring those older applications to, to modern techniques like dev ops and, uh, as well as a huge, huge perf speed up like two X, three X per speed up.

Jeremy: [00:33:47] When you talk about a perf speed up, why would including the, the .NET runtime with the application result in a perf speed up?

Scott: [00:33:55] It's not that we're including the, the, the .NET runtime that caused the perf speedup. It's that 20 years of improvement in jitter, 20 years of understanding the garbage collector, rather than using the .NET that shipped with windows, they can use the one that understands. Again, a callback to the beginning instruction sets that are newer and modern as SIMD, SSE2, all the different things that one could potentially exploit when on a newer system. Those things are we get huge numbers and actually .NET itself .NET 5 is 2 or 3x over .NET core three, which is 2 or 3x over. So a huge perf push has been happening and actually all the benchmarks are public. So if you go to techempower, the tech empower benchmarks, you can see that we're having a wonderfully fun kind of competition, a bit of a thumb war with, you know, the gos and the Java's and the, you know, groovy on rails of the world.

To the point where we're actually bumping up against the laws of physics. We can put out, you know, seven or 8 million requests a second. And then you notice that we all bundle at the top and .NET kind of moves within a couple of percentage of another one. And the reason is, is that we've saturated the network.

So without a 10 gigabit network, there's literally nothing, no more bytes can be pushed. So .NET is no longer in the middle or to the far right of the, uh, of the long tail of the curve. We're right up against anything else. And sometimes as fast as 10x, faster than things like node.

Jeremy: [00:35:23] So it sounds like going from the .net framework to .net core and rebuilding that, uh, you've been able to get these, giant speed improvements. You were giving the example of a windows desktop application and saying that, uh, you could take this that was built for the .net framework, um, and run it on .NET core.

I'm wondering, is that a case where somebody can just take their existing project, take their existing code and just choose to run it? Or is, does there have to be some kind of conversion process?

Scott: [00:35:56] So that's a great question. So it depends on if you're doing anything that is like super specific or weird. Like if you were. A weird application, then, then you might bump up against an edge case. The same thing, the same thinking applies. If you were trying to take an application cross-platform or example, I did a tiny mini virtual machine and an operating system for a advanced class in computer science.

15, 20 years ago, I was able. To take that application, which was written and compiled in .net 1.1 and move it to .NET core and run in a container and then run on a raspberry PI because it was completely abstract. It was using byte arrays and, and, and list of T and kind of pretty straightforward base class library stuff.

It didn't call the registry. It didn't call any windows APIs. So I was able to take a 15 year old app and move it in about a day. Another example, my blog, which used to run on .NET to my blog has over 19 years old now has recently been ported from .NET 2 to 4, then .NET core. And now .NET 5 runs in the cloud in Linux, in a container. With 80% of the code, the same, the stuff that isn't the same that I had to actually consciously work around with the help of my friend Mark Downey was, well, we called the registry. The registry is a windows thing. It doesn't exist. So what's the, you know, what's the implementation, is it a JSON file now? Is it an XML file? You know, when it runs in a container, it needs to talk to certain mapped file system things while before it was assuming. Okay. Back slashes or a C:. So implementation details need to move over. Another thing that is interesting to point out though, is that there are primitives, there are language and runtime primitives that didn't exist before that exists now.

And one of the most important one is a thing that's called span of T `Span`. And this is the idea of a completely new way of thinking about. Arbitrary continuous bits of memory. Like you want to be at a high level language and then messing around in bite arrays. Or do you want to think about things at that higher level, but then also re avoid reallocation and copying of memory around.

So something very simple, like some HTTP headers come in, you know, if you're doing 7 million of these a second, this can add up right. And it's like, well, usually we take a look at that chunk of memory and we copy it into an array and then we start using higher level constructs to deal with that. Now I have a list of char and then we for loop around the list of char that's kind of gross.

And then you find out that you're in the garbage collector a lot because you're allocating memory and you're copying stuff around. You could take that older code, update it to you, use span of T and represent that contiguous region of arbitrary memory differently rather than an array, a span you can point to native memory or even memory managed on a staff back and then deal with stuff.

For example, parsing or maneuvering within HTTP header with no allocations. So a really great example, you'd ask for an example of an application that you couldn't believe. VR desktop. Are you familiar with VR desktop as it relates to the Oculus quest? Okay. So the Oculus quest is the Facebook VR thing, right?

And it's an Android phone on your face and it's, you can use it unplugged. And then people say, well, I got this Android phone on my face. I'd really like to plug it in with a USB cable and use the power of my desktop. But then I got to ship those bits across the USB and then have a client application that allows me to use it as a dumb client.

Okay. You would expect an application that someone would write to do that would be written in C or C++ or something really low level VR desktop allows you to actually see your windows desktop and ship those bits across the wire, across the literal wire, the USB or wirelessly is written entirely in C#. And you can use really smart memory allocation techniques. And the trick is don't allocate any memory and C# is more than fast enough to have less than 20 millisecond jitter on frames so that it is buttery smooth, even over wireless. And I did a whole podcast on that with the Guy Godin the Canadian, a gentleman who wrote VR desktop.

Jeremy: [00:40:23] That's really interesting because you're talking about areas where typically when you think of a managed language, like, C#, you're able to use it in more environments than you, you might normally think of. And, I also kind of wonder, you have the new Apple machines coming out. They have their arm processors instead of the x86 processors. what is the state of.net when targeting an arm environment? Can somebody simply take that IL that's on one machine, bring it on to the arm machine and it just works?

Scott: [00:41:01] Yep. So if you are putting, if you're taking over the, the IL and I'm on a windows machine right now, I'm on an X64 machine, and I go, `dotnet new console`. And I go, `dotnet run` and it works on my X64 machine. And I say, `dotnet publish`. I can then take that, that library over to the raspberry PI or an arm machine say `dotnet run`.

And because I'm saying `dotnet run` with the local .NET runtime, it will pick up that IL and it will just run. So I have run using my now 19-20 year old blog. As an example, I have run that entire blog on a raspberry PI unchanged. Once I had worked out the portability issues, like once I had worked out the portability issues like back slashes versus forward slashes and assumptions like that.

So once the underlying syscalls line up, absolutely. And we actually announced in July of 2020 that we will support .NET core on the new Mac hardware. And there are requirements for .NET and .NET core on, you know, Apple like arm 64 and big sur, that would be required to get that runtime working.

But yes, the goal would be to have native processes versus Rosetta to processes working. And that work is happening.

Jeremy: [00:42:24] We were talking about like the typical enterprise application, you've got a, windows forms application is it possible for somebody with an application like that to, to target arm?

Scott: [00:42:36] So right now there is there's multiple layers. So the compiler yes. The runtime. Yes. The application. Yes. At the point where, if I were going to run on windows arm, like a surface pro X or something like that, we would need support for, I believe windows forms, winforms and WPF. I don't know that those are working right now, but those apps still run because just like we have the, the Rosetta, you know, Emulator can emulate x86 stuff.

So for example, paint.net is a great.net application that is basically Photoshop, but it's four bucks, um, that it runs entirely on .NET core and it runs just fine on a surface pro X in an emulated state. And I believe that the intent of course is to get everything running everywhere. So yeah, I would be sad if it didn't. That's a little outside my group, but the people that are working on that stuff. And by the way, the great thing about this go and look at the GitHub issues, right? So, you know, if you look on, I'm looking right now on a .net issue, 4879, where they're talking about the plan is supporting .NET core 3 and 5 on Rosetta 2 via emulation and .NET 6 supported in both native on ARM architecture and Rosetta two.

And there's in fact, a .NET runtime. A tracking issue. That's called support Apple Silicon, and they walk through where they're at current status on both mono and .NET six.

Jeremy: [00:44:00] very cool.

Scott: [00:44:01] Yeah, it is cool. It's cool. Because you get to like, actually see what we're doing, right? Like before you had to know somebody at Microsoft and then call your friend and be like, Hey, what's going on?

We are literally doing the work in, open on GitHub. So if the, if the tech journalist would pay more attention to the actual commits, they would get like scoops weeks before an a blog post came out because you're like, I just saw that, you know, the check-in for, you know, they had an Apple Silicon show up in the .NET but-da-da.

Jeremy: [00:44:26] Might get a little bit technical for the average journalist though.

One of the things that you've been talking about is how with .NET Core, you have a lot of cross-platform capabilities. I wonder from an adoption standpoint, what you've seen, like are there people actually running their services and Linux and Docker, Kubernetes, that sort of thing.

Scott: [00:44:48] Yeah .NET. Um, one of the things that we try to make sure that it works everywhere. So we've seen huge numbers, not only on Azure, but AWS has a very burgeoning, uh, .NET Community, as well as their SDKs all work on .NET. Same with Google cloud. we're trying to be really friendly and supportive of anyone because I want it to run everywhere.

So for example, at our recent .NET conf, which is our little virtual conference that we make, we had Kelsey Hightower from Google cloud, like their main Google cloud Kubernetes person. Learn.net in a weekend, and then deploy to Kubernetes on Google cloud. We're seeing a huge uptick on .NET 5. So that unified version where we took.net core and the.net framework and unified them, it's the most quickly adopted .NET ever.

I mentioned before that like 60% of the, commits are coming from outside the community. We're seeing big numbers, you know, stack overflow is moving to dot has moved to .NET core. They're actually showing their perf. And most of the games that you run are already running on .NET. And, you know, if you see a unity game, there's .NET and C# running in that a lot of the mobile games that you're seeing, um, one of the other things that we're noticing is that more and more students and young people are getting involved.

Um, so we're really trying to put a lot of work into that. One of the other things that's kind of cool that we should talk about before the end is a thing called blazor. Have you heard about that?

Jeremy: [00:46:07] Yeah. Maybe you could explain it to our audience.

Scott: [00:46:09] So just like, um, if we use Ruby on rails, as an example, I'd like to use other frameworks as an example, like builds cool stuff on top of it. So it's like you pick your Ruby, whether it be JRuby or, and Ruby or whatever. And then you've got your framework on top of that and then other frameworks, and then you layer on your layer and your layer.

.NET has this thing called ASP.NET active server pages. .NET. It's a stupid old name. You can make, web APIs that are JSON and XML based API APIs. You can make a standard model view controller type environments. You can do a thing called razor pages where you can basically mix and match C# and HTML.

But if you want to do those higher level, enterprise applications where you just don't want to learn Java script. We've actually taken the CLR and compiled it to web assembly and we can run the CLR inside V8. But then if you actually hit F12 tools on a blazor page, laser is bla Zed, O R hit F 12.

You go, and you hit a blazor page. You can see the DLLs, the actual dynamic link libraries coming over the wire. And then the intermediate language that we've been talking about so much on this podcast gets interpreted on the client side, which allows a whole new family of people. That already knows C#.

Like those individuals that we spoke to spoke about at the beginning, you've got someone with 10 years experience in unity and someone else got 10 years experience on the backend. I never got around to learning JavaScript. They can write C# they can target the browser. And then the browser itself runs.net and then blazor allows them to create spa applications, single page applications, much like, you know, Gmail and Twitter and in PWAs, progressive web apps.

So one could go and create an app. And rather than an electron kind of an app, which is using JavaScript and a whole host of tools, they can make a blazor app, have it run natively anywhere they can then pin it to their mobile phone or their taskbar on windows and they're there suddenly a web developer, and it's a really great environment for those big enterprise applications or those more line of business type apps, because the data binding is just so clean and then that backend can be anything.

So blazor is a really great environment and a pretty interesting technical achievement to get .NET itself running within the context of the browser.

Jeremy: [00:48:33] Does that mean that somehow you've been able to fit the .NET core runtime, in web assembly? I'm trying to understand how that works.

Scott: [00:48:40] Yep. That's exactly. That's exactly right. So remember how we mentioned before that this isn't the multi gigabyte thing that everyone is so afraid of, you basically bring down. I think it's like 15 or 20 megs, which is, you know, a little bit big, but if you think about the homepage of the verge, it's not that big, right.

I've seen PNGs bigger than the .net, uh, runtime. Uh, once you bring that down, of course, if you put that on a, on a CDN. that's handled. I think it's actually even smaller than that. It's not that big of a deal because once it's down, it's cached and that is really the .NET core, you know, mono runtime running within the context of WebAssembly and then you're just calling local browser things.

But here's the cool part. Let's say you don't like any of that, let's say I've described this whole thing. You like the application model. I want to write C#. But you don't want to download and like put a lot of work into getting the local CPU and the local V8 implementation to do that. You can actually not run the .NET framework and the .NET core rather, pardon me on the client side, but you can have it run on the server side and then you open a socket between the front end and the backend.

So then the events that occur. On the client side, get shuttled over web sockets to a stateful server. So there's blazor, which is client side, and then there's blazor server, which means it can run anywhere on any device. So then it's more of a programming model thing. You still don't have to write any JavaScript because you'll be addressing the Dom the document object model within the browser from the server side.

The messaging then goes magically over the web sockets. You basically, you have a persistent channel bus running between the front end and the back end. So you get to choose.

Jeremy: [00:50:28] yeah, we did a show on Phoenix live view, and it sounds like this is a very similar concept, except that you get the choice of having that run in the server or run locally in the client.

Scott: [00:50:40] Yeah. And I think the folks on Ruby on rails are doing similar kinds of things. It's this idea of, you know, bringing HTML back, maybe getting a little less JavaScript heavy, and really thinking about shoving around islands of marked markup. Right. If I have a pretty basic, you know, master detail page and I click on something and I just want to send back a card.

Is that really need to be a full blown react view? Angular application is the idea is you, you make a call to get some data, you ship the HTML back and you swap out a div. So it's trying to find a balance between, too much JavaScript and, uh, and too many post backs. And I think it, it hits a nice balance.

Blazor is seeing huge pickup, particularly in people who have found JavaScript to be somewhat daunting.

Jeremy: [00:51:27] I think we've been talking about things that are a little more cutting edge in terms of what's happening in the .NET ecosystem. Things like web assembly. what else are Microsoft's priorities for the future of .NET?

Are you hoping to move into some more ahead of time compilation environments, For example, where you need something really small, that can run on IoT type devices. What's the future of .NET from your perspective?

Scott: [00:51:57] All of the above. If you take a look at the microsoft .NET IOT, um, repository on github, we've got bindings for dozens and dozens and dozens of sensors and screens and things like that. We have partnerships with folks like PI top. So if you're teaching .NET, you can go in inside of your browser in a Jupiter kind of an environment, uh, go and, um, communicate with actual physical hardware and call GPIO general purpose IO directly. Uh, we're also partnering with folks like wilderness labs, which has a wonderful implementation of .NET Based on mono that allows you to write straight C# on a tiny micro processor, not a micro controller, pardon me uh, rather than a microprocessor and create, you know, commercial grade applications like, uh, you know, uh, uh, thermostat running in .NET, which again, takes those different people that I've mentioned before, who have familiarity with the ecosystem, but maybe not familiarity with the deployment and turns them into IOT developers.

So there's a huge group of people that are thinking about stuff like that. We've also seen 60 frames a second games being written in the browser using blazor. So we are finding that it is performant enough to go and do you know, we've seen, you know, gradients and different games like that. Being implemented using the canvas, uh, with blazor server running it's, uh, at its heart, the intent is for it to be kind of a great environment for everyone anywhere.

Um, and then if you go to the .NET website, which D O T.net dot.net, we can actually run it in the browser. And write the code and hit run so you can learn .NET without downloading anything. And then there's also some excitement around .NET interactive have notebooks, which allows you to use F# and C# in Jupiter notebooks and mix and match it with different languages.

You could have JavaScript, D3JS, some SVG visualizations, using C# all in the browser entirely in the, uh, in the Jupiter notebooks IPYMB ecosystem.

Jeremy: [00:54:00] So when you talk about, .NET Everywhere. It's really increasing the number of domains that you can use .NET at, and like be the person who maybe you built, uh, enterprise windows applications before, but now you're able to take those same skills and like you said, make a game or something like that.

Scott: [00:54:20] Exactly. It doesn't mean that you can't or shouldn't be a polyglot, but the idea is that there's this really great ecosystem that already exists. Why shouldn't those people be enabled to have fun anywhere?

I think that's a great note to end on. So for people who want to check out what you're up to, or learn more about .net, where should they head?

Scott: [00:54:42] Well, they can find me just by Googling for Hanselman, Google, with being, if it makes you happy. Um, everything I've got is linked off of hanselman.com. And then if they're interested in learning more about .NET, they can go to just Google for .NET and you'll find dotnet.microsoft.com. Uh, within the download page, you'll see that we work on Mac, Linux, Docker, windows, and, um, if you already have visual studio code, just download the .NET SDK, go to the command line type `dotnet new console`, and open it up.

And visual studio code VS code will automatically get the extensions that you need and you can start playing around. Feel free to ask me any questions. We've got a whole community section of our website, whether it be stack overflow or Reddit or discord or code newbies or tick tock, we are there and we want to make you successful.

Jeremy: [00:55:29] Very cool. Well, Scott, thank you so much for chatting with me today.

Scott: [00:55:34] Thank you. I appreciate your very deep questions and you clearly did your research and I appreciate that.

Distributed Systems and Careers with Shubheksha Jalan

Thu, 25 Feb 2021 05:44:12 -0000

Shubheksha is a Software Engineer at Apple. She previously worked at Monzo and Microsoft.

Personal Links and Projects:

Other interviews about getting into open source:

Related Links:

Music by Crystal Cola.

Jeremy: This is Jeremy Jung and you're listening to software sessions. Today, I'm talking to Shubheksha Jalan about working with distributed systems, the go programming language and some of the lessons she's learned being a software engineer for the last three years. Previously, she worked as a backend engineer at Monzo Bank. And before that she was a software engineer at Microsoft.

Shubheksha, thank you so much for joining me today.

Shubheksha: Thank you so much for having me.

Jeremy: You picked up a lot of experience with distributed systems at Monzo bank. Could you start with explaining what a distributed system is?

Shubheksha: Yeah. So the main premise and the main difference, between like, a distributed and a non distributed system, is that a distributed system has more than one node. And when I say a node, I mean just computers, machines, servers. So it's basically a network of interconnected nodes that is being asked to behave as a singular entity and present itself as, as something that's not really disjoint.

And that's kind of where the trouble starts. something that you hear a lot of people who work with distributed systems say is that just don't do it unless you really really need to because like things spiral out of control very, very quickly. Because like, when you have a single machine, you don't have to think about like communication over the network.

You don't have to think about like distributing storage. You don't have to think about so many other things, whereas with like, when you have multiple machines, you have to think about coordination. You have to think about communication and like at what levels and like what sort of reliability you need and all of that.

So on and so forth. So it just gets super tricky.

Jeremy: Yeah. And when we talk about distributed systems, you mentioned about how it's having multiple nodes. And I think a more traditional type of application is where you would expect to have a load balancer and some web servers behind the load balancer and having those talk to a database and those that involves multiple nodes or multiple systems. But, I guess in that case, would you consider that to be a distributed system or not really?

Shubheksha: I think, yeah, like if something is going over the network and like, you have to sort of make those trade offs, I think I would still call it a distributed system. And like right now, like it's not just the big companies that are building and deploying distributed systems. It's pretty much everyone, like even small startups, especially like, because the cloud has become so dominant now, like the cloud is essentially a giant distributed system that we sort of take parts off and deploy our stuff, which is isolated from all of the other stuff that's also deployed on it. But like from the perspective of a cloud provider, it is a big big distributed system. And like, even when you, as a customer are using a cloud provider, you are, your code is deployed. You don't even know where, and it's still to you like if you spin up like multiple EC2 instances, for example, that is a distributed system.

Jeremy: It's almost not so much that the different nodes are doing different things I guess it's just the fact that you have to deal with the network at all. And that's where all of these additional troubles or, or challenges come in.

Shubheksha: Yeah, I think, yeah, that's probably a more accurate way to put it. Like, I can't really think of a system that you can like reasonably fit on a single machine right now. especially with the way, like we provision things, in the cloud. So yeah, by that definition, pretty much everything that we are building is a distributed system at some level.

Jeremy: Mmm you were saying earlier about how don't build one if you don't need to. But now it almost sounds like everything is one, right?

Shubheksha: Yeah. To some level. Yeah. Everything is one. Yeah.

Jeremy: When people first start getting started with having a system that involves the network, what are some common mistakes you see people make?

Shubheksha: Like a lot of people misunderstand the CAP theorem. The CAP theorem, is basically one of the most popular theorems in distributed systems literature and it states that like CAP stands for consistency, availability and network partitions.

And there's a lot of debate around like what those terms actually mean. So I don't want to go into that right now because that'll probably take up the rest of our time. CAP theorem states that you can only have two out of those three things at a given point of time.

And a lot of people sort of take that to heart and take it way too literally whereas there's a great blog post about this, which we should definitely link in the show notes, but which argues that like network partitions are not really optional. There's no way you can have a network that's a hundred percent reliable, like that's not humanly possible.

And there's also like some confusion around like what the terms, availability and consistency mean, And that's also like a huge source of debate and confusion in the community. And like a lot of people when they, when they sort of start working. They don't really understand what that means in what context, because those terms are very, very overloaded in tech in general.

And like one thing means like five different things depending on the context. So yeah, it can be really hard to get started.

Jeremy: Mm, and since it's a little difficult to pin down, what CAP means, I guess, for somebody who's, who's starting out. what would you suggest they they focus on to make sure that when they build something, that it works, I guess is one way to put it.

Shubheksha: I think that's, that's a very hard question to answer in like a generalized manner. It, it depends on your requirements and like, what is your end goal? What sort of like availability and reliability characteristics that you're looking for, from the system, for example, if it's a, if it's a system that's like sorting some critical data in that case, like consistency would be more valuable compared to variability, like it's a trade off.

Like everything else. So it really depends on the use case and yeah, like what you hope to achieve with the system.

Jeremy: Do you have a example from your experience where you, you chose those trade offs and could kind of walk us through that process?

Shubheksha: Uh, no, I don't think most people would have to deal with this day to day. And if you do, it's probably not a very good sign.

Like at Monzo, we had like a pretty unified platform. We did not have to like, think about this, at like, that level and go into it like super deep.

Jeremy: If I understand correctly there are engineers who build infrastructure or build tooling so that the software engineers or the backend software engineers don't have to worry about is this going to be eventually consistent or is this being partitioned instead you have some kind of API that's layered on top of that. Is that correct?

Shubheksha: Yeah. So you don't have to worry about like partitioning or anything like that. Like that's just taken care of. you just write a database schema, you deploy it and like data gets populated. You're not super worried about how that's done.

Jeremy: Hmm. And when you give the example of a database schema, is this like a commercial product or an open source product? Or is this something built in house at Monzo?

Shubheksha: Oh no. This is Cassandra.

Jeremy: Oh Okay. Then it's the Cassandra database developers. They are the ones who had to think about the CAP theory and work out the trade offs that they wanted to decide on.

And you, as the engineer, you just have to know how to use Cassandra.

Shubheksha: Yeah. And like we deploy it ourselves as well. So it can get a little tricky there too. You can fine tune it, it has a lot of knobs. So you have to give consideration to that and configure it accordingly basically.

Jeremy: I don't know if this applies to Cassandra specifically, but sometimes when people talk about distributed systems they talk about things like eventual consistency of how in your database, the latest change may not always be there when you go out to read it as a, as a backend software engineer. Is that something that you have to to think about and work around?

Shubheksha: So that also depends to an extent on the configuration and the type of data. if you know some data might not be present when you're reading it, like you have to guard for it. But like a very common pattern that's used is like, if you don't find something in the database, then you just assume that it's not present at all.

So like you try to read by a particular ID and if it's not present, then you create a new one. we don't have to deal with eventual consistency at every step. I would say like, I'm sure Cassandra does something in the background to deal with that. But yeah, like usually the assumption we make is that like, yeah, if it's not found in the database, then you just go and create it.

Jeremy: Mm, okay. I guess another thing I would ask about is you have all these different nodes you're referring to in a distributed system, and they all need some way of communicating with each other. what are some, some good protocols or ways for different nodes to communicate?

Shubheksha: something that I have been using at Monzo, or like I used to use at Monzo is a protobufs. It is a project that has been released by Google and it basically specifies how different services will communicate over the wire. So that sort of standardizes it and you have to still take care of like, some of the marshaling and unmarshaling.

But yeah, like it does most of the, the job on its own because when you're communicating on like, suppose you have two different services deployed on two different machines and machines, and you need a way to make them talk to each other, you know, you basically need some sort of a language that both of them understand.

So like if one is speaking French, and if one is speaking German, they can't really talk. So like, protobufs are sort of like both of them are talking in English and they can understand what they're saying.

Jeremy: And protobuf is the serialization format. Right. So that's a way that you can get a message from one system to another. But how is the, the transport handled? Like, are you using web services or some kind of message bus?

Shubheksha: Oh, yeah. So, that's mostly done over HTTP. Like wire remote procedure calls. Like, that's the most common thing I've seen you basically have some sort of a wrapper around, uh, like go's HTTP primitives to suit the needs of your system.

Jeremy: Hmm. Okay. you have HTTP endpoints in each service and you send that service, a protobuf message,

Shubheksha: Via a remote procedure call yeah.

Jeremy: If you're using HTTP, each node or each service has to know where the other machine is or how to reach the other machines, how is that managed in a in a system?

Shubheksha: that's sort of done, by some sort of service discovery mechanism, something that's super commonly used is Consul by HashiCorp. The job of that piece of software is to basically find out what service is deployed where. That's the other challenge with distributed systems, because you have so many things all over the place, you need to sort of keep track and have an inventory of like what is where so that if you get a request for a particular endpoint, you know where to send it.

you can use like a service discovery, uh, tool like Consul or you can use something like Envoy, which is a network proxy, and which sort of helps you uh do something similar.

Jeremy: And from the, the back engine engineers perspective, if you are using Envoy, um, or Consul or something like that, when you're writing out your end points or which HTTP endpoint you're going to talk to, what does that look like? Is there some central repository you go to and you just put that URL in and Consul or Envoy handles the rest?

Shubheksha: Oh, no. So like as soon as you deploy your service, basically, the platform will be aware of it. And all of the manifests that are associated with your service will get deployed and Envoy will pick the changes up.

For example, a new service, you need to make your proxy aware of it. So there will be some amount of configuration involved either you can do that by hand, or you can do that in an automated way.

Jeremy: So it's, it's almost like, adding your service to the system, making it aware that it exists. rather than having to know exactly who you're talking to, that's all handled by, some kind of infrastructure team or by the product.

Shubheksha: Yeah. So all of that, like, all of the platform components are basically deployed by a separate team.

Jeremy: Hmm. Okay. If you are outside of that team, a lot of this complexity--

Shubheksha: Is hidden away from you, yeah. And I have mixed feelings about that.

I feel like as a backend engineer, I would still want to know how things work. And like, part of that is just because I'm a curious person by nature. But like part of it is I, I genuinely think like developers should know where their code is running and how it's running and like the different characteristics of it, like making it too opaque is not really helpful. And like, I think it's needed to be like a holistic all around engineer basically.

Jeremy: Yeah, because I wonder how that impacts when there's a problem and you need to debug the problem. How do you determine whether it's something happening on the infrastructure side versus a bug in the code? That sort of thing.

Shubheksha: Yeah. So that can be a tricky one. Like you have to go through like multiple layers of investigation to figure out at what level the problem is like you usually start with the code and when, like, and also depending on like your monitoring and alerting, like, if it's something really clear that like a node is down, then you know that yes, it is an infrastructure problem.

But if like the error rate is high. Then it is very likely to be a code problem, but yeah, like in some cases it can be very difficult to figure out like, what is actually going on and like, what is the symptom and what is the cause.

Jeremy: And in the context of a distributed system, are there specific tools or strategies you would use to try and figure out what's going on?

Shubheksha: So this really depends on the system and like how well it's monitored. Like. my main takeaway was that like observability and monitoring should be a first class citizen of the design process, rather than an afterthought that you just add in, after you're done building the entire system, like that goes a long, long way in helping people debug because like, as, as a team grows or as a company grows and as the system grows, it can get so unwieldy that it, it can be super hard to keep track of what's going on and what, what is being added where. That's one of my main, main takeaways that yes, you should always think about observability, right from the start rather than treating it like an afterthought.

And in terms of like investigation, I'm not really sure if there are like specific tactics I would use. Like that's just something. Like you watch and learn other people do. And like it's so specific to the technologies and the kind of platform that you're dealing with. But yeah, like the best thing I have found about like incidents and like on call and all of that is that you just, you have to watch a lot of people who are really good at it. Do it again and again, and again, ask them questions and like, see what workflows and mental models they have and like the kind of patterns that they have built over time. And just go from there.

It starts to makes sense after a while like, initially it just seems like magic you just keep staring and you're like, how did this person know that, you know, this is where they had to look. And like that's where the bug was. I like to think about it as like, you just have to form mental models. Like the more familiar you are with something the stronger your mental models are. And like you have to update them. with time with changes, all sorts of things and they just stick after a while.

Jeremy: Do you have any specific examples of a time you were trying to debug something and some of the steps that you took and trying to resolve it?

Shubheksha: Oh yeah, after like I was, I used to be on the, on call rotation at Monzo. So like I was involved in lots of incidents and like, usually we had pretty decent run books, about like, what to do when a spe specific alert went off.

So we used to just usually follow the steps and if we were like completely out of our depth, then try a bunch of things. Throw darts randomly and see what sticks. Like sometimes you just don't know what else you can do. Like every outage is not like super straightforward, super simple. You have to like, look around, hit at like five different things and like see where it hurts basically. So yeah.

Jeremy: And in terms of the run books you were mentioning, would that be a case where there was a problem previously and then somebody, documented how they fixed the problems, then you would address that the same way next time?

Shubheksha: Usually yes, usually yes. Or something like if like something was being tested and there's a very clear process to like, test whether it's working or not. Something like that would be documented as a runbook as well.

Jeremy: Can you give us a little bit of a picture of specifics in terms of, are you SSH going into machines or are you looking at a bunch of log files? What, what is, uh, that troubleshooting process often look like?

Shubheksha: I'm not sure if I'm supposed to talk about this, especially because I'm not at Monzo anymore, so yeah, I'd be a little cautious talking about that right now.

Jeremy: Okay. That's fair. You had mentioned observability being really important before. what are some examples of things that you can add to your services or add to your infrastructure to make this debugging process a lot easier?

Shubheksha: So one of the main things that, uh, I think work great in practice is error budgets. Having some sort of definition of this is what is acceptable in terms of a particular service or a particular system returning errors. And like, if it crosses that threshold, then it's like, yes, this is our priority. We need to fix it.

Like a lot of the SRE (Site Reliability Engineering) work can not be postponed indefinitely, which is what happens in a lot of companies. That are trying to do SRE but badly. So, yeah, so like that's something that I really like as a philosophy. in terms of like monitoring, I think what I'm trying to get at, when I say that like monitoring should be a first class citizen, is that you should be very clear about the output that you expect from the system or what does healthy, not healthy look like for your system, because if you have that, then you can derive metrics from it and vice versa. If you're not able to come up with good metrics for your system, then have you looked closely enough at like what the system is trying to achieve?

Jeremy: Yeah. I think that idea of error budgets is really interesting because I think a common problem people have is they'll log everything and there will be errors that happen, but people just expect it right? They go, Oh, it's flashing warnings, but don't worry about it. It normalizes it, but nobody really knows when it's an actual problem.

Shubheksha: Yup. Yeah. Like, yeah. So basically yeah if you're logging everything and there are errors that are expected errors then yeah how do you differentiate whether an error is an expected error or it's an unexpected error like that's just a slippery slope that you don't really want to tread on.

And people get used to it. Like if a service keeps spewing errors and nobody really cares then if there's an actual outage, it's very easy to dismiss that yeah. that's what this service does and we have seen it before and it wasn't a problem. So yeah. It's probably fine.

Jeremy: Like you were saying, it's observing what you expect the system to do. So I guess, would that just be looking at. a normal day and looking at your error rate and seeing if people are able to successfully complete whatever they're trying to do on the system. and then if you actually start receiving, I guess, trouble tickets or you receive complaints from customers, that's when you can determine oh this error rate being this percentage and people complaining at this time means that this is probably a real issue. Something like that?

Shubheksha: Yeah, that's a very tricky question to answer because like, yeah, it's, it's hard to know what the right error budget is. Like, that's a much harder question to answer then, you know, just having an error budget. So yeah, like initially, like it it's just playing around and seeing like what sort of throughput the system has, what's the expected load and like all of those things and coming up with some metric and like tweaking it over time as you learn more about the system.

Jeremy: And with these systems, I'm assuming there are a lot of different nodes, a lot of different services. How do you debug or kind of work with this sort of system on your own local machine? Like, are you spinning up all these different services and having them talk or do you have some kind of separate testing environment? What does that look like?

Shubheksha: Yeah. Usually there is a separate testing or staging environment, which tries to mimic the production environment. And on your local computer, you can usually spin up a bunch of containers. Obviously it's nowhere close to like having an actual copy of the production environment, but for simple enough changes it can work.

But usually there are like testing environment, which will give you most of the same capabilities as a production environment, where you can test changes, but it's like the other problem is keeping them in sync. That's also a really, really hard problem. So a lot of times like staging environments exist, but like even when you're deploying to production after testing and staging, it's like fingers crossed.

I don't really know what's going to happen. We'll just deploy and see what breaks. Because yeah, the environments can divert quite a bit.

Jeremy: In terms of the staging environment, would that be where you have a bunch of fake data that's similar to production and then you have tooling to have transactions that would normally occur in production and staging?

Shubheksha: So like mimicking stuff as much as possible to whatever extent. Yeah.

Jeremy: How would engineers coordinate if you have this staging environment, I don't imagine everybody has their own.

Shubheksha: Yeah. That can get that can get tricky as well, because like people can override changes and like if they they're testing at the same time and they don't know, and like they're deploying on the same, the same service, so that's one benefit of microservices that like, if your service is small enough that multiple people will not be working on it at the same time, then you sort of reduce that contention.

But yeah, like if there are multiple people who are working on the same service, then they have to coordinate somehow to just figure out who's using the environment when and like testing stuff. So the other thing is like having some, a small subset of the staging environment given to like every single engineer, but like, yeah, that's, that's obviously not very simple to achieve, especially if you have like an engineering organization that is growing quite a bit.

Jeremy: How large was the engineering team at Monzo?

Shubheksha: I think it was about 150 engineers.

Jeremy: Hmm, 150. Okay. So in your experience, when you were working on a service, were you able to work on it in isolation where you would work out what the contract should be between services. So you didn't actually need to talk to the whole system for a lot of the time? Or what did that look like?

Shubheksha: I think most of the time it was like a single person working on a single service. And yeah, if you do, need to work on the same service with someone else, and usually it was, it was never more than two people in my experience then yeah you just, you coordinate with them and just give them a heads up that yes, you're doing this and you'll be deploying your changes.

Jeremy: And your service most likely is talking to other services. So would that be a part of this design process where you work out what the contract should be and that sort of thing?

Shubheksha: Ah, so usually that was pretty straightforward. Like it was a simple RPC. So you do not have to think about it too much. Like you just know that service A will talk to service B to fetch some data XYZ.

Jeremy: So I guess before you started working on the service, you might provide to the other team, or whoever owns that other service here is the protobuf message that I intend to receive and that sort of thing.

Shubheksha: Yes. So mostly like if like a service A talking to service B. Service B will have its own handlers and like its own, proto files and all of that. And I just reuse that in my service to talk to service A.

Jeremy: And then likewise, if somebody was talking to your service, you would provide them your proto files or schema.

Shubheksha: Yeah.

Jeremy: Another thing about having all these different services is I would imagine that they are all being updated at different times.

What are some of the challenges or things that you have to look out for when you're updating services in a distributed system?

Shubheksha: The biggest problem is dependencies what depends on what, like the. Dependency graph can be pretty difficult to sort of map out, especially if you have like a really large sprawling system with like dependencies all over the place. If you're deploying, like 4 services, let's say, and like A depends on B, which depends on C.

That's a problem in any distributed system. Like if you have like especially something modular, like just tracking the dependencies in that entire system can be a huge task. And like, it can inform a lot of other decisions, like in what order do you deploy things?

And like, if you want to remove something, like, it just leads to this rabbit hole because like, yeah, if you want to delete a service, you need to first figured out, like if it's if there's anything else that's depending on it, otherwise a bunch of things will break and you won't really realize that.

Jeremy: And how do you, keep track of that. Is there some kind of chart or some kind of document? Like how do people know what they can change?

Shubheksha: A lot of it is just in people's heads. Like you talk to people who have worked on that part that you're working on. And like, they'll usually be pretty familiar with like the sort of dependencies that exist. I think. We tried to like map out all of the dependencies at once and like someone posted about this on Twitter as well.

But like, they actually tried to create a dependency graph of like all the services that we had and yeah, it was not a pretty picture,

Jeremy: Hmm. Because how many services are we talking about here? Is it like hundreds

Shubheksha: 1500

Jeremy: 1500 plus. Wow.

Shubheksha: Yep.

Jeremy: That's pretty wild because basically you're relying on maybe going into email or chat and going like, Hey, I'm going to update this service. Uh, which ones might this break?

Shubheksha: Yeah. So that can get tricky and that does make like onboarding people a little bit harder. Like there are there are trade offs, both good and bad ones when you have a system like that. It's not all bad. It's not all good.

Jeremy: 1500 is-- it seems like nobody at the company could even really know exactly what's happening at that scale. I wonder from your perspective, how do you decide when something should be split off you know into an individual service versus having a service take care of more things?

Shubheksha: Yeah, so that was usually pretty scoped, We try to let a service do one thing and do it well, rather than trying to like, put a bunch of functionality into the same thing. So that, that was usually the rule of thumb that we followed for every new service that we designed.

Jeremy: Hm. Cause it seems like a bit of a trade off in that. If there's a problem with that one service, then you have a very localized place on where to work on. But on the other hand, like you said, if you have this dependency graph, you might need to step through 10, 20, 30 services to find the root problem.

Shubheksha: Yeah, there were definitely code paths that can get that big and that long. So yeah.

Jeremy: In traditional systems, we think of concepts like transactions with a relational database. For example, you may want to update a bunch of different records and have that all complete as one action. When, when you're talking about a distributed system and you need to talk to 5, 10, 20, however many services, how do transactions work in distributed systems?

Shubheksha: I'm very tempted to say they don't (laughs). Uh, so yeah, so this is, a really really interesting field within itself, like within distributed systems itself and transactions like distributed transactions are really really hard, like transactions themselves are super hard, even on a single machine.

And then you like add all of this complexity and you're just like, yeah galaxy brain (laughs) . I'm not super familiar with this topic, but I've read a bit because it's just like super fascinating. And like, usually you try to offload all of that.

One way you can do it is via a managed service. You don't have to care like where your database is running, how it's running, you just use APIs, you throw stuff at it and you know, it's going to be stored.

There's a bunch of fine tuning and configuration you can do yes. But yeah, you don't know how to think about like the nitty gritty of it. Distributed transactions, like there's also different definitions for it. Like what do you mean?

I mean, when you say a distributed transaction, like, a transaction, that's executing on multiple machines or transaction, that's like gathering data from multiple machines before returning it to you? A transaction that's I don't know, halted to be executed on different machines? So yeah, it can get really complicated.

Jeremy: Yeah, I'm thinking from the perspective of Monzo is a bank, right? So, you might pull money out of an account and transfer it to somebody else like a very basic example. And maybe as a part of that process, you need to talk to 10 different services and let's say on the sixth service call, it fails, but those previous five have already executed whatever they're going to execute.

Shubheksha: Ah, right. I see what you mean. Do we roll all of that back? Or like, do we, yeah, so like we did not roll stuff back. But yeah, like you have to account for that in your error handling logic that what happens if it fails at this step and like XYZ has already happened?

Jeremy: Could you give like an example of one way you might handle something like that?

Shubheksha: Uh, let me think. I'm not sure. I possibly dealt with a situation like that because like all of the tables were scoped to the service that you were working on. So like nobody else, like no other service can access the tables that you had within your service. So that's simplified a lot of things. And usually there was a lot of correlation by IDs.

So like if you're generating flake IDs in one service and it fails and it doesn't generate an id, then it will not be found. And you know something has gone wrong. So you basically just like log and bail and stop proceeding. But obviously around the parts that move money we had a lot more robust error handling around that. Yeah.

Jeremy: Yeah. So I guess that's one thing that could help you track, as you were saying, having an ID that passes from service to service. So,

Shubheksha: So that's usually accomplished by go's context package. So you have some sort of a trace ID that's like passed through the lifetime of a request all the way down. Uh, so you know, that yes, like all of those calls were part of the same request.

Jeremy: Hmm. Okay. And then that would mean that you could go to some kind of logging or monitoring service and actually see, like, these are the five services that we're trying to handle it, and this was the results. I guess there's like two parts there's there's figuring out which services and which calls were part of the same transaction.

And then like you were saying earlier if something goes wrong, how do we correct for it? And yeah, that sounds like a very, very challenging problem (laughs).

Shubheksha: Yeah. Yeah. Like one of the main problems with distributed systems is when things go wrong, you don't know what's going to break where. Like a lot of things break and then you just have to like start searching for, okay, what has gone wrong, where, and like something completely different in a completely different part of the system might just be like breaking something else in a completely different part of the system. And like, as your system grows, it's impossible to keep track of all of it in your head, especially like a single person cannot do that.

So it can just be really hard to like, have a big picture view of the system and figure out what's going on, where

Jeremy: Mm, and it sounded like in your experience that that knowledge of what was going, where and how things interacted. A lot of that was sort of tribal in a way. Like there were people who, who knew how the pieces fit together. Um, but it's tough to really have this document or this source of truth.

Shubheksha: Yeah

Jeremy: Yeah, I can imagine that being like you were saying earlier, very difficult in terms of onboarding. Yeah.

Shubheksha: Yep.

Jeremy: I guess another thing about when you have distributed systems and you have all these different services. Presumably one of the benefits is so that you can scale up certain parts of the system.

I wonder from your perspective, how do you kind of figure out which parts of the systems need to be scaled up and what are some strategies for, if you're really hitting barriers, a performance in a certain part of the system, how you deal with that?

Shubheksha: Usually metrics. If you're constantly getting alerts that yes, like a particular node or like a particular part on a particular node is constantly hitting its memory and CPU usage, or, you know, you're going to be performing some sort of like heavy workload on a particular service and you scale it up preemptively or like an alert fires, and then you scale it up.

So that's like a very manual way to go about it. Uh, and in terms of performance constraints. So a lot of that was handled by our infrastructure team and I personally did not work on like fine tuning uh performance as such on any of the parts of the system that I worked on. But like what I saw other people doing, was that like, you have to go on chasing, profiling, and figuring out where that bottleneck is.

And like, sometimes it can just be your bug. That's like, like you're leaking goroutines or something like that, or like you're leaking memory somewhere else. So stuff like that, like usually you just, you have to keep profiling till you figure what's going on and then fix it.

Jeremy: And for you as the person who's building the services and handing them off to the infrastructure team, was there any contract or specification written up where you said like, Hey, my service is going to do this. And I think it's going to be able to process this many messages or this many transactions a second.

And then that's how the infrastructure team can figure out how much to scale or how did that work?

Shubheksha: Yeah. So there was some amount of load testing that was done for like services that were really, really high throughput with the infrastructure team so that they like were in the loop and like they knew what was going on and they can scale up preemptively.

Jeremy: In your experience as a backend engineer, we've talked about the infrastructure team. Are there any other teams that you would find yourself working on a regular basis?

Shubheksha: Not really, no, it was mostly backend and and infrastructure. And in some cases security as well. If you needed like a particular like their input on something particular that we were working with.

Jeremy: Would they provide some kind of audit to tell you, like, this is where potential issues might be or?

Shubheksha: Yeah. Like something like that, or like, if you just need their input on like a particular security strategy, like if you wanted to like store some, critical data or something like that, how do we do that? Or like, what would be the best way to go about it so stuff like that.

Jeremy: During your time at Monzo, you were writing systems in, in Go What do you think makes Go a good choice for distributed systems or for the type of work you were doing in general?

Shubheksha: One of the main things that I like about it is that it's super easy to pick up. This was my first job with Go like first, full time job. I was learning Go a little before that, but I was able to pick up very, very quickly. I think what attracted people to Go like, this was definitely a huge part of it.

Then the fact that it was backed by Google and, and it wasn't going to like just vanish overnight was also really helpful. And I think, it really started from docker and docker sort of shot into the limelight and it sort of made it the defacto language of cloud native infrastructure and it just kept catching on.

And then Kubernetes came along and then because Kubernetes was written in Go, everything else started to be written in Go. It has a lot of neat features that I think make it uh really really, easy to sort of start building stuff with it. One of them is definitely like how good the standard libraries are and how easy it is to sort of build on top of them.

And like first-class support for concurrency, which can be a huge, huge pain to sort of implement on its own and like a lot of languages don't support those primitives out of the box. Like it's really good for those sorts of use cases as well. And yeah, it's just easy and fun to learn.

Jeremy: So basically it was easy for you to step in as a new employee, and start learning how services work just because the language was easy to pick up. um, and yeah, that, that built in support for concurrency. I wonder, are there any things compared to your experience with other languages where you, that you didn't like about go.

Shubheksha: Mmm, I think, yeah, this is, this is like a very common and controversial opinion, uh, in the go community, but like sometimes it can feel very repetitive like, especially the error handling bits where you're like constantly checking for errors. I can appreciate explicit error handlings but like, yeah, sometimes it can just feel like you're writing the same block of code, like 50 times in the same file.

And that can be a little bit annoying. It's super verbose. Like it just, verbose, not in the Java sense, but more in the, you should know what you're doing sense, like, it tries to be as simple as possible rather than being as clever as possible and making things harder to understand

Jeremy: And the the error example you were giving, I'm assuming that's because go doesn't have exceptions. Is that right?

Shubheksha: Yes, Go does not have a concept of exceptions at all. you're supposed to do explicitly check for errors every single step and then decide what you're going to do. So in a way, it sort of makes you think harder about like the ways in which your code fails and what you're supposed to do after that, which is a good thing.

Jeremy: Right in some cases you have a very deeply nested call stack, And you might be accustomed to throwing an exception deep and at the, outer level, just handling that. Um, but with Go at every single level of the stack. You would need to check the error.

What do I want to return or, yeah okay.

Shubheksha: So you have to bubble the error up. You need to make sure you're bubbling the right error up. You have to augment it with all of the information and the metadata that you need, which will help you debug the problem no matter at what layer it occurred at so that can definitely be sometimes tricky.

Jeremy: Yeah, I can definitely see, um, I guess why it'd be good and bad. Maybe we should move on to you recently wrote a post about the lessons you learned in your third year as being a software engineer. I thought we could go through a few of those points and maybe, get into where you're coming from with those. I think the first point you had mentioned was just doing things like working on projects and doing the work isn't enough. You have to know how to, how to market it as well.

Shubheksha: Yeah.

Jeremy: You had talked about marketing to the right people. are you talking about people who are internal at your company or outside and, and how do you find what you consider to be the right people?

Shubheksha: Oh, that's a really good point. I think it's a bit of both like internally in terms of like progression and like the stakeholders of your project and all of the other people that you work on on a day to day basis. And externally just to build a network and like, let people know what you're working on and what you're learning.

Like I'm a huge fan of like learning in public in general. That's just something that brings a lot of good things your way if you do it right.

Jeremy: Things like blog posts or conference talks, things like that. Another thing you mentioned was how, how titles matter. And I know that there's a lot of discussion where people say like, Oh, it doesn't really matter. But, um, you were saying like, they really do matter. And. from your perspective, how dod the roles of, of titles change from, from company to company?

Shubheksha: I think it. It really depends on the size of the company as well. Uh, that's one of the main factors, like how well established the company is, how many engineers do they have? Like how much do they like place weight on the title internally? Like there are companies where like there are meetings, which are exclusively for people who have a certain level or a certain title.

So in that case, like, it's very obvious that if you don't get promoted then you get you're stuck, you're losing out because you're literally not allowed to attend. A certain set of meetings that you would probably benefit from. And in, in like in other ways it can also like halt your progression. And when I say titles matter, I don't say that.

Like, just because you're a senior engineer, you know, more I don't believe in like, just number of, years of experience, it's about the quality of the work that you have done more than like just the sheer amount of time that you've put in. But. In terms of like how people perceive you, what they expect from you and what they assume about you definitely shifts when you have like a title attached to your name.

Jeremy: And in your career or just in people's careers in general, how much, weight or priority should people put into making sure that they, they have like a. impressive title or a high title. especially when you think about, you were saying how it changes between the size of the organization.

Um, somebody who's a CTO at a three person company it's not the same as at a 5,000 person company. I wonder what your thoughts are on that.

Shubheksha: Uh, yeah, that's a really good question. I think like what I mentioned in the next point is that it, it's very easy to get caught up in that, in the climb. Like when it comes to career ladders, but that can also be very detrimental and that relates directly to the fact that it's not just years of experience, it's actually the quality of those years of experience that actually counts towards like how good you are.

If you, if you run after two months, it can just be like a second full time job. It can completely drain you. You'll stop learning, you stop actually doing what you enjoy. And you'll just blindly chase a title. Like that's not a good position to be in because like, you have to balance it out.

At the same time if you feel like you're just getting stuck, you're not getting promoted. You're not getting to the level you want to be at. It might just be time to like, change. And see what's out there, and if there's someone who will treat you better, but like, it really depends on what you value more because a title change can bring you better job prospects, especially if it's a change from like just software engineer to senior software engineer, then you just have to much wider pool of roles available to you. So it really depends on the stage of the career you're at. And what do you value personally?

Jeremy: In your posts, you had talked about chasing promotions, What are some of the things that you might find yourself doing that are more going for the promotion rather than growing yourself? technically or, you know, skill wise?

Shubheksha: That depends on what your company values and what do they base promotions on? So like if they have a very rigid structure where it's like a bunch of check boxes that you have to tick, then a lot of people will try to game that system. And they'll just try to do things maybe at the expense of their growth, as well as the expense of what's good for the company just to get ahead and get promoted.

So that's not beneficial for the employee or the company. But yeah, like a lot of, frameworks that are like of that fashion, where you have to check a bunch of boxes in order to get promoted often like end up in people doing things that they are being incentivized to, but in a very wrong way,

Jeremy: Right. So I guess it's looking at what your organization's process is and figuring out is it, you know, some of it is not the stuff you want to do, but you just kind of get it done and you can move on or whether the process it's so, difficult.

And, and so, um, um broken, yeah, basically, uh, that it's worth just moving on to another, another organization. Your, your next point, I think was about, uh, how sponsors are so important. And I wonder if you could explain a little bit about what you mean by a sponsor versus, a mentor or somebody else?

Shubheksha: Yeah. So the difference between uh sponsors and mentors is that I think mentors, the way I think about it at least is that mentors are very passive in the way they help you vs sponsors are very active. Like mentors will be there for you. They will help you. You can go ask them questions, whereas with sponsors, they will actively put your name in the hat for opportunities help you connect with people and like send opportunities that they think are a good fit your way, and they will be your advocate.

Rather than you having to ask your mentor to be your advocate. So I think that's, that's the main difference. And like, there's also the question of sort of like the sponsor's reputation being at stake in a certain sense that they, they basically take a chance on you and they trust you enough to put their reputation on the line in order to do right and good by you, which is not the case with mentors.

Jeremy: It sounds like a sponsor would be someone who is vouching for you, right. Or is giving you recommendations.

Shubheksha: Yes, essentially. Yeah. Someone who watches for you in like rooms either you're not in, is familiar with your work, actively advocates for it.

Jeremy: Hmm. And that could be someone within your organization, for example, saying, I think that you would be a great fit for this project and telling other managers about that, or it could even be someone outside of your work, just saying if you're looking for a position or something, they might say, Oh, I know somebody who would be a great fit for this conference talk or this job, that, that sort of thing.

Shubheksha: Precisely. Yeah.

Jeremy: Do you have any specific advice for how people should.. I don't know if seek out is the right word but to build up a network of sponsors,

Shubheksha: I get that a lot. And I have a blog post in the works for it, which I'm trying to think about, but like, it's really hard to put in words, like it's just like a lot of networking and like reaching out to people and like slowly and steadily sort of surrounding yourself with people that you can admire and look up to and who will be like willing to vouch for you.

So like there's no direct, straight method to do it. I can't prescribe like if you follow step one, two, three, that, yes, you'll definitely have it. I think it's just, it just takes time on a lot of effort and patience, because I remember at like, when I started, like, I was desperately looking for mentors and I was just like reaching out to people and I just felt like I could do so much if I just needed the guidance and it was really really hard to find.

So yeah, like I completely empathize with like people who are struggling with it at the moment. And it's, it's really hard. A lot of people are like super busy. They don't have time. And so it's it can just feel like bad to sort of ask for their time as well. So, yeah, that's also something I definitely struggle with, but like don't have concrete steps, at the moment, will, hopefully have something to say and publish it in a blog post.

Jeremy: Yeah. I mean, I wonder from your personal experience, I remember a few years ago you went on a few podcasts to talk about your experience, getting into open source and things like that.

You know, looking back on the process you went through, is that a path that you would recommend to other people, or do you have thoughts on how you might have approached that differently now?

Shubheksha: I think I got very lucky at like some steps. Like I did not set out I did not plan it out uh that way it just sort of happened. And yeah. So like, I don't think I would take a different path. Like I like along the way, especially via open source like I met so many great people who have been amazing mentors and sponsors for me, but like I did not seek them out.

The, the other thing that I realized was like, you can't meet people and be like, Oh, you're my mentor. Like, that's a relationship. That's built over time as you learn from each other. And the other thing is, yeah, it's a two way relationship. It's not just you leeching off the other person and like trying to, you know, like all, like take away all of their knowledge and like sort of embedded it in your brain.

That's not how it works. Like even if you have less knowledge compared to the other person, they can still learn something from you and like, it's always good to have that mindset rather than, you know, the hero mindset where you're like, Oh my God, this person is, is God. And like, I want to learn everything from them like that doesn't help.

Like placing people on a pedestal, basically. Like it eases the pressure on them as well. And it helps you be more comfortable too. So like a lot of my mentor, like sponsor relationships are like that where we have a bunch of like mutual interests and we talk about it and we learn from each other. So like a lot of them are like, not even like formalized, like we never, like had a conversation where we were like, Oh, like, we are like a mentor and a mentee now, like a sponsor and a sponsee now, like yeah, you just, you sort of build those kind of bonds with people and where they feel comfortable watching for you.

And you just take it from there.

Jeremy: That makes a lot of sense just because relationships in general are very much, it's not you meet somebody and you go, we are going to be friends, right?

Shubheksha: It just takes time. Yeah.

Jeremy: Yeah. So I can see how it'd be too difficult to pin down what are the steps you should take? I'm looking forward to the blog posts though.

Shubheksha: Yeah. Thank you.

Jeremy: I think the last thing you had mentioned in the post was you were talking about how programming got easier over time. And I wonder, were there any specific milestones you remember of things that were really hard and then it, all of a sudden, it just clicked over the years

Shubheksha: I think uh one of the main things I remember was like, just on designing containers and how they work. Like it was literally like a light bulb moment with where like I just, I just felt like, yes, I finally understand what it is. And like, there have been times over the years where I have felt like that without like actively trying to something that I've experienced very frequently is like, I learn about something.

Like say right now and I'll understand maybe like 40% of it. And then I'll come back and revisit it, like from another topic or like an adjacent topic. And I finally understand it and maybe I've been like working with it or around it in the intervening months, but like, it's not like I sat down and I tried to like, learn about it or anything like that.

But like when you're understanding builds up. During the intervening time, you sort of realize that you've not only learned the thing that you were actually learning, but a lot of adjacent concepts also makes sense. And it can be very hard to have this perspective when you're starting out because you're just like, none of this makes sense, like what is going on.

But yeah, it, like your understanding sort of grows and compounds in a similar way to money, I would say. And it's very hard to remember that when you're just starting out.

Jeremy: Yeah, I can definitely relate to that where I think it's, it's very difficult. And I think a lot of people want to be able to sit down and read a book or go through a course and go like, okay, I now know this thing. Uh, but I have the same experience with you where, uh, it's just working on problems that are not even hitting.

Always the main thing, but it's related to it and all of a sudden it might suddenly click. Yeah.

Shubheksha: Yeah.

Jeremy: Cool. Well, I think that's a good place to wrap up, but are there any other things you, uh, you'd like to mention or think we should have talked about.

Shubheksha: No, I think we covered a lot of ground and it was really fun chat thank you so much.

Jeremy: Cool. where can people follow you online to see what you're up to and what you're you're going to be doing next?

Shubheksha: they can follow me on Twitter. Uh, that's where I'm most active. Uh, my handle is scribblingon, I have a website which is shubheksha dot com and that's where I post all of my writing.

Jeremy: Cool. Shubheksha, thank you so much for joining me today.

Shubheksha: Thank you so much.

React Authentication with Ryan Chenkie

Thu, 17 Dec 2020 05:43:48 -0000

Ryan is the author of Advanced React Security Patterns and Securing Angular Applications. He picked up his authentication expertise working at Auth0. Currently, he's a GraphQL developer advocate at Prisma.

Related Links:

Music by Crystal Cola.

Transcript

You can help edit this transcript on GitHub.

Jeremy: [00:00:00] Today, I'm talking to Ryan Chenkie. He used to work for Auth0 and he has a new course out called advanced react security patterns.

I think authentication is something that often doesn't get included in tutorials, or if it's in there, there's a disclaimer that says, this isn't how you do it in production. So I thought it would be good to have Ryan on and maybe walk us through what are the different options for authentication and, how do we do it right?

Thanks for joining me today, Ryan.

Ryan: [00:00:30] Yeah, glad to be here. Thanks for inviting me.

Jeremy: [00:00:33] When I first started doing development, my experience was with server rendered applications. like building an app with rails or something like that.

Ryan: [00:00:42] Yep.

Jeremy: [00:00:42] And the type of authentication I was familiar with there was based on tokens and cookies and sessions. And I wonder if for people who aren't familiar, if you could just walk us through at sort of a basic level, how that approach works.

Ryan: [00:01:01] Sure. Yeah. so for those who have used the internet for, for a long time, the web, I should say for a long time, And who might be familiar with like the typical round trip application, which, which many of these still exist. A round trip application is one where every move you make through the application or the website is a trip to the server that's responsible for rendering the HTML for a page for you to see and on the server things are done. Data is looked up and HTML is constructed which is then sent back to the clients to display on the page.

That approach is in contrast to what we see quite often these days, which is like the single page application experience, where if we're writing a react or an angular or Vue application it's not always the case that we're doing full single page, but in a lot of cases, we're doing full single page applications where all of the JavaScript and HTML and CSS needed for the thing gets downloaded initially, or maybe in chunks as movements are made.

But let's say initially is downloaded and then interacting with data from the server is happening via XHR requests that that go to the server and you get some data back. So it's different than the typical round trip approach, In the roundtrip approach. And what's historically done, what is still very common to do and it's still a very legit way to do auth is you'll have your user login. So username and password go into the box. They hit submit. And if everything checks out, you'll have a cookie be sent back to the browser, which lines up via some kind of ID with a session that gets created on the server.

And a session is an in memory kind of piece of data. It could be in memory on the server. It can be, in some kind of like store, like a Redis store, for example, some kind of key value store, could be any of those things. And the session just points to a user record, or it holds some user data or something.

And it's purpose is that when subsequent requests go to the server, maybe for new data or for a new page or whatever, that cookie that was sent back when the user initially logged in is automatically going to go to the server. That's just by virtue of how browsers work cookies go to the place from which they came automatically.

That is how the browser behaves. The cookie will end up at the server. It will try to line itself up with a session that exists on the server. And if that is there then the user can be considered authenticated and they can get to the page they're looking for or get the data, the data that they're looking for.

That's the typical setup and it's still very common to do. It's a very valid approach. Even with single page applications. That's still a super valid approach and some people recommend that that's what you do, but there are other approaches too these days.

There are other ways to accomplish this kind of authentication thing that we need to do, which I suspect maybe you want to get into next, but you tell me if we want to, if we want to go there.

Jeremy: [00:04:03] You've mentioned how a lot of sites still use this cookie approach, this session approach, and yet I've noticed when you try to look up tutorials for working on react applications or for SPAs and things like that. I've actually found it uncommon, at least, it seems like usually people are talking about JSON web tokens that are not the cookie and session approach.

I wonder, if you had some thoughts on, on why that was.

Ryan: [00:04:37] Yeah, it's an interesting question. And I've thought a lot about this. I think what it comes down to is that especially for front end developers who might not be super interested in the backends, or maybe they're not concerned with it. Maybe they're not working on it, but they need to get some interaction with a backend going.

It's a little bit of a showstopper, perhaps, maybe not a showstopper. It's a hindrance. There are road blocks put in place if you start to introduce the concept of cookies and sessions because it then necessitates that they delve a little bit deep into the server. I think for front end developers who want to make UIs and want to make things look nice on the clients but they need some kind of way to do authentication.

If you go with the JSON web token path, it can be a lot easier to get done what you need to get done. If you use JSON web tokens maybe you're using some third party API or something. And they've got a way for you to get a token, to get access to the resources there.

It becomes really simple for you to retrieve that token based on some kind of authentication flow, and then send it back to that API on requests that you make from your application and all that's really needed there is for you to modify the way that requests go out. Maybe you're using fetch or axios or whatever you just need to modify those requests such that you've got a specific header going out in those requests. So it's not that bad of a time, but if you're dealing with cookies and sessions, then you've got to get into session management on the server. You've got to really get into the server concepts in some way if you're doing cookies and sessions.

I think it's just more attractive to use something like JSON web tokens, even though when I think when you get into, like, if you look at companies that have apps in production and, and like, especially like with organizations that have lots of applications, maybe internally, for example. And they're probably doing single sign-on. Maybe they've got tons of React applications, but if they're doing single sign-on, especially there's going to be some kind of cookie and session story going on there. So yeah, I think all that to say ease of use is probably a good reason why in tutorials, you don't see cookies and sessions, all that often.

Jeremy: [00:06:51] It's interesting that you bring up ease of use because, when you look at tutorials, a lot of times, it just makes this assumption that you're going to get this JSON web token. You're going to send it to the API to be authenticated. But there's a lot of other pieces missing.

I feel like to to make an actual application that's secure and going through your course for example, there's the concept of things like, the fact that with a JWT, a JSON web token, you can't, invalidate a token. Like you can't force this token to no longer be used because it got stolen or something like that.

Or, there's strategies that you discuss. Like you have this concept of a refresh token alongside the JSON web token and going through the course and, and I'm looking at this and I'm going like, wow, this, this actually seems like pretty complicated. This seems like I have to worry about more things than, just holding onto a cookie.

Ryan: [00:07:50] Totally. Yup. Yeah, I think that's, that's exactly right. I think that's one of the selling features of JSON web tokens is especially if you're coming from maybe like maybe if you're newer to the concept, let's say one of the compelling reasons to use them is that they're pretty easy to get started with for, for all the reasons that I went into just a second ago.

But once you get to the point where you've got to tighten up security for your application, you need to make sure that users aren't going to have these long lived tokens and be able to access your APIs indefinitely. And you want to put other guards in place. It requires that you do a bit of gymnastics to create all these things to protect yourself, that cookies and sessions can just do for you if you use this battle-tested well trotted method of authentication and authorization and that's one of the big arguments against JSON web tokens.

There's this really commonly cited article, which if I'm doing training, I'll drop, in any session that I'm doing, I can probably get you the link, but it's joepie.

I think I'm going to Google that joepie, JWT. That's what I always Google when I need to find this. The title is Stop using JWTs for Sessions. So this guy. Sven Sven Slootweg, I think is his name. He, he's got a great article, a great series of articles actually about why it is not a great idea to use JSON web tokens.

And I think there's a lot of validity to what he's saying. It essentially boils down to this-- that by the time you get to the point where you've got a very secure system with JSON web tokens. you will have reinvented the wheel with cookies and sessions and without a whole lot of benefit. I think he's making the case for this in situations where let's say you've got like a single page react application and you've got an API that is controlled by you that is your own first party API. That's just responsible for surfacing data for your application in those situations. You might think you need JSON web tokens, but you would be able to do a lot with cookies and sessions just fine in those situations. Where JSON web tokens have value I think is when you have different services, in different spots, on different domains that you can't send your cookies to because they're on different domains.

Then JSON web tokens can make sense. At that point, you're also introducing the fact that you need to share your secrets for your tokens amongst those different services. Other third-party perhaps services or domains that you don't control would need to know something about your tokens. I don't know if I've ever seen that actually in practice where you're sharing your secret, but in any case you would need to, to be able to validate your tokens at those APIs.

And so that becomes a case for why JWTs makes sense because cross domain requests, you won't be able to use cookies for that. So there's trade-offs but ultimately if you've got an application front end app and your own API, I think cookies and sessions make a lot of sense.

Jeremy: [00:10:50] I think that for myself and probably for a lot of people, there is no clear I guess you could say decision tree or how do you choose which of the two to use? And I wonder if you had some thoughts on, on how you make that decision.

Ryan: [00:11:05] Yeah. Yeah. It's interesting. It's certainly a whole plethora of trade offs that you would need to calculate in some way. It's one of the things that I see people frustrated with the most, I think, especially when you see articles like this, where they say stop using JWTs. The most common refrain that I've seen in response to these articles is that it is never explained *what* we should do then.

Like there's always these articles that tell you not to do something but they never offer any guidance about what you should do. And this article that I pointed out by Sven he says that you should just use cookies and sessions. So he does offer something else you can use.

But I think what it comes down to is you need to, I think first ask yourself, what's my architecture look like. Do I have one or two or a couple or whatever, front end applications backed by an API? Do I have mobile applications in the mix? Maybe communicating with that same API and where's my API hosted is it under the same domain or different domain? Another thing is like, do I want to use a third party provider? If you're rolling your own authentication, then sure. Cookies and sessions would be sensible, but if you're going to go with something like, Auth0 or Okta or another one of these identity providers that lets you plug in authentication to your application, they almost all rely on OAuth, which means you are going to get a JSON web token or a different kind of access token.

It doesn't necessarily need to be a JSON web token, but you're going to get some kind of token and instead of a cookie and session, because they are providing you an authentication artifact that is generated at their domain. So they can't share a domain with you, meaning they can't do cookies and sessions with you.

So the decisions then come down to like, whether you want to use third party services, whether you want to have support multiple domains to, to access data from you know, it would be things like, are you comfortable with, your sessions potentially being ephemeral? Meaning like, if you're going to go with cookies and sessions and you are deployed to something that might wipe out your sessions on like a redeploy or something, I would never recommend that you do that.

I would always recommend that you use like a redis key value store or something similar to keep your sessions. You would never want to have in-memory sessions on your server because these days with cloud deployments, if you redeploy your site, all your sessions will be wiped.

Your users will need to log in again, but it comes down to how comfortable are you with, with that fact, and then on the flip side, it's like, how comfortable are you with having to do some gymnastics with, with your JSON web tokens, to invalidate them potentially.

One thing that comes up a lot when I'm with the applications I've seen that are using JSON web tokens is somebody leaves the company. Or somebody storms out in the middle of the day cause they're upset about something and they've been fired or whatever, but they have a JSON web token that is still valid. The expiry hasn't expired yet. So theoretically, they could go home or something. If they have access to the same thing on their home computer, or maybe they're using a laptop of their own. Anyway, if they still had access to that token, they could still do some damage if they're pissed off right. Something like that. And, unless you put those measures in place ahead of time, you can't invalidate that token.

You could change up the secret key that validates that token on your server, but now all your users are logged out, everyone has to log back in. So yeah, man, it just comes down to like a giant set of trade-offs and the tricky part about it... I think a lot of the reason that people get frustrated is just that all of these little details are hard to know ahead of time.

And that's one of the reasons that I wanted to put this course together is to help enlighten people as to the minutia here. Cause it's a lot of minutiae, like there's a lot of different details to consider when making these kinds of decisions.

Jeremy: [00:15:13] I was listening to some of the interviews, and an example of something that you often see is people go like, okay, let's say I'm going to use JWTs but I don't know where to store it.

And then in your course, you talk about it's probably best to store it in a cookie and make it HTTP only so that the JavaScript can't get to it and so on and so forth. And then you're talking to, I think it was Ben (Awad). And he's like, just put it in local storage what's the big deal? In general in tutorials that you see on the internet, but in this course, it's like, you're getting these mixed messages of like, is it okay to put it here? Or is it not?

Ryan: [00:15:55] Yeah, it's interesting, right? Because I mean, maybe it's just like this common tack that bloggers and tech writers have, where, when they write something, I find this often and I like to go contrary to this, but what I find often, a lot of people are like, do this and don't do this. I'm giving you the prescription about what to do.

Whereas I like to introduce options and develop a conversation around trade-offs and stuff like that. But I think where we see a lot of this notion that you're not supposed to put them in local storage or you're supposed to put them in a cookie or whatever, it's because we see these articles and these posts and whatever about how it's so dangerous to put it in local storage and you should never do it.

And there's this pronouncement that it should be off limits. And the reality is that a lot of applications use local storage to store their tokens. And if we dig a little bit beneath the surface of this this blanket statement that local storage is bad because it's susceptible to cross-site scripting... If we dig a little bit deeper there what we find is that yes, that is true. Local storage can be (cross-site) scripted. So it's potentially not a good place to keep your tokens. But if you have cross-site scripting vulnerabilities in your application, you've arguably got bigger problems anyway. Because let's say you do have some cross site scripting vulnerability that same vulnerability could be exploited to allow somebody to get some JavaScript on your page, which just makes requests to your API on your behalf and then sends the results to some third party location.

So yeah, maybe your tokens aren't being taken out of local storage, but if your users are on the page and there's some malicious script in there, it can be running requests to your API, your browser thinks it's you because it's running in the browser in that session. And you're susceptible that way anyway. So yeah, the argument there to say local storage is not a big deal is because yeah, cross-site scripting bad, but I mean, you're not going to take your tokens out of local storage and be like, ah, I don't have to worry about cross-site scripting now. I'm good. You still have to manage that. So and that's the other thing too, like there's so many different opinions on this topic about what's right and what's wrong. Ultimately, it comes down to comfort level and your appetite for putting in measures that are hopefully more secure things like putting it in a cookie or keeping it in browser state. But maybe with diminishing returns. There's maybe some point at which putting all that extra effort into your app to take your tokens out of local storage might not be worth it because if you don't have that vulnerability of cross-site scripting anyway then maybe you are okay.

Jeremy: [00:18:40] And just to make sure I understand correctly. So with a cross site scripting attack, that would be where somebody is able to execute JavaScript on your website, usually because somebody submitted a form that had JavaScript code in it. And what you're saying is when you have your token stored in a cookie, the JavaScript can't access it directly, which is supposed to be the benefit. But it can still make requests to your API. Get information that is private in the JavaScript, and then start making public requests that don't need the token to other URLs?

Ryan: [00:19:20] Pretty much. Yeah. So if you put your token in a cookie and it's got to be an HTTP only cookie, you need to have that flag set, then JavaScript can't access it. Your browser won't let you script to that cookie, cause like you can with regular cookies, you can do `document.cookie` and you can get a list of cookies that are stored in that browser session, I guess you would say, or for that domain or whatever.

So yeah, you're right. You got it right. If someone is able though, to get some cross-site scripting, let's say you are keeping your token in a cookie. Someone gets some malicious script onto your page. They can start scripting things such that it makes your browser request your API on behalf of the user, without them clicking buttons and doing whatever, just says, make a post request, make a get request, whatever to the API... results come back. And then they just send them off to some storage location of theirs. I've never seen that in practice. In theory that can happen. It might be interesting to, to make a proof of concept of that. I mean, in theory all of that flow checks out but I've never actually seen it be done but theoretically possible. I'd be willing to bet a lot that it's happened in practice in some fashion.

Jeremy: [00:20:39] So following the path of your course, it starts with cookies, it goes to JWTs. And then at the end of the advanced course, it goes back to cookies and sessions. And, so it sounds like if you're going to be implementing something yourself and not using a third party service, maybe implicitly you're saying that implementing cookies and sessions probably has a lesser chance of you messing things up if you go that route. Would you agree with that?

Ryan: [00:21:12] Yeah, I think so. I think if you, especially like, if you follow the guidance, put forth by the library authors of these things, like maybe you got an express API. If you use something like `express-session`, which is a library for managing sessions. and you follow the guidance. You've got a pretty good chance of being really secure. I think, especially if you implement something like cross-site request forgery. If you have a CSRF token, that is, we can get into CSRF if you like, but if you have that protection in place, then you're in pretty good shape.

You're using a mechanism of authentication that has been around for a long time. And it has had a lot of the kinks worked out. JSON web tokens have been around for a while, I suppose in technology years, but it's still fairly new. Like it's not something that's been around for a long, long, long time, maybe 2014 or something like that I want to say. And really just into popular usage man in the last like four years probably is how long that's been going on. So not as much time to be battle-tested.

I've heard opinions before about like, man, we're going to see in a few years' time... So this plethora of single page apps, that's using JSON web tokens, we're going to see all sorts of fallout from that. I've heard this opinion be expressed where people who are not fans of JWTs are able to see their potential weak spots. They say like, people are gonna figure out how to hack these things really easily and, and just, you know, ex expose all sorts of applications that way. So, yeah. TBD on that. I suppose, but it's certainly there's potential for that.

Jeremy: [00:22:46] And so I wonder how things change when, cause I think we're, we're talking about the case where you've got an application hosted on the same domains. So, let's say you've got your react app and you've got your backend, both hosted behind the same URLs basically, or same host name. Once you start bringing in, let's say you have a third party that also needs to be able to talk to the same API. How does that change your thinking or does it change it at all?

Ryan: [00:23:22] Well, I suppose it depends how you're running things. I mean, Anytime that I've had to access third party APIs to do something in my applications. It's generally a call from my server to that third-party server. So in that flow, what it looks like is a request to come from my react application to my API that I control and the user would be validated as the request enters my application. And if it passes on through, if it's valid, then a request can be made to those third parties.

For example, if you're using third parties directly from the browser, I can't think of too many scenarios where you would be calling your API and third parties using the same JSON web token. I don't know that that would be a thing. In fact, that's probably a little bit dangerous because then you're sharing your secret key for your token with other third parties, which is not a great idea. When third parties are in the mix, generally, it's a server to server thing for me.

But you know, maybe there's a way that you validate users both at your API and at the third party using separate mechanisms, separate tokens or something like that. I don't know. I haven't employed that set up strictly from the browser to multiple APIs. For me, it's always through my own server, which is arguably, probably the way you want to do it anyway, because you're probably going to need some data from your server that you need to send to that third party and you're probably going to want to store data in your server that you get back from that third party in some scenarios. So I don't know if you go from your own server, you're probably better off I think.

Jeremy: [00:25:02] And let's say you're going from your own server and you need to make requests to our API. Does that mean that from your server side code you're receiving the cookie and then you send requests from then on using that cookie even though you're a server application?

Ryan: [00:25:23] Well in those scenarios. So this is interesting. This is where, people argue that JSON web tokens are a good fit for this scenario when you have servers to server communication. That article that I mentioned, about how you should just use cookies and sessions and stop using JSON web tokens.

The argument boils down to the fact that JSON web tokens are not meant for sessions. They're not meant to be a replacement to a session. And that's how people often use them is like a direct one for one replacement to sessions. But that's not what they are. Tokens are an artifact that is produced at a point in time that you have proven your identity, that they're basically like a receipt that you proved your identity.

And it's like, I don't know how to think about this. Maybe like when you go to an event or something and you get a stamp on your hands so that you can leave and then come back in, it's kind of like that. And then by the time that stamp washes off. It's expired. So you can't get back in that kind of thing right? So it's a point in time kind of thing. Whereas like a session would be more so like if you wanted to leave and come back from that event and every time you came back, it's like, Hey I was here just a second ago. Can you go to the back and just check with your records and make sure that I'm still valid or whatever. And then they do that and then they let you in, I don't know, maybe a bad analogy, whatever, but the point is that cookies and sessions versus tokens. It's a different world. It's not, they're not interchangeable, but people use them as if they are.

So this article argues that a good use case for JSON web tokens is when you have two servers needing to communicate with one another needing to prove between one another, that they should have access to each other, and maybe they need to pass messages back and forth between one another. That's the sort of the promise of JSON web tokens is that you can encode some information in a token easily, send it over the wire and have it be proved on either end that the message hasn't been tampered with.

That's really the crux of it with JWTs. So getting back to your question there, yeah, that's a good use of that. It's a good use case for, for tokens is if you need to communicate with a third party API, you wouldn't be able to do that necessarily with cookies and sessions.

I mean, I don't know of a way to send a cookie between two servers. Maybe there's something there. I don't know of it. But yeah, JSON web tokens easily done because you just send it in the request header. So they go easily to these other APIs. And that, I mean, if you look at integrating various third party APIs, it's generally like, they're going to give you an access token.

It might not be a JWT, but they're going to give you an access token and that's what you use to get access to those APIs. So yeah, that's how I generally put it together myself.

Jeremy: [00:28:23] It sounds like in that case you would prefer having almost parallel types of authentication. You would have the cookie and session set up for your own single page application talking to your API. But then if you are a server application that needs to talk to that same API... Maybe you use, like you were saying some kind of API key that you just give to that third party, similar to how when you log into to GitHub or something and you need to be able to talk to it from a server, you can request a key. And so that that's authenticating differently than you do when you just log in the browser.

Ryan: [00:29:07] Yeah I think that's right I think the way to think about it is like if you've got a client involved. Client being a web application or a mobile application, that's a good opportunity for cookies and sessions to be your mechanism there. But then yeah, if it's server to server communication, It's probably going to be a token.

I mean, I think that's generally these days anyway, the only way you'd see, see it be done. I think like most of the third party services that I plug into, but the access tokens, they give you aren't JSON web tokens, from what I've seen, but if you're running your own third party services, for example, and you want to talk from your main API, perhaps to other services that you have control of then a JSON web token might be a good fit there, especially if you want to pass data encoded into the token to your other services. Yeah, that's what I've seen before.

Jeremy: [00:29:59] And I wonder in your use of JSON web tokens, what are some common things that you see people doing that they should probably reconsider?

Ryan: [00:30:11] Yeah. Well, One is that it's super easy to get started with JSON web tokens. If you use the HS 256 signing algorithm, which is kind of the default one that you see quite often, that is what we call a synchronous way of doing validation for the token, because you sign your token with a shared secret.

So that shared secret needs to be known by by your server. And whichever other server you may be talking to so that you can validate the tokens between one another. If your API is both producing the token, so signing it with that secret and also then validating it to let your users get access to data. Super common thing to do, maybe not such a big deal because you're not sharing that secret around, but as soon as you have to share that secret around between different services, things start to open up for potentially that secret getting out and then whoever has that secret is able to sign new tokens on behalf of your users and use them at your API, and be able to get into your API.

So the recommendation there is to instead use the RS-256 signing algorithm or some other kind of signing algorithm, which is more robust. It's an asynchronous way of doing this meaning that there's then like a public private key pair at play. And, it's kind of like an SSH key.

You've got your public side and your private side. And so, if you want to be able to push stuff to GitHub, for example, you provide your public side of your key, your private side lives on your computer, and then you can just talk to one another. That is the better way of doing signing, but it's a bit more complex.

You've got to manage a JSON web key set. There's various complexities to it. You've got to have a rotating set of them, hopefully, various things to manage there. So yeah, if you're going to go down the road of JSON, web tokens, I'd recommend using RS 256 it's more work to implement, but it's worthwhile.

One thing that's interesting is that for a long time, eh, I can't remember if this is something that's like in the spec or not, but there's this big issue that surfaced about JSON web tokens, years back a vulnerability, because it was found out that if you, I think if you said the algorithm, so there's this header portion of the token, it's like an object that describes things about the token.

And if you set the algorithm to none I believe it is, most implementations of JSON web token validation would just like accept any token I think that's how it worked is like you could set the algorithm to none and then whatever token you sent would be validated, something like that. And so that issue has been fixed at the library level in most cases. But I think it's probably still out in the wild, in some respects, like there's probably still some service out there that is accepting tokens with an algorithm set to none. Not great. Right? Like you, you don't want that to be your situation. yeah. So watch out for that.

Other things that I can think about would be like, Don't store things in the token payload that are sensitive because anyone who has the token can read the payload, they can't modify it. They can't modify the payload and expect to, to make use of the modified payload, but they can read what's in it.

So you don't want secret info to be in there. If you're going to do a shared secret, make sure it's like a super long strong key, not some simple password. I think because JSON web tokens are not subject to, so I guess I'll back up a little bit. If you think about trying to crack a password and trying to brute force your way into a website, by guessing someone's password, how do you do that?

Well, you have a bot that is going to send every iteration, every variation of password I can think of until it succeeds. If you've got your password verification and things set up properly, you're going to use something like Bcrypt which inherently is slow at being able to decode or being able to verify passwords because you want to not allow for the opportunity that bots would be trying to guess passwords the slower you make it within reason so that a human doesn't have a bad time with it.

The more secure it is because a bot isn't going to be able to crack it. But if somebody has a JSON web token, they can run a script against it that does thousands and thousands of iterations per second, arguably, or however fast their computer is. And this is something that I've seen in real life in a demo is somebody who is able to crack adjacent web token with a reasonably strong secret key in something like 20 minutes, because they had lots of compute power to throw against it.

And they were just guessing, trying to guess various iterations of secret keys and it was successful because, you know, there's no limitation there on how fast you can try to crack it. So, and as soon as you do that, as soon as you crack it and you have that secret key, a whole world of possibilities for a hacker opens up because then they can start producing new JSON web tokens for their, for this application's users potentially, and get into their systems, very, sneakily.

So. Yeah, make sure you use a long, strong key or use RS 256 as your algorithm. That's my recommendation. I think.

Jeremy: [00:35:25] I think something that's, that's interesting about, this discussion of authentication and JSON web tokens is that in the server rendered space. When you talk about Rails or Django or Spring, things like that, there is often support for authentication built into the framework. Like they, they take care of, I try to go to this website and I'm not logged in.

So I get taken to the login page and it takes care of, Creating the session cookie and putting it, putting it in the store and it takes care of, the cross site request, forgery, issues with, with a token for that as well. And, and so in the past, I think people could, could build websites without necessarily fully understanding what was happening because the framework could kind of take care of everything.

And why do we not have that for say react or, or for Vue things like that?

Ryan: [00:36:25] Yeah. Well, I think ultimately it comes down to the facts that these, you know, these libraries like react and Vue are, are client side focused. I think we're seeing a bit of a shift when we think about things like next JS, because there's an opportunity now. if you have a full kind of fully featured framework, like next, where you get a backend portion, you get your API routes.

There's an opportunity for next, I hope they do this at some point in the future. I hope they take a library like next there's this library called next auth. very nice for integrating authentication into a next application, which is a bit tricky to do. It's it's surprisingly a little bit more tricky than you would think, to do.

If they were to take that and integrate it directly into the framework, then that's great. They can provide their own prescriptive way of doing authentication. And we don't need to worry about it as much, maybe, you know, a plugin system for other services, like Auth0 would be great. but the reason that we don't see it as much with react, vue, angular, et cetera, is because of the fact that they don't assume anything about a backend.

If you're going to talk about having something inbuilt for authentication, you need to involve a backend. that's gotta be a component. So yeah, that's, that's that's I think that's why.

Jeremy: [00:37:35] I think that, yeah, hopefully like you said, as you have solutions like, like next that are trying to. hold both parts, in the same place that we, we do start to see more of these, things built in, because I think there's so much, uncertainty, I guess, in terms of when somebody is building an application, not knowing exactly what they're supposed to do and what are all the things they have to account for.

so, so hopefully, hopefully that becomes the case.

Ryan: [00:38:06] I hope so, you know, next is doing some cool things with like I just saw yesterday. I think it was that there's an RFC in place for them to have like first-class tailwind support, which means like you wouldn't have to install and configure it separately. Tailwind CSS would just, come along for the ride with next JS.

So you can just start using tailwind classes, which I think would be great. Like, that'd be, I it's one of the least favorite things about, starting up a next project for me is configuring tailwind. So if they do, if they extend that to be something with off. Man. That'd be fun. That'd be good,

Jeremy: [00:38:37] at least within the JavaScript ecosystem, it felt like, people were people, well, I guess they still are excited about building more minimal libraries and frameworks and kind of piecing things together. and, and getting away from something that's all inclusive, like say a rails or a spring or a Django.

And it almost feels like we're maybe moving back into that direction. I don't know if that.

Ryan: [00:39:02] Yeah. Yeah, I get that sense for sure. I think that there's a lot of people that are interested in building full stack applications. you know, it feels like there was this. Not split. That's not the right way to put it, but there's, there was this trend, four or five years ago to start kind of breaking off into like being a front end developer or a backend developer, as opposed to like somebody who writes a server rendered app.

In which case you have to be, you have to be both just by the very nature of that. so there's specialization that's been taking place, and I think a lot of people now. You know, there's people like me who instead of specializing in one or the other, I just did both. So I call myself a full stack developer.

and now there's this maybe trend to go to, to, to, to go back to like being a full stack developer, but using both sides of the stack as they've been sort of pieced out now into like a single page app plus, some kind of API, but taking that experience and then melding it back into the experience of a full-stack developer.

I think we're seeing that more and more so. Yeah. I don't know. I think also like if you're. If you're wanting to build applications independently, or at least if you want to have like your head around how the whole thing works cohesively, you kind of need to, to be, you know, you need to have your head in, in, in all those areas.

So I think there's a need for it. And I think the easier it's made while we're still able to use things like react, which everybody loves you know, all the better

Jeremy: [00:40:29] Something else about react is something I liked about the course is, more so than just learning authentication. I feel like you, you, in some ways, teach people how to use react in a lot of ways. Some examples I can think of is, is how you use, react's context API to, to give, components access to requests that, that have the token included, or using your authentication context, that state to be able to know anywhere in your app, whether, you should be logged in or whether you're, still loading things like that. I wonder if there's like anything else within react specifically that you found, particularly useful or helpful when you're trying to build, an application with authentication.

Ryan: [00:41:20] Yeah. In React specifically. so I come from mostly doing angular work, years back to focusing more on react in the last few years. It's interesting because angular has more, I would say I would argue angular brings more out of the box that is helpful for doing authentication stuff or helping with authentication related things than something like react.

And, and maybe that's not a surprise because react is this maybe, maybe, maybe you think of reacting to the library as opposed to a framework. And angular is a framework that gives you lots of. Lots of features and is prescriptive about them. in any case, angular has things built in like route guards so that you can easily say this route should not be accessed unless some condition is met.

For example, your user is authenticated. It has its own HTTP library, which is to send requests to a server X XHR request. And it gives you an interceptors layer for them so that you can put a token onto a header to send that. React is far less prescriptive.

It gives you the option to do whatever you want. but some things that it does do nicely, I think which help for authentication would be like the react dot lazy function to lazily load routes in, in your application or at least lazily load components of any time. That's nice because one of the things that you hopefully want to do in a single page app, to help both with performance and with, with security is you don't want to send all of the stuff that a user might need, until they need it.

So an example of this would be like, if you've got, let's say you've got like a marketing page. And you've got your application, maybe under the same domain, maybe under different sub domains or something like that. and you've got like a login screen. There's no reason that you should send the code for your application to the user's browser before they actually log in.

And in fact, you probably shouldn't even put your login page in your react application, if you can help it. if you do have it in your application, you, at the very least you shouldn't ship all of the code for everything else in the application until the user has proved their identity. So to play this out a little bit, if the user did get all of the code for the application before they log in, what's the danger of that?

Well, there's hopefully not too much. That's dangerous because hopefully your API is well-protected and that's where the value is. That's where your, you know, your data is it's protected behind a, an API that's guarded, but the user then. Still has access to all your front end code. Maybe they see some business logic or they see they're able to piece together things about your, your application.

And, and a good example of this is do you follow, Jane Wong on, on Twitter? She finds hidden features in apps and websites through looking at the source code. I don't know how exactly how she does it. I'd love to talk to her about how she does it. I, I suspect that she'll like pull down the source code that gets shipped to the browser.

She'll maybe look through it and then maybe she'll, she'll run that code in her own environment and, and change things in it to coax it, to, to show various, to render various things that normally wouldn't render. So she's able to do that. And see secrets about how about those applications. So imagine that playing out for something, some situation where like, You in your front end code have included things that are like secret to the organization that really shouldn't be known by anybody definitely recommend you don't do that.

That instead you, you hold secret info behind an API, but the potential is still there. If it, if not that, then at least the user is figuring out how your application works. They're really getting a sense of how your application is constructed. So before you even ship that code to the browser, the user users should be authenticated.

before you send things that an admin should see versus like maybe a regular user, you should make sure they're an admin and reacts lazy loading feature helps with that. It allows you to split your application up into chunks and then just load those chunks when, when they're needed. so I'd like that feature aside from that, it's kind of like, you know, one of the downsides of react I suppose, is like, for other things like guarding routes, for attachment tokens, this kind of thing, you kind of have to do it on your own.

So yeah, react's lazy loading feature, super helpful for, you know, allowing you to split up your application. But aside from that other features, you know, it's kind of lacking, I would say in react, there's not, not a ton there to, to help you.

You kinda have to know those things on your own. Once you do them a couple of times and figure it out, it becomes not so bad, but still some work for you to do.

Jeremy: [00:46:04] Yeah. Talking about the hidden features and things like that. that, that's a really good point in that. I, I feel like a lot of applications, you go to the website and you get this giant, bundle of JavaScript. And, in a lot of cases, a lot of companies are using things like feature flags and things like that, where there'll be code for something, but the user just never gets to see it.

And that's, that's probably how you can just dig dig through and find things you're not supposed to see.

Ryan: [00:46:19] Yep. Exactly. You just pull down the source code that ends up in the browser and then flip on those feature flags manually, right? That's probably how she does it, I would assume. And it's very accessible to anybody because, as I like to try to, hammer home in my courses is that the browser is the public client.

You can never trust it to hold secrets. you really need to to take care of not to expose things in the browser that shouldn't be exposed.

Jeremy: [00:46:44] I wonder, like you were saying how in react, you have to build, more things yourself. You gave the example of how angular has a built-in way to, to add, tokens, for example, when you're, you're making HTTP requests, I wonder if you, if you think there's there's room for some kind of library or standard components set that, you know, people could use to, to have authentication or authorization in a react or in a vue, or if that's just kind of goes against the whole ecosystem.

Ryan: [00:47:18] Yeah, I think there's room for it, I wouldn't be surprised actually, if you know, some of the things that I show in my course, how to build yourself. If that exists in a, an, an independent library. I think where it gets tricky is that it's often reliance on the context. And even on the backend, that's, that's being used to make those things really work appropriately.

So. I guess a good example is like the Auth0 library that's allows you to build off into a react application. It's very nice because it gives you some hooks that, are useful for communicating, with Auth0 to, for instance, get a token and then to be able to tell if your user is currently authenticated or not.

So they give you these hooks, which are great, but, but that's reliant on like a specific service. A specific backend, a session being in place Auth0 stores, a session for your users. That's how they're able to tell if they should give you a new token. I mean, I love to see that kind of stuff.

Like if there's, you know, things that are pre-built to help you, but I just. I don't know if there's like a good way to do it in a, in a sort of general purpose way because it is quite context dependent, I think.

But I, I might be wrong. Maybe there's value. In having something that, I mean, I suppose if you created something that you could have like a plug-in system for, so something may be generic enough that it gives you these useful things. Like whether the user is currently authenticated or, if a route should be guarded or something like this.

If you had that with the ability to plug in your context, like, I'm using Auth0 or I'm using Okta, or I'm using my own auth. Whatever, maybe that maybe, maybe there's opportunity for that. that might be interesting.

Jeremy: [00:49:03] Would definitely save people a lot of time.

Ryan: [00:49:05] Yeah. Yeah, for sure. And that's why I love these prebuilt hooks by Auth0, for sure. When I'm using Auth0 in my applications, it certainly saves me time.

Jeremy: [00:49:14] The next thing I'd like to ask you about is you've got a podcast, the entrepreneurial coder and, sort of in parallel to that, you've been working a quote unquote, normal job, but also been doing a lot of different side things. And I wonder, whether that's the angular security book or recording courses for egghead like how are you picking and choosing these things and, and what's your thought process there?

Ryan: [00:49:43] Yeah, that's a good, good question. I have always been curious about business and entrepreneurial things and, and, and selling stuff, I suppose. I would say now I haven't been in business for a super long time, but I, I have been in business for a while now. and what I mean by that is, Yeah, you can take it back to like 2015 when I started at Auth0 at the same time that I was working there, I was also doing some side consulting, building applications for clients. And so since that point, I, you know, I I've been doing some side stuff. it turned into full time stuff back in 2017. I left Auth0, late 2017 to focus on, on consulting.

Then 2020, mid 2020, I realized that the consulting was good and everything like that. And I mean, I still do some of it, but it's, it's difficult when you're on your own as a consultant to be learning. And, and that's something that I really missed is that I found that my learning was really stagnating.

I learned a ton at Auth0 because I was sort of. Pushed to do so it was necessity of my job. As a, as a consultants building applications. I mean, sure. I needed to learn various things here and there, but I, I really felt that my learning had stagnated. And I think it's because I wasn't surrounded by other engineers who were doing things and learning from them and sort of being pushed to, to learn new stuff.

So yeah, an opportunity came up. With, with Prisma, who I'm now, now, working at I'm working at Prisma and it was a, it started everything kind of fell into place to make sense for it. For me. I mean, I, I don't think I would take just, just any job. It was really a great opportunity at Prisma that presented itself.

And I said, you know, I, I do miss this whole learning thing, and I do want to get some more of that. So, that's, that's probably the biggest reason that I decided to join up with a company again. So all that being said, the, the interest in like business yeah. And interest in doing product work, like a course, this react security course, or my angular book, that came out of just some kind of like, I don't know, man, like some kind of desire. I don't know when exactly it sort of sprouted, but this desire to create a product, to productize something, productize my knowledge and to create something out of that and offer it to the world and offer it, in a way where, you know, I could be, be earning some, some income from it. And you know, there there's various people I could point to that have inspired me along that journey, the, the most prominent one is probably Nathan Barry. I don't know if you're familiar with Nathan Barry. He's a CEO of convert kit and I've followed his stuff for a long time. And he was very, always very open about, his, eBooks and his products that he creates.

And that that really inspired me to want to do something similar. And so, in trying to think of what I could offer and what I could create, authentication and, identity stuff kind of made sense because I acquired so much knowledge about it at Auth0. And I was like, you know, this probably makes sense for me to to bundle up into something that is consumable by people and I, and hopefully valuable. I sense that it's been valuable as certainly the response to the products has told me that it has been valuable. So yeah, I think that's the genesis there. And then the. interest in business stuff I think I've just, I don't know. I've always been curious about, especially about like online business, maybe, maybe less so with like traditional brick and mortar, kind of business, but, but, but online business in particular, it just the there's something alluring to it because like, if you can create something and offer it for sale, there's no, there, I'm sure there are better feelings, but I was about to say there, there's no better feeling than to wake up to an email saying that you've made money when you're you've been sleeping right. And that's, it's a cool feeling for sure. so. Yeah, I think that's, that's probably where that comes from.

Jeremy: [00:53:34] And when you first started out, you were making courses for egghead and frontend masters, does that feel like very distinctly different than when you go out and make this react security course?

Ryan: [00:53:49] Yeah. Yeah, definitely. Yeah. For a few reasons. I think that the biggest and probably the overarching thing is that when you do it on your own, you create your own course, the sense of ownership that you have over it is makes it a different thing. I'm in tune quite deeply with the stuff I've created for like react security.

And even though I've done what three, I think front-end masters courses, you know, I've been out to Minneapolis a bunch of times. I know the people there, quite well it's. There, isn't the same sense of like, ownership over that material, because it's been produced for another organization when it's your own thing.

It's, it's like your baby, and you care more deeply about it. I think so. So yeah, that's been, it's a different experience for sure. For that reason. I think too, there's the fact just the, you know, getting down to the, the technicalities of it. There's, there's the fact that you are responsible for all of the elements of production.

You know, you might look at a course and be like, okay, it's just a series of videos, but there's tons of stuff that goes into it. In addition to that, right. There's like all of the write-ups that go along with each video. There's the marketing aspects like of. You know, putting the website together and all, and not only that, like the there's, the lead-up, the interest building lead up over the course of time through building an email list that, and you collect email addresses through blog posts and social engagement.

It was building up interest on social over the course of time, like tweeting out valuable tidbits. there's just this whole complex web of things that go into putting your product together. Which means that you need to be responsible for a lot of those things, if you want it to go well, that you don't really need to think about if you're putting it on someone else's platform.

So there's, trade-offs either way definitely trade-offs. But, I mean, it depends what I want to do, but I think. If I were to do a product with some size to it, it would be I'd want to do my own, my own thing for sure.

Jeremy: [00:55:48] Are there specific parts of working on it on your own what are the parts (that) are just the least fun?

Ryan: [00:55:57] Yeah, the parts that are the least fun. I suppose for me, it's like, probably editing, editing the videos is the least fun. I've got this process down now where it doesn't take super, super long, but it's still, It's a necessity to, to the process that is always there, editing videos and, and the way that I. I dunno, I'd probably do it in a I I probably put this on myself, but I, the way that I record my videos, I editing takes quite a while. yeah, I, I think that is not such a great part about it. what else? I guess there's like the whole aspect of like, if you, if you record a video and you've messed something up and the flow of things doesn't work out, then you have to like rethink the flow.

There's any anytime I've got to do anytime I've got a backtrack, I suppose that that takes the wind out of my sails for a little while. Yeah, those are the two that stand out the most. I think.

Jeremy: [00:56:51] Yeah. I noticed that during some of the videos you would, make a small mistake and correct it like you would say, Oh, we needed these curly braces here. I was curious if that was a mistake you made accidentally and you decided, Oh, you know, it's fine to just leave it in. Or if those were intentional.

Ryan: [00:57:10] Definitely no intentional mistakes. I'd love to post a video sometime on a raw video that isn't edited because you'd laugh at it. I'm sure it's like, inevitably in every video, there's what it looks like. If you, if you were to see it unedited is a bunch of like starts and stops for a section till I finally get through and then a pause for awhile and then a bunch of starts and stops to get to the, to the next section.

So like, "In this section we're going to look at", and I don't like the way that I said it. So I start again "in this section, we're gonna look at" and then there's like five of those until I finally get it. And so my trick with editing is that I will edit from the, the end of the video to the start so that I don't accidentally edit pieces of the video that I'm just going to scrap.

Anyway, that's what I did for a long time. When I first started, it's like, I'd spend like 10 minutes editing a section, and then I get further along and I'm like, Oh, I, I, I did this section in a better way later and now I have to throw it, all that editing that I just did. So if I edit from the back to the to the front, it, it means that I can find the final take first, chop it when I start to take and then just get rid of everything before it right.

So yeah, there's, there's definitely no intentional mistakes, but sometimes like, if there's. If there's something small, like it's, Oh, I I mistyped a variable or something like that. I'll often leave that in. Sometimes it's a result of like, if I do that early on in the video, and then I don't realize it until the code runs like a few minutes later.

I'm just like internally. I'm like arggh! But I don't want to go and redo the video. Right. So I'm just like, okay, I'll, I'll leave that in whatever. And some of, some of those are good. I think in fact, I arguably, I probably have my videos in a state where they're, they're a bit too clean, like, cause I've heard a lot before where people don't mind seeing mistakes.

They actually kind of like when they see you make mistakes and figuring out how to. How to troubleshoot it, right?

Jeremy: [00:59:02] The, Projects that you've done on your own have been learning courses, right. Or they've been books or courses. And I wonder if that's because you, specifically enjoy making, teaching material, or if this is more of a testing ground for like, I'll, I'll try this first. And then I'll, I'll try out, a SAAS or something like that later.

Ryan: [00:59:22] Right. Yeah. I definitely enjoy teaching. Would say that I, am passionate about teaching. maybe I'm not the most passionate teacher you'd come across. I'm sure there are many others that are more passionate than I am, but I enjoy it quite a bit. I mean, and I think that's why I like the kind of work that I do.

The devrel work that I do at Prisma. You know, I enjoy being able to offer my experience and have others learn from it. So at this point, like if I'm making courses or books or whatever, and I don't know if I'll do another book. I mean, writing, writing a book is a different thing than, than making a a video-based course.

And, and I don't know if I would do another one but in, in any case, I don't think it's a proving ground. I'll do other courses regardless of what other sort of things I make. But speaking of SAAS, that is an area that I'd like to get into, for sure. As time wears on here, I'd love to start and I've got a few side projects kind of simmering right now that that would be in that direction.

So yeah, I think that that's, that's an ideal of mine is getting to like a SAS where you started hopefully generating recurring revenue more, you know, more. More predictable recurring revenue, as opposed to, you know, of course it's a great, because you build up to a launch, hopefully you have a good launch.

It's a pretty good amount of money that comes in at launch time, but then it goes away and maybe you have some sales and you generate revenue as time goes on. But, less predictably than perhaps you would in a SAAS if you have one. So yeah, that's kind of where my head's at with that right now.

Jeremy: [01:00:51] I'm curious you, you mentioned how you probably wouldn't do another book again. I'm curious why that is and, and how you feel that is so different than a video course.

Ryan: [01:01:02] Yeah, I think, I think it's because I enjoy creating video based courses more than I, I enjoy writing. I like both, but writing less so. And I also think that there's this perception when you go to sell something, but if it's a book it's worth a certain price range, and if it's a video course, it's worth a different price range that is higher.

I don't think that those perceptions are always valid because I think the value that can come from a book, if it's on a technical topic that is going to teach you something, you know, if you pay $200, $250, $300 for that book, which is in our world, that's a PDF. Right. I think it's perfectly valid. I think that, people have this blocker though, when it comes to that internally, they're like, there's no way I'm going to pay that much for a PDF, but the same information in a video course, people are often quite, amenable to that. So there's that.

I happen to enjoy creating video courses more than I do books. I think that's the reason that I probably wouldn't do a book again. And a book is a book is harder to do than a video course. I think writing a book is, is a difficult, is a difficult thing. For whatever reason. I, I couldn't tell you like technically why or specifically why, but I sense, I've heard this from a lot of people that making a book is harder than making a video course. Yeah.

Jeremy: [01:02:28] Yeah, I have noticed that it feels like everybody who tries to write a book, they always say it was way harder than they thought it was going to be. And it took away longer.

Ryan: [01:02:36] Yeah, for sure.

Jeremy: [01:02:37] I do agree with the value perception. I don't know what it is, but maybe it's just the nature of the fact that, there's so much, you can just go online and see in blog posts and stuff that maybe for some reason, people don't value it as much when it's all assembled and, writing form.

But like the pitch, even your course on the landing pages, you say like, yeah, you can technically find all this information online, but you know, just think about how much time you're going to spend looking for it and not knowing whether or not it's, it's valid or not.

Ryan: [01:03:14] Yep. Absolutely. That's the big, the big sell I think, with, with, online education, rather than having you assemble all that knowledge yourself over the course of time, the value proposition is I will give it to you. I will show you my lessons learned and give it to you succinctly in a nicely. Kind of a, you know, collated fashion.

Jeremy: [01:03:34] Another thing that, I enjoyed about the course is that it was very to the point, sometimes you'll see an advertisement for a video course and they'll say like, Oh, it's got 20 hours of content, but it's like, the fact that it's long doesn't mean you learn more.

If anything, you would think you would want to spend less time to learn the same amount of material. So, yeah. I definitely appreciated that as well as the fact that you've got the transcripts underneath the video.

Ryan: [01:04:05] Oh, that's good to hear. That's awesome. Yeah.

Jeremy: [01:04:07] Cause it's like, I could watch a video and then get to the end and like, maybe I'll remember some things, but other things I won't. And then to, to pick it up, I have to go watch the video again. whereas when you've got that, that transcript, that's pretty helpful.

Ryan: [01:04:23] That's great to hear. Yeah, I was, I wanted to make a point of including transcripts, you know, for accessibility reasons, for sure. But, but also because you know, something that I have realized as time has gone on is that people do appreciate the, the written form. I always thought that maybe it's a little bit weird to read something that is like, the narration of a video, because it doesn't translate perhaps super well if it's in, in, in print, but I think, everything that I've heard about the transcripts has been positive. Like, like you've said, so I'm glad to hear that, people seem to like them.

Jeremy: [01:04:54] The last thing I'll ask is. I guess when authoring courses, or even just when you're releasing a product on your own, what are like some things that you think are really good ideas and what are some things that, you've realized like, Oh, maybe, maybe not so much.

Ryan: [01:05:09] Yeah. So, what I'd offer there is. it's a good idea to try to validate what you, want to create as a product. If you, if you're going for, if you're going for a sale, like right. If you want to, if you want to sell the thing, if you want to make a decent amount of money, it's a really good idea to try to validate it early on and you can do that without much effort. I think by doing smaller things ahead of time, it's things like creating couple of videos for YouTube see what the response is there. offer tidbits on places like Twitter and see what people, how people respond to them. One of the ways that I validated these courses was by doing just that, like tweeting out kind of small, valuable pieces of content, writing blog posts, and seeing how people responded to them.

And then the, like the biggest thing, though, for me for this particular course was I created the free, intro course, the react security fundamentals course. And after I saw, you know, the number of people signing up for it, I was like, okay, I think there's enough volume here. A volume of interest to tell me that if I was able to offer a sale to even a small percentage of that, that list that it would be worthwhile.

The other thing that I, would recommend is that you, that you build up an email list, you find some way to get in touch with your. audience and keep in touch with them. And that's, that's often through email. if you can build up an email list and offer valuable things to them over the course of time, they will have you in mind, they will feel some need for reciprocity when it comes time to, for you to offer something for sale, do all those things.

well before you go to release your course, the last thing I want for anyone is to release the course and nobody is there to, to buy it. And, you know, unfortunately I think a lot of people. have the assumption that if they just, you know, create it, put it up on the internet, it'll sell somehow. But the reality is you almost certainly need to have develop an audience beforehand.

At least if you do, that's the way that, you know, you can make a meaningful sort of, I guess, a impact, but also be. financial results from going down that road. So build that audience, I think is, is the key there.

Jeremy: [01:07:18] Cool, that's a good place to, to start wrapping up, if people want to check out your course or see what you're up to, where should they had?

Ryan: [01:07:26] Sure. Yeah. Thanks so much that you can go to react security.io, and I'll get you some links, to put in the show notes or whatever, but react security.io. There's a link there for. The advanced course. you can check me out on Twitter. I'm at @ryanchenkie on Twitter and, where else I've got a YouTube channel that's my name? Ryan Chenkie has my name on YouTube. I've got some sort of free stuff there. Yeah. Check those places out, you know, kind of find your way around.

Jeremy: [01:07:50] Cool. Well, Ryan, thank you so much for chatting with me today.

Ryan: [01:07:54] Thanks for having me. This has been great. I was happy to be able to do this.

Jeremy: [01:07:58] Thanks again to Ryan Chenkie for coming on the show. I'll be taking a short break and there will be new episodes again next month. i hope you all have a great new year and I'll see you then.

Software Sessions

Scott Hanselman on AI-assisted programming

Related links

Bryan Cantrill on Oxide Computer

Related links

Elizabeth Figura on Wine and Proton

Related links

Transcript

Intro

What's Wine

System Calls

What people run with Wine (Productivity, build systems, servers)

How games are different

What makes some applications hard to support (Some are hard, can't debug other people's apps)

Test suite (Half the code is tests)

How compatibility problems appear to users

Performance is comparable to native

Proton and Valve Software's role

The entire stack receives changes to improve Wine compatibility

Wine's history

Loading executables and linked libraries

Poor system call documentation and private APIs

Debugging and having broad knowledge of Wine

Finding a bug related to XAudio

How much code is OS specific?c

Direct3d, Vulkan, and MoltenVK

Targeting Wine instead of porting applications

Getting support and contributing

François Daoust on the W3C

Related links

Transcript

Intro

What's the W3C?

Funding through membership fees

Relations to TC39, IETF, and WHATWG

How WebRTC spans the IETF and W3C

Recommendations

Royalty free patent access

Implementations are built in parallel with standardization

Why requestVideoFrameCallback doesn't have a formal specification

Tension from specification review (horizontal review)

Privacy concerns

Who participates in working groups and how much power do they have?

Discussions are public on github

Why the W3C doesn't specify a video or audio codec

Digital Rights Management (DRM) and Media Source Extensions

Encrypted Media Extensions

Encrypted Media Extensions are closed source

Content Decryption Module (Widevine, PlayReady)

Documenting what's available in browsers for developers

Brandon Liu on Protomaps

Protomaps

User examples

Related projects

Other links

Transcript

Intro

Why not just use Google Maps?

Protomaps examples

Basemaps vs Overlays

PMTiles for hosting the basemap and overlays

Statically Hosted

Why running servers is a problem for resilience

Case study of news room preferring static hosting

Working on Protomaps as an open source project for five years

Why not make a SaaS?

Many open source SaaS projects are not easy to self host

Businesses going from open source to closed or restricted licenses

Financially supporting open source work (GitHub sponsors, closed source, contracts)

How PMTiles works and building a primitive of the web

Issues with CDNs and range queries

Customizing the map

OpenStreetMap is a dataset and not a map

Protomaps decides what shows at each zoom level

Financials and the idea of a project being complete

You mostly only hear about problems instead of people's wins

Be careful what PRs you accept

Plugin systems might reduce need for PRs

Time split between code vs triage vs talking to users

Running a startup vs open source project