On Understanding Computers - Sam Rose
Welcome to the learning driven development podcast. I'm Josh Cirre, and here's where I get the incredible opportunity to read, learn, and share with you articles, blogs, documentation, stories, or other forms of written content that exists in the tech space. On today's episode of learning driven development, we're gonna be reading reading On Understanding Computers by Sam Rose. You can find Sam's website at sam who, that's samsamwh0.dev or on his Twitter atsamsamwh00. That's at samwh00.
Josh:You might know Sam Rose from more of his recent popular blog posts such as queuing or load balancing, phenomenal blog posts with incredible graphics and illustrations. Today, we're diving into the archives of Sam Rose's blog. It's on understanding computers from December 15, 2000 and 13. Just as a reminder, I'll save my own thoughts and ideas as well as personal observations about this blog post until the end of the episode where I'll get to share them a little bit more freely. On Understanding Computers by Sam Rose, written December 15, 2013.
Josh:Many times I've been asked, why are you working on that? Is it for work? Referring to my tinkerings with Colonel Internals. No. It's not for work, I reply.
Josh:I'm just interested in it. Other times, this goes a bit further. That looks horrible. Why would anyone want to work and see an assembler? No 1 needs to understand that anymore.
Josh:I'll reply that I think it's important to understand that which you rely on. Then I'll be told, you don't need to understand the internal combustion engine to drive a car. They're right, of course, but there is still value in understanding how an internal combustion engine works. Then maybe someone will be telling me about the wonderful abstraction provided by the Node. Js event loop or the concurrency benefits of using a functional language.
Josh:These things scare me because I know that they eventually have to play ball with the real and terrifying world of the kernel and the processor, and only a fraction of a percentage of programmers have any interest in how either of these things run their code. Entitlement and the cost of abstraction. Frustration is part and parcel of programming. A lot of our abstractions are a house of cards, and, frequently, they collapse around us and cause us pain. If you've used any third party library, then you are probably aware of this.
Josh:Having to work around other people's bugs is a huge pain and makes your code ugly. Abstraction example, Amazon SWF. Amazon have a simple service called SimpleWorkflow. It takes away a lot of the pain of having to write a distributed message pipelining application. The idea is that you have multiple activities and a decider.
Josh:Messages are sent to the decider and can contain arbitrary input. We use JSON. Then the decider can schedule activities to run. On completion, the activities report back to the decider with the result, and the next stage can be initiated. This service works by pulling queues that live inside Amazon.
Josh:You can have multiple instances of activities, deciders, and entire workflows. It's really quite nice until you start having to do complicated things with it. Each workflow is a separate instance and has its own ID. You can kick off multiple of the same type of workflow with different input, and they can be processed in parallel depending on how many deciders and activity workers you have. SWF gives you the ability to signal a workflow with arbitrary data at an arbitrary time, not dissimilar to UNIX signals.
Josh:When this happens, it forces the decider to run and make a decision on the new data. However, if a decision task had already been started, you end up with a situation where 2 decisions are scheduled at the same time. SWF ensures only 1 decision per workflow happens at the same time, but the signal will invalidate any decision made in the first run of the decider. This is not obviously documented. It appears on page 87 of the SWF developer guide as the following note.
Josh:Note. There are some cases where closing a workflow execution fails. For example, if a signal is received while the decider is closing the workflow execution, the closed decision will fail. To handle this possibility, ensure that the decider continues pulling for decision tasks. Also, ensure that the decider that receives the next decision task responds to the event, in this case, a signal, that prevented the execution from closing.
Josh:This caused the bug in part of our code that took quite some time to track down and fix. I paid a price for the abstraction I chose to use because I didn't fully understand it. Entitlement. A common response to these types of situations is, why does it work that way? That makes no sense.
Josh:It should work this way instead. People get angry that something isn't designed exactly for their use case when, in reality, they are at fault for using something they didn't understand. Of course, taking the time to understand everything you use every day would be a herculean, perhaps impossible task. Instead, be patient with the abstractions you use and devote time to getting to know them intimately. Don't get upset if they do something you don't expect.
Josh:It just means you haven't paid them the attention they deserve. We're manipulating an array of bytes. By far, the biggest realization I had with computers was to understand that everything we do is the manipulation of a large array of bytes. All of the cutesy things we put on top of that are just ways of making this manipulation map closer to our thoughts and ideas. We ascribe meaning to this array of bytes, text, data, stack, BSS.
Josh:We divide it up into 4 kilobyte pages for administrative purposes as a layer of protection against processes, assessing the memory for other processes, and as a neat way of swapping memory to secondary storage when we run out of primary storage. Local variables are nothing more than offsets into the current stack frame. That's why it makes sense that the value of an uninitialized variable could be anything. It depends on what the last function to occupy that part of the stack was. However, none of this applies in the age of the virtual machine.
Josh:When it comes to working in Ruby, Python, Java, or any other language that runs on top of its own virtualized environment, the rules change, and I no longer know what's happening. I know that the virtual machine must communicate with the physical 1. So parts of my understanding still apply. But the rules for things like function calls and variable lookup are defined differently on a per environment basis. Example, Ruby instance variables.
Josh:Lately, I've been working on a project by Nick Markwell called boot.rb, a simple x86 kernel that will eventually boot into a Ruby shell. It uses the mruby version of Ruby. As a result of this work, I have had to dive very deep into the mruby internals. A few days ago, I was spelunking around how instance variables are defined on classes. Check this out.
Josh:Iv put here and here. The iv put routine is used for setting instance variables on an object. The code reveals 2 interesting things. Setting instance variables creates a Ruby symbol, and Ruby has 2 methods of setting instance variables. 1 of them is a segmented list which appears to operate in o n time, n being the number of instance variables already set, but saves memory.
Josh:And the other, the default is a hash table. A symbol in Ruby is an interned string. The interesting property of them in our context is that they are never garbage collected. Therefore, every time you create a differently named instance variable, you're losing a little bit more memory. I won't critique this design decision, but it is an interesting property of instance variables that I sincerely doubt most of the Ruby community know and could potentially bite if someone were doing some crazy metaprogramming in an embedded Ruby environment configured for low memory.
Josh:Tenuous, I know. But these are the types of subtle scenarios that really draw blood when they bite. Wrapping up. The taller the house of cards, the more scared I get and the more painful the bugs can be. Abstractions are necessary to get things done in the kind of time frames that modern businesses expect, but they carry with them a cost.
Josh:You have to be prepared to pay that cost when you run into a bug that sits at a lower level abstraction than the 1 you're operating in. Understanding the computer that all abstractions depend on is a very valuable skill, and it has helped me to understand some of the hardest bugs I've ever run into. It's not knowledge that comes in handy often, but when it does, it's the sort of knowledge that can turn a multi day debugging session into only a few minutes. As I was reading this blog post by Sam, the first thing that came to mind was a couple of comments that Taylor Otwell, the creator of Laravel, made in the business of Laravel podcast with Matt Stouffer. Taylor commented that a lot of the times people, especially people newer to programming, newer to Laravel, newer to building applications, make the decision, and usually it's the wrong choice, to grab as many packages as they can to solve their problem.
Josh:Usually, it's something like, okay. I need to impersonate a user as an admin, so I'm just going to grab a package that helps me impersonate a user. I've had this problem myself. I've always opted to, hey. Someone else has already solved this.
Josh:Usually, they're much smarter than I am, so I'm going to make sure that I just use whatever has already been solved. I'm going to take the easier path and just grab something so I'm not reinventing the wheel. And I think that's a good thing most of the time. It's not good to reinvent the wheel on things that are obviously trivial. But I think there's a difference between reinventing the wheel and actually knowing how the wheel should be invented in the first place.
Josh:And I think that's what Sam is getting into in this context of his blog post. Just because you're using an external package doesn't mean you shouldn't know why that external package is working the way it is or even how it works in general. If that package were to disappear or if anything you're doing when it comes to building your application or building parts of your application, features of your product, If any of that was to just not work and you had to write it from scratch, would you know how to? I always love comparing how Laravel gives you auth out of the box. But it also has standards to build auth for.
Josh:So if I wasn't using 1 of the starter kits that Laravel provides, yes, it would take me a little bit longer to build those authentication pages and build everything so that I got to the point of the starter kit itself. But I still know how to do that because the foundation has been laid before me, and I've always been curious on how things work. So I think Sam hits the nail on the head in the sense that you should always be curious how something works. Because if it doesn't work, then you're going to be the 1 that needs to fix it. And so stop relying on just, oh, I'm just choosing this or I'm just choosing that because it just works for me and I don't have to actually think about how it works.
Josh:Obviously, I fail at this a lot especially when it comes to computers and servers and everything like that. Things that I'm like, okay. I'm just leaving those decisions to someone who actually knows what they're doing. But I will say that even when it comes to things like servers, I've gotten better at things I don't fully understand because I've had to learn how to understand them. I've had to learn when it comes to Laravel how queues work, how's jobs work, how things like talking to a server and then waiting for a response works, how the inner workings of AWS work.
Josh:All of those things have made me a better developer and a better creative person because I'm more used to. I'm more familiar. I'm more interested in actually diving into the unique aspects of what makes something tick, what makes something work. So, no, you don't have to reinvent the wheel but it's incredibly important for you to know how a wheel is actually made, what makes a wheel work so that you don't have to reinvent it.