Image 01

Transneptune

beyond the Kuiper Belt, over the sea

Archive for the ‘computers’ Category

Let’s talk about Sphinx

Monday, September 11th, 2017

I’ve fielded a few questions lately from folks about Sphinx, reStructuredText, and documentation in the Python world and more broadly. So I thought I’d write up a bit of an intro to how I use and understand these tools, and how better documentation makes me a better developer.

Restructured understanding of reStructuredText

There’s a lot of cargo-cultish behavior and confusion around writing Python docs, and I think that a lot of this comes from seeing some Sphinx documentation, finding something curious in the source, and then not being able to learn what it is or where it comes from. The first source of confusion, I think, is between Sphinx and reStructuredText. Here’s the distinction: reStructuredText is a markup language (with a processor implemented in the docutils package), and Sphinx is a documentation builder or compiler, that takes a bunch of reStructuredText files and makes them into a whole suite of documentation.

Since reStructuredText is in some ways the basis, let’s start by talking about it. It’s a fairly straightforward markup language. If you’ve used Markdown before, it may be just similar enough to cause confusion. Headings are indicated with a series of =, -, or ~ on the following line, paragraphs are broken with two newlines (a single newline in a paragraph is ignored), italic is indicated with surrounding *...*, bold with surrounding **...**, and fixed-width with surrounding `...`. Lists have leading - or *. All broadly familiar.

(There are fancier things, like links and tables, but you can look those up yourself when you need them.)

Now, there are two things that reStructuredText supports that make it particularly useful as a basis for Sphinx: directives and roles. These are custom bits of markup that you can use to define special structures in the abstract representation of the document, and emit into the final compiled document in whatever way needed.

Roles are inline, and generally look like this: :role:`text`. The role is whatever the name of the custom role is (ref, class, index, or weirder things), and the text is whatever text gets wrapped in that role.

Directives are block-level, and look like this:

.. directive_name::
   :directive_arg1:
   :directive_arg2: value

   Some text that gets the directive applied to it.

They can have some keyword arguments (in the immediately following lines, with : surrounding) and a whole block of arbitrary paragraphs following.

There’s a small set of roles and directives included in reStructuredText itself, and a larger set in Sphinx, and vast wide world of third-party ones. Some builtin ones include directives for sidebars, warnings, topics, and roles for things like links.

The Sphinx’s riddle

So now we can see what’s in a pure reStructuredText file. What’s Sphinx give us? It gives us a whole passel of things, but most importantly it gives us a range of output formats, support for cross-references, and support for pulling documentation information from our Python source.

Sphinx starts with a file called conf.py in the root folder, which sets a number of values. Most crucially, it configures the location of the root reStructuredText file, and sets the theme and options used for the HTML, PDF, ePub, and other output. It also lets you specify extensions and third-party packages that enable more roles and directives.

You obviously don’t want your documentation to exist as one giant .rst file, though, right? Just like you wouldn’t want any webpage to be a single giant page. So you break your docs into many distinct .rst files, and cross-link them. The most crucial form of cross-reference is a .. toctree:: directive in the root .rst file, linking together and ordering all the other .rst documents. But you can also use the ref role to cross-link documents in the middle of a paragraph. For instance, make a reference target using this directive:

.. _some-ref:

# Wotta Heading

This is a section I want to refer to later.

And then in another document, link to it:

So, if you consider :ref:`some-ref`.

And that’ll render like this:

So, if you consider <a href="./some_file.html#wotta-heading">Wotta Heading</a>.

(There’s some magic in there, about how headings are turned into URL fragments, and how cross-linking finds the right file, but I hope the general idea is clear.)

Where do I go from here?

Start a documentation project! Or add docs to an existing project. Here’s a quick recipe:

pip install sphinx
sphinx-quickstart

Now you’ve got a conf.py and an index.rst. Write some docs, run make html, and open _build/html/index.html in your browser.

That’s all! You’ve got docs.

One more thing

Now, personally, I like using the dirhtml builder, instead of the html builder, because I like my URLs to end with a /, not a .html. But that makes it harder to view changes locally. So I made a little tool to serve the docs locally, and while it’s at it, to rebuild them as you change them. It’s called phix, and you can get it from PyPI. Assuming you’re running Python 3, just run pip install phix, and then run phix in the root of your documentation project. You can then see your docs at http://localhost:8000/ as you work on them. Enjoy!

Next steps

This is probably the first in a series. I’ve got a lot of other things I’d like to write about on this subject. I’ll link them here as I write them:

  • getting your docs on Read the Docs
  • building docs for different targets
  • writing custom Sphinx themes
  • writing Sphinx extensions
  • why you should try documentation-driven development

Craft and writing prose

Monday, March 28th, 2016

Prose is important; even if you’re writing for no one else, you’re writing for yourself in the future. The best developers I know write a lot of prose, both documentation and commit messages and comments. The value is not immediately visible, it won’t make your tests pass, but it actually is part of keeping technical debt low; some tech debt comes from not knowing in detail what a piece of code does. Good docs and comments and commit messages let you know at least what it’s supposed to do.

So, prose is a part of the craft of software development, but not just any prose: particularly technical prose, explanatory, lucid prose. And it’s not something we, as developers, are usually taught.

We’re, generally speaking, a group that’s pretty good at self-education, though. We pick up new languages, frameworks, protocols, and principles. So not only must we do this, but we can do this. It’s hard, because it’s big and fuzzy (there are no tests that can tell you if your prose is correct), but still, we can do it.

This post is just an exhortation, not a guide, though. The subject is too large to cover with a simple trick or a general principle. So, how do you begin to write better? First, write more and read more. Second, get your stuff edited.

Thanks to Ryan and Owen.

Programmers aren’t special

Monday, March 21st, 2016

Listen to craftspeople of all sorts. Learn from outside the bubble. We are not magical. What we do is not magical. It has some cool properties, so do other things. Learn.

(Yes, this means learn about how writers write. Yes, this means learn about how carpenters carpent. How psychologists psychologize. How baristas bar. How sailors sail. All of it.)

Embellish later

Monday, March 14th, 2016

So, @wholemilk said something I liked:

Yeah. But sometimes that’s hard. Why?

I am a big believer in the value of considering the emotional landscape of any labor, but in my day to day, that means mostly programming.

When you have something that’s working but still incomplete, it can be a big emotional effort to break it so you can start to move it towards the next stable point, the next point at which it’s working(-but-incomplete).

It’s like editing a document: partway through applying edits, you have broken sentences lying around, half-formed ideas not fully explicated, and the thing’s a mess. Sometimes that even shows through, if you fail to clean up after a change to the first half of a sentence leaves the last half incoherent.

It is similar with code.

Fundamentally, this is why source control matters. It is a tool to help with the emotional labor of breaking your work. If you know you can always go back to a stable point, swinging out into the void for a bit becomes exploratory, not all-or-nothing.

Code Folding

Monday, December 7th, 2015

Nick made a comment about code folding, and I promised him an essay. Here it is. It’s not very long.

I like code folding. When I know a mature project well, and the modules are kinda big, I want to open a file and not be distracted by seeing things I’m not working on.

I dislike code folding. When I don’t know a project well, I want to be able to see everything without having to drill down into it. When I do know it well, I want the modules to be small enough that there’s just one topic in each file, and so opening a file does the thing that opening a code fold is supposed to do.

So, the second case is ideal, but also idealistic—and we don’t live in an ideal world. Sometimes there are other pressures against breaking a project into a billionty well-named modules, too, and you have to live with the fact that a given module might have some variety of content.

Code folding, like window focus, should ideally follow my unknown innermost desires—the computer should always know just what I want without my having to tell it. Alas.

The Historiography of Git, or How I learned to stop caring and love writing history

Monday, November 30th, 2015

This post was occasioned by hearing about the experience of pairing with Nvie, and then talking with Owen about how he does similar things.

Git is notorious for allowing you to rewrite history, which rubs some people the wrong way, but which I and some others think is actually pretty neat and useful, if you don’t abuse it. From what I hear, when Nvie is busy coding, his commit messages are just about “foo” or “wip” or whatever, and it’s only after he reaches a good stopping point that he goes back, rewrites history to say something meaningful, and commits. As Owen said to me when we were talking about this, he does something similar, but has had a hard time explaining what utility he gets from it to others. So I promised I would try to explain it.

I don’t think the utility of this became clear to me until I started working on Hexes (which is dormant until I figure out a good way to write a testing framework for it—ideas welcome!). Every time I would take it out to work on it, I would bring it to a stable point, then commit. When I wanted to work on a new feature, I would necessarily break things that were working, for a while, and having good version control practices let me move forward with confidence, knowing I could roll back if I had to. (At some level, obviously, this is the or a purpose of version control, but the realization hit me deeply on that project.)

But the next step was to be able to roll back not to just the last stable version before I started work on this feature, but to the quasi-stable points between the last and the next fully stable points, where I had expressed an idea but not worked out all the kinks in it. And so, clearly, that’s what other commits are for. So why not make them “real” commits?

Because real commits take too much time and energy. They pull you out of the thought process you are not yet done with and demand you shift from “writer” mode to “editor” mode. You just need a quick hand-hold to let yourself move forward, not a full belay anchor. (NB: I do not rock climb and have no idea what I’m talking about there.) You should be keeping your attention on the code and the architecture, and not the individual anchor points. On the flow.

But why do you then go back through and clean up your commits? Isn’t that a lot of work? Yes! But it also has a lot of value. It’s a communication tool, and even if you’re on a team of one, you will eventually be communicating with yourself in the future. If you don’t ever review your git history, and don’t think you ever will, well, I hope you’re wrong. And if you’re not on a team of one, know that anyone who looks at your PRs will probably step through each commit and try to see what you intended to do there. Leaving your commit history as a series of WIPs is like a craftsperson who keeps their bench an utter mess—it shows that you haven’t yet learned how to work on a team, and that you probably don’t understand the cost you incur yourself that way because you’ve become accustomed to it.

This is actually writing history, not rewriting it. History is not a series of events, but the interpretation you impose on it after the fact. Imagine what a disaster it would be if “history” books were just enormous piles of primary sources with no analysis, synthesis, cross-referencing, or organization. To put it another way, when you rewrite your commit history, you are doing the job of a historian, making sense of and imposing structure on the events that happened in a given period.

Of course, before you do any of this, get real comfortable with git. Worrying about losing work is not worth the value this gives, but if you’re worrying about losing work with git, you are, and I say this with love, still using it at a novice level. You should never have a tool as central to your work as a version control tool make you worry about loss. Understand it more, that’s your next task.

Maybe this’ll at least make you pause and think for a moment about the value this approach can offer. I’m not at all sure it’s convincing, but I tried.

Thanks to Ben Warren and Owen Jacobson for input on this post, and Jonathan Chu for the initial inspiration. Ben says, of arguments about git:

I don’t understand most people when it comes to git. It is a time-travel system, the end. Quibbling over the rightness of time travel when you are using a time machine seems like missing the point.

Oh-my-zsh, virtualenvwrapper plugin, errors when you cd

Wednesday, June 24th, 2015

(Because a few people have been having this problem.)

Have you recently updated oh-my-zsh on your OS X install? Do you use the virtualenvwrapper plugin? Are you seeing “workon_cwd:6: command not found: realpath” whenever you try to cd?

There’s a simple fix: brew install coreutils. That will provide the realpath command.

Git Shortcuts

Saturday, May 23rd, 2015

I talked about how I use git. Let me talk about how I actually use it.

I have an extensive [alias] section in my .gitconfig. Any sufficiently frequently used command gets abbreviated to two (or occasionally three) characters.

    st = status
    ci = commit
    co = checkout
    cob = checkout -b
    di = diff
    amend = commit --amend
    ane = commit --amend --no-edit
    aa = add --all
    ff = merge --ff-only
    rup = remote update --prune
    po = push origin
    b = branch -vv
    dc = diff --cached
    dh1 = diff HEAD~1

So in actual day-to-day usage I’ll type g st and g b just reflexively, while I’m pondering where the code is at (depending on whether I’m thinking about code to stage or feature branches), and g aa && g ci when I need to commit the current working tree verbatim, and g rup && g ff when I need to make a local read-only branch match its upstream.

Really, the key here is to pay attention to which commands you use frequently, and enshrine them in your aliases. Also, clean out your aliases sometimes if your behaviors change and you stop using certain commands.

How I git

Saturday, May 9th, 2015

Git is not a version control tool, right? It’s a graph-manipulation tool that you can use to support version control methodologies. So this is how I use git to practice version control.

I’m going to be very explicit throughout this, using long forms of git flags and commands, and avoiding many shortcuts that I actually use in the day-to-day. I’ll write a follow-up with those shortcuts if I get the chance.

If you take nothing else away from this, at least know that git pull is terrible and should be avoided.

The Setup

My use of git is particular to the context in which I use it. That context is the green and lush valley between the mountains of GitHub and Heroku. GitHub provides a web-accessible record of the state of my development on a project, and Heroku provides a target I can deploy to with git.

The GitHub side is a bit more central to how I use git, so we’ll focus on that. Under most circumstances, my git-world looks like this:

github.com:Some/Project.git <----> github.com:me/Project.git
                                        ^
                                        |
                                        v
                                   localhost:me/Project.git

That is, there’s a project I’m contributing to (either as a primary collaborator, or just an interested citizen of the open-source world), and it’s on GitHub. There’s my fork of it on GitHub. And there’s the working copy on my local machine.

The general flow is like this:

  1. Changes I make locally get pushed to my fork.
  2. Changes in my fork get pull-requested to the original project.
  3. Changes in the original project get fetched-and-fast-forwarded in my local copy, then pushed to my fork.

To talk about how I do this, we’ll need to talk about the kinds of objects that I keep in my mental model of my git-world.

The Tools

First, there are a few tools that are intrinsic to git and GitHub:

  • remotes
  • branches
  • pull requests

I augment these with some further categories that exist only in my head:

  • upstream remote and origin remote:The original project’s remote I always call upstream (this collides a bit with some other other git terminology, but it’s not been confusing so far), and my fork I always call origin.
  • read branches and write branches:Related to the point above. Some branches are local copies of information on upstream, and they are read-only: I never commit to them. Other branches are local copies of information on origin, and they are writable: all my commits go on these branches, and get pushed to origin.The read-only branches include master. I never commit on master, only on write branches, which make their way back into master eventually.This distinction is kinda crucial, as it helps me avoid merge bubbles and confusing history states.

For the most part, all my write branches are based off of master, which is in turn tracking upstream/master. Every once in a while, I will have a branch based off of something else. For example, say that I am working on contributions to a feature branch a friend is working on. In that case, I add one more remote (beyond upstream and origin) to track their fork on GitHub. I update my remotes (see “The Commands” below), and I make a local read-branch that tracks the branch on their remote that I’m working on. I then make a write-branch based off of that local branch to work on.

All of this is in aid of one of my fundamental principles: updating tracking information and updating branch state should be clearly separated activities.

(As a side note, this is why I think git pull is toxic; it combines two operations, first a git fetch, which updates some of your remote tracking information, and then a git merge, which may be a fast-forward merge, but may as likely introduce a merge bubble, making the operation hard to cleanly reverse.)

When my friend has updated their branch on their fork (by merging my code, or by adding some of their own, or even by doing the impolite thing and rewriting history on that branch), I can update my remotes, see how different my local read-branch and their fork’s version of the branch are, make smart choices about what to do, and if all’s clear, hard update my read-branch to match their version. Then I can repeat that process with my branches off of it: I will have the ability to see the state of the differences cleanly, and not have to awkwardly back out via the reflog.

The Commands

OK, enough of me pontificating, you just want to know what git commands to run, and damn the torpedoes, right? Well, that way lies pain, so do take the time to understand what git is doing to the commit graph, but. Here’s what it looks like for me:

git clone git@github.com:wlonk/SomeProject.git
cd SomeProject
git remote add upstream git@github.com:TheOriginalFounder/SomeProject.git
git remote update --prune

At this point, I have two remotes (as per figure 1), and all local information about them is up-to-date

git branch some-feature-branch master

Now I’ve made a branch that I’ll use as a write-branch; the authoritative copy of it is here on my local machine. It’s tracking my local master, too, which is, in turn, tracking the upstream’s master. If you want to automatically make your new branches track the branch you’re on when you start them, set git config branch.autosetupmerge always. See http://grimoire.ca/git/config for more good-but-nonstandard git configs.

git checkout some-feature-branch

And now I work on this branch! Work work work, commit commit commit. Hm, maybe I want to clean up my history. OK:

git rebase --interactive  # Explaining how to use this is out of scope
git push origin

Oh, wait, there’s some upstream work I want to incorporate. It’s in the upstream’s master, and my PR to upstream hasn’t yet been merged.

git remote update --prune
git checkout master
git merge --ff-only upstream/master

If the above fails, stop, look around you, and then calmly make good choices.

git push origin master

Cool, now my local master and my fork’s master both look just like upstream’s master

git checkout some-feature-branch

Let’s just replay the work in this branch onto the new master:

git rebase

And we can deal with conflicts as they arise. And finally, we have to force-push (generally considered bad, so be careful!) to rewrite history on our remote:

git push --force origin

And remember, if all else fails:

Call the reflog!

In Defense of the Last 20%

Friday, February 22nd, 2013

There’s an idea right now, in our culture, that if you can at least get something to 80% done, you’re doing well. You see this all over, in the start-up world especially, in ideas like minimum viable product, Agile’s ethos of always being able to push a new release, and so on.

I think that this is great. But it’s also horrible and toxic.

The problem, as I see it, with the 80% solution, is that if you start with that in mind, and reach that 80%, you are very unlikely to ever reach 100%. Everything will be good enough, not good. The last 20% makes that difference.

Why won’t you reach 100% if you aim for 80%? Because the last 20% is hard.

The Unbearable Hardness of Detail

The last 20% is hard for many reasons, but the big one, to my mind, is that it is detail work. The first 80% is, for the most part, at a gross scale. You rush forward, wave after wave of effort, breaking new ground and making new functionality, making something out of nothing. The last 20% is all the trim and polish, the tightening, paring, smoothing, cleaning. The difference betwen something that works and something that works well.

For those of us who excel at reaching 80%, the detail work is exhausting in part because we don’t get the same feeling of reward from it. The value of doing work in that last 20% is less visible. We put in a bunch of work to tighten things up, and only after a lot of use and reflection on that use do we see the payoff for it.

Further, the individual pieces of this detail work often don’t have much value, even though they have tremendous value in the aggregate. Consider software development for a moment. Unit tests are part of the last 20% (and, if you’re good at TDD, are something you do before you reach the 80% functionality mark—more about that later). One unit test, for one method on one class in your project, has almost no value. Full test coverage on one class still has pretty trivial value. Full test coverage on all your classes has tremendous value.

Compared to the gross-scale work, which has a roughly linear value-to-work curve, the detail work has an exponential curve, but one that starts out well below the linear curve of gross-scale work. Add to this the fact that, if you do all the 20% work at the end, you already have a working product, and you will find that the 20% work very quickly seems not-worth-it—it has dipped below the threshold of “is this work worth my time”.

Soft Emotional Underbellies and Leaps of Faith

So, can’t we just solve this by saying “hey, this will pay off down the road a bit, keep going!”? No. Try as we might, we humans are bad at believing in the future. And in the context of start-up-culture projects especially, this is with good reason.

Quite simply, we don’t know whether what we’re doing is the thing we should be doing, or whether we’ll achieve it if it is. To embark on the project of detail work, where we have to delay our rewards and put in a lot of work before we see much result is frightening because of the very real possibility that circumstances will not let us reach that stable point on the other side.

So we hedge. We hedge against the financial cost, of course. In any start-up, the time before you have a product is a horrible dangerous time, when you can feel yourself skating on thin ice. You need to get something out so that you can have some stabilizing cash- or attention-flow. But we also hedge against the emotional cost. Because, as much as it risks our ability to keep a roof over our heads and food in our bellies, working on something only to see it fall through, to see that work in pieces on the ground, is also a real punch in the feels.

Aside: Why do we care?

Does that 20% work really matter? Why? Obviously, I think it does. To me, it has intrinsic virtue. But I hope that I can convince the skeptic in my head that this virtue is real.

If 80% is our mark for something that works, it can be very hard to see why we should aim beyond that point. If it works, it works, and that’s all, right?

That, to me, is an incredibly pessimistic view. There is so much space above minimally-acceptable, and I think it is our duty as humans to aspire to exist in that space. The sublime feeling of correctness that comes from keeping one’s tools in order, from truly good customer service, from a tool that extends just a bit past your own understanding of your need and is there, like Jeeves, with just the right thing when you discover you need it, is not to be underestimated. It reduces friction, enabling us to go further, think clearer, live better.

What can we do?

Given that the last 20% is so hard, but also so valuable, what can we do to make sure we do it? Well, first, we can acknowledge, deeply and in the very fibers of our being, that creating things is hard. There is no shortcut, no silver bullet, nothing that will change the fact that creation is hard mental, emotional, and sometimes physical work.

But, as with anything hard, there are things we can do to help ourselves through it. If our goal is to get to the top of Everest, we don’t need to go up the North ridge, we can take the Southeast ridge. Still hard, incredibly hard, but we’ve used our wits to prepare and give ourselves the best chance of success we can.

What I find helpful (he says, hardly being an expert craftsman) is to intermix the rewarding, energizing, 80% work with the important, tiring, difficult 20% work. I call this the “spoonful of sugar” approach. By keeping in my head a clear vision of the end-state, how good I want this thing to be, I can remember that the 20% work is valuable, and will enable me to make a product I can be proud of. By remembering this all throughout the process, I can keep myself from eating my desert (the 80% work) first.

The real craft, I think, comes in knowing just how to mix these kinds of work such that you are not caught in the small details with no hope of escape, nor stuck at 80% with no will to work further. As with everything in life, it’s a middle way. Of course, I’d rather have an 80% product than no product, but I’d much rather have a 100% product than an 80% product. At the risk of sounding like a stereotypical American, satisfactory is not enough.

Craft is as deep a human mystery as love, and not something I expect to solve, but I’m curious: what do you do to help yourself make something that navigates the treacherous waters between perfectionism and good-enough-ism? How do you do that 20%?

(Many thanks to @strasa and @worldnamer for the conversations that lead to this.)