Dave Guarino

about evergreen notes

2024-02-28T06:31:56Z

I've created a new category of posts on this site.

These are "evergreen" notes: thoughts and ideas I've had for some time, that I return to time and again, and which I'd like to flush out and continue to improve upon over time. This is influenced by the idea of "digital gardens" that many have adopted, and which I'd now like to have.

These will can be found in the /notes section of this site with the tag "evergreen".

Another ambition in doing this: to reduce the activation energy required to put up such a note.

And so all of these evergreen notes are written in the Obsidian note-taking application. Obsidian writes markdown files to a sub-folder of this site, which is otherwise an Eleventy site. And I have a bit of custom glue code for convenience (for example, auto-generation of page URLs and date published so that I don't have that extra overhead.)

Hopefully this will lead me to post more, shorter notes. And also to return to kernels and flush them out more, add connected thoughts as they come, and more generally build up this site a bit more than it has been in the past.

Government Stuff #1

2023-11-01T00:00:00Z

I wrote a new newsletter walking through the (very reasonable) idea maze of setting quality control metrics for a public benefits program — and how banal accounting details can lead to large structural forces. Click here or below to go give it a read!

What might LLMs/generative AI mean for public benefits and the safety net/tech?

2023-07-26T00:00:00Z

There is so much excitement about things like GPT-4, and the Executive Summary Epistemology™ of broad, sweeping summaries of things people are saying about things other people are saying and which actually have very little grounding in tactile interactions with the technology are proliferating.

In general, given just how new this wave of AI is, my perspective is that people should be spending more time experimenting with it on their problems than generating takes. But doing so does require a solid understanding of actual problems, rather than the Imagined Problems (so-high-level-to-be-meaningless) that both drive a lot of work and drive a lot of well-intentioned people to madness.

So here are some scattered thoughts to hopefully inform and encourage more experimentation. I mean them as generative (ha!) — I hope they catalyze your own thoughts for experiments; these are less predictions than provocations.

Replace “a chatbot that knows things” with “a calculator for words” as your anchor mental model. The chat experience is an interface or affordance. The underlying technology breakthrough is actually software that can process and reason about words much, much more effectively. Similarly, it’s not about these models having the answers. GPT-4 scores 88th percentile on the LSAT. That is a test of reasoning—not knowledge. Don’t think of these models as domain experts. Think of them more like an intern in your office going to law school in the fall and who got an 163 on the LSAT.
Much of the substance of what constitutes “government” is in fact text. A technology that can do orders of magnitude more with text is therefore potentially massively impactful here. Law, policy, regulations, guidance, business process and operating procedures, official letters and notices—much of the substance of what we consider government is in fact made up of text. This gives LLMs much more potential in the context of interacting with or delivering government, almost definitionally, than many other domains.
Many of the sub-tasks of the work of delivering public benefits seem amenable to the application of large language models to help people do this hard work. Eligibility operations are a value chain with concrete work involved. Processing. Verifying. Mapping messy reality to abstract rules. I see many opportunities for large language models to assist the public servants doing that in ways that may increase throughput and decrease the difficulty (and frustration) of parts of that work. Examples just off the top of my head:
- Next-level OCR for documents: OCR is currently good at well-structured tasks like processing a single form that is very common (think a tax return.) We likely now have the technology to effectively extract arbitrary information from virtually any paystub, for example, without requiring more than a human review.
- Pulling up applicable rule citations for an edge case / copilot for policy: Public benefits programs represent the accreted complexity of decades. It is very complicated and difficult work to identify the applicable rules in more complicated cases, often an exception to the exception to the exception. A human may have to reason about convoluted logic spread across 3 distinct sources of policy to get to an answer. LLM’s reasoning capabilities likely can make this much easier.
- Sensemaking, analysis, and prioritization of complaints or appeals
- Automated support for simplifying client-facing language
- Next-generation self-service options that avoid the “talking to a wall” robot experience: Chatbots may not be the core of this technology, but the sub-task of “compose an answer that is more directly responsive to this question (even if that is that you can’t answer it)” is one LLMs handle extremely well.
- LLMs + RPA to streamline interactions with legacy systems: Robotic Process Automation has been an increasingly common shim layer to make easier or quicker doing common tasks in legacy systems where the task might be time-intensive (many clicks.) LLMs likely supercharge this, given that it can take an arbitrary task, click around in a test environment, and generate a reasonable starting place for an RPA script that does a task. (Remember, a system screen is generally a page with text! Text, folks!)
- Lots more
Many of the sub-tasks of interacting with or assisting someone with public benefits are amenable to LLMs. Some examples:
- Explaining complicated notices or letters
- Helping with questions on application forms
- Generating public web content from direct assistance help interactions, to scale discoverability and distribution of that help
- Triaging and assisting with escalations and appeals of issues
- Lots more
Flowing from the above two points: we may see a path divergence in speed of adoption inside vs. outside. This has implications worth gaming out and considering deeply. In particular, the scale software and low/no cost user discovery brings could well overwhelm systems that currently display linear-to-human-staff scalability.
Also related to the above: LLMs may enable a new generation of software-based agents on top of government systems—figuring out how to align incentives in the right direction for those would be a useful conversation to start. The metaphor of “Turbotax for X” is so ubiquitous as to be somewhat annoying at this point. But its ubiquity is a function of how densely it compresses complicated experience information. People see two things: (1) “a simple, guided experience for navigating a complicated form”; (2) pernicious industry rent seeking on top of government services. The only point I seek to make here is that #2 largely came from the incentive design of the Free File program. My prior work building GetCalFresh was also a software-based agent helping people navigate a complicated program, but with fundamentally different incentives. If the cost of developing such agents has collapsed, the more useful question to start asking is: what is a strong interface-access regime that aligns incentives towards public aims? (Corollary: absent this conversation, things will likely arise anyway but be managed in a much more ad hoc way, missing opportunities and creating costs and frustration on both sides.)
LLMs are “unstable material” like so many other engineering materials, and so strong quality checks/monitoring/humans in the loop by default are probably necessary to ensure low failure rates. We don’t have a good sense yet of failure rates. In fact they may well be dynamic—some are reporting GPT-4 performance changing re-running tasks, as the model takes in input and is modified and being aligned for competing goals. This makes monitoring, QA, human checks all very critical to any use of these things, particularly in sensitive contexts where downside risk costs are significant. (Obligatory note: the status quo’s costs are also worth weighing here.)
Moonshot hope—maybe zero cost code generation solves the “our legacy system is so hard to change” problem. This could be a post all on its own. The short version:
- if the starting point for reducing the cost of change is adding lots of automated tests that characterize the system’s current behavior, and
- generation of such tests has become effectively zero-cost, and
- models in short order get good at reasoning about existing code bases
- then maybe “the legacy system problem” is about to get much, much easier to “modernize” (if we define that as 1. making changes we need, and 2. making changes that make other changes easier in the future)

Overall: I’m cautiously optimistic. Zooming out, what are the fundamentals that make me optimistic? We have complicated programs that are largely comprised of text and a fundamental technology breakthrough about processing and reasoning about text. Of course things are more complicated; these are all complex, decentralized systems at the end of the day. Complex systems are intrinsically hazardous. Things can and will get messy. But it seems to me that these fundamentals have net positive implications for reducing burden (on many different actors) and making things better overall.

But—all of this is fairly low confidence unless and until I see these things working against concrete problems! A 90% success rate vs. a 20% success rate will have very different implications.

So my final exhortation: take these thoughts as inspiration for tests and experiments to run, not as opinions or prognostications to evaluate. (And share what you find!)

What might better accountability systems for government technology (and customer experience) look like?

2023-06-02T18:10:00Z

I wrote down some thoughts on accountability, feedback loops,

Read these meandering thoughts over on Substack

(I'll eventually cross-post the content here)

Some ideas on more safely prototyping LLM products

2023-04-22T19:02:26Z

'a computer mainframe with duct tape on it in cyberpunk style' (Dalle)

I really enjoyed two recent posts about engineering and LLMs:

Mitchell Hashimoto's "Prompt Engineering vs. Blind Prompting"
Apenwarr's "System design 2: what we hope we know"

A theme that struck me as underlying both posts is nicely captured in the (rough) quote of the latter's engineering professor:

Engineering isn't about building a paperclip that will never break, it's about building a paperclip that will bend enough times to get the job done, at a reasonable price, in sufficient quantities, out of attainable materials, on schedule.

Put another way: engineering is about building things that meet people's needs at an appropriate cost and an appropriate level of safety — there is no 100% safety level, and the work is in large part applying methods to get at the empirical math of tradeoffs across cost and safety.

Unsurprisingly, both these posts were concerned with Large Language Models (LLMs) such as ChatGPT, GPT-4, Llama, etc.

Mitchell's post gestured towards a few examples of how to safely develop on top of such innately unstable material, under the umbrella of "Trust But Verify and Continuous Improvement" - for example:

For our calendar application example, we may want to explicitly ask users: "is this event correct?" And if they say "no," then log the natural language input for human review. Or, we can maybe do better to automatically track any events our users manually change after our automatic information extraction.

Now I very squarely identify as a product engineer when it comes to building software. What that means is I'm most interested in building things where the undefined variable is the user's needs and preferences (or call those requirements) and applying the right set of methods to meet the goal of building the right thing (this I would juxtapose with what might be considered more pure technical work aimed at building the thing right - "the thing" is much better defined at that point!)

So these posts got me thinking it might be useful to put fingers-to-keyboard on some of the methods that I think might be useful in starting to build products, prototypes, and services on top of LLMs.

Nota bene: AI safety is its wholly its own field and I'm not an expert. But I do have the benefit of having built actually-in-production-and-use things on top of unstable material in domains and contexts where the potential downside human cost is not small. I hope this catalyzes more thinking and exchange of ideas rather than be taken as some gospel.

Some ideas for building on top of LLMs with increased safety #

Let the user review and tell you if it's right: this is Mitchell's example above of extracting a date and asking the user if it's right before using it
Delayed-response for 100% manual human review: for example, start prototyping a service as one where a user can email and get a response within 24 hours; all LLM-generated responses are reviewed by a human before responding to the actual end-user, and corrections are made by human reviewers while building up a training data set of input-output pairs for evaluation and fine-tuning
Standardize and "whitelist" outputs that can be returned to users safely: for example, if you have some common responses that you know are generally never that bad an idea for the end-user, you might whitelist pre-vetted responses, use the model to select among them (or say none are appropriate and send to a human), and give the user a pre-vetted response; an example from a domain that I've worked in is that while helping people navigate the SNAP or food stamp program, it's often a quite safe escape hatch to tell them to call their local agency and provide them the phone number
Use LLM output to try to handle edge cases/error states, but with user review: so for example if someone puts in some info that seems to fail some sort of validation, instead of preventing the user from fixing it, try to use an LLM to fix it and show the user the output to review first
Try using a separate model as a safety check: If you have some concern that certain responses might not be safe, try having a separate model review with some rules - again, you could be very aggressive in trying to flag things that seem even maybe unsafe by asking it to only not flag if certain criteria seem to be definitively met
(Weird but true!) Limit user volume to always be able to have a human review!: this may sound somewhat obvious but I think people tend to under-appreciate how the ways that users are finding or getting to use a thing is a key variable for safety, and you can control some of that by design (for example only sharing to certain users to start, having occasional intentional downtime, offering it for only a limited testing time that is given up front, using ads or other links for user acquisition that can be toggled on/off with a URL that cannot be publicly discovered otherwise, etc.)

All of these are ways that I think one could start to offer services to real people (the greatest source of learning, to put my own biases on the table!) while having a lot more control over potential negative cases.

Have other ideas? Have ways these could be better or further de-risked? Drop me a note and I'll happily add your own to this post or another with attribution so that we can collectively get smarter at this and make more things people need and want on top of these (pretty darn impressive if unstable) materials.

A newsletter

2023-04-17T00:00:00Z

I've decided to try out a newsletter using Substack.

I plan to keep blogging here, but people have varying preferences in how they get stuff like this, so I'm considering the Substack to be:

A way to do email distribution where I aggregate and share some things that may be here (or in toots/tweets)
A bit of a test to find serendipity on that nascent network, which seems to be settling into a place at the very least worth exploring

(It's also a decidedly reversible decision, so why not! My first newsletter experiment used Mailchimp, and Substack has so far been both better as a CMS experience and on the cost outlook front.)

So go check out Dave's Occasional Newsletter.

Problems, messes, and LLMs

2023-03-21T00:00:00Z

This post goes into the bucket of "draft" or "thinking out loud"

I — like many others — am very excited by newly accessible large language models (LLMs) like ChatGPT and GPT-4 from OpenAI. I have been experimenting with these tools, and am rapidly developing a viewpoint that LLMs are about to change how we use software fairly dramatically, and fairly quickly.

Recently I tried connecting OpenAI's GPT-3 model via API to the (using a tool called LLamaIndex) to a large PDF document - specifically a SNAP advocacy guide for Massachusetts residents from the legal services and advocacy org MLRI.

The results were very impressive. Given a general question, the model - using the content from the guide - generally composed a solid answer.

I even gave it a real question from a Massachusetts resident who posted on /r/foodstamps on Reddit and... it answered extremely well.

I then wanted to pop in this model to a simple web app so I could share it with some folks and let them play around. Since the code was already in Python, I figured a small Flask app worked great for this. The problem is I hadn't built a Flask app from scratch in a bit, forgot the syntax and boilerplate.

So... I just asked GPT-4 to write the tiny app for for me and... the code worked. Completely and with minimal modification.

What do I make of all this?

Problems vs. messes #

A tangent first: a lot of how I perceive software getting built is in problem decomposition: given a stated problem, what series of sub-steps or -problems are necessary to tackle that build up to a solution to the problem.

A concept I first read about in Lorin Hochstein's great post "The ambiguity of real work" is the distinction between problems and messes, from Russell Ackoff's paper "The Future of Operational Research is Past":

Managers are not confronted with problems that are independent of each other, but with dynamic situations that consist of complex systems of changing problems that interact with each other. I call such situations messes. Problems are abstractions extracted from messes by analysis; they are to messes as atoms are to tables and chairs. We experience messes, tables, and chairs; not problems and atoms.

Because messes are systems of problems, the sum of the optimal solutions to each component problem taken separately is not an optimal solution to the mess. The behavior of a mess depends more on how the solutions to its parts interact than on how they act independently of each other. But the unit in [operations research] is a problem, not a mess. Managers do not solve problems; they manage messes.

This resonates with me in particular because, as much as I love programming, I've always identified with the bundle of characteristics that gets called product engineering. And a key part of that for me has been comfort with ambiguity.

And I will say, it's not been my experience that everyone likes ambiguity! I know plenty of folks whose passion is solving well-scoped problems where the criteria for optimization are clear. It's certainly frustrating to be told your problem is X, work it, and then find out the problem was in fact Y. But also... the world is messes. To be able to be in a "problem solving" mode, there is a prerequisite of someone having decomposed an ambiguous, uncertain, subjective mess into a technical problem.

And reasoning about messes is very different than reasoning about problems. Managing messes are often less about decomposition into clear axioms ("a new page in a web form entails additional routes and a view") and more about poking and prodding at a system to see how it responds in actuality; or modeling it from first principles; or or or...

LLMs and a shift in the human role #

What does this have to do with LLMs?

I think what I'm realizing is that LLMs increasingly cover the surface area of many problems. Writing the correct syntax for a piece of logic in a given language is a problem. Transforming or extracting from a corpus of text is fundamentally a problem. These are things can be solved and optimized.

I also think that means that more of the human operator role in technology or software becomes dealing with messes.

I don't think this is all that different from prior developments in software. Most who know me in a technology context know I really like the web framework Ruby on Rails.

Why? Because what I most enjoy is spelunking complex domain messes in which problems exist. Once something gets to an unambiguous and clearly scoped problem, I will confess I get a little bored!

Rails enabled me to apply software to messes over and over again, because the problems in the bucket of "how you make a web app" were largely taken care of. My own work was overwhelmingly spent in applying methods to make sure I was building the right thing, not building the thing right (already taken care of!)

I think what I see with LLMs is that same shape of a shift, just at a much more profound level: that more and more of the human work of software and technology will be reasoning about and managing messes.

Take the example above: given a decent piece of documentation on navigating a public benefit program, creating a query interface a user could use is now quite easy. I do not exaggerate when I say that most of the similar interfaces in production now I'm aware of hard code responses to an intent detector. This is likely your experience with a government chatbot if you've ever used one.

As another example, we still have a lot of situations where a human is reviewing a bunch of uploaded PDFs to get information out of them and type that into a screen. We were already close to that being solvable with hard-coded logic on top, but are about to get to a world where we now just need to say "these documents show information about income — extract how much income they prove and what the timeframe was (month, biweekly, weekly, etc.)" That... might well work in a high percentage of cases out of the box as of... this month.

So where does the human attention (a scarce resource!) go to? The messy layers above, around ensuring that answers are safe, relatively confident, having feedback loops, etc etc. Still a lot of work to be done! And that's not even taking into account all of the institutional mess work that entails making use of these technologies in large, old, and complex organizations and markets. An LLM can certainly extract text content much better; it is much worse at, say, navigating a relationship with a regulator, or making in-roads into a multi-sided market.

A conclusion for me: tacit knowledge about domain messes might well soon become the top constraint (and highest ROI human work on deployment of this technology. And when I say tacit knowledge, I don't necessarily mean the formal knowledge of a domain. Instead I mean understanding how domains really work rather than how we like to think they work. (An effective counter-argument here is that I love that kind of domain spelunking so, so much that this whole last argument is so squarely in my own interest that it's worth being deeply skeptical of!)

I don't know where this will go. But I'm spending a lot of time right now doing two things:

Testing LLM performance about problems I know exist under messes in domains I know a bit about
Thinking about ways for LLMs to provide leverage on spelunking complex domains

It's always a bit easy and dangerous in a hype-incentivized domain like tech to say this time is different. But best I can tell — this time sure seems different.

Masoor dal (red lentils) is a great test app for cooking

2023-01-30T00:00:00Z

I put a lot of intentional time into learning cooking during the pandemic. To be completely frank, I've never cooked much at all in my life, getting by mostly with frozen meals and cost-effective takeout like tacos (a blessing of living in California.) So this was a significant change for me.

A lot of folks have talked about the parallels between cooking and programming. (Aside: I think it's not shocking that Ink and Switch's recent research prototype Potluck uses cooking and recipes as the domain to explore dynamic documents as personal software.) They are activities that can be done solo, in general, with relatively quick feedback loops, plenty of tacit knowledge, and levels of abstraction that can be achieved and transferred across different instances of problems or aims.

Having acquired a number of cooking books during this effort, two really stuck with me as the most valuable:

Salt Fat Acid Heat by Samin Nosrat - This is the highly predictable guess.
Masala Lab: The Science of Indian Cooking by Krish Ashok - This is the less predictable and standout book for me! While focused on Indian cooking's methods, it generalizes and goes into the science quite well. What's more, Ashok explicitly takes an algorithmic approach, describing for example a general-purpose algorithm for composing Indian gravies (dba "curries" in many corners) with different variables to tinker with to yield different flavor profiles that evoke the cuisines of different regions.

What both books have in common is a focus less on specific recipes, and more on patterns and abstractions that are useful across dishes.

But to actually use any of these patterns and abstractions, you actually need a concrete context, and that generally means applying the ideas to generate a dish that does not suck. For an amateur, I'll confess I think that's easier said than done!

Masoor dal: a kind learning environment #

Remember TodoMVC? It's a site that shows how to implement a small todo app in a variety of Javascript frameworks. The point is to illustrate the different tools, how they work, how you use them, all with a sufficiently compact problem domain that your attention goes less to that problem and more to the details of each framework.

Masoor dal - red lentils - has been a wonderful "test app" for me. It's relatively easy, very fast (20-30 mins?), exceedingly cheap (reducing downside risk cost of a failed attempt), and maybe most importantly it is an excellent medium for trying many different flavors and micro-methods involved in cooking.

Put another way, masoor dal has represented a "kind learning environment" for cooking. This concept comes from psychology which differentiates between kind and wicked learning environments, which can also be read as feedback loop environments. I first learned of the concept from David Epstein's Range: Why Generalists Triumph in a Specialized World.

To quote from that book:

[In] 'kind' learning environments. Patterns repeat over and over, and feedback is extremely accurate and usually very rapid. In golf or chess, a ball or piece is moved according to rules and within defined boundaries, a consequence is quickly apparent, and similar challenges occur repeatedly.

In 'wicked' domains, the rules of the game are often unclear or incomplete, there may or may not be repetitive patterns and they may not be obvious, and feedback is often delayed, inaccurate, or both. In the most devilishly wicked learning environments, experience will reinforce the exact wrong lessons. [1]

Making masoor dal has let me quickly do something, and see how it works (or doesn't.)

(Careful observers may note the post-it reading Is the oven off? affixed to the stove face. As they say: service design is my passion - aaand I keep leaving the oven on.)

Some examples of the micro-skills I spend more of my attention on when making it over and over:

Understanding "when the oil shimmers" as a signal of sufficient heat to add whole spices like cumin
Figuring out what onions being cooked to some "golden" and vaguely transparent level really means
Smelling garlic or ginger (or other spices) cooking in oil and learning what "when the spices are aromatic" actually means experientially
Chopping onions (yes! chopping onions well is not something humans leave the womb fully-formed in capability on! and what's more this task drives me to a yak shave - getting a proper chef's knife, which I now use for virtually everything and on the order of 10-30 times per day)
Cooking down fresh tomatoes (and the relative differences of using canned tomatoes - which are great too for this!)
The significant effect of a tiny amount of turmeric on a tomato and onion flavor profile

And by far the most important of all!

How to balance a dish with the incremental addition of salt and acid (fresh lemon juice is my ride-or-die companion in the kitchen these days)

I can hardly describe the revelatory power of tasting a batch fresh off the stove, squeezing some lemon juice, and then finding the dish to be a completely different and wonderful flavor.

My own favorite masoor dal: a Bengali flavor #

After much experimentation, my favorite masoor dal to cook these days is a Bengai-flavor-inspired riff using the ideas in Masala Lab.

It is... very, very good. Oh did I mention dal is fairly healthy?

This recipe is an excellent proxy, though I make the following specific changes:

It is a must to use mustard seed oil here
I use 1 serrano pepper specifically for more (and "clean" i.e. unflavored) heat
I use 1 medium white onion
I use roma tomatoes, but occasionally use a canned tomatoes, and I'm still undecided which I prefer (canned is perhaps slightly leading for me right now)

I live in the US, and as such there are two ingredients that may be slightly harder to find, and so recommend visiting an Indian grocery or ordering from online (Amazon has both) - these are:

Mustard seed oil
Panch phoron spice mix (a mix of 5 whole spices, so if you happen to be a spice nut already, and you have cumin, fenugreek, brown mustard seeds, fennel and nigella seeds on hand, you're good to go)

[1] Source of these quotes is the book, but are pulled from https://www.driverlesscrocodile.com/books-and-recommendations/more-from-david-epstein-on-kind-and-wicked-learning-environments/

Designing for flexibility (or: users find a way)

2022-12-23T00:00:00Z

I spend a lot of time thinking about "practical knowledge." Or, a slight variation, how different forms of practice or participation in a system yield differential access to certain knowledge about a system.

One category of knowledge on my mind lately is the understanding of edge cases when users of software are given some degree of flexibility: for example, allowing input on a front end that may be invalid downstream, open ended questions asking for freeform text, an on-page chat interaction to ask for help.

During the 6 years I spent working on GetCalFresh - but particularly in the early years of rapid iteration and high-touch concierging issues people had trying to apply for food stamps through the service - I came upon many examples of unexpected user behavior. Perhaps even beyond "unexpected" and into the category of what I might call "wat."

In the early days, I remember fondly returning to a phrase time and again - users find a way. Provided even a modicum of space for novel behavior, users would interact with our service in incredibly unexpected ways.

A few examples of unexpected user behavior I have seen first-hand, in the wild:

A user somehow successfully using our web site to apply for food stamps on a video game console browser (if I recall, it was some generation of XBox)
A user entering their name, character by character, in individual-letter emoji rather than letters themselves
A user who uploaded a single document attached to their application: it was an audio file in a file format I had never seen of their own voice saying "I don't have any documents to submit" (upon investigation, I think we discovered that this format came from a particular flavor of Android phone prompting an option to record audio at a file upload point)
Contra a very fundamental administrative premise of how food stamps work in California (that they are administered at the county-level) many different instances of people putting in an incorrect county for their city/zip code

...and many other examples I can't recall at this early morning hour.

(I would be remiss here if I did not link to the delightful and evergreen Falsehoods programmers believe about names by Patrick McKenzie/patio11 as a classic in a similar vein.)

Now the first few examples above might seem like completely reasonable edge cases that, if impossible, would create a tiny hassle for some people, but maybe not major problems. The last one - county mismatch with city - is potentially a much more significant insight: for me at least, having seen that case (and having figured out how to fix it for people manually prior to submitting it to the correct county) gave me re-anchored me in a very positive way.

Whereas the most obvious way to design this series of questions was to start right there up front with asking "what county do you live in?" I now had much more texture on the really existing spectrum of user processing of that question. That led to some very different design approaches over time; approaches that would have never been considered without (1) the initial flexibility for users to signal to us that processing information via their behavior and (2) an active process for investigating and fixing corner cases like this.

What am I getting at here? Designing software to be flexible specifically in the sense that users can do annoying things you do not anticipate is a window into the actual domain you are building in that should not be undervalued.

To get less abstract, I have found a huge amount of learning value myself in things like:

Minimizing form field validation in the early stages of a service (accept accents characters! accept emoji! accept damn near anything!) even if downstream that form data needs to go into a more restrictive database (say, a legacy system with character limits or intolerant of accent marks)
Having open ended questions like "is there anything else we didn't ask you think is relevant?"
Building in "escape hatches" at critical points where the default path might be an uncomfortable pushing of a user down some direction, but you allow them to pull the rip cord and say it's not working for them in some way (say, a terms of service that you suspect might be scary to users in its legalese - you could have an explicit button or way to reach out, offering it if the user is confused or overwhelmed)

These examples are grounded in a multi-page web form style service, but I would like to be explicit this concept - design in flexibility, to let users tell you what you don't know - doesn't solely apply to online or software services. Think about a paper form that might say: "Confused? Overwhelmed? Give us a call here."

The other side of this is a big tradeoff: automation. What makes software a peculiar material is that it can scale and automate things so effectively. The more flexibility you build in, the less automation you potentially have. Infinite flexibility, for example - just narrate your situation and we'll fill out the form for you - is virtually unscaleable. And, huh, what does that "infinite flexibility" sound like? It sounds a lot like a direct service provided by an individual! Much of the really-existing use of technology is framed in the vein of self-service - definitionally designing something to take on the burden of some direct service for efficiency and other goals. (None of this is intended as judgment, but rather description. There are many reasonable points on this spectrum of flexibility in design based on the constraints of the system in which a serve exists.)

The primary thing - the "odd bit of tacit knowledge" here - that I'm trying to share is this relationship between intentionally flexible design providing this window into aspects of reality that very few others will have.

If you have fun examples of wat user behavior, send me a little note!

This post - like all posts here - represents my personal views and experiences, and not those of any employer or affiliated organization past, present, or future.

Does GitHub have a maximum limit on number of issues?

2022-07-01T00:00:00Z

No. #

There is no (documented) maximum limit for the number of issues a GitHub repository can have.

Some repos with a large number of issues:

Adguard Filters: ~113,000
Flutter: ~70,000 issues
Kubernetes: ~41,000 issues
Tensorflow: ~35,000 issues

If you know of a documented limit or have evidence of an undocumented limit, please email me.

Why I find Google Search ads valuable for user research

2022-01-06T00:00:00Z

Recently, I shared a write-up of some work I worked on with a few wonderful colleagues (Anne and Justin) at the California Office of Digital Innovation, where I've done varied work for the past ~1.5 pandemic years or so.

This work was fundamentally about identifying — in an acute and actionable way — barriers Californians were facing in accessing the Emergency Broadband Benefit (EBB), a pandemic benefit providing $50-$75 per month for internet to lower-income Americans.

Anne made a comment on Twitter:

https://twitter.com/anneneville/status/1475585650246123525

So I figured I would briefly detail why I really like having Google Search ads ("AdWords") in the toolbelt for doing user research.

1. You can reach people right at the point that they have the acute need you're looking to understand (aka "already activated") #

Talking to someone who, 15 minutes ago, searched for "cheap internet" or "internet discount" is very different than talking to, for example, someone who is generally eligible for an internet subsidy because they live on a lower income.

The slightly more jargon-y way to put this is the person is already activated: the need is top of mind, and they're pursuing it — right now.

Talking to people who may want but don't have broadband internet at home is also useful; it's just getting at a different piece of the puzzle.

In our case, we wanted to understand what might be standing in the way of a person who is eligible and has already decided they have this need.

And that's precisely the research context where connecting with someone based on a search query they made is so useful.

2. With ads, you can control volume (# of people who see the ad and who you reach out to) #

You can turn ads off and on at will. You can also set a daily budget target that will generally constrain your spend (and how many people see it.) It's not perfectly precise throttling, but it's absolutely good enough for most cases.

Controlling volume is particularly valuable if you want to have fairly involved conversation with someone, like we did actively helping folks through the application as far as they wanted.

It's also helpful if you're, for example, not wanting to drive a massive amount of traffic to the landing page because this is your very first exposure of users to whatever content you put there. Many government sites are so high-volume that a direct link from, say, a main page would drive an immense amount of traffic. While you should work hard to make whatever landing page content is there valuable, no amount of thinking and working hard on that content is a substitute for getting view into how actual users engage with it. (To riff on an old saying: not even the strongest content design survives first contact with actual users.)

While search ads are one way to control volume, you can also achieve that in other ways for this kind of recruitment:

Having an on-page "intercept" (thing that actually asks the user if they're open to talking) that you can toggle on or off
Sprinkling just a dash of code on the landing page to show only a certain percent of users (say, 1 in 10) the intercept

3. While you might not have set up AdWords, other parts of your organization (even in government) may already have done that #

Google Search ads are generally a tool for marketing and digital outreach.

While a research or technology team may not already have done that, a public affairs team or another department might have.

Plenty of government agencies use AdWords for digital outreach already in fact. (As an anecdote, here in California I have seen search ads for getting a vaccine from a public health agency.)

So even in organizational contexts where it's difficult to buy things like pay-as-you-go SaaS, look around and you may well find some smart colleague over in another spot has already navigated all that (albeit not for your research uses!)

Aside: even if you are in a small, nimble org, setting up AdWords may be a little intimidating (it's a very powerful tool and design for power does not necessarily lead to simplicity.) It may not be the most zero-friction onboard, but you can definitely do it.

4. You can reach users without creating work for others in your org (or in a partner org) #

A common way to recruit people for research on a web site is to put that intercept on the page. Let's say I'm trying to understand user' unmet needs in navigating the Department of Some Benefit (DSB.) Your first thought might be to put an intercept on the DSB home page. And yes, that's great!

But in doing that, you're creating work for others when you don't have to. If you step back and think about your work with the incredible point of contact you have at DSB as a design problem itself, you're actually making the choice a little bit harder for them because now they may have to go and try and get this prioritized, and communicate with other folks, etc. Sometimes that's worth it to the end of, for example, creating deeper buy-in.

I happen to be of the opinion that if you have not talked to at least one user though, choose whatever path absolutely minimizes the Time To User (TTU.) It's also nice to be able to say to a partner, "we're going to talk to some users — no action needed on your end, let us know if you have any questions or concerns!"

5. You are reaching people before they hit any friction or barriers in the actual service that may push them out #

For a lot of people, a web search is the first step to anything. And it's on the whole an extraordinarily low friction step to take.

So if you cast your net there, you can reach a really wide swath of users.

By contrast, intercepting users later in some "journey" may add selection bias of in favor of users able to jump over some hoop or point of friction.

One example: intercepting users at a certain question screen in an online benefits application is great to understand how people might struggle with that question. But did users have to create an account to get there? Now you're maybe losing vision into those folks' experience.

There's no perfect approach, but the point is that there are such low barriers to making a web search that you get people before any kind of friction in the process pushes anyone out. That's pretty valuable, particularly for services with "heavy front doors" (say, complex account creation or identity proofing steps early on.)

6. Being forced to write a landing page with some value for users is a good forcing function to actually "work the problem" (the MMM-MVP) #

For our EBB research, we didn't just run ads saying "we're doing research please talk to us!" We wanted to provide actual value to Californians who were searching for a need they had, since this benefit was now available to them.

So we had to both make ad copy language and write a landing page (in plain, friendly language) that actually communicated the benefit in a way that could create understanding and clarity on (if they wanted) how to go get it.

And doing that led to very practical knowledge development. Whereas before it had been pretty abstract thinking about the benefit, being forced to write content actually forced us to grapple with substantive questions (and similar to what users themselves would have to tackle.)

Example: Do families with kids receiving free and reduced lunch qualify? Yes, but what about the special pandemic situations where families who don't normally get it did? Have to dig into that!

It's not a lot of work, but it's very different work than background research because of the practical motivation of trying to clearly explain this thing to real people.

Limitations and constraints #

Search ads are just one tool. And of course it has limitations! (If you've ever met me, you'd know any totalizing view of some single lever "solving" a messy complex problem is decidedly not my jam.)

Depending on your aim, you can and should also talk to people who prefer offline (in-person) services; who don't go to search first; who search in languages other than ones you speak; etc.

So thinking about how this complements other approaches is good. Particularly, how does this stack up against how your current default way of finding people to talk to? What different sampling biases does this present relative to the sampling biases in those approaches?

(For example, in our case I wanted to make sure we reached users out in more rural parts of California — instead of that being a massive lift and only being limited to one locale we could visit, we were able to talk to multiple people from very different parts of rural California. On the flip side, we didn't really get to talk to people who were completely cut off from internet whatsoever: by definition we were reaching people who often were using smartphones to search or a library or work computer.)

But a big virtue of this particular tool is — as we found in our EBB work — it's one of the lowest-friction ways to "get out of the building" and talk to the people who are the only true experts in their own experience of a thing. And speaking only from my own experience, the longer you wait to do that — and the more you let get in your way — the more likely you are to develop mental models that are detached from that experience.

If this was useful to you, you can sign up for very occasional emails from me on things like this.