Trustless Pursuance: Decentralized Architecture

In a rapid series of events, Distributed Denial of Secrets was banned from Twitter for our BlueLeaks release, along with any links to our website, the leaked documents in question, or magnet links to the leaked documents in question (including blocking URL-shortened links, forwarding domains, and a ROT13-encoded version of the magnet link). Shortly after, one of the main DDoSecrets servers was seized by the German authorities, rendering the document archive and search engine unreachable. The raw documents were hosted with torrents and have all been mirrored elsewhere, but the loss of Twitter as a soapbox, as well as the index and search engine, are unfortunate setbacks.

Following the trend of other groups deplatformed from mainstream social-media, DDoSecrets has moved to Telegram and is looking at alternative media options for making announcements. On the hosted infrastructure side, we have dedicated servers for specific projects that remain online, but a shoestring budget limits our ability to operate a separate server for each task with many redundancies.

All of this is to emphasize the importance of decentralized systems, avoiding central points of failure and single-party censorship (as opposed to cooperative censorship). When considering the design of the Pursuance Project, we want a system that can outlive Pursuance as an organization, and can survive without the Pursuance Project’s servers if they’re lost due to lack of funding or censorship or equipment failure.

This post proposes a model for Pursuance without reliance on a permanent and trusted server, for use by groups like DDoSecrets that cannot afford such centralized dependencies. We begin by looking at peer-to-peer chat software, and building from there.

Ricochet

Ricochet is an anonymous encrypted “instant” messaging program. It works by hosting a Tor Onion Service on each user’s computer. To write to another user you need to know their onion service address, at which point both computers can connect to each-other’s onion service and send messages. This eliminates the need for any centralized chat server, requires no network configuration (such as domain registration and port forwarding), and hides every user’s IP address from one another.

Unfortunately, Ricochet has a range of shortcomings. Messages cannot be delivered while a user is offline, allowing only synchronous communication. There is no possibility for group-chats, and the trivial solution of “send a message to each user in the group” would require everyone in the group-chat to be online at once and is completely unscalable.

Cwtch

The good people at Open Privacy have been working on a more sophisticated chat system built on top of Ricochet, known as Cwtch (pronounced “kutch”, translating roughly to “a hug that creates a safe space”). The basic premise is to use Ricochet for peer-to-peer messaging, but to add an untrusted server that can cache messages during offline periods and facilitates group-chat. The server cannot read any messages, and will relay messages to any user that connects and asks. Each message is signed, preventing tampering, and includes a hash of the previous message from the user (blockchain-style), making omission easily detectable. Therefore the server cannot act maliciously without detection, and because it cannot identify users or distinguish their messages, it cannot target specific users and can only act maliciously at random.

Users create a group chat by direct-messaging another user (via Ricochet), establishing a shared symmetric key, and then relaying messages through the untrusted server encrypted with this key. Users are added to the group chat by providing them with the onion address of the relay and the symmetric key. Forward secrecy is achieved by periodically rotating the key and providing the update to all members of the chat via direct-message on Ricochet (and posting all messages using the old and new keys until rotation is complete). Backward secrecy is achieved by not providing these updated keys to compromised users. Users can be removed from a group by updating the shared key and providing it to all but the removed user. More technical details in the Cwtch whitepaper.

It’s important to note how minimal the role of the relay server is in this design. The majority of the work is performed by the client, which discards messages from the relay that aren’t encrypted with a group key the user is in, discards duplicate messages, and parses the contents of messages. The client can be connected to multiple groups on multiple servers concurrently, and a group can be moved from one server to another seamlessly, or even use multiple servers at once if messages are relayed through each server. Since the server is just a relay, and connects via Tor, it requires virtually no configuration and limited resources, and can be trivially deployed.

Cwtch has a few drawbacks (aside from being alpha software not ready for public use), but they are relatively small:

  1. The user that created the group has the responsibility of periodically rotating keys and distributing those keys to users via Ricochet. This can be automated, but it requires that the founder periodically log in, and if they abandon the group then key rotation will cease and forward and backward secrecy are lost. This places the founding user in an “administrative” role, which is reflected by their ability to add and remove users, unlike in Signal where no user is an administrator and the group can only be added to.

  2. There is no system for initial key exchange. Users sign their messages to mathematically prove a lack of tampering, but this is only effective if users exchange keys ahead of time over a trusted platform. This can be alleviated with something like Keybase proof integration to demonstrate that a Cwtch user has a well-established identity on many platforms.

Cwtch is intended to be used as an open source platform for peer-to-peer messaging with groups, allowing others to build more complex software and interactions on top of that foundation. So, let’s do just that…

Pursuance on top of Cwtch

What would it take to decentralize Pursuance so that it exists entirely client-side, with no storage except as messages sent via Cwtch to eliminate a need for a central Pursuance-specific semi-trusted server?

For basic functionality, not much is required. We can describe a pursuance as a Cwtch group-chat with a ton of meta-messages. The first message in a pursuance declares the rules of the pursuance (describing what roles exist) and how they interact), and metadata like the pursuance name. Subsequent messages take one of the following formats:

  1. Changing Rules (Creating or destroying roles or amending how roles interact)

  2. Changes Roles (Assigning or de-assigning a role from a user)

  3. Tasks (Creating, closing, assignment, and commenting on tasks)

Under this sytem, every user in the pursuance has all data in the pursuance, including data they do not have permission to access. Therefore, all tasks must be encrypted to only be readable by roles with permission to read the data. A role exists not only as a designator, but also as a shared key allowing the user to read messages encrypted for that role. Adding and removing users both fall under the category of “changing roles”.

Clients have access to all pursuance rules, and a list of all users and their roles, and so clients can reject illegal messages, such as a user closing a task they have no access to. Since all content in a pursuance consists of tasks and discussions associated with those tasks, this messaging system describes the entire infrastructure of a pursuance.

Cwtch handles “desynchronizing” in a decentralized manner, allowing one user to contact another and get caught up on messages they’ve missed. This is intended to elegantly handle Cwtch relay servers that only cache recent messages, and to allow a smooth transition if the group has moved between servers (because, perhaps, the original server has gone offline). Pursuance inherits this re-synchronization, and allows users to catch up if the pursuance has moved servers.

So to summarize, the advantages of this change include:

  • Encrypted messaging provided by Ricochet and Cwtch instead of reinventing the wheel in Pursuance

  • No central and trusted server required, easily moved in case of server failure

  • Automatic recovery in case of data outage

  • Only have to maintain one piece of software, the Pursuance client, instead of a client and a server

However, there are a number of hurdles remaining (besides Cwtch itself, which remains in Alpha):

  • Synchronizing keys between different pursuances to cross-assign tasks can be complicated

  • Sending tasks between pursuances is challenging: the task must be sent by a user, and the receiving pursuance must somehow verify that the sending-user is part of an approved role in the sending-pursuance

  • All of this becomes harder if a pursuance can move from one server to another and disrupt lines of communication with its neighbors

  • Pursuance discovery remains challenging, and it is unclear what “joining” a pursuance looks like unless new users invited from inside

  • The “group leader” role from Cwtch must be inherited by Pursuance, probably as the pursuance founder; There must be an elegant process for migrating the pursuance and underlying Cwtch chat if the leader becomes inactive

Cwtch isn’t the only option for decentralized data storage and messaging, and alternatives should be considered. One notable alternative is Tox, which stores messages in a decentralized hash table, much like torrents. Another is the InterPlanetary File System, which acts somewhat like an HTTP and git replacement, storing web data with version history in a distributed filestore. A more complete Pursuance implementation may combine several of these technologies - for example storing encrypted tasks on IPFS so they can be accessed by multiple pursuances, but running metadata for a pursuance and exchanging keys within Cwtch. This post should be considered an early abstract for what distributed Pursuance software might look like.

Posted 7/7/20


Information Paradigms and Radical Interfaces

One of my non-social-org interests is the design of interfaces for viewing and interacting with information that can change our perception and thought-processes surrounding that information. I haven’t written much on this subject yet (except tangentially when discussing Halftone QR Codes to create dual-purpose links/images), so this post will be an overview of information paradigms and exciting ideas in the topic space.

Conventional Model: Folder Hierarchies

When computers were imagined as information storage and retrieval systems, they first replaced their physical predecessors: Filing cabinets filled with nested folders. This hierarchy describes every major “filesystem”, including directory trees on Unix/Linux and Windows:

However, a simple hierarchy fails to describe information that belongs in multiple places at once: What if a file could be categorized under “Research” or under “Gradschool”? We can address this problem with links or aliases (so the file is in the “Research” folder but has a reference in the “Gradschool” folder), but it’s an imperfect solution to an awkward shortcoming.

Predominant Internet Model: Hypertext

While a folder hierarchy allows one file to reference another, it does not allow for more sophisticated referencing, like a file referring to several other files, or to a specific portion of another file.

Both of these problems can be improved with hypertext, which introduces a concept of a “link” into the file contents itself. Links can refer to other documents, or to anchors within documents, jumping to a specific region.

Hypertext is most notoriously used by the World Wide Web to link between web pages, but it can also be used by PDFs, SVG files, and protocols like Gopher. Still, there are more sophisticated relations that are beyond hypertext, like referring to a region of another document rather than a specific point.

Xanadu

Ted Nelson’s Project Xanadu aimed to extend hypertext dramatically. Xanadu departed from the idea of static documents referencing one another, and proposed dynamically creating “documents” out of references to information fragments. This embedding allows bidirectional linking, such that you can both view a piece of information in context, and view what information relies upon the current fragment. The goal was to allow multiple non-sequential views for information, and to incorporate a version control system, so that an information fragment could be updated and thereby update every document that embedded the fragment. Information storage becomes a “living document”, or a thousand living documents sharing the same information in different sequences and formats.

Perhaps unsurprisingly, Xanadu’s scope has hindered development for decades. A subset of Xanadu allowing embed-able fragments can be seen in a demo here. The bidirectional-reference model struggles to mesh with the decentralized reality of the Internet and World Wide Web, and a central database for tracking such links is impossible at any significant scale. HTTP “Referer” headers ask browsers to notify web servers when another document links to one of its pages, but the system was never widely deployed as a way of creating bidirectional links. Tumblr and Twitter “reblogging” and “quote retweeting” comes closer by creating bidirectional references, but falls short of the living-document paradigm. Wikipedia accomplishes bidirectional links within its own platform, and allows anyone to edit most pages as they read them, but still considers each page to be independent and inter-referential rather than dynamically including content from one another to create shifting encyclopedic entries.

Some projects like Roam Research continue developing the Xanadu dream, and support embedding pieces of one document in another, and visualizing a network diagram of references and embedded documents. There’s a good writeup of Xanadu’s history and Roam’s accomplishments here.

Explorable Explanations

Linking information together between different fragments from different authors is an insightful and revolutionary development in information design; but it’s not the only one. Bret Victor’s essay Explorable Explanations provide an alternate view of information conveyance. It critically introduces the concept of a reactive document, which works something like a research paper with a simulation built in. Any parameters the author sets or assumptions they make can be edited by the reader, and the included figures and tables adjust themselves accordingly. This provides an opportunity for the reader to play and explore with the findings, develop their own intuition about how the pieces fit together. There’s an example of this kind of interactive analysis at Ten Brighter Ideas.

Visual Exploration: XCruiser

All the models so far have focused on articles: organizing articles, moving between articles, synthesizing articles out of other articles, and manipulating the contents of articles. Other paradigms move more radically from simulating sheets of paper.

XCruiser flying through a filesystem

One such paradigm is XCruiser, which renders filesystems as three dimensional space. Directories are concentric galaxies, while files are planets. The user literally “flies” through their computer to explore and can see parent, child, and parallel directories in the distance.

It is, perhaps, not an intuitive or particularly useful model. Nevertheless, it demonstrates that our classic “hierarchical” view is far from the only possibility. A similar 3-D filesystem renderer is FSV, the File System Visualizer, featured in Jurassic Park.

Kinetic Exploration: Dynamicland

Finally, interfaces need not be purely visual, reached through a monitor, keyboard, and mouse. Bret Victor’s most recent and daring project, Dynamicland seeks to reimagine the boundaries of a physical computer. Terribly oversimplified, the idea is:

  1. Strap a number of projectors and cameras to the ceiling, aimed at the floors and walls

  2. Give the computer OpenCV to recognize objects and read their orientations and text

  3. Program the computer to display information via any projector on any surface defined by physical objects

  4. The entire room is now your computer interface

Ideally, the entire system can be developed from this point using only physical objects, code written on paper, a pencil on the table turned into a spin-dial referenced by a simulation on the wall. No screens, no individual computing.

The main takeaway is that the computer is now a shared and collaborative space. Two computer-users can stand at a wall or desk together, manipulating the system in unison.

Almost all our computing technology assumes a single user on a laptop, or a smartphone. We put great effort into bridging the gaps between separate users on individual machines, with “collaboration” tools like Google Docs and git, communication tools like E-Mail, IRC, instant messaging, slack, and discord. All of these tools fight the separation and allow us to compute individually and share our efforts between computers, but Dynamicland proposes a future where a shared space is the starting point, and collaboration emerges organically.

Posted 6/16/20


Alternative Social-Media Platforms

This post is an informal discussion related to a research project I’ve been involved in and the broader context of alternative media platforms, deplatforming, and community building. This is not peer-reviewed academic work; see the paper for that.

Background

Mainstream social media platforms (Facebook, Twitter, YouTube) have been under public pressure to limit hatespeech and hate groups active in their communities, especially white supremacist and fascist groups. Most of their response has involved deplatforming (banning users) and demonetization (disabling advertisements for content so users can’t profit off of questionable speech). This is usually seen as a broadly good thing because it cleans up the community and removes the most dangerous and toxic members. Most of the debate circles around which users should be deplatformed or demonetized and whether platforms are doing enough, and why platforms are disincentivized from acting, and the odd role of cultural censor this puts private companies in.

However, deplatformed users don’t simply disappear. Those people still exist, and still want to produce content and regain their source of income. So where do they go when they’re exiled from mainstream social media?

Alt-Social Media

Unsurprisingly, banned alt-right users have flocked to a number of alternative social media platforms. Minds stepped into Facebook’s shoes, Voat is a clear clone of Reddit, Gab stands in for Twitter, and BitChute serves as alternate YouTube. Importantly, these platforms don’t advertise as being havens for the alt-right or neo-nazis; they all self-describe as bastions of free speech that take a hands-off approach to moderation. Whether this is a cover story or the platforms are innocent and have been co-opted by exiled horrible people is up for some debate, but doesn’t change the current state of their communities.

Alternative communities face the same user-acquisition problems the mainstream platforms had when they were young: The point of social-media is to be social, and you can’t build a social community without a lot of people to talk to. The network effect is important here; most alternative social networks will fizzle out quickly because nothing interesting is happening there (and we’ve seen that most Reddit clones have stuttered to a halt within a year or so), but successful social networks will attract more and more people, creating a more vibrant community and attracting even more users, amplifying the effect in a feedback loop. Centralization is natural here. It’s a basin of attraction if you want to use system science terminology.

External influence is also important for selecting which alternative communities will thrive. When a major celebrity like InfoWars is banned from YouTube, they carry their substantial following with them, and whatever platform they land on will receive an explosion of accounts and activity.

The Radicalization Problem

On mainstream platforms the alt-right needed to take a careful stance. They want to express their ideas to recruit others, but have to “behave” and stay within acceptable language for the platform and self-censor. When a popular post contains something hateful, the comments are filled with detractors explaining why they’re wrong and offering countering-views.

On alternative platforms these limitations vanish. Content producers can advocate for race war or genocide or fascist dictatorship or whatever flavor of abhorrent views they hold, without repercussions from the platform. For the most part the only people that join the alternate platform were either banned from mainstream platforms or followed content producers that were, creating a magnificent echo-chamber. Because of the network effect, these users converge to a handful of alternative platforms and all meet one another. Under group-grid theory these platforms would be classified as enclaves - not much social hierarchy, but a clear sense of in-group and out-group and shared attitudes.

Obviously, the content producers on alternative platforms have a smaller reach than on mainstream platforms. However, the intensity of their rhetoric increases dramatically, and so, perhaps, does the threat of radicalization. Deplatforming hateful users from big platforms “cleans up” those platforms, but does it potentially fuel violence by cutting the audience off from counter-viewpoints?

The role of alternative social platforms in radicalization is difficult to measure, and is confounded by other communities like image boards and Discord and Telegram groups. What we know is that incidences of alt-right violence are increasing, and many shooters are active on alt-right media platforms, of either the more private Telegram and 8Chan variety or the more public BitChute flavor. What we can say most confidently is “there may be dangerous unintended consequences of deplatforming that should be investigated.”

Solutions?

If deplatforming is dangerous, what alternatives exist?

More Deplatforming

An obvious answer is to increase deplatforming. If we pressured Twitter and Facebook to deplatform harmful users, then we can pressure hosting and DNS providers to delist harmful communities. This has precedence; after a terrible series of shootings by 8Chan members, Cloudflare terminated service for the 8Chan image board. The board is back online, but only after finding a more niche domain registrar and hosting provider Epik, known for supporting far-right sites. Epik was in turn shut down by their own backend hosting provider, who wanted nothing to do with 8Chan, and only came back online after Epik agreed to only provide DNS services for 8Chan. The site is now hosted by a Russian service provider.

This highlights both the successes and limitations of deplatforming. Through collective agreement that a site like 8chan is abhorrent we can pressure companies to stop cooperating, and we can make it very difficult for such communities to operate. However, once a site is committed to using alternative services to stay operational, they can move to increasingly “alternative” services until they find someone willing to take their money, or someone with resources and an agreeable ideology. Deplatforming pushes them away, but they always have somewhere further to go.

De-recommending

The opposite strategies is to let the alt-right remain on mainstream platforms, but find alternative means to limit their influence and disperse their audience. A key piece of this strategy is recommendation algorithms, responsible for selecting similar YouTube videos, relevant search results, and prioritizing content on a feed. These algorithms can be amended to lower the relevance of alt-right content, making it less likely to be stumbled upon and suggested to fewer people. If the content producers still have a voice on the mainstream platforms then they will be disinclined to leave for a small alternative soapbox with a miniscule audience, and they may not even know that their content is de-prioritized rather than unpopular.

An important consideration: Changes to recommendation algorithms are fraught with challenges, and place more authority in the hands of media platforms, who would be increasingly responsible for shaping culture through mysterious and unobserved means.

Counter-Suggestions

Social Media platforms have been unusually quick to combat misinformation about COVID-19 during the ongoing pandemic. At any mention of COVID, YouTube includes links to CDC and World Health websites with reliable information about the state of the disease. This protocol could be expanded, linking to the Southern Poverty Law Center or Anti-Defamation League or other positive influences at mentions of hate trigger-phrases.

Is this strategy effective? Does it combat hateful views as well as misinformation? Could countering misinformation help prevent the formation of some hateful views to begin with? This is an ongoing research area. One benefit of this strategy is that it can be deployed widely; attaching an SPLC link just below a video title does not deplatform the uploader, and does not need to carry the same weight as making decisions about censorship.

Federating

Both de-recommending and counter-suggestions place more authority in the hands of platforms and see them as the arbiters who decide which cultural problems must be addressed. Detractors to this idea regularly suggest moving to decentralized platforms like Mastodon and Diaspora. In federated social networks, users have a “home” on a particular server, which has its own rules of conduct and permitted content. Servers can interact with one another, agreeing to bridge content between servers and further its reach (a kind of dynamic collaboration loosely reminiscent of connections between Pursuances). In theory this provides a more organic solution to censorship, where everyone is allowed to exist on the federated platform, but if their content is unsightly then it won’t propagate far.

Unfortunately, in practice these federated platforms have suffered from low membership and high centralization, both as a consequence of the network effect. Projects like Mastodon and Diaspora have good intentions and intriguing designs, but without a large community they cannot attract members from Twitter and Facebook, and so mainstream platforms remain viable recruiting grounds for alt-right spaces. Further, running your own federated server within one of these platforms suffers from the same network effect, and frequently leads to centralization on a small number of servers. I wrote about this very briefly a few years ago, and the problem has persisted, to the point that more than half of Mastodon’s users are on three federated instances.

Almost exactly one year ago we witnessed a case study in how federated platforms can handle censorship, when Gab rebuilt itself as a Mastodon instance. The response was positive: Most server operators promptly blocked Gab, isolating its instance on the network. Gab can use the Mastodon code, but not its community. This suggests that federated social networks could be an effective solution; if they can grow their populations.

Posted 6/12/20


Splintering a Pursuance

The primary goal of the Pursuance Project that differentiates it from any other group-based task-management software is the emphasis on organic collaboration, and the trivial creation of new Pursuances that allow complex information flow and rapid organization. That’s buzzwordy, so in a practical sense “it’s really easy to make new organizations and working groups, and split a working group off into a new organization”.

I’ve written in some previous posts about the proposed role system for Pursuance, and how to create collaborations between Pursuances built on top of the role system and using “tasks” as a basic unit that can have attached discussions and can be assigned and referenced between Pursuances. This post will center on ways to create Pursuances more organically than by creating a new blank Pursuance and inviting users in.

As a very quick review, most online collaboration works through shared membership, where one or more individuals are a part of two organizations (or an honorary part of one to facilitate collaboration), and “bridge the gap” between both groups:

This has a moderate startup cost - it’s easy to introduce the new person and add them to some group chats, but you need to bring them up to speed on how the organization operates, add them to Google Docs, Signal, wikis, Keybase, or whatever other infrastructure the group uses. This setup is also brittle, because if the “bridge” members become inactive or leave then the collaboration promptly falls apart. Adding a new user to replace the missing bridge requires repeating much of the same onboarding process, and as membership shifts in both organizations it is easy to lose touch over time.

In the Pursuance model we hope to create a link at an organizational level, that allows sharing tasks, messages, and documents together without sharing members:

More formally, a collaboration exists as a shared agreement between a role in each Pursuance, such that members within a role in one pursuance may assign tasks to a role in another pursuance:

This has the benefit of granting each Pursuance autonomy: Each group can have whatever internal structure they want, and whatever membership turnover works for them, and as long as the roles involved in collaboration still exist the groups can continue to interact in this shared context.

Sometimes, however, collaboration can turn into something more. A collaborative project may split into a Pursuance all on its own, that may even outlive the two parent groups. This is natural and acceptable, and the platform should support such shifting bureaucracy.

To enable this kind of growth, we need to expand our concept of how collaboration works. Imagine that two Pursuances (in this case, the Pursuance Project itself and Distributed Denial of Secrets, which worked together on #29Leaks) want to begin a large collaborative project. We know this project is going to get pretty complex and will involve collaboration with several outside individuals (like journalists) that aren’t part of either original group. This project sounds bigger than just a collaboration between two groups: It’s going to need roles and collaborative agreements of its own. This project sounds more like a Pursuance. So let’s create a new Pursuance with agreements with each of its parent pursuances:

organizer {
	assign tasks technical
	accept tasks 29leaks@ddosecrets
	invite journalist
}

technical {
	accept tasks organizer
	accept invite engineers@pursuance-project
	accept invite sysadmins@ddosecrets
}

journalist {
	accept invite relations@pursuance-project
	contact organizer
	contact technical
}

This looks something like the following:

This creates an interesting arrangement: 29Leaks is an independent Pursuance that can create its own rules and roles, and 29Leaks organizers can invite journalists directly into the project. However, the edges of this Pursuance are permeable in that technical staff from either the Pursuance Project or DDoSecrets can volunteer themselves and automatically join the technical role of 29Leaks, the facilitating “29leaks” role in DDoSecrets can assign tasks to the 29Leaks organizers, and the public relations group from the Pursuance Project can directly add relevant journalists to the journalists role within 29Leaks. This means that while 29Leaks is an independent entity with its own structure it is also trivial to communicate with the two original Pursuances.

Imagine a scenario where 29Leaks continues to operate long into the future. It has accumulated collaborations with several other groups, and it is no longer appropriate to consider this a “child project” of DDoSecrets and the Pursuance Project. Maybe 29Leaks continues to accept technical support from DDoSecrets, but is otherwise fully independent. The administrators of 29Leaks with the founders role may amend the rules to:

organizer {
	assign tasks technical
	invite journalist
}

technical {
	accept tasks organizer
	accept invite sysadmins@ddosecrets
}

journalist {
	contact organizer
	contact technical
}

And the organization now looks like:

In this way, Pursuances can grow and adapt over time, becoming independent entities or folding into other larger Pursuances. When a Pursuance no longer serves a useful purpose it can be discarded, and active members will have adopted new roles in other Pursuances. Thus, even as the people and groups involved shift, activism moves forward.

Posted 5/22/20


Memory: Allocation Schemes and Relocatable Code

This Spring I’m a graduate teaching assistant for an operating systems course. Campus is closed during the COVID-19 epidemic, so I’ve been holding office hours remotely and writing up a range of OS explanations. Below is one of these writeups for a recent topic of some confusion.

There are, broadly, four categories of memory allocation schemes that operating systems use to provision space for executables. This post will discuss each of them in historical order, with pros and cons, and some executable formats (relocatable and position-independent code) that are closely intertwined. We’ll include shared memory and dynamic libraries at the end. Note that we are not talking about malloc and free - the way an application allocates and deallocates memory within the space the operating system has already assigned it - but the way the operating system assigns memory to each program.

1: Flat Memory Model / Single Contiguous Allocation

In a flat memory model, all memory is accessible in a single continuous space from 0x0 to the total bytes of memory available in the hardware. There is no allocation or deallocation, only total access to all available memory. This is appropriate for older computers that only run one program at a time, like the Apple II or Commodore 64. If a program needs to keep track of which memory it’s used then it may implement a heap and something like malloc and free within the memory range. However, if we want to run multiple programs at the same time then we have a dilemma. We want each program to have its own memory for variables, without stepping on one another. If both programs have full access to all the memory of the system then they can easily and frequently write to the same memory addresses and crash one another.

2: Fixed Contiguous Memory Model

Enter our second model, designed explicitly for multi-process systems. In this model, memory is broken into fixed-length partitions, and a range of partitions can be allocated to a particular program. For example, the following is a diagram of memory broken into 40 equally-sized blocks, with three programs (marked A, B, and C) running:

AAAAAAAAAA
AA........
..BBBBBBB.
...CCCCC..

When a program starts, the operating system determines how much memory it will need for static variables and the estimated stack and heap sizes. This is usually a guess based on the size of the executable file plus some wiggle room, but in older operating systems you could often override the system’s guess if you knew the executable required more memory:

Mac OS 9 settings panel for setting the preferred amount of memory for an executable

Once the operating system determines how much memory is needed, it rounds up to the nearest partition size and determines how many blocks are required. Then it looks for an area of unallocated memory large enough to fit the executable (using one of the algorithms described below), assigns those blocks to the executable, and informs the executable of where its memory begins (a base address) and how much memory it can use.

Since we can only assign an integer number of blocks, there is always some waste involved: A program requiring 1.1 blocks of space will be assigned 2 blocks, wasting most of a block. This is called internal fragmentation.

First-Fit

In the simplest algorithm, the operating system scans memory for the first open space large enough to fit the new program. This is relatively quick, but often leads to poor choices, such as in the following scenario:

AAAAAAAAAA
AA........
..BBBBBBB.
...CCCCC..

We have a new program ‘D’ that we want to start running, which requires three blocks worth of memory. In a First-Fit algorithm we might place ‘D’ here:

AAAAAAAAAA
AADDD.....
..BBBBBBB.
...CCCCC..

This suits D’s purposes perfectly, but means we have reduced the largest free memory space available, so we can no longer fit programs larger than 7 blocks in length.

Next-Fit

The first-fit algorithm also uses memory unevenly - since it always searches memory from 0x0 onwards, it will prefer placing programs towards the start of memory, which will make future allocations slower as it scans over a larger number of assigned blocks. If we keep track of the last assigned block and search for new blocks from that point, the algorithm is called next-fit, and is a slight optimization over first-fit.

Best-Fit

In the “Best-Fit” algorithm we scan all of memory for the smallest unallocated space large enough to fit the program. In this case we would allocate ‘D’ here:

AAAAAAAAAA
AA........
..BBBBBBBD
DD.CCCCC..

This is better in the sense that we can still fit programs of 10 blocks in size, but there’s a downside: The space immediately after ‘D’ is only one block long. Unless the user starts a very small 1-block executable, this space will be effectively useless until either process ‘D’ or ‘C’ exits. Over time, the best-fit algorithm will create many-such holes. If there is sufficient total space for a program, but no unallocated region is large enough to load it, we use the term external fragmentation.

Worst-Fit

“Worst-Fit” is something of a misnomer and should be thought of as “the opposite of best-fit”. In worst-fit we find the largest unallocated memory space, and assign the first blocks from this region to the new program. This avoids the external fragmentation problems of the best-fit algorithm, but encounters the same problem as our first-fit example: A larger program will not have space to run, because we’ve been nibbling away at all the largest memory regions.

Relocatable Code

In order to implement any of these memory allocation schemes, we must be able to tell programs where their variables may be placed. Traditional flat-memory executables use absolute addressing, meaning they refer to variable locations by hard-coded memory addresses. In order to run multiple processes concurrently we need to redesign the executable format slightly to use some kind of offset table. When the program starts, the operating system fills in the true locations of variables in this table, and programs refer to variable locations by looking them up in this table rather than hardcoding the absolute variable locations in. This ability to have executable code run at any starting address in memory is called relocatable code. The exact implementation details of relocation are platform-specific.

As a side-note, we also use relocatable code during the compilation of executables. Object files (.o files) are parts of a program that have been compiled, but not yet linked to one another to create the final executable. Because of this, the final locations of variables are not yet known. The linker fills out the variable locations as it combines the object files.

Defragmentation

Obviously when a program exits we can reclaim the memory it used. Why can’t we move executables while they’re running? Specifically, why can’t we go from:

AAAAAAAAAA
AA........
..BBBBBBBD
DD.CCCCC..

To a cleaner:

AAAAAAAAAA
AABBBBBBBD
DDCCCCC...
..........

This would give us all the room we need for large executables! Unfortunately even relocatable executables aren’t this flexible - while they are designed to be started at any memory location, once the program is running it may create pointers to existing variables and functions, and those pointers may use absolute memory addresses. If we paused an executable, moved it, rewrote the relocation table, and resumed it, then many pointers would be invalid and the program would crash. There are ways to overcome these limitations, but when we get to virtual memory the problem goes away on its own.

3: Dynamic Contiguous Memory Model

The dynamic contiguous memory model is exactly the same as the fixed contiguous memory model, except that blocks have no fixed, pre-determined size. We can assign an arbitrary number of bytes to each executable, eliminating internal fragmentation. This adds significant overhead to the operating system, but from an application perspective the executable is similarly given a starting position and maximum amount of space.

4: Virtual Memory

The modern approach to memory allocation is called virtual memory. In this system we create two separate addressing modes: logical addresses, which are seen by the executable, and physical addresses, which are the true locations of variables seen by the operating system. Whenever the executable reads or writes any memory the operating system translates between the userspace logical address and the physical memory address.

Adding a translation layer before every memory access adds massive overhead - after all, programs read and write to variables all the time! However, there are a number of tempting advantages to virtual memory, so as soon as computer hardware became fast enough to make implementation feasible, all major operating systems rapidly adopted it.

First, we lose the requirement that program memory be contiguous: We can present a userspace program with what appears to be a contiguous memory space from 0x0 to its required memory amount, but when it accesses any address we translate to the true physical location, which can be split up across any number of memory pages. External fragmentation is vanquished!

Further, there’s no reason a logical memory address has to map to any physical memory address. We can tell every executable that it has access to a massive range of memory (4 gigabytes on 32-bit systems, since that’s the range a 32-bit pointer can describe), and only map the pages of memory the executable actually accesses. This means we no longer have to calculate how much space an executable needs before it starts, and can grow as demands change.

Obviously if we offer every executable 4 gigabytes of memory and many take up that much space then we’ll quickly run out of real physical space. Virtual memory comes to the rescue again - we can map a logical address not to a physical address in RAM, but to a location on the disk drive. This lets us store memory that hasn’t been accessed in a long time (known as swap), freeing up space for new programs, and letting us dramatically exceed the true amount of RAM in a system.

Shared Memory

Virtual memory adds a lot of complexity to the operating system, but in one area things are much simpler. Shared memory is when two or more programs have access to the same region of memory, and can each read or write to variables stored inside. With virtual memory, implementing shared memory is trivial: Since we’ve already decoupled logical and physical memory addresses, we can simply map space in two different programs to the same physical region.

Shared/Dynamic Libraries

We can re-use shared memory to implement shared libraries. The idea here is that common pieces of code, like compression or cryptography libraries, will be needed by many programs. Instead of including the library code in every executable (a practice called static linking), we can load the library once, in a shared memory location. Every executable that needs the library can have the shared code mapped in at runtime. This is referred to as dynamic linking.

On Windows, dynamically linked libraries (DLLs) are implemented using relocatable code. The first time a library is loaded it can have any logical address in the first program that uses it. Unfortunately, relocatable code has a severe limitation: the code can only be relocated once, at program start. After that, any pointers to variables or functions will contain their absolute memory addresses. If we load the DLL in a second program at a different logical address then all the pointers within the library will be invalid. Therefore, Windows requires that DLLs be loaded at the same logical address in all executables that use it. Windows goes to great lengths to ensure that each library is loaded at a different address, but if for some reason the address is unavailable in an executable then the library cannot be loaded.

Unix and Linux systems implement shared libraries without the restrictions of relocatable executables. To do this, they have restructured the code in libraries once again…

Position-Independent Executables

Position-Independent Executables (PIE) can be loaded at any location, without a relocation table or last minute rewriting by the operating system. To accomplish this feat, PIE code uses only pointers relative to the current instruction. Instead of jumping to an absolute address, or jumping to an offset from a base address, PIE executables can jump only to “500 bytes after the current instruction”. This makes compilation more complicated and restricts the types of jumps available, but means we can load a segment of code at any location in memory with ease.

Modern Unix and Linux systems require that all shared libraries be compiled as Position-Independent Executables, allowing the operating system to map the library into multiple executables at different logical addresses.

Wrapping Up

Modern operating systems implement virtual memory, allowing them to run many executables concurrently with shared memory and shared libraries, without concern for memory fragmentation or predicting the memory needs of executables. I’ve focused this post on memory allocation from an application perspective, with little regard for hardware, but the implementation details of memory allocation schemes (such as block or page size) are often closely tied to the use of size and number of physical memory chips and caches.

As a final note, virtual memory suggests that we can switch back from relocatable code to absolute addressing, at least outside of shared libraries. After all, if the operating system is already translating between logical and physical addresses, then can’t we use absolute logical addresses like 0x08049730 and let the OS sort it out? Yes, we could, but instead all executables are compiled using either relocatable code or position-independent code, in order to implement a security feature known as Address Space Layout Randomization (ASLR). Unfortunately, that’s out of scope for this post, as explaining the function and necessity of ASLR would require a longer crash-course on binary exploitation and operating system security.

Posted 4/26/20


View older posts