How to Decentralize AI

Updated: Feb 25

Written Mar 3, 2023 and posted on my old Substack blog. Moved to Wix so I could embed AI chatbots into blog posts.

Why Decentralizing AI Matters - Part 1

As I dropped out of university and the military, I worked for an AI startup who asked me to build their data labeling team. Data labelers do tasks that train AI through reinforcement learning with human feedback which can be something as simple as a captcha style object recognition task where you might have a grid of squares and you click the squares with cars in it so the AI model gets better at recognizing cars.

I built up this team of 70 data labelers from developing countries and they'd tell me they could pay for tuition and thank me for this opportunity, and it felt great. But within 6 months of the company launching we received a contract offer from a social media company who said we could generate their content instead of them hiring hundreds of people in the Philippines. Plus, the way data labeling works is you do a task manually enough times until the AI model does it automatically then you're out of a job, so eventually the data labelers we employed no longer had revenue from this. That's a lot of jobs automated within a very short time.

I realized that the question of the century is an urgent one, how do we decentralize ownership of AI and distribute the benefits? This became very important to me personally as I consider my impact on the world and the future we're building. I don't want to live in a world where people are made irrelevant, I want to live in a world where people are more capable of living the lives that they wish to live. How do we make this happen?

How to Decentralize AI - Part 2

Imagine if you could own your data that's being used to train AI models. As the model is used and generates revenue you get paid some fraction for contributing data to that model, like royalties or like stocks in a company. This could be done with profit sharing through an LLC for each LLM or NFTs or DAOs. This means people would continue to receive long-term benefits and decentralized ownership of AI models. As well as this, you still own the data meaning that you can shut off access to it whenever you want which makes a collective kill switch. So if a corporation starts using an AI model in a way that you don't like then you can cripple it by removing access to your training data that has fine-tuned it. For example, if Palantir starts using drones to blow up cars in Afghanistan then everyone who contributed data or did captcha style tasks saying which squares have cars in them, then we could collectively decide to shut off access to that and then the drones would become far worse at recognizing cars and become less effective. This would make a decentralized failsafe for AI.

This addresses decentralized data ownership, though decentralized compute is another topic I’ll try to find time to write about in the future.

The Challenges of Decentralizing AI Data Ownership - Part 3

Now there's three main difficulties I see for decentralizing data:Data needs to be uncopyable, permissions enabled, and quickly accessible. 1. In order for the data ownership to mean anything then we need to make sure it can't simply be copied. So the data needs to be opaque yet still usable. Maybe we could make queryable data without giving access or perhaps using a middle man or a smart contract with encryption and a zero knowledge proof. This may also be possible through homomorphic encryption which allows data to be used while it is still encrypted. The CTO of TripleBlind in Kansas City is the inventor of homomorphic encryption. 2. In order to have a decentralized failsafe, we need to be able to remove access to our data or delete it when we want. We need to be able to adjust the permissions for our data perhaps on a blockchain. 3. Lastly, we need to ensure that data retrieval is low latency so that way we have rapid responses so using a model isn't like talking to someone on Mars where we send a message and have serious delays in the response. I'm not a blockchain guy and I really don't understand how feasible these challenges are and hope to chat with smarter people who have ideas on this. Please feel free to reach out as I'd be grateful for any advice.

Update: I’m glad to say that at the start of my job the best I could find was 10x the cost to run a model using homomorphic encrypted contributions of data but a year later in Denver in 2023 at the first web3 x AI telegram group meetup there was someone who figured out to do it at only 3.5x the cost.

How to Mine AI - Part 4

Now how do we actually make this understandable from a user perspective?The solution is to make data labeling gamified and educational.Imagine you have a platform like Khan academy or where you complete tasks, do quizzes, and gain points that have real world on-going value and release dividends the way stocks or tokens do. Here's an example. For training a large language model to be able to do a smarter command F, companies or open-source organizations might release a set of tasks where people see a question then read articles and highlight the passages that are relevant to answering that question.

Sets of tasks such as these could be posted by the organization then users could find those tasks by filtering for what type of skillset is needed or what type of organization they want to contribute to and invest in as they gain tokenized shares in that model. With a good UX, this basically becomes Wikipedia hopping and reading articles and getting paid for it and contributing to a future with distributed ownership of AI, and I like reading articles just for fun. There are also games where people try to find an object in an image, which could be made into data labeling for object recognition fine tuning. This could open up streams of income to people in developing countries.

You can also add reputational points and achievements and badges and level up your avatar and have a marketplace for data labelers to exchange shares if they wish and a marketplace for people to share their fine-tuned models, which is why I did just buy and 

This is how to mine AI the way people mine crypto, which could be massive. But I would only want to do this if I knew that the decentralized data ownership challenge was solved.

This is an open and urgent question. How do we decentralize AI?

