Creating Artificial Life with Reinforcement Learning

Overview

Code and instructions for creating Artificial Life in a non-traditional way, namely with Reinforcement Learning instead of Evolutionary Algorithms.

Although Evolutionary Algorithms have shown to result in interesting behavior, they focus on learning across generations whereas behavior could also be learned during ones lifetime. This is where Reinforcement Learning comes in, which learns through a reward/punishment system that allows it to learn new behavior during its live time. Using Reinforcement Learning, entities learn to survive, reproduce, and make sure to maximize the fitness of their kin.

Table of Contents

  1. About the Project
  2. Getting Started
    2.1. Prerequisites
    2.2. Usage
    2.3. Google Colaboratory
  3. Environment
    3.1. Agents
    3.2. Observation
    3.3. Reward
    3.4. Algorithms
  4. Results
  5. Documentation
    5.1. Training
    5.2. Testing

1. About the Project

Back to ToC
The simulation above is a good summary of what this project is about. Entities move and learn independently, eat, attack other entities, and reproduce. This is all possible by applying Reinforcement Learning algorithms to each entity, such as DQN and PPO.

The general principle is simple, each entity starts by randomly executing some actions and will slowly learn, based on specific rewards, whether those actions helped or not. The entity is punished if the action is poor and rewarded if it was helpful.

It is then up to the entities to find a way to survive as long as possible while also making sure their kin is in good shape as possible.


2. Getting Started

Back to ToC

To get started, you will only need to install the requirements and fork/download the ReinLife package, together with the train.py and test.py files.

2.1. Prerequisites

To install the requirements, simply run the following:
pip install -r requirements.txt

2.2. Usage

Due to the many parameters within each model and the environment itself, it is advised to start with train.py and test.py. These files have been prepared such that you can run them as is.

Training

To train one or models, simply run:

from ReinLife.Models import PERD3QN
from ReinLife.Helpers import trainer

brains = [PERD3QN(), 
          PERD3QN()]

trainer(brains, n_episodes=15_000, update_interval=300, width=30, height=30, max_agents=100,
        visualize_results=True, print_results=False, static_families=False, training=True, save=True)

This will start training the models for 15_000 episodes. The most important variable here is static_families. If this is set to True, then there will be at most as many genes as the number of brains chosen. Thus, you will only see two colors. If you set this to False, then any number of genes will be created each with their own brain.

Testing

To test one or models, simply run:

from ReinLife import tester
from ReinLife.Models import DQN, D3QN, PERD3QN, PPO, PERDQN

main_brains = [PPO(load_model="pretrained/PPO/PPO/brain_gene_0.pt"),
               DQN(load_model="pretrained/DQN/DQN/brain_gene_0.pt", training=False),
               D3QN(load_model="pretrained/D3QN/D3QN/brain_gene_0.pt", training=False),
               PERD3QN(load_model="pretrained/PERD3QN/Static Families/PERD3QN/brain_gene_1.pt", training=False),
               PERDQN(load_model="pretrained/PERDQN/PERDQN/brain_gene_1.pt", training=False)]
tester(main_brains, width=30, height=30, max_agents=100, static_families=True, fps=10)

The models above are pre-trained (see results below).
You can choose any number of brains that you have trained previously. Note, make sure to set all models (except PPO) to training=False, otherwise it will demonstrate more random behavior.

2.3. Google Colaboratory

It is possible to run the training code in google colaboratory if you need more computing power. You start by installing pygame and cloning the repo:

!pip install pygame
!git clone https://github.com/MaartenGr/ReinLife.git
%cd ReinLife

After that, you are ready to run the training code:

from ReinLife.Models import PERD3QN
from ReinLife.Helpers import trainer

n_episodes = 15_000

brains = [PERD3QN(train_freq=10), PERD3QN(train_freq=10)]

env = trainer(brains, n_episodes=n_episodes, update_interval=300, width=30, height=30, max_agents=100,
        visualize_results=True, print_results=False, google_colab=True, render=False, static_families=True,
        training=True, save=True)

Then, simply look at the files on the left in ReinLife/experiments/... to find the experiment that was run.


3. Environment

Back to TOC

The environment is build upon a numpy matrix of size n * m where each grid has a pixel size of 24 by 24. Each location within the matrix represents a location which can be occupied by only a single entity.

3.1. Agents

Agents are entities or organisms in the simulation that can move, attack, reproduce, and act independently.

Each agent has the following characteristics:

  • Health
    • Starts at 200 and decreases with 10 each step
    • Their health cannot exceed 200
  • Age
    • Starts at 0 and increases 1 with each step
    • Their maximum age is 50, after which they die
  • Gene
    • Each agents is given a gene, which simply represents an integer
    • All their offspring have the same gene value
    • Any new agent that is created not through reproduction gets a new value
    • This gene is represented by the color of the body

An agent can perform one of the following eight actions:

  • Move one space left, right, up, or down
  • Attack in the left, right, up, or down direction

The order of action execution is as follows:

  • Attack -> Move -> Eat -> Reproduce

Movement

An agent can occupy any un-occupied space and, from that position, can move up, down, left or right. Entities cannot move diagonally. The environment has no walls, which means that if an entity moves left from the most left position in the numpy matrix, then it will move to the most right position. In other words, the environment is a fully-connected world.

Although the movement in itself is not complex, it becomes more difficult as multiple entities want to move into the same spot. For that reason, each entity checks whether the target coordinate is unoccupied and if no other entity wants to move in that space. It does this iteratively as the target coordinate changes if an entity cannot move.

Attacking

An agent can attack in one of four directions:

  • Up, Down, Left, or Right

They stand still if they attack. However, since it is the first thing they do, the other agent cannot move away. When the agent successfully attacks another agent, the other agent dies and the attacker increases its health. Moreover, if the agent successfully attacks another agent, its border becomes red.

(Re)production

Each agent learns continuously during its lifetime. The end of an episode is marked by the end of an agents life.

When a new entity is reproduced, it inherits its brain (RL-algorithm) from its parents.

When a new entity is produced, it inherits its brain (RL-algorithm) from one of the best agents we have seen so far. A list of 10 of the best agents is tracked during the simulation.

3.2. Observation

The field of view of each agent is a square surrounding the agent. Since the world is fully-connected, the agent can see "through" walls.

The input for the neural network can be see in the image below:

test

There are three grids of 7x7 (example shows 5x5) that each show a specific observation of the environment:

  • Health
    • Shows the health of all agents within the agent's fov
  • Kinship
    • Shows whether agents within the agent's fov are related to the agent
  • Nutrition
    • Shows the nutritrional value of food items within the agent's fov

Thus, there are 3 * (7 * 7) + 6 = 153 input values.

3.3. Reward

The reward structure is tricky as you want to minimize the amount you steer the entity towards certain behavior. For that reason, I've adopted a simple and straightforward fitness measure, namely:

test

Where r is the reward given to agent i at time t. The δ is the Kronecker delta which is one if the the gene of agent i, gi, equals the gene of agent j, gj, and zero otherwise. n is the total number of agents that are alive at time t. Thus, the reward essentially checks how many agents are alive that share a gene with agent i at time t and divides by the total number of agents alive.

The result is that an agent's behavior is only steered towards making sure its gene lives on for as long as possible.

3.4. Algorithms

Currently, the following algorithms are implemented that can be used as brains:

  • Deep Q Network (DQN)
  • Prioritized Experience Replay Deep Q Network (PER-DQN)
  • Double Dueling Deep Q Network (D3QN)
  • Prioritized Experience Replay Double Dueling Deep Q Network (PER-D3QN)
  • Proximal Policy Optimization (PPO)

4. Results

Back to TOC

In order to test the quality of the trained algorithms, I ran each algorithm independently against a copy of itself to test the speed at which they converge to a high fitness. Below, you can see all algorithms battling it out with PER-D3QN coming out on top. Note, this does not mean it is necessarily the best algorithm. It might have converged faster than others which limits their learning ability.

test

Moreover, for each algorithm, I ran simulations with and without static families.

DQN

With static families

PER-DQN

With static families

D3QN

With static families

PER-D3QN

With static families
With static families

PPO

With static families

5. Documentation

Back to TOC

5.1. Training

The parameters for train.py:

Parameter Description Default value
brains Contains a list of brains defined as Agents by the ReinLife.Models folder.
n_episodes The number of epsiodes to run the training sequence. 10_000
width, height The width and height of the environment. 30, 30
visualize_results Whether to visualize the results interactively in matplotlib. False
google_colab If you want to visualize your results interactively in google_colab, also set this parameter to True as well as the one above. False
update_interval The interval at which average the results 500
print_results Whether to print the results to the console True
max_agents The maximum number of agents can occupy the environment. 100
render Whether to render the environment in pygame whilst training. False
static_families Whether you want a set number of families to be used. Each family has its own brain defined by the models in the variable brains. False
training Whether you want to train using the settings above or simply show the result. True
limit_reproduction If False, agents can reproduce indefinitely. If True, all agents can only reproduce once. False
incentivize_killing Whether to incentivize killing by adding 0.2 everytime an agent kills another True

5.2. Testing

The parameters for test.py:

Parameter Description Default value
brains Contains a list of brains defined as Agents by the ReinLife.Models folder.
width, height The width and height of the environment. 30, 30
pastel_colors Whether to automatically generate random pastel colors False
max_agents The maximum number of agents can occupy the environment. 100
static_families Whether you want a set number of families to be used. Each family has its own brain defined by the models in the variable brains. False
limit_reproduction If False, agents can reproduce indefinitely. If True, all agents can only reproduce once. False
fps Frames per second 10

Other work

ReinLife was based on:

  • Abrantes, J. P., Abrantes, A. J., & Oliehoek, F. A. (2020). Mimicking Evolution with Reinforcement Learning. arXiv preprint arXiv:2004.00048.
Owner
Maarten Grootendorst
Data Scientist | Psychologist
Maarten Grootendorst
Use MATLAB to simulate the signal and extract features. Use PyTorch to build and train deep network to do spectrum sensing.

Deep-Learning-based-Spectrum-Sensing Use MATLAB to simulate the signal and extract features. Use PyTorch to build and train deep network to do spectru

10 Dec 14, 2022
Resources for the Ki testnet challenge

Ki Testnet Challenge This repository hosts ki-testnet-challenge. A set of scripts and resources to be used for the Ki Testnet Challenge What is the te

Ki Foundation 23 Aug 08, 2022
CUDA Python Low-level Bindings

CUDA Python Low-level Bindings

NVIDIA Corporation 529 Jan 03, 2023
A graph neural network (GNN) model to predict protein-protein interactions (PPI) with no sample features

A graph neural network (GNN) model to predict protein-protein interactions (PPI) with no sample features

2 Jul 25, 2022
UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

UnivNet UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation. Training python train.py --c

Rishikesh (ऋषिकेश) 55 Dec 26, 2022
🔥3D-RecGAN in Tensorflow (ICCV Workshops 2017)

3D Object Reconstruction from a Single Depth View with Adversarial Learning Bo Yang, Hongkai Wen, Sen Wang, Ronald Clark, Andrew Markham, Niki Trigoni

Bo Yang 125 Nov 26, 2022
Repository of our paper 'Refer-it-in-RGBD' in CVPR 2021

Refer-it-in-RGBD This is the repository of our paper 'Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images' in CVPR 2021 Pape

Haolin Liu 34 Nov 07, 2022
Official Tensorflow implementation of U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation (ICLR 2020)

U-GAT-IT — Official TensorFlow Implementation (ICLR 2020) : Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization fo

Junho Kim 6.2k Jan 04, 2023
P-Tuning v2: Prompt Tuning Can Be Comparable to Finetuning Universally Across Scales and Tasks

P-tuning v2 P-Tuning v2: Prompt Tuning Can Be Comparable to Finetuning Universally Across Scales and Tasks An optimized prompt tuning strategy for sma

THUDM 540 Dec 30, 2022
Anti-Adversarially Manipulated Attributions for Weakly and Semi-Supervised Semantic Segmentation (CVPR 2021)

Anti-Adversarially Manipulated Attributions for Weakly and Semi-Supervised Semantic Segmentation Input Image Initial CAM Successive Maps with adversar

Jungbeom Lee 110 Dec 07, 2022
An implementation of a discriminant function over a normal distribution to help classify datasets.

CS4044D Machine Learning Assignment 1 By Dev Sony, B180297CS The question, report and source code can be found here. Github Repo Solution 1 Based on t

Dev Sony 6 Nov 09, 2021
Open-source Monocular Python HawkEye for Tennis

Tennis Tracking 🎾 Objectives Track the ball Detect court lines Detect the players To track the ball we used TrackNet - deep learning network for trac

ArtLabs 188 Jan 08, 2023
Code repository for the paper: Hierarchical Kinematic Probability Distributions for 3D Human Shape and Pose Estimation from Images in the Wild (ICCV 2021)

Hierarchical Kinematic Probability Distributions for 3D Human Shape and Pose Estimation from Images in the Wild Akash Sengupta, Ignas Budvytis, Robert

Akash Sengupta 149 Dec 14, 2022
Maximum Spatial Perturbation for Image-to-Image Translation (Official Implementation)

MSPC for I2I This repository is by Yanwu Xu and contains the PyTorch source code to reproduce the experiments in our CVPR2022 paper Maximum Spatial Pe

51 Dec 14, 2022
codes for Image Inpainting with External-internal Learning and Monochromic Bottleneck

Image Inpainting with External-internal Learning and Monochromic Bottleneck This repository is for the CVPR 2021 paper: 'Image Inpainting with Externa

97 Nov 29, 2022
Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation) Download Synthia dataset The model uses

32 Sep 21, 2022
An investigation project for SISR.

SISR-Survey An investigation project for SISR. This repository is an official project of the paper "From Beginner to Master: A Survey for Deep Learnin

Juncheng Li 79 Oct 20, 2022
code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren*, Raymond A. Yeh*, Alexander G. Schwing.

Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning Overview This code is for paper: Not All Unlabeled Data are Equa

Jason Ren 22 Nov 23, 2022
Official source code of paper 'IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo'

IterMVS official source code of paper 'IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo' Introduction IterMVS is a novel lear

Fangjinhua Wang 127 Jan 04, 2023
Towards Interpretable Deep Metric Learning with Structural Matching

DIML Created by Wenliang Zhao*, Yongming Rao*, Ziyi Wang, Jiwen Lu, Jie Zhou This repository contains PyTorch implementation for paper Towards Interpr

Wenliang Zhao 75 Nov 11, 2022