7 min read
Reverse engineering the coronavirus (SARS-CoV-2)
i was bored

Reverse engineering the coronavirus (SARS-CoV-2)

Start here: corona.md

Background

This project applies techniques from reverse engineering to understand the SARS-CoV-2 virus. The goal here is simply to build an understanding of the virus from first principles.

Biology vs. software

Biological systems are fundamentally information processing systems. While not a perfect analogy, software provides a useful framework for thinking about biology. The table below provides a rough outline of this analogy.

BiologySoftwareNotes
nucleotidebyte
genomebytecode
translationdisassembly3 byte wide instruction set with arbitrary “reading frames
proteinfunctiona polyprotein is a function with multiple pieces
protein secondary structurebasic blocks80% accuracy in prediction
protein tertiary structureThis seems like the hard one to predict: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0205819
quaternary structurecompiled function with inlininghttps://en.wikipedia.org/wiki/Protein%E2%80%93protein_interaction_prediction
genelibrarybacteria are statically linked, viruses are dynamically linked
transcriptionloading
protein structure predictionlibrary identification
genome analysisstatic analysis
molecular dynamics simulations of protein foldingdynamic analysisSimulation doesn’t seem to work yet. Constrained by tooling and compute.
no equivalentexecutionWe are reverse engineering a CAD format. Runs more like FPGA code, all at once. No serial execution. (What are the FPGA reverse engineering tools?)

Progress

Downloading the SARS-CoV-2 genome

GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA and RNA sequences. The SARS-CoV-2 sequences available in GenBank have been downloaded in download_sequences.md.

Translating RNA to proteins

lib.md contains a function translate that converts an RNA sequence to a chain of amino acids. This function is used in corona.md.

Annotating functions

The translate function is used in corona.md to identify and annotate functions for all proteins encoded by the genome.

Folding proteins

The OpenMM toolkit is used for molecular simulation of protein folding in fold.md.

Work to be done

  • Automatic extraction of genes from different coronaviruses
  • Good multisequence compare tool
  • Molecular dynamics?
  • Secondary Structure prediction on orf1a?

Open questions

Testing

How tests work

Homemade test?

Possible treatments and prophylactics

:warning: Disclaimer: The information in this repository is for informational purposes only. It is not medical advice.

Hydroxychloroquine + zinc

RdRP inhibitors

Dexamethasone

Lopinavir-Ritonavir (AIDS cocktail)

Resources

Biology

Bioinformatics

Epidemic modeling

Antibodies

Masks

Vaccines

Genome studies (what genes = bad covid)

DNA Synthesis