A plan to immortalize I-War community heritage

More
7 years 7 months ago #20499 by palmer
Thank you SoupDragon and schmatzler for your efforts!

As an old I-War fan I'm so happy to see you keeping the heritage of I-War community online.

I just learned about the "The End of an Era" that was about to happen in Jan 2015. Despite the successful outcome in that case, I'm still frustrated that my favorite game's entire community website with years of accumulated user content could just disappear.

In this post I will analyze root causes of website fragility and suggest counter-measures.

First some examples of sad fate of websites for the games I played.
  • Mu Online fansite muhq.com shut down around 2014. Today it is only browsable through web archive .
  • Armada Online had a "forum meltdown". Lots of useful forum threads were lost due to server malfunction (or a hack -- don't remember).
  • Armada Online wiki just broke. Now it prints dozens of PHP errors and fails to display content. Compare what it is now and what it used to be .
  • Warzone 2100 website had a security breach resulting in a huge data loss. I think they never fully recovered.
  • torn-stars.com was offline for some time. Now it's back, but could go down again. I cannot easily download the site with wget because this PHP-based site is a total mess, archiving-wise.
  • some I-War related websites are dead. They are only available thanks to your tireless archiving efforts, and archive.org of course.
The amazing archive.org makes it possible to browse the pages of now-dead websites. But it does not always mirror binaries. Recovering website pages after data loss is not trivial (requires scripting) and is only possible if archive.org crawled it. Recovering binaries is only possible if you reach people who still have copies. Even then, you still collect things piece by piece. And still there is a problem of integrity protection. You are never sure you find all files. Those you do recover may differ from what was hosted previously. You are never sure that files you get from people are virus-free.

It is so disappointing to realize how many hours/days/weeks of creative work can vanish. Community is often good at creating content and bad at preserving it for decades.


Problems

Some of the problems that lead to site extinction.


1. Lack of effortless replication

It is not trivial to create an exact copy of an entire public website by a random visitor.

Requirements to the replica:

1. First of all, it must be doable.

Some websites fail at this very first step for: 2. Cloned website can be browsed offline.

Most websites fail here for:
  • Not using relative links, which forces to convert links, which in turn breaks synchronization.
  • Hosting images on external domains which die few years later.
Any "popular" WordPress or Blogger based blog is a total fail in this regard.

3. It must be possible to completely recover a failed website from such replica.

Any website that does not expose the source of its content fails this requirement. For instance, any website engine that stores content in a SQL database, but does not expose a readonly connection to such database fails. MediaWiki fails because it is untrivial to fetch raw .mediawiki files that are rendered by the engine.


2. Lack of effortless incremental synchronization

Even if you manage to mirror a website with wget, how do you update your mirror when the origin changes? Re-downloading everything is not an option. Using wget as an example, incremental updates are only possible if the website takes care to:
  • Properly serve Last-Modified HTTP header
  • Use relative links, so wget does not have to convert the links, which alters file timestamps
  • Do not use dynamically generated content (PHP, Javascript). Disqus comment section is a good example of horribly non-archiveable content.
Very few properly designed websites with static content pass this criteria.


3. Lack of integrity protection

How do you make sure your replica is not missing files? That no files were infected or damaged? How do you compare replicas made by Alice and Bob?

Replication and synchronization must be considered from the very first days of a website. Unfortunately it is completely ignored by most website authors today.


Solution

Make it trivial to replicate the whole website. The more people mirror it, the higher chances our grandkids will play I-War with mods.

Tools to achieve this are distributed version control systems and hash functions.

Let's start with requirements to such a website:
  • Distribute source files, not rendered pages. Source files do not include any "cruft" (navigation, sidebars, login or search forms, comment feeds, CSS, javascript, tracking, ads, etc).
  • All visible content is generated from source files.
  • Source files are in easily editable text formats (Markdown).
  • Source files are stored in versioned repositories (Git).
  • Dynamic content generation is avoided (Javascript, PHP). If used, source data is available in plaintext files (CSV or JSON database), which are also stored in repositories.
  • The whole repository can be trivially replicated ("git clone"). If there are multiple, a script is provided to get them all at once.
  • Incremental updates are trivial ("git pull").
A Git-based blog is a good working example of implementing these principles.


Implementation

Here I propose a specific implementation for i-war2.com.

The entire website is built from several repositories, each of which can be easily mirrored and updated.


1. Knowledge base repository

This is the primary repository with all knowledge and file metadata.

Contents include:
  • One Markdown file per article, including news/blog [1]
  • All images linked from articles
  • JSON file database with metadata for Downloads [2]
  • CSS styles to render pages
  • Code and instructions to build the website
  • Code and instructions to mirror all repositories necessary to fully replicate the website
[1] Strive for plaintext source files for all kinds of content. Avoid anything that is hard to read in text editor, hard to script and takes much space. Examples: PDF, DOC, DOCX, SQL.

[2] Pages in Downloads section have special requirements:
  • They contain hashes of files.
  • All binaries (see repository #3) can be verified against these hashes in one script, reporting any missing or broken files.
  • A page is generated for every file.
For this purpose I propose a JSON file with a simple database. Each record has fields: (title, description, version, author, website, source code URL, release date, path in file storage, file size, file hash). A good working example is Prism Break project. See how they build the website from a simple JSON database hosted on GitHub.

Implemented as Git repo.


2. Source code repository for every mod or utility

It's always nice to have source code in addition to binaries.

Source code for your scripts to setup a game server also deserve a repository. "setting up DirectPlay on a Linux machine is the equivalent of hell" -- this work shall not be lost.

Implemented as Git repos.


3. Binary file storage

Stores all binaries (mods, utilities, screenshots, movies, etc).

Implemented as a directory structure with timestamped files. Served via HTTP or FTP, wget will only download new or changed files. Rsync is an option. GitHub has something for storing large files, worth investigating as well.


4. Recovery plan

To recover after failure (malfunction, site hacked, domain or hosting expired), several up-to-date replicas must be found to check against each other.

One way to find them is to "call for help" as you did on the forum [/forum/general-i-war-talk/3157-help-some-old-content-is-still-missing].

Another is to build a list of mirrors beforehand. This can be done as
  • simple list of mirror URLs ( example ) or contact details of mirror maintainers
  • in case of GitHub, list of people who forked a given repo (tracked automatically by GitHub)

Open questions

I have not yet deeply considered the following:
  • How to store dynamic data? Things like download counters and file ratings.
  • How to store user messages? Forum posts, replies, file comments, news comments.
Intuitively they also must be stored in versioned plaintext files, but this requires more design work.


Benefits

Having these ideas implemented, dozens of tech-savvy I-War fans could replicate all available I-War knowledge, and, equally important, keep it up-to-date with trusted maintainers.

Using a platform like GitHub opens the door to contributions via pull request workflow. You could then review and merge big changes (like new FAQ articles) or small ones (like typo fixes) with couple clicks.


Conclusion

I think long-term profit is totally worth the effort and I wish all good old games communities strived for something like this.

P.S. Thanks for maintaining Tron! One of my all time favorites as well.

Please Log in or Create an account to join the conversation.

More
7 years 7 months ago - 7 years 7 months ago #20500 by schmatzler
While I appreciate that you took all the time to write this up, I think converting the Joomla-based installation, the forum and JDownloads to JSON data is a total overkill for the community of an old game with not that many active users.

The whole website including databases is mirrored every day on a special backup machine. It has saved my ass countless times. In case of failure, I also store weekly backups on another drive. Your examples all looked like no backups had been made beforehand, so a data loss was fatal in the end.

A GIT-based website is fine on projects like this , but I hesitate against implementing this here.

BTW, the sources and prerequisites for running the gameservers with wine are documented in these places:

www.ldso.net/tronforum/viewtopic.php?f=6&t=1242
appdb.winehq.org/objectManager.php?sClass=version&iId=7386

Space. The final frontier.

Please Log in or Create an account to join the conversation.

More
7 years 7 months ago - 7 years 7 months ago #20501 by palmer
An overkill indeed. I wanted to share a vision of a final ideal setup, while I perfectly realize that it may not be feasible here.

I agree about backups. Maybe sites in my examples were just unlucky to have no backups. Intuitively I knew you do backups from reading the very first announcement ;)

And thanks for the links.

Somehow this site, and no other triggered this writeup. Maybe I-War is too special for me. Turned out quite big, so thanks for taking time to read!

Anyway, I wget-ed this site just in case. Mostly went well, with two minor issues. One was that files in "Downloads" were not saved with correct paths because of redirects from "/downloads/send/..." to real location. But I figured with --trust-server-names wget uses the path after redirects. Another is that "Documents" have always changing Last-Modified so updating them is not as nice as "Downloads" (which work fine).

Please Log in or Create an account to join the conversation.

More
7 years 7 months ago #20502 by Chessking
I remember when I opened I-war2.com two years ago, and got a 404 not found error. :( I had considered backing up the site in the past, and was disappointed that I had not. Thankfully, the website was only down because Schmatzler was re-modeling it. :cheer:

This is one tough navy, boy. They don't give you time off, even for being dead. -Clay

Storm Petrel

Please Log in or Create an account to join the conversation.

More
7 years 7 months ago #20503 by IronDuke
Did I seriously not check the forums for five days? :blink: I don't normally miss stuff... then again, this week has been overwhelmingly busy.

Skimmed yer wall o' text, and that plan sounds solid, I guess... I'm a gameplay coder, not a website coder. :P But what glimmers I do understand sound smart, so they must be good.

Welcome to the forums! B)

--IronDuke

Very little about the game is not known to me. Any questions you got, throw them at me. :)

Please Log in or Create an account to join the conversation.