Development Journal: Unit Testing the Entire Website

I’m working on a rather large project at work, and it’s something that I’m struggling with. For that reason I’ve decided to journal my frustrations and the steps in the hopes that I can remember what happened on the other side of this.

May 27, 2015

The goal of the project is pretty straight forward: WordPress has a rather robust unit testing setup for their core files, and there are examples for how to unit test Plugins and Theme files. I’m going to combine all of these tests, install the website files over the WordPress core, and run unit testing on all of the files.

Right from the start I’m realizing it’s not as simple as copy pasta over core. Our site has heavily customized things, and we’re using many plugins and drop-ins that the Core unit tests don’t account for, so I’ve forked the Unit test framework, and I’m making the required changes.

Issue #1: Database

Our website is a Multisite install, which the core files are setup to handle, but unfortunately the default setup isn’t working. Connection is refused, the unit tests aren’t working, etc. To see what the problem is, I’m going to run a simple test.

Steps:

  1. Clone the WordPress development files.
  2. Create a fresh database table.
  3. Update wp-tests-config.php with corresponding DB info.
  4. Run phpunit

If the issue really is the difference between our website files and WP core files, PHP Unit should run without issue.

Result:

After doing the steps above, the result was a clean running unit test. That tells me that our database isn’t working with the WordPress core unit test setup. I’m going to test this further with multisite setup, but I think I’m on the right track.

WordPress let’s you create custom database error handlers, so I created one to print a backtrace for me whenever the DB isn’t working, and low and behold it worked and let me know that HyperDB, which we use because multisite, isn’t loading just the way it should, for one reason or other.

We heavily customized our config file, I mean a lot, so I’m taking a play from WordPress’ handbook here and creating a config file strictly for our unit testing. This means I’ll be cutting out a lot of the clutter, and the things that aren’t being tested currently, and hopefully that’ll get us to the point we can at least run the tests. I don’t care if they error out, I just want a fully running PHP Unit.

Because I really don’t care about HyperDB, and it’s just causing me heartache, I got around the errors by just not using HyperDB during unit testing. This is obviously a hacky way to handle it, but once I have a working unit test setup, I can work on forking HyperDB and fixing it’s implementation.

Next up to ruin my day is Batcache! That said, I also don’t really care about cacheing when I’m running unit tests, so I’m just going to shut that off for this as well.

Shut off caching and it found new errors. Sunrise was causing errors, but being wise to their tricks, I created a blank sunrise file. There were a few more errors, but I realized during the process that I was running Master tests against the 4.1 branch. So I switched to the 4.1 branch tests and ran again. PHP Unit started running.

It’s randomly throwing errors, but that’s fine. I track each error down and either skip it, or fix it.

June 2nd, 2015

The last time I ducked into it we were getting a lot of terminal issues where a rogue UTF-8 character was changing the character structure of the terminal. I spent a few days trying to figure out how to remove the data sets from the PHPUnit debug output. Figured that out today, so now I’m back tracking down individual tests.

After skipping all of the tests that caused PHP errors, or took more than a few seconds to complete, I finally reached the point where the tests would finish and the PHP Unit report would display. This is good. It’s filled with fails, but it completes.

Now I’m working on a BASH Script to automate the setup and tear down of the main repository code, so I don’t have to keep a duplicate copy of our codebase in the testing repository.

June 17th, 2015

I finished this a week or two ago, shortly after my last entry, but forgot to update. Basically what I ended up doing is looking at each fail, and if it was something that would take more than a couple minutes or a line of code change, I skipped the test, and left myself a note in the code as to why. I now have a passing Unit Test.

I’m going to publish this as this entire brain dump, and revisit the topic in a future post where I’ll explain the whys and the whats a lot better. For now, this is what you get.

Theory: Programming Genetic Algorithm VIA Unit Tests

Genetic Algorithm’s provide a way to let the algorithm evolve into it’s natural optimum. If done properly, over a long enough period of time, it can be extremely effective. The key to creating a useful genetic algorithm is the fitness tests. It’s survival of the fittest in action.

My theory is that using Unit Tests as a first-entry (or “health”) fitness test, could very well be a way to creating genetic algorithms that can learn to program. The trick here is to create unit tests that are all encompassing. They should test each and every aspect of the program.

Fitness tests would then be layered:

  1. Does it compile / run
  2. Does it pass unit tests

If both of these are true, it survives. Then we can add in additional fitness tests, for instance, we can compare each generation’s run time, memory usage. Maybe we toss in code quality metrics as well. All of these can be automated, so it’s conceivable that each of these could be used as a fitness test.

Now, this is just a theory. I haven’t tested it, but please feel free. I may test it when I have time, and if I do I’ll update this post / write a new one.

Going back to an old project

I started making WordPress plugins about 5 years ago, and one of my first was Auto More Tag, which essentially just adds a “Read More” button automatically into your content. Now, this seems like an easy thing to do, but at the time I had no idea how to go about it. I ended up coding it to insert the more tag into the content during the saving of the content. Once it was working, I left it to do other things.

Let me make something very clear, something that this very plugin taught me.

It is never a good idea to change the content of someone’s website with a plugin.

Read those words. They’re in a blockquote because they’re important. Do not do this. Ever. Do not do it. You might think it’s a good idea, but it’s not. At the time I was naive, and thought I could get away with doing this. I couldn’t. It’s bad. So very bad.

And the support forum taught me that, with many people cursing my name because of deleted content, and rightly so. Turns out that I hadn’t put in support for multibyte strings, so our non-English speaking friends were getting their content chopped. This isn’t a good thing, but I had no idea how to fix it.

Then a developer offered to add in MB String support, and I let him fork it, and take over. Then I stepped away for a couple years. I’ve just recently opened the code back up after getting a few emails that the plugin wasn’t working.

Now that I’m a bit more seasoned, I realize exactly how stupid it was for me to edit content. I realized this before, but now I actually have an idea of how to fix that stupidity. In release 4.0.0, which I pushed out to the WordPress plugin repo this weekend, I’ve converted the codebase from inserting on save, and editing the content (read the quote above) to now using a filter on page load to insert the more tag.

This is a much better way to handle things, and I’m happy to do it.

I’m also happy to see that one of my very first plugins, and in fact the very first plugin I ever released on the WordPress repository, is still being used today. 4 years later.

Future Plans

I plan to improve the settings page, and clean up the code a bit, make it easier to understand. But a lot of what I plan to edit has to do with the development environment. I’m hoping to encourage people to help me to improve the plugin.

Aside from that, I’m looking forward to improving the user experience, and making the plugin even simpler to use. Possibly adding in additional ways to customize where and when the More tag is added.

“Generally WordPress plugins work really well … when they’re small. When you build something that solves one particular problem, or improves one specific WordPress feature, or does something very … purposeful.” – John James Jacoby, LoopConf Vegas, 2015

Node Bot – An IRC Bot with a Brain

So I decided to take my weekend and learn how to use Node.js. It sounds like a tall order, but I didn’t think so. I’ve been a JavaScript programmer for about 13 years now, so I didn’t feel it would be that different.

It wasn’t.

The biggest thing I had to learn was npm, the Node.js Package Manager, which makes adding features to Node.js pretty simple.

Now, to rewind just a bit, I’ve also recently been trying to learn Python, and the first thing you learn about Python is that if you want to do it, someone else already has, so just install the package and use their framework. But I’m still learning Python’s syntax.

But I know JavaScript.

So here I am, with the simple to use package idea of Python coupled with the syntax of JavaScript that I already know, with only a few new things tossed in. The major thing I had to learn was how to package my code into multiple files. On the web it was easy, create files, include them with HTML. Can’t do that in Node.

But it was a simple matter of learning that single process, and after that I was able to port the entire concept into a project.

Whenever I learn a new language, or a new system like Node.js, I do the requisite “hello world” examples, but then I say, “Ok, I’m going to spend some time and put together a project.” I do this so I can feel it, I can feel how the code works top to bottom, and see the errors, handle the errors, and find the work around that I’m going to need when I tackle it in a production environment.

I own an IRC server on the Jaundies network. It’s sphinx.jaundies.com you can get on unsecure on port 6667 or secure on port 7001. It’s a self signed certificate, so you might need to approve it. I tell you this, to lead to this: in one of the many channels I sit in on Jaundies, there are two other developers who have been toying with IRC Bots, and practicing their development skills. So when I hit Node.js, the first idea I had was to try to build an IRC bot. It was perfect for two reasons. First, it was something I’ve done before, and I understand the concepts. And second, it’s something I couldn’t do easily in JavaScript in the browser. It’d show me the power behind Node.

So I did it, it took me about an hour, and I had a very basic Node IRC Bot running. It did nothing but greet itself when it hit the room, but it was able to join and stay alive without issue. I did this by using the Node IRC package.

But, I wanted more. That was simple, and it made me really want to expand my knowledge.

Since I was 13 or so I’ve been in love with the concept of artificial intelligence, specifically neural networks. I’ve had this idea for a while to learn how to build a proper neural network, but I had yet to find a neural network framework that allowed it. FANN — Fast Artificial Neural Network — is a C neural network framework that has many ports, but none had been put together and easily documented enough for me to learn.

Then I found Node FANN. Node FANN was a Github project I found that wasn’t documented at all, but the code was readable, so I was able to figure it out.

So I ended up spending the weekend adding a Neural Network to this IRC Bot. What it does now is joins a channel, and builds the nick list. It then randomly selects anyone in the channel, and watches what all the people write. If the person whom it is watching speaks, it goes into training mode.

What it’s doing is using up to the 5 latest messages from other people as inputs, and training the neural network to output whatever the person the bot is watching would output.

Is this a great idea for a neural network? Hell no, it’s a horrible idea. But it was fun to write, it taught me how to convert letters to numbers without using a key map, and I was able to learn a bit about how Node works. I’d say I know Node enough now to be comfortable working with it in my day job. That’s how simple it is.

So why write this post? No reason. It’s been a while since I blogged, and I wanted to spread the love about Node.js that I’m feeling. I’m still a PHP guy at heart, but I’m adding Node.js as a permanent tool in my toolbox.

If you’d like to play around with the bot, or if you have any ideas about how to make the Neural Network work better than it does, feel free to fork my repo on Github. I’ll keep playing with it, and adding things here and there.

 

Why It’s Important to do Unpaid Open Source as a Paid Developer

Programming is an art. And just like any other art, it can be distilled down into easily digested parts that you can profit from. The problem is that by putting this profit layer on your art, you’re psychologically more likely to do less. This was shown in a study done by Desmond Morris in 1962, and witnessed by anyone who’s ever gone from doing code because they loved the challenge, to doing it for a paycheck. The luster of the quest for control loses its sheen, and suddenly you’re just gathering tickets and knocking out patches.

That’s why I say it’s important to add that layer of unpaid open source into your routine. And going off of the study I linked to above, the earlier the better. The chimps in the study permanently stopped enjoying their artwork as much as they had before being paid for it. If you can trick yourself into not looking at the paycheck as the product of your code — or artwork — and instead look at it as simply paying you to show up, you’re a step ahead.

This is easier if you’re salaried. If you have the freedom to not go into work one day, and it won’t affect your paycheck, then you’re effectively not getting paid to do your artwork. You’re just getting paid to be.

But the easiest way, by far, to trick your mind into working better is by doing your artwork for free. Find an open source project that you love, open the tickets, and fix one of them. It’s that simple. Submit your patch, and then walk away. Don’t stick around and wait for the merge, don’t refresh the screen until you’re single handedly DOSing Github. Just walk away. Check your email later, or just go find another ticket, and keep doing it. Do it for the love of the code, for the search for the answer, and feeling you get when you hit submit.

Just don’t do it for money.

Thank you, Netflix, for telling me I have a problem…

I absolutely love the fact that Netflix has most of the shows that I love to watch in it’s streaming catalog. The technology itself is a display of engineering genius, being able to direct high quality content to my Xbox without a lack of quality and lag. That said, in their efforts to expand and improve their customer service, they’ve just gone too far.

I like series. I like watching them. I like watching the stories unfold over multiple episodes. And I binge, I binge on entire seasons at a time. This was never a problem before. Now it is. Because now, it seems, every 3 or 4 episodes, it pops up a message asking me if I’m still watching.

This isn’t an issue on the website, but on the Xbox, it’s a pain in the ass. I’m watching TV, watching a series, I get involved in it, get comfortable, and want to just relax. The controller is across the couch, and I’m relaxed, cuddled up under a blanket with a drink and my wife. And then that damned box pops up.

No, Netflix…I’m not watching…that’s why you’re still on.

Normalization – Use It

So I might just be a bit of an overzealous code nerd, but whenever I download a snippet of code, I always analyze it, and try to make sure it’s up to snuff. So it was no surprise that I did this when I was looking at an email newsletter plugin that I was thinking about using. The problem this time was I didn’t even get to the code before stopping.

Wrong format

  1. admin@gmail.com,admin1@gmail.com,     (Comma at the end)
  2. admin@gmail.com,,admin1@gmail.com     (Two comma)

Why is this bad? It’s called Normalization, people…

…use it. Being the devoted coder that I am, I managed to figure out the solution to this dilemma, in less characters than it took to type that instruction to the end user.

This uses trim to strip away any leading or trailing commas–the first incorrect format–and then uses str_replace to replace any commas that are doubled up into a single comma–the second incorrect format. Simple, efficient, and a single line of code. It can be run when you validate that you actually have a value in that field… I mean…you do validate that, correct?

Why is normalization so important?

To put it simply, the end user doesn’t always know what they’re doing. To make a truly usable product with a wide appeal, you need to normalize the input so you can reasonably predict what’s going to be there. In the example above, the developer ignores this aspect of development, and instead puts the responsibility of input sanitation on the end user. This isn’t good design, it’s also not good customer service.

The end user doesn’t care what’s entered into the box, as long as what comes out on the other side is correct. Good software design should be able to interpret user input in such a way that it can say, “Hey, you probably didn’t mean to do that, so let me fix that for you. It’s okay, no big deal, it’s what I’m here for.”

Get It? Normalize?

Get It? Normalize?