Moving away from Pivotal Tracker to Fulcrum. Good Enough.

August 10, 2016, 2:55 pm

≪ Previous: Moving away from Slack into Rocket.chat. Good Enough.

I've been blogging about how I am building my own internal infrastructure. Already reported on how I moved from GitHub and Travis to GitLab, then how we moved from Slack to Rocket.chat, but the original first move is the subject of this article.

I've been using Pivotal Tracker since 2010 up to 2015 and this is because I believe it is far superior than Trello and other glorified to-do lists out there.

The Cost Conundrum

As with all the other reports, let me start with the cost argument. Pivotal Tracker is priced based on amount of users.

If you have a small team and not many outside clients, a basic 10 users plan would be enough, and you will pay just USD 420 a year. Just pay it already.

If you have up to 25 users, it's stil not so expensive, and you will pay USD 1,800 a year. Well worth it.

Even if you have 50, the maximum plan, you will pay USD 3,600 a year. More than that and you have to contact them directly.

The cost is not prohibitive and if you can pay, I advise you to do so.

In my case, the strategy is to get full control over our data and knowledge, which is why I'd rather have ownership of the service.

And because Fulcrum was "almost" good enough, I decided to try it out and see if we could evolve it over time. And I think that even though we are still very far from the full-featured Tracker, what we have is enough for most of our projects.

Let me say it again: you most probably won't benefit if you're a small team. Pivotal Tracker is, to this day, the best project management tool for software development teams, and you should definitely try it out first, get used to it's workflow (as I will explain below), and then decide if you want to take control and install the open source alternative instead.

There is the factor of cost, but it's not the driving incentive for this move.

The Important Features

Doing some research I found the Fulcrum project. It's a very simple, very bare-bone Rails project that mimics Tracker's most important features.

Any competent project management tool must have at the very least 4 pillars:

Stories (with discussion threads, assets uploading, tags, Epics organization, and searchability)
Proper numeric estimation (story points, proportional size for stories)
Time-boxed Sprints (short, fixed periods of time)
Velocity (the actual productivity of a particular team in a particular project)

Any tool that has all of the above are useful

Fulcrum itself has almost everything. We made an internal fork of the project and we started adding some of the missing features. We've been using it for around 1 year already and it's been good enough for our routine. There are still a few bugs and missing features that we intend to add.

If you're interested, I am making this fork publicly available to conform to the Aphero GPL license.

Any contribution is welcome.

On Proper Agile Management

Now sidetraking a bit, I'd like to explain why this kind of tool (Tracker-like) is the best compared to what I called "glorified to-do lists".

The principles are very simple:

Every project must start with the most complete backlog possible (uncertainty is at its highest in the beginning, and details will be missing, but it's important to have a vision of what "minimally complete" means and the order of importance of each story, a proper "priority").
- It doesn't matter the level of details. A story must have enough information that any developer in the team can implement the required tasks. Otherwise it must be tagged "blocked" so the Product Owner knows that (s)he has to add more details and no developer will touch it until the tag is removed.
All stories must be estimated (more on that on the next section). It really doesn't matter the unit. You can think in hours, in fractions of a day or for what it really is: a proportion. A story sized "2" should be roughly half the complexity of a Story sized "4". That's it. Even "blocked" stories should receive sizes.
Projects must be divided in time-boxed, fixed-sized Sprints, so the team has the opportunity to reassess frequently. This is where estimation can be refined, where more levels of detail in the stories should be required.
Bugs can't have any numeric estimation. They will serve to slow down the team's velocity and its an indication of poorly made deliveries.
Velocity is the actual amount of story points delivered by the end of a Sprint.
- The total amount of story points of all stories in the project backlog divided by the team's velocity will give a roughly estimated "End Date". This is important, because this is how you know when certain groups of features will be delivered over time, and this is what the Product Owner needs to know to prioritize stories, simplify stories or even cut down unnecessary features.
- The Product Owner sole responsability is to be accountable for the Return on Investment of the implemented features. And the only way to measure cost-benefit is to know "how much" a feature will "cost" in terms of story points divided by velocity.

What the tool will expose:

If a backlog has a lot of "blocked" stories or if it has many stories delivered without acceptance, it's a symptom of a Product Owner not doing the work.
If the velocity varies too much between sprints, it's a symptom of either poorly written stories (and therefore developers having a hard time understanding what to do), or developers not concentrating in their work and delivering inconsistent results.
Inconsistent velocities with many bugs in each sprint is the symptom of developers not doing code with proper quality (automated tests to avoid regressions for example).
A "bug" is not the same thing as the Product Owner changing (s)he's mind on a story and rejecting. The team must not implement undocumented changes. Acceptance is tied to what the story says. The Product Owner must change his mind before the story is implemented or create a new story to add the change (and therefore add more cost to the project, there is no free lunch).
Stories in the "Chilly Bean" (in Fulcrum, or "Ice Box" in Tracker) are "wish-lists" and they will never be implemented until the Product Owner moves the story to the backlog.
Velocity should be roughly the same between sprints. Peaks or valleys in the velocity are symptoms of a problem that needs to be addressed by the next sprint. This is how a project evolve for the better over time.

Velocity

On Estimation

Many people worry too much about "proper estimation" to the point of creating so much tension that developers start screaming about "no estimates". This is nuts. This is why "estimation" is a different word than "prediction", one is uncertain, a ballpark, the other is exact.

There is no such thing as a "correct" estimation.

I would argue that in many projects if the stories are minimally described (meaning that a developer is able to implement what is written), we could estimate all of them to be size "1" and forget about it.

We run 2 or 3 sprints (usually one sprint should be one work-week - 5 days of at most 8 hours, no more, no less) and measure how many stories were delivered. And this is the velocity.

Now we can see how many stories we have until the end of the project and divide by this velocity. And this is how many weeks it will take to finish the project.

There is no need for consensus based estimation.

Consensus on estimation is usually a waste of time. Estimation is merely the realization that it's difficult to write down every story to be the exact same size, so we add proportions to them (A story size 2 is half the size of a story size 4).

But if a project is 3 months or larger, the less the proportions will matter (unless you do the anti-practice of having heavily disproportionate stories, some size 1 and many sized 5, 8 or even 10, for example). But if you stay closer to less than size 5 for every story, then the larger the project, the less the individual estimates matter. Velocity will average out the differences over time.

Then the task of the Project Manager, Product Owner, the team in general, is to keep an eye on the total amount of points divided by this velocity, which derives the estimated end date of the project. And everybody can negotiate, beforehand, which stories can be done, which cannot, and change priorities, remove stories or simplify stories.

Don't waste time trying to create formulas to derive estimation consensus.

If you manage the Velocity then you can manage the entire project. This is the core of all proper Agile methodologies. Everything else is an accessory to that end.

Velocity

Trying to artificially increase velocity is one of the many sins. You can't. It's better to invest time in adding more details to a story, or to simplify a story or to decide that some stories are just not needed to deliver a proper product in the end.

Monte Carlo simulations, Six Sigma techniques, Kanban concepts, none of them matter. All traditional industry techniques are meant for production lines in factories, where the time of each task is well defined and you manage the variance, the exception. It's Gaussian, Normal world.

In software projects, or any craft and research, you manage expectations on results based on milestones and cutting excesses and uncertainties. It's a Paretto world, one where there is no such thing as an "average", there is no medium, no variance.

In a Paretto world everything is interconnected, not statistically independent (a requirement for Gaussians). It's therefore 80/20: where 20% of the project will yield 80% of the results, where 20% of the scope can be cut down.

Anything that is not "statistically independent" (like tossing a coin) can't be averaged out, can't fit in a Normal distribution. It's most possibly a Paretto, Power Law, Zipf, Exponential, or even a Poisson, not a Gaussian. Software parts are all intertwined in a network, they are not independent events, they can't be randomly rearranged, they are not easily interchangeable.

Conclusion

This article is very short and I can expand each bullet point above into its own, very long, article. Let me know in the comments section if I should, and which points raise more questions.

I've been refining those techniques for the past 10 years, and at least 5 within the constraints of a tool such as Tracker. And I believe I was able to come up with the minimal set of principles that lead to proper management, not superstition, fantasy or delusion you see everywhere nowadays.

Projects suffer when the recommendations above are not followed:

Product Owners don't perform their responsabilities: they fail to add details to stories, they neglect delivered features and don't test them properly, they don't participate.
Backlogs are incomplete, lacking detail, lacking estimation, lacking prioritization.
Developers don't properly start and finish their tasks with proper code quality.
Time is wasted in bikeshedding instead of delivery.

Our fork of Fulcrum, called "Central", is not perfect by any means, but it works well enough and gives us enough control to keep adding features that make our routine more comfortable, and we are just getting started.

I hope this technique and tool is valuable to more teams and I'd love to hear feedback from anyone.

↧

Hacking Mattermost Team Edition

August 12, 2016, 11:27 am

≫ Next: Choosing MatterMost over Rocket.chat and Slack

≪ Previous: Moving away from Pivotal Tracker to Fulcrum. Good Enough.

In my previous post I was reporting on my move from Slack to Rocket.chat. But I also mentioned that before Rocket.chat I would rather use Mattermost. First and foremost because it's written in Go (lightweight, highly concurrent, super stable), and because the code base shows much more quality than Rocket.chat (which feels super fragile, with almost no automated tests at all).

But my major complaint with MatterMost is because the free, open source, Team Edition, lacks a super important feature: not allowing users to delete private groups.

@iantien commented that the private groups are never actually "deleted", they are just marked as deleted, audited, but all data is still in the database. Just the UI has no way to hide the "delete" option from users and there is no Administration UI to unarchive the delete groups.

In fact, you can open a psql session in your PostgreSQL database and just do:

1	update channels set deleteat = 0;

This will unarchive and restore all delete channels. But you can see how this is a hassle.

I strongly disagree when he says that only "10,000 users enterprises" would need such a feature. Even in a small team, any grumpy user can just archive a channel out of the blue and disrupt the entire team communication in private groups. Sure, the "Town Square" and other public channels will still function, but if you have just one external user participating in projects, for example, you want to use private groups to isolate your internal communication from external users.

So, not having the option for very basic permissions (such as disallowing members to delete channels or private groups) is a very big show stopper even for small teams. And sure, the USD 20/user/year fee is not expensive, but as Mattermost still has less features than Rocket.chat, it becomes a very hard to sell proposition.

Also, hacking the code itself and adding a flag to disallow this option, in Go, is actually quite easy, but you would have to maintain your own fork (as I think Mattermost would not accept a pull request from a feature that already is in their payed, Enterprise offering)

But after @iantien commented that nothing is deleted and it's all audited, I quickly realized that I could use the audit metadata and devise a way to automatically restore the channels (unless it's the system admin doing it). All without altering the source code.

One can use the many tools available in PostgreSQL itself, namely: TRIGGERS. So, without further ado, just run this in your Mattermost database:

123456789101112131415161718192021222324252627282930313233

CREATEORREPLACE FUNCTION undelete_channel() RETURNS triggerAS$$    DECLARE        user_counter integer;        channel_id character varying(26);BEGIN-- Only for channel delete operationsIF NEW.action NOTLIKE'%/channels/%/delete'THEN            RETURN NEW;ENDIF;-- Check if it is the system_adminSELECTcount(*) INTO user_counterFROM usersWHERE id = NEW.useridAND roles = 'system_admin';IF user_counter > 0THEN            RETURN NEW;ENDIF;        channel_id = split_part(NEW.action, '/', 7);UPDATE channelsSET deleteat = 0WHERE id = channel_id;        RETURN NEW;END;$$ LANGUAGE plpgsql;DROPTRIGGERIFEXISTS undelete_channel ON audits;CREATETRIGGER undelete_channel AFTER INSERTON audits    FOR EACH ROW EXECUTE PROCEDURE undelete_channel();

That's it, it will listen to audits new inserts, check if it is a "channel delete" action, check if it is not a 'system_admin', and if so it will automatically grab the channel id from the action REST URL and do the proper UPDATE to get it back.

I tested it already and in my UI users don't even realize something happened. Not even the offending user sees the channel go away, it instantly comes back.

So, if this was the only thing stopping you to use the free-of-charge, on-premise Team Edition, there you go. And with this you can derive functions to also avoid renaming channels, but I will leave it as an exercise for you (please share in the comments section below if you do it).

Happy Hacking!

↧

Choosing MatterMost over Rocket.chat and Slack

August 13, 2016, 6:53 pm

≫ Next: The Ruby Community and Reputation

≪ Previous: Hacking Mattermost Team Edition

I am a technologist, basically a nerd. So when I obsess over technical stuff, no matter how small, I just can't sleep calmly until I find reasonable closure.

My team, my clients, we've been happily using Slack for more than 2 years, as I believe was true for many teams around the world. Although no one ever complained, I was always annoyed by the small things. First of all, as everything of value, it costs. Either I pay USD 6.67/month/user or I live with the restrictions of the free plan. And as most other teams, I lived with the restrictions for as long as I could.

For example, to get rid of the warnings to upgrade because we hit the upload limits, I created a small tool called slack_cleanup (first in Elixir, and then in Crystal just for exercise). This helped for a while.

Our team kept growing, steadily, frequently, as well as clients. The more users we add, the more conversations, the faster we hit the restrictions. History gets lost more frequently, we need to clean up uploads more often. It gets old very fast.

One thing I value above everything is knowledge. As an small example, I myself keep multiple backups for all my emails, all my projects, all my assets that I produced in the last 20 years. Heck, I have a 6 TB, Raid-5, Drobo system right in my home desk and 3 extra 1TB external drives for backup. I've lost very little over the years.

It really annoys me when I lose information.

Gmail Business, DropBox, AWS S3 buckets, being external services, don't worry me because I keep copies of everything offline. So if those accounts get busted all of a sudden, I have multiple copies.
GitHub annoyed me a bit because although I have multiple copies of the repositories, I would lose all the Pull Request, Issues history. That's one reason I moved to my own GitLab and helped tweak the import process to grab those Pull Request history.
Slack annoys me a lot for the aforementioned reasons, which is why last week we tried both MatterMost first, and then we deployed Rocket.chat.

Mattermost

There, and Back Again

We moved from Slack to Rocket.chat and a couple of days latter we are moving again, now to Mattermost.

Yes, this was cumbersome. My team was not very happy for leaving Slack. Slack really is slick, full-featured, good-looking, well-rounded, a proper web product for this generation. Any alternative should be at least almost as good-looking, and at least have bug-free features, including webhooks.

Mattermost fits the bill almost perfectly but the open sourced Team Edition lacked one important feature for me: the ability to disallow normal members to rename and/or delete private groups. Yes, we expect grown-ups to behave, but when you have remote teams, remote clients, external users with no commitment to the company, it's a hassle.

Yes, I could and I probably should use the payed Enterprise Edition. But for just that small feature, it felt too expensive. So that triggered me to let it go and install Rocket.chat instead. I moved my entire team there (almost 50 people, because I didn't move the clients yet). That would be the end of the story.

But Rocket.chat has both a complex infrastructure to deal with (you must have at the very least 3 boxes or pay extra for a proper Mongodb SaaS). Actually, in my previous post I explained why you MEAN guys should not be careless dealing with Mongodb. In a nutshell: Mongodb was not meant to run in a single instance, you must have a replica set. If you have a single instance Mongodb, you're doing it wrong.

And the most problematic: the client-side is just too heavy. It frequently spikes out the CPU in not so powerful machines. It's noticeably and measurably slower to navigate in its UI, compared to Slack. MatterMost UI was much faster and way more responsive.

I was almost willing to overlook the lack of a proper test suite. I was willing to try to live with Meteor and CoffeeScript. I was willing to deal with the complex MongoDB maintenance.

But slow responsiviness across many of the members of my team is a no-go, a big show-stopper. We turned the beta videochat feature off (as it always triggers CPU spikes across all users), but many members still had a bad experience with a UI that was too slow and a resource hog.

The React-based, ES6-written, properly structured - with good enough client-side test suites - MatterMost was a more suitable candidate. So I decided to really think about the original problem and I came about with a simple solution: add a simple PLPGSQL function to be triggered whenever someone tried to delete a channel. Sure enough, it worked. And that prompted me to call my team again and propose this new change: I believe everybody was on-board as MatterMost was way faster on their machines.

I know I am sounding really harsh towards Rocket.chat, and it's really not my intention. If we didn't have any other options, we would still move to Rocket.chat. But as Mattermost proved to be the better choice, it was a no-brainer.

MatterMost Install

As always, I will not bore you with instructions that are already available online. If you want to install everything manually, follow this tutorial, but a better option would be the Docker-based deployment. You can even install/upgrade it together with GitLab.

Make sure you have NGINX in front of the server and that you have both a properly registered domain or sub-domain, and that you have SSL - use Let's Encrypt.

Because I want to keep tweaking and experimenting with the codebase in a live installation, I installed everything manually and I have this directory structure:

123456789

-rw-r--r--  1 mattermost mattermost     6504 Aug 13 20:11 config.jsondrwxrwxr-x  5 mattermost mattermost     4096 Aug 13 20:30 datalrwxrwxrwx  1 mattermost mattermost       33 Aug 13 20:13 mattermost -> mattermost-team-3.3.0-linux-amd64drwxr-xr-x  9 mattermost mattermost     4096 Aug 13 20:28 mattermost-team-3.2.0-linux-amd64-rw-rw-r--  1 mattermost mattermost 19968308 Jul 14 16:37 mattermost-team-3.2.0-linux-amd64.tar.gzdrwxr-xr-x 11 mattermost mattermost     4096 Aug 13 20:28 mattermost-team-3.3.0-linux-amd64-rw-rw-r--  1 mattermost mattermost 20241448 Aug 12 20:41 mattermost-team-3.3.0-linux-amd64.tar.gzdrwxrwxr-x  3 mattermost mattermost     4096 Aug 13 20:23 platform-dev-3.3.0-rw-r--r--  1 mattermost mattermost 18060203 Aug 13 20:22 platform-dev.tar.gz

So I have a restricted, sudo user mattermost and I have a main mattermost directory pointing to the binary packages you will find in the official download page.

Notice that I have a copy of mattermost/config/config.json in the home directory. I leave it there so every time I download a new version and redo the symlink to mattermost, I can just do:

123	rm -Rf ~/mattermost/config/config.jsoncd ~/mattermost/configln -s ~/mattermost/config.json

Also make sure you change at least the following in the config:

..."FileSettings": {"MaxFileSize": 83886080,"DriverName": "local","Directory": "/home/mattermost/data",...

If you want, you can set file uploads to go to some S3 bucket you have, and just fill in the AWS details. But if you choose to have them locally, change the directory to somewhere outside of the mattermost folder, as in every upgrade you will change the folder. With both AWS EC2 or Digital Ocean you can always choose to add a secondary volume that can outlive the virtual boxes, so even if you get to a point where you have hundreds of concurrent users and you want to scale horizontally, you can have all your boxes pointing to a shared volume (AWS EBS, for example).

Speaking of which, in this configuration, upgrading would be like this:

123456789101112

sudo service mattermost stopwget https://releases.mattermost.com/x.y.z/mattermost-team-x.y.z-linux-amd64.tar.gztar xvfz mattermost-team-x.y.z-linux-amd64.tar.gzmv mattermost mattermost-team-x.y.z-linux-amd64ln -s mattermost-team-x.y.z-linux-amd64 mattermostrm -Rf mattermost/config/config.jsonrm -Rf mattermost/datacd mattermost/configln -s ~/config.jsoncd ..ln -s ~/datasudo service mattermost start

The reason I wanted to have it this way is because I can tweak the code and manually push the changes.

For your development machine you should follow this instruction. If you're in OS X and you choose to use Docker Toolbox, remember that you don't need VirtualBox anymore as it will use OS X's native hypervisor now. In my machine, I had to add dockerhost manually in my /etc/hosts because boot2docker was failing to get my ip address.

Then you can just clone the code from Github:

1234	mkdir mattermostcd mattermostgit clone https://github.com/mattermost/platformgit checkout -b v3.3.0 v3.3.0

Remember to always checkout the correct stable version (v3.3.0 as of the time when I originally posted this article) that you have installed in your server. Again, I will not bore you with what's already documented in the links above, but you must have Go 1.6(.3), Docker, Docker-Composer, Docker-Machine all installed already. From there you can do:

12	go build mattermost.go # builds the binary, for OS X in my caseGOOS="linux" GOARCH="amd64" go build mattermost.go # this will cross-compile to an ELF-compatible binary for Linux

You can then make run and after it finishes (and as usual, npm will make sure it takes a very long time) you can open http://localhost:8065 to play with it locally.

Most importantly: you can tweak the React JSX components from platform/webapp/components, for example, the channel_header.jsx and add stuff like this:

123456789101112131415161718192021222324252627

# line 493 from v3.3.0if(isAdmin || isSystemAdmin) {    dropdownContents.push(<likey='rename_channel'role='presentation'><arole='menuitem'href='#'onClick={this.showRenameChannelModal}><FormattedMessageid='channel_header.rename'defaultMessage='Rename {term}...'values={{term:(channelTerm)}}/></a></li>    );if (!ChannelStore.isDefault(channel)) {        dropdownContents.push(deleteOption);    }}

And you know what this will do? Remove the "Rename Group" and "Delete Group" options from the channel menu if the user is not a system admin. Now how do you put this in your server?

Just run make dist and this will trigger npm to run webpack through all your assets and generate a webapp/dist folder. You can scp or whatever this folder to your server and replace mattermost/webapp/dist with this new directory, sudo service mattermost restart and voilá, you tweaked your installation!

And if you want to tweak the server-side Go codebase, just go build as explained above, move the generated mattermost binary to the server and replace mattermost/bin/platform with your new binary, restart the service, and you're done.

This is how I will keep experimenting for the time being, until I feel that I am comfortable enough to automate the whole process later. This should work just fine for my team for the time been.

Of course, the snippet above is nothing but a dirty hack, don't try to contribute like this. A proper implementation would require me to at least create a new setting into the config.json, add checkings probably to api/channel.go's deleteChannel() function (and add the proper unit tests to api/channel_test.go), then change the webapp/channel_header.jsx component in the React front-end (as well as a proper unit tests to webapp/tests/client_channel.test.jsx), then make sure the make test passes, and create a feature request to the core team.

But the idea here was just to show that it was not so difficult to solve the show-stopper for my particular scenario, both using the SQL trigger to safeguard the database and the hack to fix the Web UI.

Conclusion

But what about the most important feature of all? RightGIF support? There is nothing as simple as a rightgif slack command just yet. Fortunately you can compensate most missing niceties like this by installing a Hubot server, and linking it to a user so you can chat with the bot and make it do things for you (set an alarm, fetch a gif, translate text, etc). As a caveat, the hubot adapter requires the use of the mattermost-client which must be synced with the server platform version releases to work properly, so be careful when you're upgrading.

Overall, Mattermost is a great choice. It's not for amateurs as well, the development environment requires you to know your Docker stuff. It requires you to know proper Go-lang configuration. It requires you to follow proper procedures to contribute, as they should. The project itself is a single codebase divided into roughly 2 applications, a Go-based HTTP API and a React-based front-end to consume the APIs. Everything about the project is automated through the proper usage of Docker images (for mysql, postgresql, openldap instances) and Makefiles to run the test suite, create packages, etc.

And it has some conveniences from Slack that Rocket.chat doesn't have yet such as a simple shortcut to switch channels (Cmd-K), the ability to reply messages and organize them as threads, proper and more complete Markdown support. Overall, the features are well-rounded, and not half-baked. What is there is solid and works, it's always bad to have half-finished features.

With these hacks in mind I can strongly recommend that you use Mattermost. And as I said in previous posts, it's not just a matter of cost. If you're a small team, without internal developers or someone that can maintain your own installation, you should definitely pay Mattermost for the Enterprise support, it's affordable and way cheaper than Slack.

For now, it's the better choice, both in terms of overall fit-and-finish, well-rounded features, simple and responsive UI, good code quality and overall care on the engineering.

↧

The Ruby Community and Reputation

August 19, 2016, 7:38 am

≫ Next: Ubuntu 16.04 LTS Xenial on Vagrant on Vmware Fusion

≪ Previous: Choosing MatterMost over Rocket.chat and Slack

I've just read the posts from Adam Hawkins and the support from Alan Bradburne over Ruby Weekly.

As a disclaimer, I don't know them, and I respect their points of view, it's not supposed to start a flame war, just to paint an alternative point of view.

Both articles represent the point of view of many veteran Ruby developers in this community. Some of whom already left to other technologies or stopped public appearances.

I already made my stand very clear in the "Rails has Won: The Elephant in the Room". But let me simplify here.

There is a clearly a faction being formed. Apart from great developers like Solnic (dry-rb), Nick (Trailblazer), Luca (Hanami), most people are complaining that Rails and some of the most close pieces of the ecosystem around it don't conform to their newly-found vision of what "good" should be.

In their view of the world, a "better" Rails should be way simpler, way more explicit, composed of super small libraries, super explicit external APIs, super composability to take it apart back and forth.

And here lies my problem: if DHH complied and took that direction, that would be "Ruby OFF the Rails". The whole point of "Ruby ON Rails" is exactly because it is defined by being a coherent, opinionated full-stack. With coarse grained libraries that are meant to work together in tandem.

Let me present the true Elephant in the Room:

Discussions like this are just inflammatory, link baits. They serve no point but create a cloud of animosity against something, without actually presenting an objective good alternative. Writing bullet points are easy, writing complete solutions and maintaining them are not.
There are indeed good alternatives popping up already, the aforementioned Trailblazer, Hanami and the dry-rb collection of libraries, just to name a few. Now, the complaint is that they don't have as much traction right now. What's the solution? Bad mouthing Rails? Writing ranting articles of how life is unjust? Or actually writing more tutorials on how to use Trailblazer? Recording screencasts on how to use Hanami? Going to a conference and presenting more talks on how to import dry-rb into your projects?
Ruby on Rails was meant for programmers that were tired of super composability, super configuration, super explicit, super fine grained libraries. It was meant for Java programmers coming out straight from the nightmare that was J2EE 1.x in 2004, remember that? Now, the "proposal", is to go back to that? Waste enormous amounts of time fine tuning your tailor-made mini-stack? We have that already, it's called Javascript. And let me tell you: configuring the current breed of package.json brings me back really bad memories from pom.xml, hibernate.cfg.xml to struts-config.xml.
Beginners are not interested in fine grained solutions. Most people forget how it was to be a beginner. Actually I'd argue that 80% of the world's developers benefit greatly from a Rails-like approach. It's no coincidence that after it proved its point, many platforms conformed to an opinionated, convention-over-configuration approach.

The fact that people are complaining against Rails makes it feel like it is Rails fault somehow. Actually not, it's just a reflection of the frustration of the very people that failed (and are failing) to pitch the alternatives. Everybody wants the free lunch.

Let me tell you a short story.

Back in 2005 I was the only Rails developer I knew in my country (Brazil). I googled around and found perhaps half a dozen other hobbyists doing Ruby for fun, a couple doing it professionally already.

Everywhere I looked, people would make fun of us because they thought we were just crazy. "Of course J2EE is the way to go." "Of course every project should conform to Eric Evan's DDD approach." "Of course every project should conform to all of the Gang of Four's Design Patterns, the more, the marrier." "Of course we should have very isolated, deployment packages with very explicit API boundaries between them, no matter how it makes the productivity go down."

We've been there before. I already had more then 10 years of experience in professional programming back in 2004.

It took me another 10 years of pilgrimage. 9 years organizing my own conference. More than 1,000 blog posts. Almost 200 talks, several of them where I payed from my own pocket, to buy bus tickets, airplane tickets. I put my money where my mouth was.

The promise was replacing all the complexity, all the bureaucracy, for an opinionated stack, where most of the basic decisions were already made, and it would be out of our way so we could focus on the most important part: the business.

Every single talk I did pitching Rails since 2006 had the above section. And with every single person that ever watched any of my talks as my witnesses, I always said:

"Many of you heard that Java is bad because it's so complex and so bureaucratic. And many will try to pitch their alternatives at the expense of Java. Not us. We embrace what Java has to offer. Solr, Elasticsearch, Hadoop, all made with Java, and what will we do? Rewrite everything in Ruby? Of course not. For Rails to Win, Java doesn't have to Lose." (and I put the "evil edition" image on fire, on stage).

I wrote an article (in pt-BR, sorry) in October of 2007 titled "For me to win, the other has to lose ..." where I explained why this way of thinking is so lame. And I took inspiration by the famous 1998 Macworld talk by Steve Jobs, where he called in Bill Gates on stage, on the big screen to announce their collaboration. The Apple fans went nuts, they were horrified by that sight, it was like sleeping with Satan.

Almost 20 years later, Apple is one of the most valuable companies in the world. Because it didn't matter. Jobs left pure ideology behind and became a pragmatic. They needed Microsoft's endorsement, they got it, and they proved their points by results: they spent the 90's masturbating over why classic OS 7 to 9 were better than Windows 3.1 up to NT 4. But it took them 10 years to actually prove the worth of OS X over the first decade of the 21st century. One step at a time, one release after the other, frequently delivering instead of spending years closed in the labs until the "perfect" solution arised. Baby steps. It culminated when they were able to squeeze OS X into iOS in 2007.

That's how you prove a point: by sheer pain and sweat, by putting it out there in the streets and pitching it, selling it, creating true value one small release at a time.

And then this happened:

Macbooks everywhere

Remember when Apple took over the world? One entire decade to come back from bleeding 1 billion a year up until becoming the most valuable company in the world, whose effects are still going on 5 years after Jobs' passing.

Some of us understand that engineers want to masturbate over irrelevant things.

"- My algorithm runs 0.1 milliseconds faster that yours".
"- Ha, but my design has fewer dependency points than yours!"

Rails did to Ruby what Apple did to Unix and what Canonical kind of did to Linux-based distros. People complaining about Rails reminds me of people trying to argue why Archlinux is way superior to OS X or Ubuntu, but it's just that people in general are stupid not to perceive that value.

And for some, it's not exactly flattering to be in the same league as Apple or Canonical, but the results are undeniable.

Remember what the legendary "15 minute blog" was all about? Why it surprised us so much in 2005? It was not what Rails could do, it was all the work we didn't need to do. And Rails kept true to that until now.

If you're a beginner programmer, Rails still has plenty to teach and you can learn the details later.

If you're the average programmer, Rails' ecosystem will take you from point A to point B much faster, with better productivity and better maintainability.

If you're the tech star programmer, funded by a unicorn, why are you complaining?

Programmers ranting are not following one of our own old favorite quotes: "Premature Optimization is the Root of All Evil". And as the "UNIX way" already had:

The strategy is definitely:
Make it work,
Then make it right,
and, finally, make it fast.

There is nothing wrong in having alternatives to Rails, but the Rails bashing is getting very old, very fast. It's difficult to pitch alternatives, I know, I've spent many show soles in the last 10 years advocating Ruby, so I really know it.

I never believed in free lunch, I believe in targeted, hard work. That's been my last 25 years in programming.

↧

Ubuntu 16.04 LTS Xenial on Vagrant on Vmware Fusion

September 21, 2016, 2:19 pm

≫ Next: The Next 10 Years

≪ Previous: The Ruby Community and Reputation

I'm old school. I know the cool kids are all playing around with Docker nowadays, but I like to have a full blown linux environment with all dependencies in one place. I will leave volatile boxes for the cloud.

I like to keep a Vagrant box around, because no matter how messy an OS upgrade can go (looking at ya macOS), I know my development box will just work.

But even with everything virtualized and isolated, things can still go wrong. I am currently using Vagrant 1.8.5, with the vagrant-vmware-fusion plugin 4.0.11 and Vmware Fusion 8.5 on El Capitan (even though macOS Sierra just launched, I will wait at least 1 month before upgrading, there is nothing there that is worth the risk).

If you're installing a brand new box for the first time, this is the bare-bone Vagrantfile configuration I am using:

12345678910111213141516171819202122

# -*- mode: ruby -*-# vi: set ft=ruby :Vagrant.configure(2) do |config|  config.vm.box = "bento/ubuntu-16.04"  config.vm.network :forwarded_port, guest: 8080, host: 8080  config.vm.network :forwarded_port, guest: 3000, host: 3000  config.vm.network :forwarded_port, guest: 3001, host: 3001  config.vm.network :forwarded_port, guest: 4000, host: 4000  config.vm.network :forwarded_port, guest: 5555, host: 5555  config.vm.network :forwarded_port, guest: 5556, host: 5556  config.vm.network :forwarded_port, guest: 3808, host: 3808  config.vm.network "private_network", ip: "192.168.0.100"  config.vm.synced_folder "/Users/akitaonrails/Sites", "/vagrant", nfs: true  config.vm.provider :vmware_fusiondo |v|    v.vmx["memsize"] = "2048"endend

I usually go in the Vmware settings for the virtual machine and enable an extra processor (as my Macbook has 8 virtual cores to share) and enable hypervisor (support for Intel's VT-x/EPT).

As a rule of thumb, the very first thing I always do is set the locale to en_US.UTF-8:

123	sudo locale-gen "en_US.UTF-8"sudo dpkg-reconfigure localessudo update-locale LC_ALL=en_US.UTF-8 LANG=en_US.UTF-8

And just to make sure, add the following to /etc/environment:

12	LC_ALL=en_US.UTF-8LANG=en_US.UTF-8

You must set UTF-8 before you install packages such as Postgresql.

Then I upgrade packages and install the basic:

sudo apt-get update && sudo apt-get upgradesudo apt-get install open-vm-tools build-essential libssl-dev exuberant-ctags ncurses-term ack-grep silversearcher-ag fontconfig imagemagick libmagickwand-dev python-software-properties redis-server libhiredis-dev memcached libmemcached-dev

This will install important tools such as Imagemagick, Memcached and Redis for us.

Now, to install Postgresql:

1	sudo apt-get install postgresql-9.5 postgresql-contrib postgresql-server-dev-9.5

Create the superuser for vagrant:

sudo -i -u postgrescreateuser --interactiveEnter name of role to add: vagrantShall the new role be a superuser? (y/n) y

And only for the development environment edit /etc/postgresql/9.5/main/pg_hba.conf and change the following:

# "local" is for Unix domain socket connections onlylocal   all             all                                     trust# IPv4 local connections:host    all             all             127.0.0.1/32            trust# IPv6 local connections:host    all             all             ::1/128                 trust

This is to make your life easier while programming. If you did everything right until now, you will have your PG with proper unicode encoding and without bothering with password when you do bin/rails db:create. If you didn't configure your locale properly before, you can follow this gist to manually set PG's locale to UTF-8.

Installing Ruby is still better done through RVM:

gpg --keyserver hkp://keys.gnupg.net --recv-keys 409B6B1796C275462A1703113804BB82D39DC0E3\curl -sSL https://get.rvm.io | bashsource $HOME/.rvm/scripts/rvmrvm install 2.3.1rvm use 2.3.1 --default

And I prefer using YADR as my default dotfiles, replacing Bash for ZSH. And comparing to other dotfiles, I like this one because I usually don't have to tweak it, at all. I won't even configure anything about RVM after installing because YADR takes care of that already.

1	sh -c "`curl -fsSL https://raw.githubusercontent.com/skwp/dotfiles/master/install.sh `"

To update it (or resume in case it breaks for some reason):

12	cd .yadrrake update

The only 2 tweaks I have to do is change my iTerm2 profile to use Solarized, and I have to add the following 2 lines to the top of the .vimrc file:

12	scriptencoding utf-8set encoding=utf-8

Next step, install NodeJS:

12	curl -sL https://deb.nodesource.com/setup_6.x \| sudo -E bash -sudo apt-get install nodejs

Next step, install Java. You can choose Oracle's installer, but I believe the openjdk should be enough:

1	sudo apt-get install default-jdk

We will need Java for Elasticsearch 2.4.0:

1	wget https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/deb/elasticsearch/2.4.0/elasticsearch-2.4.0.deb && sudo dpkg -i elasticsearch-2.4.0.deb

Now you can start it manually with sudo /etc/init.d/elasticsearch start, and you want to leave it that way because it consumes a lot of RAM, so you should only start it when you really need it.

With Java in place, we can also install Leiningen to have Clojure ready.

echo "PATH=$PATH:~/bin" >> ~/.zsh.after/bin.zshmkdir ~/bin && cd ~/binwget https://raw.githubusercontent.com/technomancy/leiningen/stable/bin/leinchmod a+x leinlein

Leiningen will install it's dependencies and you can follow its tutorial to get started.

Installing Rust is as easy:

1	curl -sSf https://static.rust-lang.org/rustup.sh \| sh

Installing Crystal, also easy:

12	curl https://dist.crystal-lang.org/apt/setup.sh \| sudo bashsudo apt-get install crystal

Installing Go is not difficult, but more manual:

wget https://storage.googleapis.com/golang/go1.7.1.linux-amd64.tar.gztar xvfz go1.7.1.linux-amd64.tar.gzchown -R root:root .gosudo mv go /usr/localtouch ~/.zsh.after/go.zshecho "export GOPATH=$HOME/go" >> ~/.zsh.after/go.zshecho "export PATH=$PATH:/usr/local/go/bin:$GOPATH/bin" >> ~/.zsh.after/go.zsh

Once you install go and set it's work path, we can install some useful tools such as forego and goon (that Elixir's Hex can optionally use):

12	go get -u github.com/ddollar/foregogo get -u github.com/alco/goon

And speaking of Elixir, we saved the best for last:

123	wget https://packages.erlang-solutions.com/erlang-solutions_1.0_all.deb && sudo dpkg -i erlang-solutions_1.0_all.debsudo apt-get updatesudo apt-get install esl-erlang elixir

And this is it, a very straightforward tutorial to have a modern development environment ready to go. These are the basic software development tools that I believe should be in everybody's toolbelts for the following years.

Honestly, I am not so much into Clojure and Go as I think I should. And I didn't give .NET Core a lot of time yet, but I will explore those in more detail in the future.

↧

The Next 10 Years

September 26, 2016, 11:47 am

≫ Next: Sharing models between Rails Apps - Part 1

≪ Previous: Ubuntu 16.04 LTS Xenial on Vagrant on Vmware Fusion

If you attended Rubyconf Brazil 2016, last Saturday, Sep 24th, you saw me announcing my "retirement" from organizing the conference, actually from organizing any language/platform specific conference.

I might not understand Portuguese, but I'm fluent in Douglas Adams. Congrats @AkitaOnRails and thank you! #RubyConfBR pic.twitter.com/hfOA2wMKot
— Kären Engelbrecht (@scriberty) September 24, 2016

Being crystal clear, this is not one of those "leaving Ruby" dramas. Because I don't believe in BS like that.

I am a Rubyist and I will always be a rubyist. For the foreseeable future Ruby will remain as my primary, favorite, language.

And there is no other drama involved, by the way. Rubyconf Brazil is primarily sponsored and organized by Locaweb, and I have nothing but gratitude for their relentless support through all these years. I even owe them an apology as I did the announcement without anyone knowing. I don't know if they will want to continue with the brand or not, we still didn't talk about it.

The Conference Principles

About the event, somethings people may not know:

I don't own the conference.

I don't have any contract with any sponsor. I don't have a salary, profit-sharing (the conference was always non-profit, Locaweb itself paid premium and always spared no expense, which is why quality was always top-notch), or any other kind of financial arrangement. I've been organizing the conference in my own time.

That was by design, a principle.

I tried to guarantee as much fair game as possible. Which is also why I always limited myself of speaking at my own event. The only times I did speak, was to fill in a gap (a speaker that couldn't make it in the last minute) or at very specific moments (at the 1st Rails Summits, or the 10th anniversary of Ruby on Rails in 2014).

My main concern with the conference has always been the program curation. I handpicked talks by technical merits, as much as I could. They always reflected what I personally think are worth watching. That's why I always avoided public voting (as it unbalances towards celebrities), never a committee (as it forces politics), no blind choosing (as I believe a conference program is like storytelling and I wanted coherency).

As we were not profit-oriented, I never worried about lame things such as choosing topics just for the sake of increasing audience. It doesn't mean I always chose right, mistakes happen, but they were the exception. For the most part I believe I did the best I could.

It was always what I personally believed made sense for the audience. I am very proud not only of the speakers we were able to foster, but also the audience that followed them. If you get Rubyconf Brazil's registered attendees, most of them are not in any other of the big events in the country. It's an exclusive group of people that meet once a year.

More than that: every year, in the last 10 years, half of the attendees were new to the conference, and more than half come from all over the country (not just from Sao Paulo City).

RejectConf 2007

My very first "official" event was called RejectConf SP, in 2007. It had around 150 registrations and less than a 100 people actually attended. Almost all of the speakers are still around and doing very well.

It was named like that because in 2007, Ruby was the ugly duck, specially in Brazil. No one in their right minds would ever think how big this community would become. So it was the rejected language.

The ugly duck ended up being a Black Swan.

The next 2 years I didn't try to make a "Ruby"Conf, because most people didn't even know what "Ruby" was, but everybody was aware of "Rails" as there was a period of time of "Rails-bashing" in all the other communities. So I called my next events "Rails Summit".

Once we were able to make the Ruby name sink in and be better appreciated, that's when I made the switch to "Rubyconf Brazil" in 2010.

So 2016 was the 7th Rubyconf Brazil, but the 10th Ruby-related event in a row.

Then, Ruby on Rails reached the milestone of 10 years, in 2014. I made the closing keynote explaining why it still matters.

Most importantly I made the following important point:

“With or without religion, you would have good people doing good things and evil people doing evil things. But for good people to do evil things, that takes religion.” ― Steven Weinberg

The False Dilemma of Language Religions

If you've been paying attention on everything I do, you will know that I am not an extremist language/platform evangelist. I don't believe in language religions. I believe in freedom. Freedom to do and say whatever I want (to the limit of not harming others, of course), and most importantly: without prejudice.

But this is increasingly difficult as it automatically sounds "weird" when a "rubyist" wants to talk about Go, Clojure, or other languages. It always feel like "hey, this rubyist is talking about Clojure, he is switching, therefore Ruby is losing ground". And for that matter it always sounded strange when someone heard about an evangelist of one language speaking about another.

This is the bullshit of this generation. I witnessed my freedom being restrained, year after year. Everybody else's, for that matter. Everybody must be under some flag nowadays, having extreme, black-and-white position.

This is just stupid.

Most extreme positions meant to feel like a solid choice ends up being a very ambiguous and demagogic position.

We live in a society exquisitely dependent on science and technology, in which hardly anyone knows anything about science and technology. ― Carl Sagan

I've been writing about Critical Thinking, the Scientific Method and related topics for many years in my blog and social media.

I've been putting non-Ruby topics in a "Ruby"Conf for many years. We had Node.js in 2010, iOS in 2011, Elixir was introduced by José Valim in 2012's RubyConf. And in 2015 I did yet another big change by introducing 3 parallel tracks (out of 5) dedicated to non-Ruby topics.

This year we put all language logos (Elixir, Crystal, Go, Javascript, Clojure) in the Rubyconf Brazil materials.

No rubyist felt that was strange, and this is what I am most proud of the audience we built: Rubyists in Brazil already feel super natural to talk and use other languages and platforms without feeling like they are "betraying" a cause or without feeling the need to "abandon" Ruby because they started enjoying something else such as Elixir or Clojure.

Rubyists, in Brazil at least, are less prone to fall for the mistake of vanity metrics such as choosing a language because of the TIOBE ranking.

There was one last thing I had to do: set the final example.

For most people Fabio Akita (a.k.a. AkitaOnRails) is synonymous with Ruby. Even though I write a lot about other things for many years, most people will still associate me with RubyConf Brazil. And even though the conference changed its direction years ago, "Ruby"Conf is mostly associated with "just Ruby".

It was just too comfortable to keep going with Rubyconf Brazil as it was. I know how to do it. People like it. It's still pretty much growing without signs of slowing down (we started having to close registration a couple of weeks before because of overbooking).

When things start to get too comfortable you know it's possibly not entirely right.

I am not my favorite language.

This was the final message, the story I wanted to tell and that was 10 years in the making.

I spent 10 years building a healthy audience that is unlike any other in the domain of technology: a highly agnostic, rational, critical thinking aware, results-oriented, crowd. We really believe in "best-of-breed" and "best-tools-for-the-job".

But it couldn't be just small talk, demagogic speech. And the most uncomfortable thing I could do was to let it go.

The message was loud and clear: I am not my language, and neither should you let it be.

Akita"OnRails"

Many people asked if I will change my username.

Nope, it will still be "AkitaOnRails".

Since the very beginning, "On Rails" is not about the web framework "Ruby on Rails". It means exactly what it says: "On Rails", "Over a Line", "Embracing Constraints".

"Constraints are Liberating" - DHH

Ruby on Rails is a web framework. But it is also a statement, an opinionated point of view.

It means:

Making efforts to write Beautiful Code
Making efforts to write Enjoyable Code
Making efforts to write Maintainable Code

It means using the best techniques and practices to achieve that. Test Driven-Development (Test-First, of Test-After, it doesn't matter). Continuous Integration. YAGNI (You Ain't Gonna Need It). SOLID (as Bob Martin evangelizes). Continuous Delivery. Etc.

The Ruby Community didn't invent Git, but we made it work through Github.

The Ruby Community was the very first software development crowd to adopt Agile techniques as a whole, not as a dissident faction of another mainstream group.

Now every new community aspires to be "On Rails", without naming it like that. And not by just copying the web framework. Just by the desire and discipline of writing "Good Code", never "Quick and Dirty" Code.

And these values will keep on going, with or without a Rubyconf or the Ruby language.

I know, for most people it will still feel like I am tied to the web framework, but it's not. Maybe someday I will change it, but not now.

We will not always agree on what the right techniques are, but we will always make efforts toward the same goals: writing beautiful code that yields valuable results.

The Next 10 Years

With all that said, I hope to have helped a bit in spreading the word:

Stop being extremists.

Exercise critical thinking. Do what feels right, not what pleases random influential groups. Never force your point of view upon others. Stop blindly following false prophets.

I do everything by this idea:

"I never live for the sake of another person and I never ask anyone to live for the sake of mine. I only accept voluntary trade for mutual benefit."
“I am a software engineer. I know what is to come by the principle on which it is built.

I explained this in detail in my article "_Why, Ruby Dramas, and Dynamiting Courtlandt". In a small way, I want to think that I honored this principle.

I write software (and about software) because I enjoy software. I am very fond of Ruby and will keep doing the best Ruby I can for many years. And I will also write good Elixir, and good Crystal, and perhaps good Javascript, etc.

But I will do it my way, not anyone else's.

And so should you.

↧

Sharing models between Rails Apps - Part 1

October 3, 2016, 11:58 am

≫ Next: Sharing models between Rails Apps - Part 2

≪ Previous: The Next 10 Years

"Build a microservice and expose an API."

That would be the quick answer if you ask any developer how to share business logic between applications.

Although it makes sense in many number of situations, it's not a good answer all the time.

My TL;DR for some situations is that you can organize your models logic as ActiveSupport::Concerns (or plain Ruby Modules if you will) and move them out to a Rubygem that your applications can consume.

Notice that I am only speaking of Models, not Controllers or Views. To share those you would need a full blown Rails Engine instead. But many cases I've seen wanted to just share the business logic between applications while having separated front-end logic.

A small example of this scenario is the open sourced project I've been working on in the last few weeks. Central, which is a Pivotal Tracker/Trello alternative - if you're interested.

A few days ago I started a new project (for internal use only) that would query the same models as Central. I didn't want to implement HTTP APIs at this point, and the new application would itself have models with relationships to the models in Central (while treating them as read-only).

After a few refactorings, most of Central's models look like this one:

classTeam< ActiveRecord::Base  include Central::Support::TeamConcern::Associations  include Central::Support::TeamConcern::Validations  include Central::Support::TeamConcern::Scopes  include Central::Support::TeamConcern::DomainValidator  ...end

And I have this dependency in the Gemfile:

1	gem 'central-support', github: 'Codeminer42/cm42-central-support', branch: 'master', require: 'central/support'

Whenever I change the concerns, I do a bundle update central-support in the projects (this is the one caveat to have in mind to avoid dealing with outdated models).

This was possible because most of those models were mature and stable and I will not be changing them often. I don't recommend exposing unstable dependencies (as gems or APIs, it doesn't matter), because this is a recipe for huge headaches of cascading breaking changes due to outdated dependencies that are changing too often.

You should ONLY expose business logic that is reasonably stable (changes only every week or so).

The whole endeavor was to build a certain Rubygems structure, organize the original models into Concerns (which breaks no behavior), make sure specs are still passing, and them move the content (models and specs) over to the new Rubygems and make sure the specs pass there.

That's how I built a secondary open source dependency for Central, called Central Support. As many gems, it's main file lib/central/support.rb is nothing but a bunch of 'require's to load all the dependencies.

So I methodically organized logic as concerns, such as lib/central/support/concerns/team_concern/association.rb, which is just the extraction of the Active Record associations from the 'Team' model.

Cut from Central, Paste into Support. When all relevant logic has been moved, I could move the entire Team model spec, mostly without any changes, and make it run. Every time I moved a bit, I bundle updated the gem and ran the main spec suite to make sure nothing broke.

And this is the difficult part: make a sandbox where those concerns could run and be tested.

To begin, I needed to build a minimal Rails app inside the spec folder, at spec/support/rails_app. And there I could put fake models that include the concerns I had just extracted from Central.

There is scarse documentation on how to do that, but I think you can just do rails new and start from there, or copy my rails_app folder for the bare minimum. My case is simpler because this gem is not to be general purpose, so I don't need to run it against different Rails versions, for example.

This internal test app must have a carefully crafted Gemfile:

123456789101112

...gem 'central-support', path: File.expand_path("../../../..", __FILE__)gem 'devise'gem 'pg'gem 'friendly_id'gem 'foreigner'group :testdo  gem 'test-unit'  gem 'rspec-rails'...

You don't have to add the gems from the main gemspec. But you can remove the development dependencies that you would put in the gemspec and keep them in the test app Gemfile.

Now, from the main Gemfile you can do:

123	source 'https://rubygems.org'eval_gemfile File.join(File.dirname(__FILE__), "spec/support/rails_app/Gemfile")

Most tutorials to build a Rubygem will add a line to load dependencies from the gemspec, but here we are replacing it for the test app's Gemfile. This is the manifest that will be loaded when we run bundle exec rspec, for example.

Speaking of which, this is the spec/rails_helper.rb:

1234567891011121314151617

ENV['RAILS_ENV'] ||= 'test'require 'rails/all'require 'factory_girl'require 'factory_girl_rails'require 'rspec/rails'require 'shoulda/matchers'`cd spec/support/rails_app ; bin/rails db:drop db:create db:schema:load RAILS_ENV=test`require 'support/rails_app/config/environment'require 'support/database_cleaner'require 'support/factory_girl'require 'support/factories'require 'spec_helper'

To wrap your head around it:

bundle exec rspec will load the main Gemfile
the main Gemfile will load from the internal test app's Gemfile
that internal test app's Gemfile will require the gemspec from ../../../.. and the development and test groups of gems (including Rspec, Factory Girl, etc)
the gemspec will require the runtime dependencies such as "activesupport", "enumerize", etc
finally, the rails_helper.rb listed above will load.

There at line 11, the runner will execute a command to cd into the internal test app's root folder and run the db:schema:load, therefore you need a db/schema.rb ready to load, as well as config/database.yml.

The spec/spec_helper.rb is more standard, with optional configurations for test coverage, etc.

The models inside the internal test app are the important parts, because they are the means to include the extracted concerns into a runnable format. The 'spec/support/rails_app/app/models/team.rb' is such an example:

classTeam< ActiveRecord::Base  include Central::Support::TeamConcern::Associations  include Central::Support::TeamConcern::Validations  include Central::Support::TeamConcern::Scopes  include Central::Support::TeamConcern::DomainValidatorend

And with that, I could move the unmodified specs directly from the main project (Central), such as spec/central/support/team_spec.rb:

12345678910

require 'rails_helper'describe Team, type: :modeldo  it { is_expected.to validate_presence_of :name }  it { is_expected.to have_many :enrollments }  it { is_expected.to have_many :users }  it { is_expected.to have_many :ownerships }  it { is_expected.to have_many :projects }  ...end

If you go back in the Central project, some commits back, you will find the very same file as spec/models/team_spec.rb. And the main advantage of this approach is exactly being able to move most of the code out of the main project, together with their specs, into a dependency gem, without having to "rewrite" anything.

If I had to rewrite all or a big chunk of the code, it would've been a more expensive choice and I would probably have deferred it to another time and focus on more valuable features first.

This approach is not perfect but it was super cheap. I could move all the relevant business logic out of the main project without having to rewrite anything but a few wiring code. The new dependency gem received all the relevant bits and specs, and everything just runs.

So, if you have 2 or more Rails apps that could share the same models, this is how you can start it. Of course, there are always a lot of caveats to keep in mind.

In my case, the Central project is the one that can read-and-write to the database. My internal secondary app is just using the models as read-only. When 2 different apps write to the same database, you may have a number of conflicts to deal with.

This approach is useful if your secondary application is akin of an Administration dashboard, for example. You need to have some of the same associations, scopes, even validations for eventual editing, but it's limited to a few, controlled users.

This is also useful if you're doing data analysis, and again you can use the same associations, scopes, to build reports and dashboards. Essentially, if you need read-only access, this is a no-brainer.

In the next article I will explain how I wired a secondary application, using the central-support gem and dealing with 2 different databases at the same time.

↧

Sharing models between Rails Apps - Part 2

October 4, 2016, 1:13 pm

≫ Next: Iniciativa THE CONF

≪ Previous: Sharing models between Rails Apps - Part 1

Let's continue from where I left off in Part 1 where I quickly described how you can extract reusable model logic from a Rails app into a testable Rubygem.

If I were building a secondary Rails app connecting directly to the same database as the first, I could just add the dependency to the extracted gem:

1	gem 'central-support', github: 'Codeminer42/cm42-central-support', branch: 'master', require: 'central/support'

Then recreate the models including the same Concerns, and make sure I remove the ability to bin/rails db:migrate from the secondary app (by creating empty db tasks with the same name, for example).

By the way, this is one big caveat that I didn't address in Part 1: up to this point, the schema was frozen in the central-support gem.

From now on, you must control the evolution of the tables mapped in the gem from within the gem. The best approach is to use the spec/support/rails_app and normally create new migration with bin/rails g migration from there. Then you must move the migration to the lib/generators/central/templates/migrations folder.

The lib/generators/central/install_generator.rb will take care of making a central:install task available that will dutifully put new migrations into your application's db/migrate folder as usual. You just have to bundle update central-support to get the newest changes, run bin/rails g central:install to create the new migrations (it will automatically skip existing ones) and run the normal bin/rails db:migrate. A migration generator code is very simple, you can do it like this:

12345678910111213141516171819202122

require 'securerandom'require 'rails/generators'require 'rails/generators/base'require 'rails/generators/active_record'moduleCentralmoduleGeneratorsclassInstallGenerator< Rails::Generators::Base      include ActiveRecord::Generators::Migration      source_root File.expand_path("../templates", __FILE__)defcreate_migrationsDir["#{self.class.source_root}/migrations/*.rb"].sort.each do |filepath|          name = File.basename(filepath)          migration_template "migrations/#{name}", "db/migrate/#{name}", skip: true          sleep 1endendendendend

The migration_template will take care of adding the proper timestamp to the migration file, so you don't have to add it manually and the template file name can something plain such as migrations/add_role_field_to_user.rb.

All that having being said, there is a second challenge I added to my internal secondary app: I wanted it to have it's own main database and use Central's database as a secondary read-only source.

So migrations in the secondary app (let's just call it Central-2) will run against it's own main database, not against the main Central's database. This add the following problem: the test suite must be able to create and migrate both test databases, in isolation from Central. Only in production should Central-2 connect to Central's database.

Every Rails application has a config/database.yml.sample and a db/schema.rb, so I started by creating a config/database_central.yml.sample and a db_central/schema.rb.

The config/database_central.yml.sample is already interesting:

123456789101112131415161718192021222324252627

development:adapter: postgresqlencoding: unicodetimeout: 5000database: fulcrum_developmentpool: 5# Warning: The database defined as "test" will be erased and# re-generated from your development database when you run "rake".# Do not set this db to the same as development or production.test:adapter: postgresqlencoding: unicodetimeout: 5000<% if ENV['DATABASE_CENTRAL_URL'] %>url: <%= ENV['DATABASE_CENTRAL_URL'] %><% else %>database: central_test<% end %>pool: 5production:adapter: postgresqlencoding: unicodetimeout: 5000url: <%= ENV['DATABASE_CENTRAL_URL'] %>pool: <%= ENV['DB_POOL'] || 5 %>

In production, it will use the DATABASE_CENTRAL_URL environment variable to connect to Central's main database.

When running tests locally, it will simply connect to a local database named central_test.

Now, while running tests at Gitlab-CI (or any other CI for that matter), I have to configure the DATABASE_CENTRAL_URL to point to a secondary Postgresql test database.

For Gitlab, this is how I configure the build script:

1234567891011121314151617181920212223242526

image: codeminer42/ci-ruby:2.3services:  - postgres:latestcache:key: central-botuntracked: truepaths:    - .ci_cache/variables:RAILS_ENV: testDATABASE_URL: postgresql://postgres:@postgresDATABASE_CENTRAL_URL: postgresql://postgres:@postgres/central_testbefore_script:  - bundle install --without development production -j $(nproc) --path .ci_cache  - cp config/database.yml.sample config/database.yml  - cp config/database_central.yml.sample config/database_central.yml  - bin/rails --trace central:db:create central:db:schema:load  - bin/rails --trace db:create db:schema:loadtest:script:    - bundle exec rspec

Notice how I copy the ".sample" config files to make sure they exist. And then how I run tasks you know such as db:create db:schema:load to create the normal test database, but tasks you don't know such as central:db:create central:db:schema:load.

I defined those tasks in lib/tasks/db_central.rake like this:

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495

task spec: ["central:db:test:prepare"]namespace :centraldo  namespace :dbdo |ns|    task "environment:set"doRake::Task["db:environment:set"].invokeend    task :dropdoRake::Task["db:drop"].invokeend    task :createdoRake::Task["db:create"].invokeend    task :setupdoRake::Task["db:setup"].invokeend    task :resetdoRake::Task["db:reset"].invokeend    task :migratedoRake::Task["db:migrate"].invokeend    task :rollbackdoRake::Task["db:rollback"].invokeend    task :seeddoRake::Task["db:seed"].invokeend    task :versiondoRake::Task["db:version"].invokeend    namespace :schemado      task :loaddoRake::Task["db:schema:load"].invokeend      task :dumpdoRake::Task["db:schema:dump"].invokeendend    namespace :testdo      task :preparedoRake::Task["db:test:prepare"].invokeendend# append and prepend proper tasks to all the tasks defined here above    ns.tasks.each do |task|      task.enhance ["central:set_custom_config"] doRake::Task["central:revert_to_original_config"].invokeendendend  task :set_custom_configdo# save current vars@original_config = {env_schema: ENV['SCHEMA'],config: Rails.application.config.dup    }# set config variables for custom databaseENV['SCHEMA'] = "db_central/schema.rb"Rails.application.config.paths['db'] = ["db_central"]Rails.application.config.paths['db/migrate'] = ["db_central/migrate"]Rails.application.config.paths['db/seeds'] = ["db_central/seeds.rb"]Rails.application.config.paths['config/database'] = ["config/database_central.yml"]ActiveRecord::Tasks::DatabaseTasks.database_configuration = CM(Rails.root)      .join("config", "database_central.yml")      .File.read      .ERB.new      .result      .YAML.load      .unwrap.freezeend  task :revert_to_original_configdo# reset config variables to original valuesENV['SCHEMA'] = @original_config[:env_schema]Rails.application.config = @original_config[:config]endend

This is how I define a namespace of Rake tasks that start with central:, and every one of those connect to the secondary database as described in database_central.yml. The weird syntax in line 81 is from my chainable_methods, don't mind it too much.

The db_central/schema.rb is basically a copy of the spec/support/rails_app/db/schema.rb from the central-support gem, with the same tables. The spec runner of both the gem and this secondary app will just load the schema into the test database.

Now that we have the basic underpinnings for specs in place, we can focus on how the application itself can consume those external models.

We start by adding an initializer like config/initializer/db_central.rb:

DB_CENTRAL = CM(Rails.root)  .join("config", "database_central.yml.sample")  .File.read  .ERB.new  .result  .YAML.load  .unwrap.freeze

In this case I am reading from the sample file because different from the CI build, when I deploy to Heroku I don't have a script to copy the sample to the final yaml file. This will populate the constant DB_CENTRAL with the database URL stored in the DATABASE_CENTRAL_URL environment variable that I have to set.

Then I create a new file called app/models/remote_application_record.rb that looks like this:

12345678

classRemoteApplicationRecord< ApplicationRecord  establish_connection DB_CENTRAL[Rails.env]self.abstract_class = trueunlessRails.env.test?    default_scope -> { readonly }endend

This is how you create a new connection pool for the secondary database configuration. You must only have this establish_connection in one place and have the models inherit from here. The abstract_class = true will make ActiveRecord not try to load from a table of the same name as this class.

Then we have a default_scope locking down the model as readonly. We don't want that in the test environment because I still want to have Factory Girl populate the test database with fake data for the specs. But it's good idea to have it in the production environment just to make sure.

Finally, I can create all the models I need, such as app/models/central/team.rb:

123456789

moduleCentralclassTeam< RemoteApplicationRecordself.table_name = 'teams'    include Central::Support::TeamConcern::Associations    include Central::Support::TeamConcern::Validations    include Central::Support::TeamConcern::Scopesendend

From here I can just call normal Arel queries such as Central::Team.not_archived.limit(5).

Conclusions

If you didn't already, refer to Part 1 for more details.

This is a simple recipe to share model logic between a main read-write Rails app and a secondary read-only Rails app. They share most (not all) of the same models, they share the same logic (through some of the Concerns), and they share the same database.

In this particular case, the recommended approach is to create a Follower database, which is how Heroku calls a secondary replicated database and make my secondary application connect to that (as it only needs a read-only source).

For more complicated scenarios, you will need a more complicated solution such as an HTTP API layer to make sure only one App manages model migrations and such. But the Rubygem approach should be "good enough" for many cases.

If I really need to go that way, it won't be too difficult to transform this small gem into a full blown Rails API app. If you can't even separate the logic as a Concern, you won't be able to separate them as APIs either, so consider this a quick exercise, a first step towards creating an Anti Corruption Layer.

And as a bonus, consider contributing to the Central and central-support open source projects. I intend to build a competitive Pivotal Tracker/Trello alternative and we are getting there!

↧

Iniciativa THE CONF

October 21, 2016, 8:50 am

≫ Next: THE CONF Initiative

≪ Previous: Sharing models between Rails Apps - Part 2

Obs: if you don't understand Brazilian Portuguese, please read this English version.

Faz algum tempo que venho pensando sobre eventos de desenvolvimento de software no Brasil e o tema me incomoda muito faz alguns anos.

Eu anunciei minha "aposentadoria" da organização da Rubyconf Brasil porque eu queria pensar em como resolver esse meu incômodo.

Quando decidi fazer uma "Rubyconf" no Brasil, minha preocupação de 1 década atrás era se o tema em si atrairia número suficiente de pessoas. Eu tinha uma meta em mente: uma conferência de ponta, inspirada na Railsconf norte-americana, que trouxesse mais de 1.000 pessoas.

Essa meta foi atingida faz alguns anos já, se não me engano pelo menos desde 2013.

Mas o formato nunca evoluiu o suficiente pra mim.

Em outros países, mais notavelmente nos Estados Unidos, alguns países da Europa como Inglaterra, e algumas exceções na Ásia como no Japão, as conferências de software atraem muita gente de todo o mundo. Como exemplos temos Railsconf, OSCON, Microsoft Build, QCon NY, QCon SF, ApacheCon, LinuxCon, Apple WWDC, Google I/O, etc.

Alguns destes enchem por motivos óbvios. Se você quer saber as novidades ainda não reveladas da Apple, você precisa ir no WWDC, por exemplo. Muitos deles atraem gente por realmente terem conteúdo exclusivo das empresas que as criam, como Microsoft, Facebook ou Google.

Outras atraem gente pela reputação dos seus palestrantes e por sua vez esses palestrantes querem participar desses eventos porque isso impulsiona seus currículos, é o caso talvez de eventos como OSCON ou QCon NY/SF. Eventos como esses juntam facilmente 1 mil, 5 mil, ou mais pessoas pagando tickets acima de USD 500 até alguns mais caros que USD 2.500!

Eu não sou do tipo "anti-imperialista" como muitos radicais que preferem não consumir produtos norte-americanos ou dizer que não devíamos usar Inglês e sim forçar Português, por exemplo.

Pelo contrário, eu acredito que todos nós devemos adotar o Inglês como lingua-franca para toda comunicação a respeito de tecnologia.

Não para virar "puxa-saco de gringo", exatamente o oposto: justamente para parar de puxar-saco de gringo!

Conundrum

Esta seção pode se tornar confuso então vou tentar ser o mais direto que puder. Vou também restringir a narrativa somente ao Brasil, mas os mesmos argumentos podem servir para qualquer país na América Latina. Por favor, tenham paciência nesta parte.

Primeiro de tudo, nossas conferências são todas apresentadas em Português do Brasil, exclusivamente para uma audiência brasileira.

Para aumentar suas reputações, grandes eventos aqui trazem palestrantes convidados "internacionais" - e por "internacional" entenda "fora da América Latina", como EUA ou Europa.

A audiência local vai a esses eventos algumas vezes somente por causa da qualidade desses convidados.

Mas a verdade não dita é que a maioria desses convidados não viriam por vontade própria sem um convite ou incentivo financeiro. Mas esses mesmos palestrantes normalmente competem para poder falar em eventos americanos ou europeus. Eu não estou criticando eles! É uma consequência natural e eu faria a mesma coisa se fosse eles.

Isso é o sintoma de como eventos no Brasil são feitos. Eles são muito bem feitos e efetivos para a audiência que querem atingir: brasileiros. Simultaneamente isso os torna nada atrativos para todo o resto do mundo.

É uma questão de economia, porque é caro vir pra cá, americanos precisam tirar visto e isso é muito burocrático, e eles teriam melhor exposição em eventos mais diversificados que entendem inglês.

Ao mesmo tempo temos diversos grandes desenvolvedores brasileiros aplicando para palestrar em eventos americanos ou europeus para ganhar mais exposição, exatamente pelas mesmas razões: nossos eventos brasileiros não tem alcance com nenhuma platéia não-brasileira.

Somente brasileiros entendem Português do Brasil.

A audiência internacional não tem noção da qualidade dos palestrantes brasileiros porque nós não estamos fazendo um bom trabalho em mostrar isso.

Um primeiro passo para remediar essa situação é tentar alguma coisa nunca tentada antes: criar uma conferência no Brasil onde a principal linguagem de apresentação seja inglês. Surpreendentemente, eu acredito que isso soa como algo nada demais para qualquer um, de qualquer lugar, menos se você for brasileiro!

Isso não vai aumentar a quantidade de palestrantes "internacionais" querendo vir pra cá, e esse não é o objetivo.

O objetivo é criar um palco internacionalmente reconhecido para nossos desenvolvedores de mais alto-calibre para se apresentarem para uma platéia muito maior. Essa platéia seria capaz ou de vir para cá participar ou assistir online. Porque nenhum americano jamais vai tentar assistir uma palestra gravada no Brasil que é falado em Português, mas eles não teriam problema nenhum em assistir um brasileiro falando em inglês.

Não somente nossas conferências não são atrativas para platéias que falam inglês, mas não são atrativos nem mesmo para nossos amigos vizinhos na Argentina, Colômbia, Chile, Uruguai, Paraguai, etc.

O Brasil é uma grande ilha que fala Português (do Brasil) cercada por países que falam Espanhol, em um mundo globalizado que fala Inglês.

Podemos fazer melhor.

Do Brasil para o Mundo

Acho que a primeira coisa para mudar as coisas é criar um palco internacional em solo nacional.

Nunca fui no FOSDEM mas ouvi dizer que é um evento que acontece na Bélgica, onde o público fala principalmente 3 línguas: holandês, francês e alemão e o conteúdo é em Inglês. Europeus já são muito melhor acostumados a lidar com múltiplas línguas do que nós no Brasil, onde sequer sabemos o Espanhol que todos os países da América do Sul falam - menos o nosso.

A maioria dos brasileiros e latino americanos sequer sabe que muitos de nós já estamos competindo internacionalmente, de igual pra igual. Muitos grandes contribuidores em projetos de código aberto são brasileiros, muitos engenheiros em grandes empresas gringas conhecidas, e até mesmo diretores e CTOs em empresas nos EUA e Europa são do Brasil.

Por exemplo, vocês sabiam que o ex-Diretor de Engenharia do SoundCloud em Berlim e atual Diretor de Engenharia na DigitalOcean de Nova Iorque é brasileiro? E sua equipe conta com uma dúzia de engenheiros brasileiros agora. Vocês sabiam que um dos VP de Engenharia de Software da Godaddy é brasileiro? Você tem engenheiros brasileiros trabalhando no Heroku, Gitlab, RedHat, tech startups como Doximity, Pipefy. E isso não é nem contando grandes empresas nacionais como Nubank, o cartão de crédito controlado todo online que todo mundo quer ter e um dos maiores cases mundiais da linguagem Clojure. Elixir, a linguagem que vem conquistando todo muno, inventado e construído por um brasileiro que mora na Polônia. Xamarin, um dos maiores catalisadores da entrada da Microsoft no mundo de código aberto, fundado por um Mexicano, com importantes engenheiros brasileiros também. Crystal a nova linguagem que promete ser parecida com Ruby mas com performance de linguagem compilada, inventada e contruída por um Argentino. E se você é um desenvolvedor Rails, com certeza já usou Devise, a biblioteca mais ubíquita na comunidade Rails, feita e mantida por brasileiros. E a lista não acaba.

Não estou dizendo que ao fazer um evento em inglês teríamos palestrantes "internacionais" fazendo fila pra vir pra cá. Nem é esse o objetivo. Mas eu sei que existem muitas empresas de tecnologia pelo mundo que precisam contratar excelentes engenheiros, onde quer quer estejam.

E assim como muitos latino americanos sequer sabem do nosso nível internacional, essas empresas muito menos sabem disso ou onde procurar. Pouca gente de fato, na prática, sabe que as comunidades da América do Sul competem de igual para igual com engenheiros de qualquer parte do mundo.

Não somos meramente mão de obra barata!
Somos engenheiros muito qualificados, mais acessíveis que um norte-americano, em fuso horário melhor que alguém da Índia ou do Leste Europeu.
Mas até agora estamos fazendo um péssimo trabalho em mostrar nossas qualidades ao mundo.

Um palestrante local que tenha sua palestra gravada no YouTube sequer pode mandar o link para uma empresa americana: ninguém ia entender.

Um evento todo em inglês seria um problema para o público latino americano?

Não! Eu venho experimentando com isso faz alguns anos. Na primeira Rails Summit que fizemos em 2008, 2/3 da platéia precisava de tradução simultânea de inglês para português.

Em 2015 esse número caiu para, talvez, uma dúzia de pessoas em mais de 1.000. E em 2016, pela primeira vez, retiramos totalmente a tradução simultânea. E nenhuma palestra de gringo ficou vazia. Fiquei sabendo que a QConSP também cortou a tradução este ano.

Nos últimos 10 anos o público brasileiro subiu muito de nível. Com tanto material disponível no YouTube!, em cursos online que vão de Code School a Code Academy e tantas fontes de screencast, estamos mais do que acostumados a consumir material em inglês. Aliás, sabiam que um dos melhores instrutores do Code School há anos é brasileiro?

Mais e mais estamos descobrindo que nossas capacidades técnicas, se bem exploradas, não deixam a desejar frente a nenhum programador dos EUA ou Europa, tanto que vemos hoje dezenas de excelentes programadores brasileiros se mudando para os EUA ou Europa, para ocupar posições importantes em diversas empresas que apreciamos como Google, Heroku, Gitlab, Digital Ocean, Medium e tantas outras.

Temos dezenas de ótimos eventos de tecnologia, de diferentes tamanhos, acontecendo em diversos estados pelo país. E eles são ótimos principalmente para introduzir tecnologia em comunidades regionais, estudantes e iniciantes. E para isso realmente acho que eles devem continuar sendo em português, para atrair mais gente.

Mas entre eventos para iniciantes e eventos avançados de nível internacional existe um "gap" que nenhum outro evento está tentando preencher.

Os programadores latino americanos têm nível internacional.

Acho que chegou a hora de criar um evento que finalmente leva isso em conta: um evento de nível internacional que possa ser consumido por um público internacional. Alguém precisa começar.

Se não nós, quem? Se não agora, quando?

O primeiro critério: que ele seja todo em inglês, primariamente com palestrantes locais apresentando em inglês. Onde o material gerado esteja disponível para qualquer pessoa do mundo todo.

Code-Only aNd Fun - No-Frills

Eu acredito que um evento precisa ter foco.

Um evento que tenta fazer de tudo dilui seu conteúdo e acaba não tendo uma identidade. Todo evento parece a mesma coisa, fala a mesma coisa, só muda de nome e endereço.

Em vez disso, eu começaria pensando num evento para programadores, feita por programadores, com foco forte em programação e criação com software. Quero pensar num evento técnico, com propostas de apresentar conteúdo exclusivamente ao redor de código.

Então o segundo critério de um evento deste tipo seria que a proposta do palestrante precisa se ater a código.

A preferência será para códigos, abertos, de sua própria autoria, embora numa primeira vez podemos ver caso a caso, mas nada que seja diferente de desenvolvimento de software. Nada de propaganda para vender serviços. Nada de auto-ajuda. Nada diferente de código.

Exemplo disso seria softwares como Sonic Pi que é um live coding de música expresso numa DSL em Ruby. Ou que tal o Ruby 2600 do nosso camarada Chester que é uma versão de emulador de Atari em Ruby. Ou então esta palestra de interface testing com Diablo 3 do nosso colega Rodrigo Caffo apresentado na Rubyconf 2012 (um ótimo exemplo de um brasileiro palestrando em inglês lá fora). Ou então qualquer um dos mais de 150 repositórios do prolífico Nando Vieira.

E isso são somente os exemplos pequenos. Nosso colega Rafael França acabou de completar 4 anos como Rails Core, e é difícil dizer quanto código dele vocês usam em projetos Rails todos os dias. Ele também apresenta em inglês, como esta palestra de Sprockets na Railsconf 2016.

E saindo do mundo Ruby, existem vários grandes exemplos nos mundos de Python, Javascript, Go, Devops, e vários outros.

Parte dessa idéia veio da palestra de Koichi Sasada, um dos principais líderes do desenvolvimento do MRI Ruby quando ele mencionou sobre "EDD: Event Driven Development" na Rubyconf Brasil 2014:

EDD

Muitos desenvolvedores iniciam código novo open source ou aceleram seus desenvolvimentos perto de eventos justamente para poder mostrá-los. É um tipo de ciclo positivo onde uma coisa leva à outra. E este tipo de evento seria um catalisador interessante para fomentar exatamente esse tipo de ciclo. Não somente palestras "conceituais", "idéias sem execução", "auto-ajuda", mas apresentação de código que leva a mais código, ano após ano.

Então o critério seria código e diversão, ou "Code-Only aNd Fun".

Iniciativa "THE CONF"

Toda primeira vez de um evento começa pequeno, e um evento somente de código também tem que começar com seu Beta.

Meus critérios:

Somente Código! (Code-Only)
Diversão, ou somente o Essencial ("And Fun" ou "No-Frills").
Todo em inglês (independente da nacionalidade do palestrante)!
Principalmente (mas não exclusivamente) palestrantes experientes latino americanos.

O Alvo: sendo realista, inicialmente o alvo maior será o público de programadores brasileiros, mas com sorte consigamos atrair nossos amigos pela América Latina, e também o público internacional nem que nos assistam somente online. Inicialmente talvez seja um tipo de evento que atraia pelo menos 300 pessoas, com talvez uns 20 palestrantes. Seria um excelente tamanho para um primeiro passo.

O Objetivo: criar um palco latino americano para mostrar as nossas capacidades, na prática, ao mundo. Nenhuma palestra patrocinada para vender produtos, nenhuma palestra que não seja voltada à programação na prática.

Se eu não estiver enganado esta seria a ~~primeira~~ (na realidade soube agora que já teve JSConfBR 2013 e 2014 que tentou esse formato!) conferência feita no Brasil, com palestrantes latino americanos, mas com a missão de se tornar uma conferência internacionalmente reconhecida, que atraia mais gente da comunidade Latino Americana e futuramente atraia a atenção de empresas do mundo todo buscando pelos melhores programadores, onde quer que estejam.

Esta é a Iniciativa "THE CONF".

THE CONF INITIATIVE

Se você gosta da idéia e quer participar, comece a seguir @theconfbr no Twitter e a fanpage no Facebook. Este é basicamente o primeiro "rascunho" da idéia e agora é a hora de mandar sugestões. Quanto mais suporte eu ver, mais fácil será decidir ir em frente ou não.

Ainda não tenho uma data, mas se a idéia ganhar tração gostaria de tentar o segundo semestre de 2017. Mostre seu suporte e suas sugestões para me ajudar a decidir se a idéia é boa, e os detalhes da melhor implementação.

↧

THE CONF Initiative

October 21, 2016, 8:50 am

≫ Next: Natural Language Generation in Ruby (with JRuby + SimpleNLG)

≪ Previous: Iniciativa THE CONF

Obs: se você é brasileiro, leia esta versão em Português.

I was the main curator and organizer of all Rubyconf Brasil events and I have been doing Ruby related events and talks in Brasil for the past 10 years. My last Rubyconf Brasil in 2016 had almost 1,200 attendees coming from all over the country.

I also did talks in international events such as Locos x Rails 2009, Railsconf 2010, RubyKaigi 2011, Toster 2012, Bubbleconf 2012, and others. In smaller scale I travelled all over Brasil doing more than 150 talks as well.

My main goal was to foster a healthy Brazilian Ruby community as well as help growing a better class of local software developers.

But I still have an annoying scratching itch that never went away.

When I decided to organize a Ruby-centric event in Brazil, 10 years ago, my main concern was if the subject would be interesting enough to have momentum. I had a personal goal in mind: a Railsconf-comparable conference, reaching for 1,000 attendees, consistently happening for 10 years.

The initial goal was reached a few years ago, since at least 2013 when we reached the magic 1,000 number and 2016 reached my personal number "10" conferences in a row.

But the format of Brazilian tech events still gives me that itch.

To put it in perspective, some big conferences attract people from all over the world. Notable examples are Railsconf, OSCON, Microsoft Build, QCon NY, QCon SF, ApacheCon, LinuxCon, Apple WWDC, Google I/O, etc.

Some of them for obvious reasons. If you want to know the hottest news on Apple, you have to attend WWDC. Many of those example have exclusive content from the organizer's companies such as Facebook or Google.

Other conferences attract people due to their roster of speakers' reputations. And in turn they attract more speakers that want to be in that roster. It's maybe the case of OSCON, or QCon.

Conundrum

This section can become quite confusing so I will try to be as much direct and to point as I can. I will narrow down the narrative to Brazil, but the same argument can be applied to any country in Latin America. Please, bear with me.

First of all, our conferences are all presented in Brazilian Portuguese, exclusively to a Brazilian audience.

To increase their reputation, big events here bring "international" guest speakers - and by "international" I mean "from out of Latin America", such as USA, Europe.

The local audience go to those event sometimes just because of the quality of those guests.

The untold truth is that most of those guests wouldn't come out of their own volition without an invitation and financial incentives. But those same speakers usually compete to speak in USA or European events. I am not criticizing them! This is a natural consequence and I would do the same if I were them.

This is the symptom of how Brazilian events are made. They are very well done and effective for the intended audience: Brazilians. Simultaneously making them not attractive for everybody else.

It's a matter of economics, because it's expensive to come here, the visa process for americans is annoying, and they get better exposition on a more diverse event that understands English.

At the same time, we have several great Brazilian developers applying to speak at american or european events to gain more exposure, exactly for the same reasons: our Brazilian events lack in audience reach because they are mostly made just for Brazilians.

Only Brazilians speak Brazilian Portuguese.

The international audience is not aware of the quality of our Brazilian speakers because we are not doing a good job advertising that!

One first step to remedy this situation is to try something never attempted here: to create a Brazilian conference where the main language of presentation is English. Surprisingly, I believe this doesn't sound like a big deal for anyone, anywhere in the world, but for Brazilians!

This won't increase the ammount of international speakers wanting to come here, and this is not the goal.

The goal is to create an Internationally Recognizable stage for our highly-skilled developers to speak up to a larger audience. This audience would be able to watch them either coming here or online. Because no american will ever try to watch a Brazilian recorded talk that is spoken in Portuguese, but they will have no problem watching a Brazilian speaking in English.

Not only our conferences are not attractive for English-speaking audiences, but they are not attractive even for our fellow neighbors in Argentina, Colombia, Chile, Uruguay, Paraguay, etc.

Brazil is a big Portuguese-speaking island surrounded by Spanish-speaking countries in a globalized English-speaking world.

We can do better.

From Brazil, to the World

The real change is to create an internationally visible stage in Brazilian soil.

And for that we could try to create the very first Latin American Tech Conference entirely in English. I never attended FOSDEM but I heard that attendees there speak Dutch, French, German but most content is presented in English exactly for that diversity. Europeans are certainly better adjusted to multiple languages than Brazilians are. We live in a globalized and diverse world, and like it or not, the technology lingua-franca is English.

Brazil, and Latin America in general, has many talented and hard-working developers competing in the international arena. Many core contributors to big open source projects, many programmers working in renowed companies, and even CTOs to USA and European companies are from Brazil.

For example, did you know that former Director of Engineering of SoundCloud in Berlin and now Director of Engineering at DigitalOcean NY is a Brazilian? Actually his own team there has a dozen of Brazilians. Did you know that Godaddy has a Brazilian as VP of Software Engineering? You have Brazilian engineers working at Heroku, Gitlab, startups such as Doximity, Pipefy. And that's not even counting the big Brazilian tech companies such as Nubank, the flagship all-online credit card and one of the largest successful cases of Clojure in the world. Elixir, the language that is conquering hearts and minds, again invented and built by a Brazilian living in Poland. Xamarin, one of the major catalysts for Microsoft into the OSS world is founded by a Mexican, and it has important Brazilian engineers. Crystal, the brand new language that promises to bring Ruby-like enjoyment to the low-level compiled world was invented and built by an Argentinian. If you're a Rails developer you definitely used Devise in your projects, one of the most ubiquitous library in the Rails community, made and maintained by brazilians. And I can go on and on.

I am not saying that worldwide speakers would flock to come here, that's not even the point. But I know that many tech companies throughout the world are in need to hire great developers, wherever they are.

This is one of the many things foreigners don't know about Latin American software developers: we are literally on par with anyone from anywhere in the world. You just need to know where to look.

We are not low-quality cheap labor.
We are near-shore, top-quality, skillful and educated software engineers. And this is not hubris.
But we are doing a very poor job at advertising that to the world.

Most conferences in Brazil are in Brazilian-Portuguese only, with a couple of foreigner guest speakers (usually requiring some form of translation). They are made exclusively to showcase technology and brands to a beginner-level audience, but they don't do a lot to showcase our potential to that international arena.

What if one event made in Brazil finally raises that bar, presenting our cream of the crop, in English?

Would that alienate the Latin American audience? I don't think so. I have been experimenting in the last couple of years and my event alone, Rubyconf Brazil, didn't have any simultaneous translations. An audience of almost 1,000 attendees listened to English-only speakers without any problems whatsoever.

Heck, our guests that came from Argentina presented in English in a Brazilian conference!

This is the result of years of having hundreds of screencasts and events recorded made available on YouTube! Thousands of hours of online classes over Code School, Code Academy and many others. Hundreds of ebooks from Pragmatic Programmer and other publishers. Brazilians are consuming all that for years, and by the way, did you know that one of the top instructors at Code School is a Brazilian?

There are dozens of great small to big tech conferences all over the continent, introducing technology to many young people wanting to learn more. They are doing a fantastic job already. But there is still a big gap between those events and the worldwide arena of international events, and I think this proposal is one step in the direction to finally close this gap.

It's about damn time we bring more attention to that fact in a conference that welcomes anyone from anywhere in the world.

So the first criteria for such an event is that every speaker presents in English, where all recorded material is available and accessible to anyone in the world.

Code-Only aNd Fun - No-Frills

I believe a conference should have focus.

Any event that tries to squeeze everything in ends up having their identities diluted so much that every tech event becomes similar to every other tech event.

Instead, I'd rather have an event for programmers, made by programmers, with a focus on programming and creativity by means of software. I'd like to have a mostly technical event, with the main goal of showcasing the Latin America capacity.

So the second criteria would be for the speakers to send proposals focused on code. Preferably open source code of his own authorship, but subjects around software development at the very least. No propaganda to advertise services or products. No self-help topics. I want to focus around the craft of software.

One example of interesting code would be something such as Sonic Pi a live coding music synth DSL written in Ruby. How about Ruby 2600, an Atari emulator written in Ruby by Brazilian developer Chester? Or this presentation about interface testing with Diablo 3 again presented by fellow Brazilian developer Rodrigo Caffo at Rubyconf 2012? Or this talk about Sprockets by another fellow Brazilian developer and Rails Core Committer Rafael França, presented at RailsConf 2016? Or this talk about Crystal presented by fellow Argentinian developer Ary Borenszweig in Prague last year?

And out of the Ruby world, there are many great examples in communities as diverse as Python, Javascript, Go, and so on.

Part of this concept came to me while watching Koichi Sasada's keynote at Rubyconf Brazil 2014. Koichi is a Japanese Ruby Core Committer and he mentioned his concept of "EDD: Event Driven Development", meaning that he speeds up prior to speaking at important conferences.

EDD

Many developers begin new open source code or speed up their contributions near to presenting at events. It's a positive cycle. This kind of event I am proposing could be a catalyst to exactly this kind of positive cycle in Latin America. Code that leads to more Code.

So the main criteria would be Code Only aNd Fun!

"THE CONF" Initiative

Every new implementation should start small, and a code-only event should also have it's Beta stage.

My initial criteria:

Code-Only
And Fun, of course, or "No-Frills" if you prefer.
All in English, All inclusive for everybody from anywhere.
Mainly (but not exclusively) focused around great Latin American developers.

The Target: realistically it will be primarily targetting Brazilians (and I hope, some of our Latin American friends) locally, and the International audience of software developers, at least online. I believe it may be an event for around 300 attendees, maybe around 20 speakers. If we can gather at least that, it would be a superb first step.

The Goals: to create a Latin American stage to showcase our skills, in practice, to the world. Less non-Latin American speakers at first, no sponsored talks to sell products. Mostly Code.

If I am not mistaken this would be the very first Latin American conference, with Latin American speakers, with the mission of eventually becoming an Internationally Recognizable conference. One that would primarily attract and bring together communities all over Latin America and that eventually becomes recognized by the worldwide community of developers and enterprises looking to hire our great developers or companies, no matter where they are.

THE CONF INITIATIVE

If you like the idea and want to participate, start following @theconfbr Twitter account and the Facebook page. This is basically the very first "draft" of the idea and this is the time to send suggestions. The more support I see, the easier it will be to decide to go forward or not.

I still don't have a date, but if the idea shows potential and people support it, I'd like to aim for the 3rd quarter of 2017. Show your support, send your suggestions, this will help me make a decision and how to best implement it.

↧

Natural Language Generation in Ruby (with JRuby + SimpleNLG)

October 28, 2016, 3:41 pm

≫ Next: [Discussion] Can we protect our work from DNS providers suffering DDoS attacks?

≪ Previous: THE CONF Initiative

I am building a project which needs to generate proper English sentences. The first version I built used the super naive way of just creating a string template and doing simple sub-string replacements or concatenations.

But you can imagine that it quickly becomes cumbersome when you have to deal with pluralization, inflection, and it starts to become something like this:

1	"There #{@users.size == 1 ? 'is' : 'are'}#{@users.size} user#{'s'unless@users.size == 1}."

Or use Rails I18n support like this:

I18n.backend.store_translations :en, :user_msg => {:one => 'There is 1 user',:other => 'There are %{count} users'}I18n.translate :user_msg, :count => 2# => 'There are 2 users'

For simple transactional phrases (such as flash messages), this is more than enough.

But if you want to generate an entire article in plain English from data structures, then the logic becomes very convoluted very fast.

I looked around and found a few Ruby projects that could help, for example:

"nameable" which can do useful stuff like this:

1234	Nameable::Latin.new('Chris').gender#=> :maleNameable::Latin.new('Janine').female?#=> true

"calyx" which can be used to generate simple phrases like this:

12345678

classGreenBottle< Calyx::Grammar  mapping :pluralize, /(.+)/ => '\\1s'  start 'One green {bottle}.', 'Two green {bottle.pluralize}.'  bottle 'bottle'end# => "One green bottle."# => "Two green bottles."

Nice and dandy, but still useless for the more complex needs I have in mind.

So I decided to dig a bit deeper, into the dark world of NLG, or Natural Language Generation (not to be confused with NLP, which stands for Natural Language Processing, which is the opposite of what I want. NLP gets plain English text and returns a parsed data structure).

For NLP (parsing, tokenization, etc) I'd highly recommend "Stanford CoreNLP". It seems to be one of the most robust and comprehensive out there (come on, it's from Stanford). Again a Java project, and a big download (more than 300MB!). Those linguistics projects are super heavy because they have to download entire dictionaries and lexicon databases.

But focusing on my problem at hand, NLG, there are several options out there. In all honesty, I did not do a very extensive research so if you are aware of which is the most robust and also well maintained and with an easy to use interface, let me know in the comments section below.

My choice was SimpleNLG. From it's GitHub page we can see that it seems to be quite well maintained to this day, it's a simple Java library and it is one of the "simpler" alternatives. KPML is on the opposite spectrum: it seems to be one of the oldest (since the 80's!) and most robust one. But seriously, it feels like you need a ph.D to even get started.

Reading the SimpleNLG Java source code was boring but easy enough. Give yourself one full day of study to get used to the code and you're in business.

The main problem is that it's written in Java and I am not intending to write anything in Java (or any derivative) for now. For a short while I considered the endeavour or rewriting the damn thing in something more portable such as Rust, which I could load anywhere through FFI.

But even though SimpleNLG has "Simple" in it's name it has a few hairy dependencies to load the lexicon database. And the database itself is an HSQLDB dump, which is a Java-written database. And then, there would be the issue of maintaining a fork.

I quickly gave up on that idea and instead I worked around this by wrapping the library under a simple Rails-API endpoint. I had a few issues because I had Git LFS tracking jar files in my system and Heroku doesn't support it and I ended up with a corrupted deployment (beware of those quircks, by the way!)

Finally, I was able to deploy a working JRuby + Rails-API project embedding SimpleNLG at Heroku. You can deploy your own copy by cloning my nlg_service. It works fine with the latest JRuby 9.1.5.0. You should pay for at least a Hobby tier over Heroku. Java takes a ridiculous amount of time to start up and more time to warm up. Heroku's free tier shuts down your dyno if it sits idle and a subsequent web request will definitelly time out or take an absurd amount of time to return.

Once deployed it starts up Rails, then loads this initializer:

require 'java'Java::JavaLang::System.set_property "file.encoding","UTF-8"SIMPLE_NLG_DEFAULT_LEXICON_PATH = Rails.root.join("lib/SimpleNLG/resources/default-lexicon.xml").to_s.freezeSIMPLE_NLG_PATH                 = Rails.root.join("lib/SimpleNLG").to_s.freezeDir["#{SIMPLE_NLG_PATH}/*.jar"].each { |jar| require jar }

And then I map the classes like this:

12345678910111213141516

moduleSimpleNLG%w(    simplenlg.aggregation    simplenlg.features    simplenlg.format.english    simplenlg.framework    simplenlg.lexicon    simplenlg.morphology.english    simplenlg.orthography.english    simplenlg.phrasespec    simplenlg.realiser.english    simplenlg.syntax.english    simplenlg.xmlrealiser    simplenlg.xmlrealiser.wrapper).each { |package| include_package package }end

Finally, I have a simple endpoint mapped to a controller action:

123456789101112131415

classApi::RealisersController< ApplicationControllerdefcreate    reader = java::io::StringReader.new(params[:xml])begin      records = SimpleNLG::XMLRealiser.getRecording(reader)      output = records.getRecord.map do |record|SimpleNLG::XMLRealiser.realise(record&.getDocument)end@realisation = output.join("\n").strip      render plain: @realisationensure      reader.closeendendend

The process of generating the final English text is called "realisation". SimpleNLG has a comprehensive Java API but it also exposes it as a simpler XML format. The full XML Realiser Schema is available as an XSD.

If I want to write this sentence:

"There are some finished and delivered stories that may not have been tested."

This is the XML that I need to assemble:

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748

<?xml version="1.0"?><NLGSpecxmlns="http://simplenlg.googlecode.com/svn/trunk/res/xml"xmlns:xsd="http://www.w3.org/2001/XMLSchema"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><Recording><Record><Documentcat="PARAGRAPH"><childxsi:type="SPhraseSpec"><subjxsi:type="NPPhraseSpec"><headcat="ADVERB"><base>there</base></head></subj><vpxsi:type="VPPhraseSpec"PERSON="THIRD"><headcat="VERB"><base>be</base></head><complxsi:type="NPPhraseSpec"NUMBER="PLURAL"><headcat="NOUN"><base>story</base></head><specxsi:type="WordElement"cat="DETERMINER"><base>a</base></spec><preModxsi:type="CoordinatedPhraseElement"conj="and"><coordxsi:type="VPPhraseSpec"TENSE="PAST"><headcat="VERB"><base>finish</base></head></coord><coordxsi:type="VPPhraseSpec"TENSE="PAST"><headcat="VERB"><base>deliver</base></head></coord></preMod><complxsi:type="SPhraseSpec"MODAL="may"PASSIVE="true"TENSE="PAST"><vpxsi:type="VPPhraseSpec"TENSE="PAST"NEGATED="true"><headcat="VERB"><base>test</base></head></vp></compl></compl></vp></child></Document></Record></Recording></NLGSpec>

Ok, this is preposterous, I know.

Which is why I decided to go ahead and use one of Ruby's most recognized strengths: creating DSLs or Domain Specific Languages.

The result of my initial endeavor to simplify this process is the nlg_xml_realiser_builder ruby gem. Simply add the following to your Gemfile:

1	gem 'nlg_xml_realiser_builder'

And the humongous XML above becomes something more manageable like this:

1234567891011121314151617

dsl = NlgXmlRealiserBuilder::DSL.newdsl.builder(true) do  sp :childdo    subj :np, 'there', cat: 'ADVERB'    verb 'be', PERSON: 'THIRD'do      compl :np, ['a', 'story'], NUMBER: 'PLURAL'do        preMod :cp, conj: 'and'do          coord :vp, 'finish', TENSE: 'PAST'          coord :vp, 'deliver', TENSE: 'PAST'end        compl :sp, MODAL: 'may', PASSIVE: true, TENSE: 'PAST'do          verb 'test', TENSE: 'PAST', NEGATED: trueendendendendend.to_xml

Understanding the intricasies of an NPPhraseSpec vs a VPPhraseSpec or the difference between a WordElement or StringElement are beyond this blog post. But most of the original XSD has been mapped through this constants file.

I have a few acceptance specs that are generating XMLs like the above, posting to my live online NLG Web Service and fetching the resulting English sentences. I will change this process in the future but you can test it our yourself.

The advantages start here. Now let's check out the previous example more closely. Again, it renders this phrase:

"There are some finished and delivered stories that may not have been tested."

Now, it's in plural form because I am talking about 'stories', but what if I want a singular version?

Below is the new version where I just wrap it around a method and make the attribute 'NUMBER' accept both 'PLURAL' or 'SINGULAR':

12345678910111213141516171819

defexample(plural = 'PLURAL')  dsl = NlgXmlRealiserBuilder::DSL.new  dsl.builder(true) do    sp :childdo      subj :np, 'there', cat: 'ADVERB'      verb 'be', PERSON: 'THIRD'do        compl :np, ['a', 'story'], NUMBER: plural  do          preMod :cp, conj: 'and'do            coord :vp, 'finish', TENSE: 'PAST'            coord :vp, 'deliver', TENSE: 'PAST'end          compl :sp, MODAL: 'may', PASSIVE: true, TENSE: 'PAST'do            verb 'test', TENSE: 'PAST', NEGATED: trueendendendendend.to_xmlend

And I can run the singular version like this:

1	puts example('SINGULAR')

This is the resulting phrase:

"There is a finished and delivered story that may not have been tested."

Check out how it changed the verb from "are" to "is" and the noun determiner from "some" to "a" on its own! And of course, this is a contrived example. Now imagine an entire customizable article, full of paragraphs and sentences that I can customize depending on several variable I have.

While I was studying and writing this DSL I got a good enough grasp of the SimpleNLG structure, but if you have more examples for more complex phrase structures, please let me know in the comments section down below.

Most of the specs were copied from the XML Realiser tests from the original Java project to make sure I am covering most cases.

It will be interesting to see if this DSL makes it easier for more people to experiment with NLG. As usual, send your Pull Requests, ideas and suggestions on my GitHub public repositories:

And if you're interested in the subject of NLP and NLG I found this list of Ruby related open source projects as well.

↧

[Discussion] Can we protect our work from DNS providers suffering DDoS attacks?

October 31, 2016, 10:22 am

≫ Next: Ruby on Rails implementation of a (proper) Ranking/Popularity system

≪ Previous: Natural Language Generation in Ruby (with JRuby + SimpleNLG)

A few days ago we witnessed a coordinated DNS amplification attack against Dyn DNS. There are many problems along the way, a lot of blaming, and lot of possible solutions.

Bottomline is that attacks like this will probably happen again, more often. The more poorly managed online devices (IoT) we add to the network, the more open recursive DNS servers around, and the more blackhat hackers learn how easy it is to run attacks like this, the worse it will become.

Now, how does one mitigate this kind of problem?

What I will suggest now is not the be-all and end-all sollution. As a disclaimer, I am neither a whitehat hacker nor fully experienced systems administrator. If you are, please by all means let me know in the comments section below if what I am saying is total busted or if it actually works the way I am describing.

I have a small company where up to a 100 people work everyday, either as software developers or performing administrative back-office work. We need access to a number of services such as GitHub, CodeClimate, Google Apps, Dropbox, and a number of internal services such as GitLab, Mattermost and so on.

When the DynDNS attack happened, we had some problems for a few hours. We didn't went total dark, though.

Some of the internal services we use were not affected because the authoritative DNS was not DynDNS. So our internal chat app, Mattermost, for example, remained responsive and online. The same for GitLab. So we didn't stop all our work duties. But some of our projects in GitHub and deployments over Heroku became compromised.

So this is the mitigation plan I am testing: installing my own DNS recursive/forwarder server. I want to forward all DNS queries to some public recursive DNS such as Google (8.8.8.8) or OpenDNS (208.69.38.170) and cache the results for more time than the TTL that returns from the query.

If you're not aware, every time you perform a DNS lookup, it returns the IP address for a given name (for example, 50.19.85.154, 50.19.85.156, and 50.19.85.132 for heroku.com) and a Time To Live (for example, 28 seconds when I ran dig against heroku.com).

It means that for that window of 28 seconds any other DNS lookup can rely on the IP addresses provided without having to run the queries against the DNS servers again (caching).

After the TTL expires, I should query again to see if the IP addresses did change. TTLs are usually small so the administrators of the services can freely move and decomission servers at least after a certain small window (TTL) and be sure that almost everybody will see new IP addresses when the TTL expires. On the other hand, the small the TTL, the more DNS traffic we all need to keep re-checking addresses all the time.

In practice I belive most services are stable enough to not be changing servers all the time. They do, eventually, for maintenance or even scalability purposes, but I believe it's "rare".

Actually, I believe that if I had my own DNS server, caching DNS results and overriding the TTLs from 60 seconds/15 minutes to a larger extent (let's say, 1 full day) before expiring and requiring a new query, we would have passed through that DDoS episode without noticing it.

In fact, I manually inserted GitHub web address in my local /etc/hosts file and I was able to browse through GitHub in the midst of the DDoS attacks.

For most people, that whole apocalyptic episode felt like "the internet fell", but in reality only the authoritative DynDNS servers went down and every other recursive DNS, obeying the TTLs also failed to get responses.

The cascading effect was that we were just unable to translate domain name queries such as spotify.com or heroku.com into their static IP addresses. But if we had those addresses in local caches, we wouldn't feel it because Spotify's, ZenDesk's, Heroku's servers were all online and fine.

It's different than when AWS's Sydney data center suffered an outage. That's way more rare and I've seen it happen just once every couple of years.

Unbound

I was going to install BIND9 in an AWS EC2 t2.micro machine. But I've read that it's probably overkill if I am not interested in setting up an authoritative server. Whatsmore, I can't set a minimum TTL in BIND if I am not mistaken.

So the other simpler option is Unbound. It's a simple apt-get install unbound away.

Configuration is super easy as well, I just added this:

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263

server:  interface: 0.0.0.0  do-ip4: yes  do-ip6: yes  do-udp: yes  do-tcp: yes  do-daemonize: yes  # access-control: 0.0.0.0/0 allow  access-control: <your public IP>/8 allow  access-control: <your public IP>/8 allow  # Use all threads (roughly the same number of available cores)  num-threads: 4  # 2^{number_of_threads}  msg-cache-slabs: 16  rrset-cache-slabs: 16  infra-cache-slabs: 16  key-cache-slabs: 16  # More cache memory (this is per thread, if I am not mistaken)  rrset-cache-size: 150m  msg-cache-size: 75m  # More outgoing connections  # Depends on number of threads  outgoing-range: 206 # <(1024/threads)-50>  num-queries-per-thread: 128 # <(1024/threads)/2>  # Larger socket buffer  so-rcvbuf: 4m  so-sndbuf: 4m  # Faster UDP with multithreading (only on Linux)  so-reuseport: yes  # cache for at least 1 day  cache-min-ttl: 172800  # cache for at most 1.5 day  cache-max-ttl: 259200  # security  hide-identity: yes  hide-version: yes  harden-short-bufsize: yes  harden-large-queries: yes  harden-glue: yes  harden-dnssec-stripped: yes  harden-below-nxdomain: yes  harden-referral-path: yes  use-caps-for-id: yes  use-syslog: yes  python:    remote-control:            control-enable: no  forward-zone:    name: "."    forward-addr: 8.8.8.8    forward-addr: 8.8.4.4

Replace the <your public IP> for whatever IP you get when you Google for "what's my IP", it's better to whitelist every IP you want to enable access to query this server.

Unbound comes pre-configured for DNSSEC and it seems to work out of the box (or at least this is what this test says).

In my "enterprise" ISP account I have a static IP address and I can set my Airport Extreme and add my new DNS server IP address and all clients connecting there receives the new DNS automatically.

In my inexpensive backup ISP account (your usual DSL or Cable internet service), it doesn't have a static IP so I have to use the ISP's DHCP and NAT and enable "Bridge Mode" on the router. In that case I can't have a secondary DHCP turned on to set up the DNS automatically on the clients, so I have to add it manually in my notebook's network configuraton.

Anyway, one immediate advantage of setting up my own DNS server is that response times are much faster as I chose a data center very close to me, geographically. So instead of the usual 60ms from Google'S DNS I get less than 30ms in average now (not a dramatic improvement, but somewhat noticeable in web browsing).

The disadvantage is that a recursive DNS should obey the protocol and use the authoritative TTL, never overriding. But as I am using it only for whitelisted IPs within my own organization, I belive that with an occasional manual flush, I should be ok most of the time.

This is not a 100% guarantee that we will not suffer anything in case of another DDoS episode like we had, because not all domain names will be cached in this local DNS server. But the ones we use the most probably will be, particularly essential services such as access to GitHub or Heroku or ZenDesk.

With this new DNS "caching" system all my queries will return and be cached with a larger TTL, for example:

1234567891011121314151617181920

$ dig heroku.com; <<>> DiG 9.8.3-P1 <<>> heroku.com;; global options: +cmd;; Got answer:;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 34732;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 0;; QUESTION SECTION:;heroku.com.      IN  A;; ANSWER SECTION:heroku.com.   172800  IN  A 50.19.85.156heroku.com.   172800  IN  A 50.19.85.132heroku.com.   172800  IN  A 50.19.85.154;; Query time: 357 msec;; SERVER: xx.xx.xx.xx#53(xx.xx.xx.xx);; WHEN: Mon Oct 31 15:07:37 2016;; MSG SIZE  rcvd: 76

I'm obfuscating my DNS IP, of course. The important bit is that you can see that my DNS is returning a TTL of 172800 seconds now. My local notebook cache should keep it that long now as well.

And now my organization should (possibly) be protected against another DNS DDoS attack like we had with Dyn DNS. Zerigo already went through that. SimpleDNS already went through that. Who knows which one will suffer next, maybe all of them again.

As I said before, I am not 100% sure this is a good mitigation. We will see if it works on the next Mirai attack. And a final BIG DISCLAIMER:

EVERY DNS recursive server MUST obey the authoritative TTL!! TTLs exist for many important reasons, so if you manage a publicly available DNS server, you MUST NOT override the TTL with a minimum cache time like I did. Make sure you know what you're doing!

Are you an experienced whitehat hacker? Let me know if there's something easier/more secure. The goal is not to fix the internet, just to protect the productivity of my tiny organization.

↧

Ruby on Rails implementation of a (proper) Ranking/Popularity system

October 31, 2016, 12:04 pm

≫ Next: Matches, Rankings, The Social Network, League of Legends, and Ruby?

≪ Previous: [Discussion] Can we protect our work from DNS providers suffering DDoS attacks?

I was reading a blog post published recently titled "Ruby on Rails implementation of a ranking system using PostgreSQL window functions" and to be fair the purpose of the post was to introduce PostgreSQL's "ntile" window function.

But in the process, the author made the same mistake I've seen time and time again.

Let's assume you have a project with resources that you want to list by "popularity". It can be a Reddit-like site where people like or dislike posts or comments. It can be an e-commerce where people like or dislike products.

It can be anything where people like or dislike something.

The biggest error people make is to consider a simple score like this:

1	popularity = positive_votes - negative_votes

There is an old article titled "How Not To Sort By Average Rating" and I quote:

"Why it is wrong: Suppose one item has 600 positive ratings and 400 negative ratings: 60% positive. Suppose item two has 5,500 positive ratings and 4,500 negative ratings: 55% positive. This algorithm puts item two (score = 1000, but only 55% positive) above item one (score = 200, and 60% positive). WRONG."

Then you may think, I know how to fix it:

1	Score = average_rating = positive_votes / total_votes

Again, this is wrong, and again I quote:

"Why it is wrong: Average rating works fine if you always have a ton of ratings, but suppose item 1 has 2 positive ratings and 0 negative ratings. Suppose item 2 has 100 positive ratings and 1 negative rating. This algorithm puts item two (tons of positive ratings) below item one (very few positive ratings). WRONG."

Correct Solution: Lower Bound of Wilson Score Confidence Interval for a Bernoulli

And I quote again:

"Say what: We need to balance the proportion of positive ratings with the uncertainty of a small number of observations. Fortunately, the math for this was worked out in 1927 by Edwin B. Wilson. What we want to ask is: Given the ratings I have, there is a 95% chance that the "real" fraction of positive ratings is at least what? Wilson gives the answer. Considering only positive and negative ratings (i.e. not a 5-star scale), the lower bound on the proportion of positive ratings is given by:"

Lower Bound Bernoulli equation

I recommend you read the original article but let's cut to the chase. If we follow the original post I linked in the beginning, I have a simple blog post Rails app, but instead of a visits_count field I need to add a positive:integer and negative:integer fields and user interface to post votes.

And I will replace the PostWithPopularityQuery class with the following:

123456789101112

classPostWithPopularityQuerydefself.callPost.find_by_sql ['SELECT id, title, body, positive, negative,        ((positive + 1.9208) / (positive + negative) -        1.96 * SQRT((positive * negative) / (positive + negative) + 0.9604) /        (positive + negative)) / (1 + 3.8416 / (positive + negative))        AS ci_lower_bound      FROM posts       WHERE positive + negative > 0      ORDER BY ci_lower_bound DESC']endend

And this is what I expect to see in a simple scaffold index.html.erb:

Post index page

I would even go as far as recommending the ci_lower_bound to be a float field in the table and to have an asynchronous ActiveJob to update it in some larger interval of time (every 5 minutes, for example) and then the PostsController#index action would perform a straight forward SELECT query ordering directly against a real indexed field ci_lower_bound DESC without performing the calculations on every query.

Now THIS is the correct way to implement a simple, naive popularity ranking system that actually works correctly.

And this is not the only way to do it. There are dozens of good discussions of algorithms online. Every service that depends on content popularity have been refining algorithms like this for years. Facebook used to have an algorithm called "EdgeRank" which relied on variables such as Affinity, Weight, Time Decay, and it seems to have evolved so much that it now calculates popularity against more than 100 thousand variables!!

But regardless of the online service, I can assure you that none sorts by simple visits count or simple average votes count. That would downright wrong.

↧

Matches, Rankings, The Social Network, League of Legends, and Ruby?

November 1, 2016, 5:58 am

≫ Next: [Off-Topic] Learning Statistics

≪ Previous: Ruby on Rails implementation of a (proper) Ranking/Popularity system

There is a segment in a series of talks that I have been presenting in Brazil for the last 3 years or so that I never blogged about. Yesterday I just posted about the proper way to rank content by "popularity" so I thought I should revisit the theme.

To begin, I believe by now most people already watched the movie "The Social Network". It's interesting that I heard many people saying how this movie influenced them to begin their own startups. So I was thinking, "what does this movie actually teach?"

The Social Network Casting

Well, from the movie we learn that David Fincher is an fantastic director, that Aaron Sorkin write very sharp dialogue that is very compelling, that Justin Timberlake does a better Sean Parker than the real deal, that Andrew Garfield does an ok Eduardo Saverin, and that Jesse Eisenberg will forever be Mark Zuckerberg.

And that's it, we can't learn anything else from the movie.

Or can we?

Facemesh

One of my favorite scenes from the movie is when Zuckerberg/Eisenberg is pissed by the Erica Albright split up and he starts scrapping women photos from the Harvard websites and organizing them into a troll web site called "Facemesh" where he puts them to compete. People can vote on which photo they prefer and it sorts them out in a ranking of popularity.

When Eduardo Saverin/Andrew Garfield shows up, Zuckerberg asks him:

"Wardo, I need you (...) I need the algorithm used to rank chess players"

And he goes ahead and write the following in the dorm room window:

Through the Looking Glass

Here most people would just think:

"pff, another gibberish formula just to show off that they are little geniuses, but most certainly this formula doesn't even exist"

Except it does. And this is the one scene in the movie that stuck up in my head as I saw it before. To help you out, let's reverse the mirrored image:

The Algorithm

And this is the "algorithm".

As I said in my previous article, most developers would create a Facemesh-like website adding integer fields in the table of contestants with the count of upvotes, downvotes and they would do something silly such as:

1	score = upvotes - downvotes

Or even sillier:

1	score = upvotes / (upvotes + downvotes)

And it doesn't work that way, you will get very wrong rankings.

The Demonstration

To show you how wrong, I created a simple demonstration project called elo_demo which you can git clone and run yourself.

It will create 2,000 random matches against 10 players. This will be the sorted results if we use the wrong methods of subtracting losses from wins and order through that result:

1234567891011

   Name        Games  Wins  Losses Points (wins - losses) 1 Kong          217   117    100     17 2 Samus         211   110    101      9 3 Wario         197   102     95      7 4 Luigi         186    95     91      4 5 Zelda         160    81     79      2 6 Pikachu       209   105    104      1 7 Yoshi         223   112    111      1 8 Mario         203   101    102     -1 9 Fox           208    95    113    -1810 Bowser        186    82    104    -22

Now, let's make the 2nd place Samus win 10 times in a row against the 3rd place Wario, this is the new ranking:

1234567891011

   Name        Games  Wins  Losses Points 1 Samus         221   120    101     19 2 Kong          217   117    100     17 3 Luigi         186    95     91      4 4 Zelda         160    81     79      2 5 Pikachu       209   105    104      1 6 Yoshi         223   112    111      1 7 Mario         203   101    102     -1 8 Wario         207   102    105     -3 9 Fox           208    95    113    -1810 Bowser        186    82    104    -22

Sounds fair, Samus jumps to 1st place and Wario goes down to the 8th place.

Now, what if we make the weaker 10th place Bowser win 10 times against the current 2nd place Kong?

1234567891011

   Name        Games  Wins  Losses Points 1 Samus         221   120    101     19 2 Kong          227   117    110      7 3 Luigi         186    95     91      4 4 Zelda         160    81     79      2 5 Pikachu       209   105    104      1 6 Yoshi         223   112    111      1 7 Mario         203   101    102     -1 8 Wario         207   102    105     -3 9 Bowser        196    92    104    -1210 Fox           208    95    113    -18

This is where you see how wrong this method is. Even though he lost 10 times against the weakest player, Kong still reigns supreme at 2nd place. And poor Bowser, in spite of all his hard work and effort, levels up just 1 meager position from 10th to 9th.

This is very frustrating and if it feels unfair, it's because it is. This kind of calculation is wrong.

From Chess to League of Legends

If you ever played League of Legends you are probably familiar with something called "ELO Boosts".

Elo Boost

This is a way to level up your account for money. I'd strongly recommend against it if you intend to compete professionally as it's against the rules stipulated by Riot.

Anyway, I wonder if you ever wondered why it's called "ELO".

This is for Austro-Hungarian professor Arpad Emmerich Elo. He is best known for his system of rating chess players. Quoting from Wikipedia:

The original chess rating system was developed in 1950 by Kenneth Harkness (...). By 1960, using the data developed through the Harkness Rating System, Elo developed his own formula which had a sound statistical basis and constituted an improvement on the Harkness System. The new rating system was approved and passed at a meeting of the United States Chess Federation in St. Louis in 1960.
In 1970, FIDE, the World Chess Federation, agreed to adopt the Elo Rating System. From then on until the mid-1980s, Elo himself made the rating calculations. At the time, the computational task was relatively easy because fewer than 2000 players were rated by FIDE.

His system has been refined and evolved to make tournment leaderbords actually fair and competitive. One such evolution is in the form of Microsoft's TrueSkill Ranking System used in all Xbox Live games.

That "algorithm" that Eduardo Saverin writes in the window of Harvard's dorm room? It's the ELO Rating System!!

I don't know if Zuckerberg actually implemented the ELO rating system equations. If he did, it was the correct choice. But the whole Eduardo writing the equations in the window probably didn't happen that way as it would be way easier to Google for it :-)

ELO Rating System Demonstration

My pet demonstration project also calculates that exact ELO score. The calculations are done by the elo rubygem.

The idea is to calculate the probability that one player has to win over the other player. So if a strong player plays against a weak player, he is expected to win, and if this is the outcome, he will not score a lot and the losing player will not fall a lot either. But if the unexpected happens and the strong one loses than it's expected for him to fall down a lot and for the "weaker" player to jump up a lot.

That will make the tournments more competitive and make the new players more motivated to play against the strongest and also make the strongest play harder to hold their positions.

From the elo gem documentation, this is how you use it:

1234567891011121314151617181920

kong  = Elo::Player.newbowser = Elo::Player.new(:rating => 1500)game1 = kong.wins_from(bowser)game2 = kong.loses_from(bowser)game3 = kong.plays_draw(bowser)game4 = kong.versus(bowser)game4.winner = bowsergame5 = kong.versus(bowser)game5.loser = bowsergame6 = kong.versus(bowser)game6.drawgame7 = kong.versus(bowser)game7.result = 1# result is in perspective of kong, so kong winsgame8 = kong.versus(bowser, :result => 0) # bowser wins

And this is how you assess the results:

kong.rating       # => 1080kong.pro?         # => falsekong.starter?     # => truekong.games_played # => 8kong.games        # => [ game1, game2, ... game8 ]

The gem has more tuning besides that original algorithm, such as the K-factor to reward new players. Those kinds of tunings are what makes matches more competitive today and how you evolve it to TrueSkill levels, but it's beside the point of this article.

Let's see the wrong ranking again:

1234567891011

   Name        Games  Wins  Losses Points (wins - losses) 1 Kong          217   117    100     17 2 Samus         211   110    101      9 3 Wario         197   102     95      7 4 Luigi         186    95     91      4 5 Zelda         160    81     79      2 6 Pikachu       209   105    104      1 7 Yoshi         223   112    111      1 8 Mario         203   101    102     -1 9 Fox           208    95    113    -1810 Bowser        186    82    104    -22

Now let's see how the correct ranking is by calculating the Elo score using the exact same 2,000 matches:

1234567891011

   Name        Games  Wins  Losses Points  Elo Rating 1 Pikachu       209   105    104      1         851 2 Zelda         160    81     79      2         847 3 Samus         211   110    101      9         842 4 Luigi         186    95     91      4         841 5 Wario         197   102     95      7         824 6 Mario         203   101    102     -1         820 7 Yoshi         223   112    111      1         803 8 Kong          217   117    100     17         802 9 Bowser        186    82    104    -22         78510 Fox           208    95    113    -18         754

See how different it is? In the wrong ranking, Kong is considered the strongest, but in the Elo ranking he is just 8th place. And reason is that even though he is the one that won most matches (217) he also lost a heck of a lot (117). Someone with less wins such as Zelda in 2nd place (160 wins) lost a heck of a lot less (81), which is why she is higher in the ranking.

Now, if we make her win 10 matches in a row against 3rd place Samus, this is the new ranking:

1234567891011

   Name        Games  Wins  Loses  Points  Elo Rating 1 Zelda         170    91     79     12         904 2 Pikachu       209   105    104      1         851 3 Luigi         186    95     91      4         841 4 Wario         197   102     95      7         824 5 Mario         203   101    102     -1         820 6 Yoshi         223   112    111      1         803 7 Kong          217   117    100     17         802 8 Bowser        186    82    104    -22         785 9 Samus         221   110    111     -1         77510 Fox           208    95    113    -18         754

Again, Zelda jumps up from 2nd to 1st place and Samus fall down from 3rd to 9th. So far so good. But what about the scenario where we make strong 2nd place Pikachu against a much weaker 10th place Fox McCloud?

1234567891011

   Name        Games  Wins  Loses  Points  Elo Rating 1 Zelda         170    91     79     12         904 2 Luigi         186    95     91      4         841 3 Fox           218   105    113     -8         829 4 Wario         197   102     95      7         824 5 Mario         203   101    102     -1         820 6 Yoshi         223   112    111      1         803 7 Kong          217   117    100     17         802 8 Bowser        186    82    104    -22         785 9 Samus         221   110    111     -1         77510 Pikachu       219   105    114     -9         766

Now, this is fairness: Pikachu should have won, but losing 10 times in a row against someone considered much weaker makes him fall down from 2nd place all the way to the last place. And noobie Fox, having won 10 times against a much stronger opponent deserves jumping up all the way to 3rd place.

This is the kind of dynamic that can make matches and games competitive, which is exactly why every online leaderboard and professional tournment use those kinds of algorithms.

And it all began in chess, using math that is known since the late 40's!!

This is the point of this post and my previous one: the math is not new.

Developers waste a great amount of time in stupid pissing contests over which language or tool is "shinier", but they ignore the proper math and deliver wrong results. But the math has been around for decades, in some cases, more than a full century already!

We should strive to earn the title of Computer Scientists. There is a lot of science lacking from computing nowadays. Pissing contests do not make a programmer any good.

↧

[Off-Topic] Learning Statistics

November 1, 2016, 5:40 pm

≫ Next: 3 Months of GitLab. The Hybrid-SaaS Era

≪ Previous: Matches, Rankings, The Social Network, League of Legends, and Ruby?

I always talk bits and pieces about statistics and as a heavy skeptic myself, I believe I can think beyond the "common sense" that most people rely on. I'm very familiar with biases, skewed data, wrong questions leading to wrong answers.

I can probably pinpoint why a certain argument is wrong. But I am very incompetent the other way around: given a proper data collection, how to do proper exploratory data analysis? What are the correct methodologies for any given scenario? And of course, all the math involved.

For example, I just posted 2 articles to try to shed some light on exactly that: most developers "common sense" on dealing with data is to do primitive aggregations such sums, averages. Start talking about "standard deviation" and you lost half of the developer’s population. Start talking about binomial or poisson distributions and you lost the remaining half. Now get into linear regression, bayesian statistics and almost everybody left the room already.

We live in 21st century. At every 60 seconds and Facebook receives an extra 3.3 million new posts; YouTube receives 400 hours of videos; Instagram receives 55,555 uploaded photos; WhatsApp exchanges 44.4 million messages; even e-mail, more than 206 millions of them are being sent. By the time you finish reading this post, you can multiply that amount by 5 to 10!

“We are drowning in information,
but we are starved for knowledge”
– Various authors, original probably John Naisbitt

I will spend a few weeks diving into R. Yes, many people will talk about Julia, but we can't deny the amazing body of knowledge, the experience, and the robust and extensive set of packages available for R, including learning material. There's a great tool RStudio which just saw it's 1.0 release.

And within the many materials I gathered, one stood out just for its introduction (I am still to review the book as whole). It's called "Learning Statistics with R", by Daniel Navarro from the University of Adelaide, Australia. I find it interesting because it's a psychology professor teaching proper statistics through the basics of R, which is exactly what I wanted. You can purchase a hardcopy or download the free PDF.

I like the introduction so much that I wanted to share a few paragraphs to motivate you to join me in learning better statistics. So let's dive right into it:

The cautionary tale of Simpson’s paradox

The following is a true story. In 1973, the University of California, Berkeley got into some trouble over its admissions of students into postgraduate courses. Specifically, the thing that caused the problem was that the gender breakdown of their admissions looked like this ...

123	Number of applicants Percent admittedMales 8442 44%Females 4321 35%

... and they got sued. Given that there were nearly 13,000 applicants, a difference of 9% in admission rates between males and females is just way too big to be a coincidence. Pretty compelling data, right? And if I were to say to you that these data actually reflect a weak bias in favor of females, you’d probably think that I was either crazy or sexist.

Oddly, it’s actually sort of true ... after Berkeley got sued, people started looking very carefully at the admissions data (Bickel, Hammel, & O’Connell, 1975). And remarkably, when they looked at it on a department by department basis, it turned out that most of the departments actually had a slightly higher success rate for female applicants than for male applicants. The table below shows the admission figures for the six largest departments (with the names of the departments removed for privacy reasons):

12345678

                      Males                          FemalesDepartment   Applicants   Percent admitted   Applicants   Percent admittedA                825             62%            108              82%B                560             63%             25              68%C                325             37%            593              34%D                417             33%            375              35%E                191             28%            393              24%F                272              6%            341               7%

Remarkably, most departments had a higher rate of admissions for females than for males! Yet the overall rate of admission across the university for females was lower than for males. How can this be? How can both of these statements be true at the same time?

Here’s what’s going on. Firstly, notice that the departments are not equal to one another in terms of their admission percentages: some departments (e.g., engineering, chemistry) tended to admit a high percentage of the qualified applicants, whereas others (e.g., English) tended to reject most of the candidates, even if they were high quality.

So, among the six departments shown above, notice that department A is the most generous, followed by B, C, D, E and F in that order. Next, notice that males and females tended to apply to different departments. If we rank the departments in terms of the total number of male applicants, we get A>B> D > C > F > E (the “easy” departments are in bold).

On the whole, males tended to apply to the departments that had high admission rates. Now compare this to how the female applicants distributed themselves. Ranking the departments in terms of the total number of female applicants produces a quite different ordering C > E > D > F >A >B.

In other words, what these data seem to be suggesting is that the female applicants tended to apply to “harder” departments.

And in fact, if we look at all Figure 1.1 we see that this trend is systematic, and quite striking. This effect is known as Simpson’s paradox. It’s not common, but it does happen in real life, and most people are very surprised by it when they first encounter it, and many people refuse to even believe that it’s real. It is very real. And while there are lots of very subtle statistical lessons buried in there, I want to use it to make a much more important point . . . doing research is hard, and there are lots of subtle, counterintuitive traps lying in wait for the unwary. That’s reason #2 why scientists love statistics, and why we teach research methods. Because science is hard, and the truth is sometimes cunningly hidden in the nooks and crannies of complicated data.

Before leaving this topic entirely, I want to point out something else really critical that is often overlooked in a research methods class. Statistics only solves part of the problem. Remember that we started all this with the concern that Berkeley’s admissions processes might be unfairly biased against female applicants. When we looked at the “aggregated” data, it did seem like the university was discriminating against women, but when we “disaggregate” and looked at the individual behavior of all the departments, it turned out that the actual departments were, if anything, slightly biased in favor of women.

The gender bias in total admissions was caused by the fact that women tended to self-select for harder departments. From a purely legal perspective, that puts the university in the clear. Postgraduate admissions are determined at the level of the individual department (and there are very good reasons to do that), and at the level of individual departments, the decisions are more or less unbiased (the weak bias in favor of females at that level is small, and not consistent across departments). Since the university can’t dictate which departments people choose to apply to, and the decision making takes place at the level of the department it can hardly be held accountable for any biases that those choices produce.

That was the basis for my somewhat glib remarks earlier, but that’s not exactly the whole story, is it? After all, if we’re interested in this from a more sociological and psychological perspective, we might want to ask why there are such strong gender differences in applications. Why do males tend to apply to engineering more often than females, and why is this reversed for the English department? And why is it the case that the departments that tend to have a female-application bias tend to have lower overall admission rates than those departments that have a male-application bias? Might this not still reflect a gender bias, even though every single department is itself unbiased? It might.

Suppose, hypothetically, that males preferred to apply to “hard sciences” and females prefer “humanities”. And suppose further that the reason for why the humanities departments have low admission rates is because the government doesn’t want to fund the humanities (Ph.D. places, for instance, are often tied to government funded research projects). Does that constitute a gender bias? Or just an unenlightened view of the value of the humanities? What if someone at a high level in the government cut the humanities funds because they felt that the humanities are “useless chick stuff”. That seems pretty blatantly gender biased. None of this falls within the purview of statistics, but it matters to the research project.

If you’re interested in the overall structural effects of subtle gender biases, then you probably want to look at both the aggregated and disaggregated data. If you’re interested in the decision making process at Berkeley itself then you’re probably only interested in the disaggregated data.

In short there are a lot of critical questions that you can’t answer with statistics, but the answers to those questions will have a huge impact on how you analyze and interpret data. And this is the reason why you should always think of statistics as a tool to help you learn about your data, no more and no less. It’s a powerful tool to that end, but there’s no substitute for careful thought.

Download the Book

Did you get intrigued? So download the book and let's study some statistics beyond the mere basics. I am not saying this is the best book, just one that seems interesting, and if you happen to know a good book teaching statistics for novices through the use of R without any prior knowledge, let me know in the comments section below.

↧

3 Months of GitLab. The Hybrid-SaaS Era

November 16, 2016, 11:17 am

≫ Next: Coherence and ExAdmin - Devise and ActiveAdmin for Phoenix

≪ Previous: [Off-Topic] Learning Statistics

If you didn't already, maybe it's a good idea to read my previous post on Moving to GitLab! Yes, it's worth it!.

My impressions stem from very specific circumstances, so this is not to be read as a general recommendation for every situation. For example, I didn't review GitLab's hosted options, which may be more compatible with circumstances different than mine.

In my case, the move to GitLab was part of an internal company strategy to have more control over our own data. In that strategy we also moved from using Slack to Mattermost, and from using Pivotal Tracker to my own open source alternative called Central, among other things.

Before the migration, I had dozens of active projects over GitHub and lots of archived ones at Bitbucket. I'm the kind that hates to lose data, so I keep redundant backups and try to never erase anything.

I moved almost everything over to my own GitLab server and this is almost 200 repositories, spread in 4 groups, with almost 80 active users. This accounts for more than 13,500 notes and over 5,200 Merge Requests already, and over 2,500 builds in the CI, and the internal Sidekiq controlling everything has over 51,000 jobs processed.

In the last 3 months since the migration, I am paying a Digital Ocean bill of around $140 every month for the exact infrastructure I described in the previous article, with a separated box for the GitLab Core, the CI Runner, the CI Docker Registry Mirror, the CI Cache server and disposable machines for parallel builds, controlled by the Runner.

Although the reduction in cost is welcome, the whole endeavor was not done because of it. I could easily pay double or triple just for the convenience, but the underlying strategy is more important in this case.

My company has over 60 developers, working from our own offices in 6 different cities in Brazil (no freelancers and no home-office), communicating daily over Mattermost, Code Reviewing through GitLab's Merge Requests, having immediate automated tests feedback from GitLab CI, and project management through our own Central application.

The only piece that is currently missing is a good static code analysis tool. We heavily rely on Code Climate, which I strongly believe is one of the best out there from our own experience.

I was about to start an endeavor to build a simplified OSS version to integrate into GitLab CI's, but Bryan Helmkamp recently announced that they would release their own "Community Edition" version of Code Climate in the next few months.

This is exactly the missing piece in my stack, and it will make my strategy move faster for next year.

Hybrid-SaaS Era

In the past 5 to 10 years there was a fast move towards 3rd party "micro-services" structure. And this is really great as it allowed many small companies or even independent developers to tap into technology that made them not only move faster, but with increasingly more quality in their delivery.

It's now super easy to have top-notch upload and storage service through tools like Cloudinary.

Also very easy to have top-notch relational database, with replication and scalability, such as Heroku Postgres or AWS RDS.

Project management as a service with Pivotal Tracker, Trello.

Communication as a service with Slack, Hangout, etc.

Knowledge Management as a service with GitHub, Bitbucket, etc.

So now many tech companies are a mashup of several different 3rd party services.

My small beef is that our entire knowledge, experience, portfolio is spread across a dozen or more opaque services, completely out of our control and depending on each company's (or their investors) whim.

Don't get me wrong, I am not against using those services, on the contrary. I use several of them and I will continue to use many of them for the foreseable future. Some endeavors would not even be possible without the efficiency of this sample of a technology-based free market.

But every now and then you reach a tipping point where it becomes important to have more control over your own data, your own identity. Not only for pure ownership but also to be able to use this data with better intent.

And I believe that at that point we should have an option that doesn't cost a couple of limbs to replicate.

Before an option such as GitLab, we were either limited to choose between walled-gardens such as GitHub or Bitbucket or you could invest a ton of resources trying to use small open sourced components to try to build your own. With GitLab we now have the option to make a smooth transition to its hosted option, so we don't incur in having to deal with infrastructure maintenance or we can choose to have full control. And the users will not suffer in the process.

Which is why I believe that the best SaaS to endure the next decade will start to go into Hybrid-mode: having a commercial, usually "cheaper", hosted option, and a DIY (do-it-yourself) OSS version.

GitLab is like that. Mattermost is like that. Now Code Climate is getting like that.

In that option, companies like GitLab have a win-win situation. Many more people and companies can contribute towards making a mature and robust platform that everybody enjoys using, and at the same time, each participant can extend it to their own particular plans. GitLab can have a sustainable business model serving the long tail of companies and developers that want a reasonably affordable hosted service while the top 20% can pursue more specific endeavors using the same technology.

That way we remove the discussion of confidentiality, the way companies treat our data, and we move forward into more productive actions such as building a tool that benefits both the external public and each others internal plans.

Walled gardens are here to stay. Releasing integration - whilst opache - public APIs is not enough. It's very exciting to see those new options arising to fill this gap. The OSS environment along with a sustainable business model makes sense. And I hope to continue to see more and more competitors to closed services following the Hybrid model in the near future.

Why go the trouble to install, maintain and tweak open sourced alternatives to well-established hosted (albeit opaque) services? I believe more than a discussion over convenience, maintenance and cutting costs. We should start small, less risky, but as we grow we should be able to take control back. But we usually can’t get out of walled gardens without a significant - and sometimes impossible - investment in reinventing the wheel. We always had the small open sourced components that those services are built on, but to go from those small components to a full-featured system, it’s an unrealistic path.

With this possible trend of post-SaaS, or Hybrid-SaaS as I call it, we may have just gained the missing link to go from just convenience to full control without the inherent costs and risks chasm.

↧

Coherence and ExAdmin - Devise and ActiveAdmin for Phoenix

December 6, 2016, 12:31 pm

≫ Next: Elixir Phoenix App deployed into a Load Balanced DigitalOcean setup

≪ Previous: 3 Months of GitLab. The Hybrid-SaaS Era

This is intended for Rubyists researching the possibility of replacing some of Ruby and Rails with Elixir and Phoenix.

Cutting straight to the chase, many small Rails apps that I build start with 2 very simple add-ons: Devise for authentication and ActiveAdmin for basic database management. Then I build up from there.

Both Elixir and Phoenix are fast moving targets right now, which makes it difficult for a stable set of libraries to solidify properly, but I think that we're finally getting past the early adopters curve already.

One big point of contention has been user authentication. Many purists will argue that you need to build your own from scratch or by using low level libraries such as Guardian.

If you're building an application that just exposes API endpoints, that's probably fine. But for a full-featured web app meant for humans to use, this is hardly a good choice. I will not entertain the discussion today, as it's beyond the point.

I am assuming that you at least followed both Elixir and Phoenix tutorials by now. If you didn't go ahead and do it, it will take you one or two days to learn the very basics if you're already an experienced rubyist. Then come back and read my posts on Elixir to understand where it stands out compared to everything else.

That being said, let's get started.

Coherence (Devise alternative)

Finally, I found this project that's been under heavy development for the past 6 months called Coherence. For all intents and purposes, it successfully mimics Devise in almost every way. And this is a very good thing for a lot of scenarios.

Their README is well explained enough so I will not copy and paste it here, just read it there to get up and running. But if you want to try out all of their features you can tweak their procedure with this set of options in the installation Mix task:

1	mix coherence.install --full --rememberable --invitable --trackable

Run mix help coherence.install to see a description for all the options.

And if you're not tweaking the front-end, you can just add the proper sign up, sign in, sign out links by adding the following snippet to the web/templates/layout/app.html.eex:

123456789101112131415

<headerclass="header"><navrole="navigation"><ulclass="nav nav-pills pull-right"><%= if Coherence.current_user(@conn) do %><%= if @conn.assigns[:remembered] do %><listyle="color: red;">!!</li><% end %><% end %><%= YourApp.Coherence.ViewHelpers.coherence_links(@conn, :layout) %><li><ahref="http://www.phoenixframework.org/docs">Get Started</a></li></ul></nav><spanclass="logo"></span></header>...

(By the way, whenever you see YourApp in the code snippets, you must change for your app's module name.)

If you get lost in their documentation you can check out their Coherence Demo repository for an example of a basic Phoenix app with Coherence already configured and working. You will mostly have to take care of web/router.ex to create a :protected pipeline and set the scopes accordingly.

If you do it correctly, this is what you will see:

Coherence Navigation Links

Coherence Sign In Form

It's been a long while since I got excited by a simple sign in page!

Ex Admin (ActiveAdmin alternative)

Then, the next step I usually like to do is to add a simple Administration interface. To that end I found the Ex Admin, that's been under heavy development since at least May of 2015. It's so damn close to ActiveAdmin that it's old theme will make you forget you're not in a Rails application.

Again, it's pretty straightforward to set it up by just following their README instructions.

Once you have it installed and configure, you can very quickly expose the User model into the Admin interface like this:

1	mix admin.gen.resource User

And we can edit the web/admin/user.ex with the following:

123456789101112131415161718192021222324252627282930313233343536373839404142

defmodule YourApp.ExAdmin.Userdo  use ExAdmin.Register  register_resource YourApp.Userdo    index do      selectable_column      column :id      column :name      column :email      column :last_sign_in_at      column :last_sign_in_ip      column :sign_in_countend    show _user do      attributes_table do        row :id        row :name        row :email        row :reset_password_token        row :reset_password_sent_at        row :locked_at        row :unlock_token        row :sign_in_count        row :current_sign_in_at        row :last_sign_in_at        row :current_sign_in_ip        row :last_sign_in_ipendend    form user do      inputs do        input user, :name        input user, :email        input user, :password, type: :password        input user, :password_confirmation, type: :passwordendendendend

Yes, this is eerily similar to the ActiveAdmin DSL. Thumbs up for the team responsible, and it really shows how Elixir is well suited for Domain Specific Languages, if you're into that.

If you followed the Coherence instructions, it asks you to add a :protected pipeline (a set of plugs) for your protected routes. For now you can add the /admin route to go through that pipeline. And for the uninitiated, a "plug" is similar in concept to a Rack app, or more specifically, a Rails middleware. But in Rails we only have one pipeline of middlewares. In Phoenix we can configure multiple pipelines for different set of routes (browser and api, for example).

So we can add the following to web/router.ex:

...scope "/admin", ExAdmindo  pipe_through :protected  admin_routesend...

With those simple contraptions in place, you will end up with something like this:

Ex Admin

And if you're still not conviced, how about changing to their old theme?

ActiveAdmin knockoff

Hell yeah! Makes me feel right at home, although I really prefer the new theme. But you could replace your ActiveAdmin-based app for this one and your users would hardly notice the small differences in the interface. The behavior is basically the same.

If you still have questions on how to properly configure ExAdmin, check out their Contact Demo project, where you can find a real example.

Stitching a simple Admin role

Obviously, we don't want to let all authenticated user to access the Admin section.

So we can add a simple boolean field in the users table to indicate whether a user is an admin or not. You can change your migration to resemble this:

123456789101112

...defchangedo  create table(:users, primary_key: false) do    add :name, :string    add :email, :string    ...    add :admin, :boolean, default: false    ...endend...

And you can configure the priv/repos/seeds.exs file to create 2 users, one admin and one guest:

YourApp.Repo.delete_all YourApp.UserYourApp.User.changeset(%YourApp.User{}, %{name: "Administrator", email: "admin@example.org", password: "password", password_confirmation: "password", admin: true})|> YourApp.Repo.insert!YourApp.User.changeset(%YourApp.User{}, %{name: "Guest", email: "guest@example.org", password: "password", password_confirmation: "password", admin: false})|> YourApp.Repo.insert!

As this is just an exercise, you can drop the database and recreate it, like this: mix do ecto.drop, ecto.setup.

Coherence takes care of authentication, but we need to take care of authorization. You will find many examples online to something that resembles Rails' Pundit, such as Bodyguard. But for this post I will stick to a simple Plug and create a new Router pipeline.

We need to create lib/your_app/plugs/authorized.ex and add the following:

1234567891011121314151617181920212223242526272829

defmodule YourApp.Plugs.Authorizeddo@behaviourPlug  import Plug.Conn  import Phoenix.Controllerdefinit(default), do: defaultdefcall(%{assigns: %{current_user: current_user}} = conn, _) doif current_user.admin do      connelse      conn        |> flash_and_redirectendenddefcall(conn, _) do    conn      |> flash_and_redirectend  defp flash_and_redirect(conn) do    conn      |> put_flash(:error, "You do not have the proper authorization to do that")      |> redirect(to: "/")      |> haltendend

Once a user signs in, Coherence puts the structure of the authenticated user into the conn (a Plug.Conn structure), so we can pattern match from it.

Now we need to create the router pipeline in the web/router.ex like this:

123456789101112131415161718192021

...pipeline :protected_admindo  plug :accepts, ["html"]  plug :fetch_session  plug :fetch_flash  plug :protect_from_forgery  plug :put_secure_browser_headers  plug Coherence.Authentication.Session, protected: true  plug YourApp.Plugs.Authorizedend...scope "/"do  pipe_through :protected_admin  coherence_routes :protected_adminend...scope "/admin", ExAdmindo  pipe_through :protected_admin  admin_routesend...

The :protected_admin pipeline is exactly the same as :protected but we add the newly created YourApp.Plugs.Authorized plug at the end. And then we change the /admin scope to go through this new pipeline.

And that's it. If you log in with the guest@example.org user, it will be kicked out to the homepage with a message saying that it's not authorized. If you log in with the admin@example.org it will be able to access the ExAdmin interface in /admin.

Wrapping Up

Even though it's now super simple to add Authentication, Administration and basic Authorization, don't be fooled, the learning curve is still steep, even if you've been a Rails developer for a while.

Because of what's underneath, the OTP architecture, the concepts of Applications, Supervisors, Workers, etc, it's not immediatelly simple to wrap your head around what's really going on. If you're not careful, libraries such as Coherence or ExAdmin will make you feel like it's just as simple as Rails.

And it's not like that. Elixir is a completely different beast. And I mean it in a bad way, on the contrary. It's meant for highly reliable and distributed systems and it demands way more knowledge, patience and training from the programmer.

On the other hand, exactly because libraries such as Coherence makes it a lot easier to get started, you may become more motivated to put something up and running and then investing more time really understanding what's going on underneath. So the recommendation is: get your hands dirty, get some quick instant gratification of seeing something running, and then go on and refine your knowledge. It will be way more rewarding if you do so.

I don't see Phoenix meant to just be a Rails replacement. This would be too easy. I see it more as another piece to make Elixir the best suited set of technologies to build highly scalable, highly reliable, highly distributed systems. Stopping at simple web applications would not fulfill Elixir's potential.

↧

Elixir Phoenix App deployed into a Load Balanced DigitalOcean setup

December 25, 2016, 4:16 am

≫ Next: Ex Pusher Lite - Part 3 - A Complete Solution

≪ Previous: Coherence and ExAdmin - Devise and ActiveAdmin for Phoenix

One of the main advantages of building a Websockets enabled web application using Phoenix is how "easy" it is for Erlang to connect itself into a cluster.

For starters, Erlang does not need multiple processes like Ruby (which is limited to one connection per process, or per thread if you're using a threaded-server like Puma). One single Erlang process will take over the entire machine, if you need to. Internally it will keep one real thread per machine-core. And each thread will have its own Scheduler to manage as many micro-processes as you need. You can read all about it in my post titled "Yocto Services".

Moreover, Erlang has built-in capabilities to form a cluster, where each Erlang instance acts as a peer-to-peer Node, without the need for a centralized coordinator. You can read all about it in my post about Nodes. The power of Erlang is in how "easy" it is to form reliable distributed systems.

You can fire up many Phoenix instances and from one of the instances, it can broadcast messages to Users subscribed in Channels even if their sockets are connected to different instances. It's seamless and you don't need to do anything special in your code. Phoenix, Elixir and Erlang are doing all the heavy lifting for you behind the scenes.

No Heroku for You :-(

Because you want to take advantage of this scalability and high availability feature for distributed systems (in the small example of a real-time chat system) you will need to have more control over your infrastructure. This requirement rules out the majority of Platform as a Service (PaaS) offerings out there, such as Heroku. Heroku's model revolves around single, volatile processes in isolated containers. Those jailed processes (dynos) are not aware of other processes or the internal networking, so you can't fire up Dynos and have them form a cluster because they won't be able to find each other.

If you already know how to configure Linux related stuff: Postgresql, HAproxy, etc, go ahead directly to the Phoenix-specific section.

IaaS (DigitalOcean) to the rescue!

You want long lived processes in network reachable servers (either through private networking, VPN, or plain simple - insecure! - public networks).

In this example I want to walk you through a very simple deployment using DigitalOcean (you can choose any IaaS, such as AWS, Google Cloud, Azure or whatever you feel more comfortable with).

I have created 4 droplets (all using the smallest size of 512Mb of RAM):

1 Postgresql database (single point of failure: it's not the focus of this article to build a highly available, replicated database setup);
1 HAProxy server (single point of failure: again, it's not the focus to create a highly available load balancing scheme);
2 Phoenix servers - one in the NYC datacenter and another in the London datacenter, to demonstrate how easy it is for Erlang to form clusters even with geographically separated boxes.

Basic Ubuntu 16.04 configuration

Goals: configure locale, assure unattended updates are up, upgrade packages, install and configure Elixir and Node.
You should also do: change SSH to another port and install [fail2ban](https://www.digitalocean.com/community/tutorials/how-to-protect-ssh-with-fail2ban-on-ubuntu-14-04, disallow login through password.

You will want to read my post about configuring Ubuntu 16.04. To summarize, start by configuring proper UTF-8:

sudo locale-gen "en_US.UTF-8"sudo dpkg-reconfigure localessudo update-locale LC_ALL=en_US.UTF-8 LANG=en_US.UTF-8echo 'LC_ALL=en_US.UTF-8' | sudo tee -a /etc/environmentecho 'LANG=en_US.UTF-8' | sudo tee -a /etc/environment

Make sure you add a proper user into the sudo group and from now on do not use the root user. I will create a user named pusher and I will explain in another post why. You should create a username that suits your application.

12	adduser pusherusermod -aG sudo pusher

Now log out and log in again through this user. ssh pusher@server-ip-address. If you're on a Mac copy the public key of your SSH certificate like this:

1	ssh-copy-id -i ~/.ssh/id_ed25519.pub pusher@server-ip-address

It creates the .ssh/authorized_keys if it doesn't exist, sets the correct permission bits and appends your public key. You can do it manually, of course.

DigitalOcean's droplets start without a swap file and I'd recomend adding one, specially if you want to start with the smaller boxes with less than 1GB of RAM:

12345678910

sudo fallocate -l 2G /swapfilesudo chmod 600 /swapfilesudo mkswap /swapfilesudo swapon /swapfilesudo cp /etc/fstab /etc/fstab.bakecho '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstabsudo sysctl vm.swappiness=10sudo sysctl vm.vfs_cache_pressure=50echo 'vm.swappiness=10' | sudo tee -a /etc/sysctl.confecho 'vm.vfs_cache_pressure=50' | sudo tee -a /etc/sysctl.conf

Make sure you have unattended upgrades configured. You will want at least to have security updates automatically installed when available.

1	sudo apt install unattended-upgrades

Now, let's install Elixir and Node (Phoenix needs Node.js):

12345678

wget https://packages.erlang-solutions.com/erlang-solutions_1.0_all.deb && sudo dpkg -i erlang-solutions_1.0_all.debcurl -sL https://deb.nodesource.com/setup_6.x | sudo -E bash -sudo apt-get updatesudo apt-get upgradesudo apt-get install build-essential nodejs esl-erlang elixir erlang-eunit erlang-base-hipesudo npm install -g brunchmix local.hexmix archive.install https://github.com/phoenixframework/archives/raw/master/phoenix_new.ez # this is optional, install if you want to manually test phoenix in your box

Now you have an Elixir capable machine ready. Create a image snapshot over DigitalOcean, move it regions you want to create your other droplets and use this image to create as many droplets as you need.

For this example, I created a second droplet in the London region, a third droplet for postgresql in the NYC1 region and a fourth droplet in the NYC3 region for HAProxy.

I will refer to their public IP addresses as "nyc-ip-address", "lon-ip-address", "pg-ip-address", and "ha-ip-address".

Basic PostgreSQL configuration

Goal: basic configuration of Postgresql to allow the Phoenix servers to connect.
To do: create a secondary role just to connect to the application database and another superuser role to create the database and migrate the schema. Also lock down the machine and configure SSH tunnels or another secure way, at least a private networking, than allowing plain 5432 TCP port connections from the public Internet.

Now you can connect to ssh pusher@pg-ip-address and follow this:

1	sudo apt-get install postgresql postgresql-contrib

You should create a new role with the same name of the user you added above ("pusher" in our example):

$ sudo -u postgres createuser --interactiveEnter name of role to add: pusherShall the new role be a superuser? (y/n) y$ sudo -u postgres createdb pusher

Postgresql expects to find a database with the same name as the role and the role should have the same name as the Linux user. Now you can use psql to set a password for this new role:

12	$ sudo -u postgres psql\password pusher

Postgresql comes locked down to external connections. One way to connect from the outside is to configure your servers to create an SSH tunnel to the database server and keep external TCP connections through port 5432 disavowed.

But for this example, we will just allow connections from the public Internet to the 5432 TCP port. Warning: this is VERY insecure!

Edit the /etc/postgresql/9.5/main/postgresql.conf and find the listen_addresses configuration line and allow it:

1	listen_addresses = '*' # what IP address(es) to listen on;

This should bind the server to the TCP port. Now edit /etc/postgresql/9.5/main/pg_hba.conf and edit it at the end to looks like this:

# IPv4 local connections:host    all             all             127.0.0.1/32            trusthost    all             all             your-local-machine-ip-address/24        trusthost    all             all             nyc-ip-address/24       trusthost    all             all             lon-ip-address/24       trust# IPv6 local connections:host    all             all             ::1/128                 trust

Save the configuration file and restart the server:

1	sudo service postgresql restart

See what I did there? I only allowed connections coming from the public IPs of the Phoenix servers. This does not make the the server secure, just a little bit less vulnerable. If you're behind a DHCP/NAT based network, just Google for "what's my IP" to see your public facing IP address - which is probably shared by many other users, remember you're allowing connections from an insecure IP to your database server! Once you make initial tests, create your new schema, then you should remove that your-local-machine-ip-address/24 line from the configuration.

From your Phoenix application, you can edit you local config/prod.secret.exs file to look like this:

12345678

# Configure your databaseconfig :your_app_name, ExPusherLite.Repo,adapter: Ecto.Adapters.Postgres,username: "pusher",password: "your-super-secure-pg-password",database: "your-app-database-name",hostname: "pg-ip-address",pool_size: 20

Replace the information for your server and database and now you can test it out like this:

1	MIX_ENV=prod iex -S mix phoenix.server

If you see a :econnrefused message from postgrex, then you're in trouble. Re-check your configuration, restart the server and try again. If everything connects, you can run MIX_ENV=prod mix do ecto.create, ecto.migrate to prepare your database.

Finally, you will want to lock down the rest of your server with UFW, at the very least. UFW should come pre-installed in Ubuntu 16, so you can just do:

123	ufw allow 5432ufw allow sshufw enable

That's it. And again, this does not make your server secure, it just makes it less insecure. There is a huge difference!

And by the way, if you're a Docker fan:

DO NOT INSTALL A DATABASE INSIDE A DOCKER CONTAINER!

You have been warned!

Basic HAProxy configuration

Goals: Provide a simple solution to load balance between our 2 Phoenix servers.
To do: There is something wrong with session checking or something like that as sometimes I have to refresh my browser so I am not sent back to the login from in my application. Phoenix uses cookie-based sessions so I don't think it is missing sessions.

Now let's ssh pusher@ha-ip-address. This one is easy, let's just install HAProxy:

1	sudo apt-get install haproxy

Edit /etc/haproxy/haproxy.cfg:

1234567891011121314

...listen your-app-name  bind 0.0.0.0:80  mode http  stats enable  stats uri /haproxy?stats  stats realm Strictly\ Private  stats auth admin:a-secure-password-for-admin  option forwardfor  option http-server-close  option httpclose  balance roundrobin  server us nyc-ip-address:8080 check  server uk lon-ip-address:8080 check

You can avoid the stats lines if you have other means of monitoring, otherwise set a secure password for the admin user. One very important part is the option http-server-close as explained in this other blog post, otherwise you may have trouble with Websockets.

For some reason I am having some trouble with my application after I login and it sets the session, sometimes I have to refresh to be sent to the correct page, not sure why yet and I believe it's something in the HAProxy configuration. If anyone knows what it is, let me know in the comments section below.

Now you can restart the server and enable UFW as well:

sudo service haproxy restartsudo ufw allow httpsudo ufw allow httpssudo ufw allow sshsudo ufw enable

Finally, I will assume you have a DNS server/service somewhere where you can register the IP of this HAproxy server as an A record so you can access it by a full name such as "your-app-name.mydomain.com".

Basic Phoenix Configuration

Goal: configure the Phoenix app to be deployable. Configure the servers to have the necessary configuration files.
To do: find out a way to cut down the super slow deployment times.

Finally, we have almost everything in place.

I will assume that you have a working Phoenix application already in place, otherwise create one from the many number of tutorials out there.

I have assembled this information from posts such as this very helpful one from Pivotal about an AWS-based deployment. In summary you must do a number of changes to your configuration.

When you're developing your application, you will notice that whenever you run it, it delta-compiles what changed. The binary bits are in the _build/dev or _build/test in the form of .beam binaries (similar to what .class are for Java).

Different from Ruby or Python or PHP, you are not deploying source-code to production servers. It's more akin to Java, where you must have everything compiled into binary bits and packaged into what's called a release. It's like a ".war" or ".ear" if you're from Java.

To create this package people usually use "exrm", but it's being replaced by "distillery", so we will use it.

Then, if you're from Ruby you're familiar with Capistrano. Or if you're from Python, you know Capistrano's clone, Fabric. Elixir has a similar tool (much simpler at this point), called "edeliver". It's your basic SSH automation tool.

You add them to mix.exs just like any other dependency:

123456789101112

...defapplicationdo  [mod: {ExPusherLite, []},applications: [..., :edeliver]]enddefp deps do  [...,   {:edeliver, "~> 1.4.0"},   {:distillery, "~> 1.0"}]end...

From the Pivotal blog post, the important thing to not forget is to edit this part in the config/prod.exs file:

1234	http: [port: 8080],url: [host: "your-app-name.yourdomain.com", port: 80],...config :phoenix, :serve_endpoints, true

You MUST hardcode the default PORT of the Phoenix web server and the allowed domain (remember the domain name you associated to your HAProxy server above? That one). And you MUST uncomment the :serve_endpoints, true line!

For edeliver to work you have to create a .deliver/config file like this:

123456789101112131415161718192021222324252627282930313233343536373839404142434445

# change this to your app name:APP="your-app-name"# change this to your own servers IP and add as many as you wantUS="nyc-ip-address"UK="lon-ip-address"# the user you created in your Ubuntu machines aboveUSER="pusher"# which server do you want to build the first release?BUILD_HOST=$USBUILD_USER=$USERBUILD_AT="/tmp/edeliver/$APP/builds"# list the production servers declared above:PRODUCTION_HOSTS="$US $UK"PRODUCTION_USER=$USERDELIVER_TO="/home/$USER"# do not change hereLINK_VM_ARGS="/home/$USER/vm.args"# For *Phoenix* projects, symlink prod.secret.exs to our tmp sourcepre_erlang_get_and_update_deps() {  local _prod_secret_path="/home/$USER/prod.secret.exs"  if [ "$TARGET_MIX_ENV" = "prod" ]; then    __sync_remote "      ln -sfn '$_prod_secret_path' '$BUILD_AT/config/prod.secret.exs'      cd '$BUILD_AT'      mkdir -p priv/static      mix deps.get      npm install      brunch build --production      APP='$APP' MIX_ENV='$TARGET_MIX_ENV' $MIX_CMD phoenix.digest $SILENCE"  fi}

Remember the information we've been gathering since the beginning of this long recipe? These are the options you MUST change to your own. Just follow the comments in the file content above and add it to your git repository. By the way, your project is in a proper GIT repository, RIGHT??

If you like to use passphrase protected SSH private keys, then it's going to be a huge pain to deploy because for each command, edeliver will issue an SSH command that will keep asking for you passphrase, a dozen times through everything. You've been warned! If you still don't mind that, and you're in a Mac you will have an extra trouble because the Terminal will not be able to create a prompt for you to input your passphrase. You must create an /usr/local/bin/ssh-askpassscript:

123456789101112131415161718192021222324252627282930313233

#!/bin/bash# Script: ssh-askpass# Author: Mark Carver# Created: 2011-09-14# Licensed under GPL 3.0# A ssh-askpass command for Mac OS X# Based from author: Joseph Mocker, Sun Microsystems# http://blogs.oracle.com/mock/entry/and_now_chicken_of_the# To use this script:#   Install this script running INSTALL as root## If you plan on manually installing this script, please note that you will have# to set the following variable for SSH to recognize where the script is located:#   export SSH_ASKPASS="/path/to/ssh-askpass"TITLE="${SSH_ASKPASS_TITLE:-SSH}";TEXT="$(whoami)'s password:";IFS=$(printf "\n");CODE=("on GetCurrentApp()");CODE=(${CODE[*]} "tell application \"System Events\" to get short name of first process whose frontmost is true");CODE=(${CODE[*]} "end GetCurrentApp");CODE=(${CODE[*]} "tell application GetCurrentApp()");CODE=(${CODE[*]} "activate");CODE=(${CODE[*]} "display dialog \"${@:-$TEXT}\" default answer \"\" with title \"${TITLE}\" with icon caution with hidden answer");CODE=(${CODE[*]} "text returned of result");CODE=(${CODE[*]} "end tell");SCRIPT="/usr/bin/osascript"for LINE in ${CODE[*]}; do      SCRIPT="${SCRIPT} -e $(printf "%q" "${LINE}")";done;eval "${SCRIPT}";

Now do this:

12	sudo chmod +x /usr/local/bin/ssh-askpasssudo ln -s /usr/local/bin/ssh-askpass /usr/X11R6/bin/ssh-askpass

Remember, Macs only. And now everytime you try to deploy you will receive a number of graphical prompt windows asking for the SSH private key passphrase. It's freaking annoying! And you must have XQuartz installed, by the way.

Now you must manually create 3 files in all Phoenix servers. Start with the vm.args:

1234	-name us@nyc-ip-address-setcookie @bCd&fG-kernel inet_dist_listen_min 9100 inet_dist_listen_max 9155-config /home/pusher/your-app-name.config

You must create this file in all Phoenix machines, by the way, changing the -name bit for the same name you declared in the .deliver/config file. The -setcookie should be any name, as long as it's the same in all servers.

See the -config /home/pusher/your-app-name.config? Create that file with the following:

[{kernel,  [    {sync_nodes_optional, ['uk@lon-ip-address']},    {sync_nodes_timeout, 30000}  ]}].

This is an Erlang source-code. On the NYC machine you must declare the London name, and vice-versa. If you have several machines, all of them but the one you're in right now. Get it?

Finally, for the Phoenix app itself, you always have a config/prod.secret.exs that should never be git added to the repository, remember? This is where you put the Postgresql server information and random secret key to sign the session cookies:

1234567891011121314151617

use Mix.Configconfig :your_app_name, YourAppName.Endpoint,  secret_key_base: "..."# Configure your databaseconfig :your_app_name, YourAppName.Repo,  adapter: Ecto.Adapters.Postgres,  username: "pusher",  password: "your-super-secure-pg-password",  database: "your-app-database-name",  hostname: "pg-ip-address",  pool_size: 20# if you have Guardian, for example:config :guardian, Guardian,  secret_key: "..."

How do you create a new random secret key? From your development machine just run: mix phoenix.gen.secret and copy the generated string into the file above.

So now you must have those 3 files in each Phoenix server, in the /home/pusher home folder:

123	~/vm.args~/prod.secret.exs~/your-app-name.config

Finally, all set, you can issue this command:

1234567891011121314151617181920

$ mix edeliver update production --branch=master --start-deploy--> Updating to revision d07eaea from branch master--> Building the release for the update--> Authorizing hosts--> Ensuring hosts are ready to accept git pushes--> Pushing new commits with git to: pusher@nyc-ip-address--> Resetting remote hosts to d07eaea8bdbf08e2b2f30550d164d0cbc5eb45c7--> Cleaning generated files from last build--> Fetching / Updating dependencies--> Compiling sources--> Detecting exrm version--> Generating release--> Copying release 0.0.1 to local release store--> Copying your-app-name.tar.gz to release store--> Deploying version 0.0.1 to production hosts--> Authorizing hosts--> Uploading archive of release 0.0.1 from local release store--> Extracting archive your-app-name_0.0.1.tar.gz--> Starting deployed release

Now, this will take an absurdly long time to deploy. That's because it will git clone the source code of your app, fetch all Elixir dependencies (every time!), it will have to compile everything, then it will run the super slow npm install (every time!), brunch your assets, create the so-called "release", tar and gzip it, download it and SCP it to the other machines you configured.

In the .deliver/config file you set a BUILD_HOST option. This is the machine where all this process takes place, so you will want to have at least this machine be beefier than the others. As I am using small 512Mb droplets, the process takes forever.

If you do everything right, the edeliver process finishes without any error and it leaves a daemon running in your server, like this:

/home/pusher/your-app-name/erts-8.2/bin/beam -- -root /home/pusher/your-app-name -progname home/pusher/your-app-name/releases/0.0.1/your-app-name.sh -- -home /home/pusher -- -boot /home/pusher/your-app-name/releases/0.0.1/your-app-name -config /home/pusher/your-app-name/running-config/sys.config -boot_var ERTS_LIB_DIR /home/pusher/your-app-name/erts-8.2/../lib -pa /home/pusher/your-app-name/lib/your-app-name-0.0.1/consolidated -name us@nyc-ip-address -setcookie ex-push&r-l!te -kernel inet_dist_listen_min 9100 inet_dist_listen_max 9155 -config /home/pusher/your-app-name.config -mode embedded -user Elixir.IEx.CLI -extra --no-halt +iex -- console

If you want to stop all those daemons, you can use edeliver and run:

1	mix edeliver stop production

And you can start them again with:

1	mix edeliver start production

If unlike me, you're using the same operating system as the production machines, you can avoid having to build on the server and create the release locally and just deploy the binaries directly:

123	mix edeliver build release --verbosemix edeliver deploy release to production --verbosemix edeliver start production --verbose

It will still take a long time, but it should be easier. So this is a pro-tip for you, Linux users. Follow this Gist for more details, you must emulate what's run from the .deliver/config file's bottom half.

Also notice that I ran the migrations manually, but you can do it using mix edeliver migrate.

Read their documentation for more commands and configurations.

Also, do not forget to enable UFW:

sudo ufw allow sshsudo ufw allow 8080sudo ufw allow proto tcp from any to any port 9100:9155sudo ufw default allow outgoingsudo ufw enable

Debugging Production bugs

Right after I deployed, obviously it failed. And the problem is that the /home/pusher/your-app-name/log/erlang.log files (they are automatically rotated so you may find several files ending in a number), you won't see much.

What I recommend you to do is to change the config/prod.exs file ONLY in your development machine and change the log setting to config :logger, level: :debug, use the same prod.secret.exs you edited in the servers above and run it locally with MIX_ENV=prod iex -S mix phoenix.server.

For example, in development mode I had a code in the controller that was checking the existence of an optional query string parameter like this:

12	if params["some_parameter"] do ...

That was working fine in development but crashing in production, so I had to change it to:

12	ifMap.has_key?(params, "some_parameter") do ...

Another thing was that Guardian was working normally in development, but in production I had to declare its application in the mix.exs like this:

1234	defapplicationdo [mod: {ExPusherLite, []},applications: [..., :guardian, :edeliver]]end

I was getting :econnrefused errors because I forgot to run MIX_ENV=prod mix do ecto.create, ecto.migrate as I instructed above. Once I figured those out, my application was up and running through the http://your-app-name.yourdomain.com, HAProxy was correctly forwarding to the 8080 port in the servers and everything runs fine, including the WebSocket connections.

Conclusion

As I mentioned above, this kind of procedure makes me really miss an easy to deploy solution such as Heroku.

The only problem I am facing right now is that when I log in through Coherence's sign in page, I am not redirected to the correct URI I am trying ("/admin" in my case), sometimes reloading after sign in works, sometimes it doesn't. Sometimes I am inside a "/admin" page but when I click one of the links it sends me back to the sign in page even though I am already signed in. I am not sure if it's a bug in Coherence, ExAdmin, Phoenix itself or an HAProxy misconfiguration. I will update this post if I find out.

Edeliver also takes an obscene amount of time to deploy. Even waiting for sprockets to process in a git push heroku master deploy feels way faster in comparison. And this is for a very bare-bone Phoenix app. Having to fetch everything (because Hex doesn't keep a local global cache, all dependencies are statically vendored in the project directory). Having to run the super slow npm doesn't help either.

I still need to research if there are faster options, but for now what I have "works".

And more importantly, now I have a scalable cluster for real-time bi-directional WebSockets, which is the main reason one might want to use Phoenix in the first place.

If you want to build a "normal" website, keep it simple and do it in Rails, Django, Express or whatever is your web framework of choice. If you want real-time communications the easy way, I might have a better solution. Keep an eye on my blog for news to come soon! ;-)

↧

Ex Pusher Lite - Part 3 - A Complete Solution

December 30, 2016, 11:45 am

≫ Next: Customizing Fedora 25 for Developers

≪ Previous: Elixir Phoenix App deployed into a Load Balanced DigitalOcean setup

It's been over a year since I wrote the 2 pieces about my "Ex Pusher Lite" concept. The code from a year ago is already obsolete as I was still just learning my way through both Elixir and Phoenix.

I've published an article about ExAdmin and Coherence and another on Deploying Elixir to DigitalOcean this month.

The idea is very simple, it is a homage to Pusher. If you used Pusher before, this is very similar (albeit way less feature complete, of course).

I built an entire solution inspired by Pusher, using the Phoenix framework, deployed to Digital Ocean and you can test it out right now, just sign up at expusherlite.cm42.io.

Once you sign up, you will have a secret token (don't disclose that, of course) and you will be below an Organization. Then you can go on and create Applications within that Organization. Each Application will have a unique token to identify it.

dashboard

Now, let's say you want to create a Rails application with a Chat feature. Any version of Rails, you don't need 5.0 and you don't need ActionCable.

First off, let's configure config/secrets.yml:

12345678910111213

development:  secret_key_base: b9a1...e7aa  pusher_host: "expusherlite.cm42.io"  org_id: acme-inc  app_key: 0221...f193  secret_token: 4036...f193...production:  secret_key_base: <%= ENV["SECRET_KEY_BASE"] %>  pusher_host: <%= ENV['PUSHER_LITE_HOST'] %>  org_id: <%= ENV['PUSHER_LITE_ORGANIZATION'] %>  app_key: <%= ENV["PUSHER_LITE_APP_KEY"] %>  secret_token: <%= ENV["PUSHER_LITE_SECRET_TOKEN"] %>

Replace the pusher_host, org_id, app_key, and secret_token for the ones you created before.

Now I want to add a PageController:

12345678910

require "net/http"require "uri"classPageController< ApplicationControllerdefindex    uri = URI.parse("http://#{Rails.application.secrets.pusher_host}/api/sessions")    response = Net::HTTP.post_form(uri, {"token" => Rails.application.secrets.secret_token})@guardian_token = JSON.parse(response.body)["jwt"]Rails.logger.info @guardian_tokenendend

What this piece does is submit the secret token in the server-side, to my service, to get a JSON Web Token (JWT) back. Now you can pass this JWT to the front-end to enable authentication.

In the front-end we can have this simple app/views/page/index.html.erb:

12345678910111213141516

<h1>Ex Pusher Lite - Rails Integration Example</h1><script type="text/javascript" charset="utf-8">  window.guardian_token = "<%= @guardian_token %>";  window.org_id = "<%= Rails.application.secrets.org_id %>"  window.app_key = "<%= Rails.application.secrets.app_key %>";  window.pusher_host = "<%= Rails.application.secrets.pusher_host %>";</script><div id="chat" class="fixedContainer"></div><input type="text" name="name" id="name" value="" placeholder="Name"/><input type="text" name="message" id="message" value="" placeholder="Message"/><input type="checkbox" name="channel" id="channel" value="api"/><label for="channel">send through API</label>

Super simple, we can tweak the CSS (app/assets/stylesheets/application.css) just to make it look nicer:

12345678910111213

....fixedContainer {height: 250px;width: 100%;padding:3px;border: 1pxsolidblack;margin: 5px;overflow: auto;}body {font-family: Helvetica, Arial}

Finally, we need to load the main Javascript from the ExPusherLite server, so edit the layout at app/views/layouts/application.html.erb and add this line right after the closing </body> tag:

1	<scriptsrc="http://<%= Rails.application.secrets.pusher_host %>/js/pusher.js"></script>

And we can now use this Javascript in the app/assets/javascripts/application.js to hook everything up. This is the relevant bit:

123456789101112

$(document).ready(function() {var PusherLite = require("pusher_lite").default;  window.pusher = new PusherLite(window.pusher_host, window.app_key, window.guardian_token, "robot")  window.pusher.listenTo("new_message", function(payload) {var chat = $("#chat")    chat.append("<p><strong>" + payload.name + "</strong> " + payload.message + "</p>");    chat.scrollTop(chat.prop("scrollHeight"));  })  window.pusher.connect();

We can now continue in the same file with the Javascript that binds to the message input field, listening to the "Enter" key press event to send the messages:

12345678910111213141516

  message_element.on('keypress', function(event) {if (event.keyCode != 13) { return; }var message_element = $("#message");var name_element    = $("#name");var check_element   = $("#channel");var payload = { name: name_element.val(), message: message_element.val() };if(!check_element.prop("checked")) {      sendPusher(payload);    } else {      sendAPI(payload)    }    message_element.val('');  });})

And this is how we send messages to ExPusherLite, either directly through the full-duplex WebSockets:

1234	functionsendPusher(payload) { console.log("sending through socket") window.pusher.trigger('new_message', payload );}

Or Posting to the available API:

123456789101112131415161718192021

function sendAPI(payload) {  console.log("sending through API")  $.ajax({    type : 'POST',    crossDomain: true,    url : makeURL("new_message"),    headers : { Authorization : 'Bearer ' + window.guardian_token },    data : payload,    success : function(response) {      console.log(response);      console.log("sent through API successfully");    },    error : function(xhr, status, error) {      console.log(error);    }  });}function makeURL(event) {  return "http://" + window.pusher_host + "/api/organizations/" + window.org_id + "/applications/" + window.app_key + "/event/" + event;}

By the way, you can send messages using the API from the server-side if you want. Specifically from an ActiveJob process so you can keep your Rails web application fast, and you can use the opportunity to store the message in your database, or apply any filters.

And this is it! You now have a Rails application with WebSockets. You can have your lunch and eat it too.

If you want to see this example working, I published a demo app over at Heroku. It's just a demo, it has no authentication, no cross-sripting sanitization, no nothing.

In summary: this is a Rails app (you could do it in Django, Laravel, ASP.NET MVC, it doesn't matter) talking through WebSocket + APIs to a Phoenix cluster.

Next Steps

Keep following my blog (or my Twitter at @akitaonrails ) for more posts to come.

I am still considering if I will open the ExPusherLite code as open source, so let me know if you're interested.

I am also considering if I will keep the current servers online as a cheap service. You can use it for free right now to play with it, but don't use for production-level apps yet. As I am still heavily coding it, I will keep updating the servers, so there is no SLA. Let me know if you're interested in such a service that keeps the code open source so you can trust it better.

There are important features still missing, such as proper SSL support, encrypted channels, better Presence tracking APIs and so on, but what's available right now already covers most use cases for WebSockets.

And better: because this is Phoenix, because this is Elixir, and because this is Erlang, we get distributed PubSub for "free". As I explained in my deployment post, this is a setup with a server in New York and another in London, just to showcase the distributed nature of Erlang.

It's been very fun to play with Elixir for the past few days and how fast I was able to put together a full-featured solution like this. There were many puzzles that made me scratch my head, figuring out how to deal with cross origin issues, how to make the nodes find each other through the edeliver deployment, figuring out the missing bits in replacing exrm for distillery (which is a transition still taking place in the community), etc.

Now I am quite comfortable with the basics, from bootstrapping a project all the way to deploying in a cluster scenario. And I hope this service proves useful to more people.

As this is possibly my last post of the year: Happy New Year! And I will see you again in 2017!

↧

Customizing Fedora 25 for Developers

January 6, 2017, 11:19 am

≫ Next: Arch Linux - Best distro ever?

≪ Previous: Ex Pusher Lite - Part 3 - A Complete Solution

I've been a long time Ubuntu user. Whenever I need to setup a Linux box I go to straight to the latest LTS. Muscle memory, can't avoid it.

But to replace my macOS, Unity is damn ugly, honest. I tried to customize Cinnamon and I almost liked it, and don't even get me started on KDE or XFCE.

GNOME 3.22, on the other hand, is very handsome. I don't need to tweak it or hack it to make it look good. The default set of global shortcuts are spot on if you're a long term macOS user. I like almost everything about it.

I've been curious about all the fuss surrounding the phase out of X.org into Wayland so I wanted to check it out.

The best distro I could find with those in mind is good old Fedora. RedHat (4) was the second Linux distro I tried after Slackware 1 back in the mid-90's. I come back and leave every couple of years. It's a good time to try it again.

The TL;DR is that I am quite delighted with Fedora 25. It does almost everything I need very right out of the box.

Fedora 25

I dusted off a 4 years old Lenovo ThinkCentre Edge 71z Tower desktop and Lenovo IdeaPad G400s notebook. They are, respectivelly, a 2nd generation Core i5 SandyBridge 2.5Ghz and Core i3 2.4Ghz, with 8GB of RAM in the Tower and 4GB of RAM in the notebook. For a developer's routine, they are quite good enough. A better CPU wouldn't do a whole lot.

I was very happy to see that this old tower has an old Intel graphics card with a DVI port. Fortunatelly I had an old DVI-to-HDMI cable around and I was able to hook it up to my ultrawide LG monitor 21:9 (2560x180) and it properly scaled everything (macOS Sierra had a regression that required a hack to make it work!)

What hurts a lot are the super slow mechanical hard drives (7200rpm and 5400rpm). I just ordered a RAM upgrade and 2 Crucial MX300 compatible SSD drives. When those arrive, I will have the snappiness I need.

That being said, when you have a fresh Fedora 25 install, what to do next?

for Ubuntu users

Just remember this: instead of apt-get you get dnf. Fedora prior to version 22 used to have yum, but dnf supercedes it with basically the same command options.

You don't have the equivalent of apt-get update because it auto-updates. The rest is pretty much the same: dnf install package instead of apt-get install package, dnf search package instead of apt-cache search package, and so on. For a global upgrade, do dnf upgrade instead of apt-get update && apt-get upgrade.

For services, instead of doing sudo service restart memcached you can do sudo systemctl restart memcached.

That's pretty much it for the most part. Read this wiki page for more command differences.

Crystal Language support

Let's say you want to learn this brand new language called "Crystal": Ruby-like familiar syntax and standard libraries but Go-like native binary generation, with fast concurrency primitives and all the benefits of an LLVM optimized binary.

You should follow their wiki page but this is what you need:

12345678910111213141516

sudo dnf -y install \  gmp-devel \  libbsd-devel \  libedit-devel \  libevent-devel \  libxml2-devel \  libyaml-devel \  llvm-static \  openssl-devel \  readline-develsudo dnf -y install fedora-repos-rawhidesudo dnf -y install gc gc-devel # get all dependencies from Fedora 25sudo dnf -y install gc gc-devel --enablerepo=rawhide --best --allowerasingsudo dnf -y install crystal

And that's it, a lot of dependencies but as it's pre-1.0 I believe they will improve this in the future.

Ruby and Node.js support

Rubyists have a number of Rubies version control, but I personally like RVM. First, we need to install some other requirements and go on with it:

sudo dnf -y install patch autoconf gcc-c++ patch libffi-devel automake libtool bison sqlite-devel ImageMagick-devel nodejs git gitgcurl -sSL https://rvm.io/mpapis.asc | gpg2 --importcurl -L https://get.rvm.io | bash -s stable --rubysudo npm -g install brunch phantomjs

There you go, you should have the lastest stable Ruby, Node, Npm and useful tools such as Brunch (required if you want to build Elixir-Phoenix web apps) and PhantomJS for automated acceptance tests in many languages

Notice that we're installing Git, the optional GitG which is a fantastic companion to your Git routine.

GitG

Postgresql, Redis, Memcached support

What's a web app without proper databases and cache services? Let's install them:

123456789

sudo dnf -y install postgresql-server postgresql-contrib postgresql-devel memcached redissudo postgresql-setup --initdbsudo sed -i.bak 's/ident/trust/' /var/lib/pgsql/data/pg_hba.conf # NEVER do this in production serverssudo systemctl start postgresqlsudo su - postgrescreateuser youruser -pcreatedb youruser --owner=youruser

Change youruser for the username of your current user account, of course.

Java support

This is easy, let's install the lastest OpenJDK 8 and web browser plugins.

1	sudo dnf -y install java-1.8.0-openjdk icedtea-web

Go Support

Even easier:

1	sudo dnf -y install go

Do not forget to edit your profile, such as $HOME/.profile and add the proper environment variables:

12	export GOROOT=$HOME/goexport PATH=$PATH:$GOROOT/bin

Elixir Support

There is an easy way, and a more complicated and time consuming one. Let's start with the easy one:

1234	sudo dnf -y install erlang elixirmix local.hexmix local.rebarmix archive.install https://github.com/phoenixframework/archives/raw/master/phoenix_new.ez

The problem is that packages for distros such as Fedora can take time to come out. For example, Elixir 1.4 has been out for a couple of days, but no upgrades for Fedora yet.

Another problem if you're professionally developing Elixir projects is that you will need an Elixir version control, because you will end up getting client projects in different Elixir versions and you need to setup your environment accordingly. That's where asdf comes in. You can follow this gist but I will paste the important bits here:

12345678910111213141516171819

sudo dnf -y install make automake gcc gcc-c++ kernel-devel git wget openssl-devel ncurses-devel wxBase3 wxGTK3-devel m4git clone https://github.com/asdf-vm/asdf.git ~/.asdf --branch v0.2.1# For Ubuntu or other linux distrosecho '. $HOME/.asdf/asdf.sh' >> ~/.bashrcecho '. $HOME/.asdf/completions/asdf.bash' >> ~/.bashrc# restart your terminal or source the file above:source ~/.bashrcasdf plugin-add erlang https://github.com/asdf-vm/asdf-erlang.gitasdf plugin-add elixir https://github.com/asdf-vm/asdf-elixir.gitasdf install erlang 19.0asdf install elixir 1.4.0asdf global erlang 19.0asdf global elixir 1.4.0

Compiling Erlang from source will take a humongous ammount of time, specially if you're using old CPUs like me. But this is how you both have access to the latest and greatest Elixir while also having the ability to choose older versions for client projects.

By the way, you can install additional asdf plugins to version control other platforms such as Go, Rust, Node, Julia and many others. Check out their project page for more details.

Docker Support

You will probably want to have access to Docker as well, so let's do this:

123456789101112131415

sudo tee /etc/yum.repos.d/docker.repo <<-'EOF'[dockerrepo]name=Docker Repositorybaseurl=https://yum.dockerproject.org/repo/main/fedora/$releasever/enabled=1gpgcheck=1gpgkey=https://yum.dockerproject.org/gpgEOFsudo dnf -y install docker-enginesudo systemctl enable docker.servicesudo systemctl start docker# you can test if everything went ok with the infamous hello worldsudo docker run --rm hello-world

Desktop Apps

Once you have everything in place, let's configure the non-terminal aspects for a better experience.

Terminator

Speaking of terminals, you will want to install Terminator. I really don't like using screen or tmux in my local machine (I can't get around those key bindings). I am more used to iTerm2 on macOS and Terminator is pretty much the same thing with similar key bindings. You definitelly need to replace the default terminal for this one.

1	sudo dnf -y install terminator

You will also want to edit ~/.config/terminator/config and add the following to make it better:

1234567891011121314151617181920212223

[global_config]  title_transmit_bg_color = "#d30102"  focus = system[keybindings][layouts]  [[default]]    [[[child1]]]      parent = window0      type = Terminal      profile = default    [[[window0]]]      parent = ""      type = Window[plugins][profiles]  [[default]]    use_system_font = false    font = Hack 12    scrollback_lines = 2000    palette = "#073642:#dc322f:#859900:#b58900:#268bd2:#d33682:#2aa198:#eee8d5:#586e75:#cb4b16:#586e75:#657b83:#839496:#6c71c4:#93a1a1:#fdf6e3"    foreground_color = "#eee8d5"    background_color = "#002b36"    cursor_color = "#eee8d5"

Hack font

You will want to have a nicer font such as Hack around as well:

123	dnf -y install dnf-plugins-corednf copr enable heliocastro/hack-fontsdnf -y install hack-fonts

Gnome Tweak Tool

Now you will want to install Gnome Tweak Tool to be able to setup Hack as the default monospace font:

1	sudo dnf -y install gnome-tweak-tool

Vim, Zsh, Yadr

Vim gruvbox

I really like to use Vim so you can install it like this:

1	sudo dnf -y install vim-enhanced vim-X11

And I really like to use YADR to customize all aspects of my ZSH and Vim:

1	sh -c "`curl -fsSL https://raw.githubusercontent.com/skwp/dotfiles/master/install.sh `"

I recommend you have Zsh, Vim, Ruby pre-installed before running the script above. Once you finish, I had to tweak the settings a bit:

1	sed 's/gtk2/gtk3' ~/.vim/settings/yadr-appearance.vim

You'd want to tweak that file as well, to add new fonts such as Hack, and right now I am more in the mood of "gruvbox" instead of "solarized" as Vim theme.

GIMP Photoshop

Gimp with Photoshop Theme

If you're a web developer you will have to edit a couple of images sometimes. And if you're like me, Gimp is a freaking usability nightmare. But there are ways to make it a bit more palatable.

12	sudo dnf -y install gimpsh -c "$(curl -fsSL https://raw.githubusercontent.com/doctormo/GimpPs/master/tools/install.sh)"

There you go, a Photoshop-like theme for Gimp to make it less ugly.

Spotify

What would we, developers, be without music to concentrate?

12	dnf config-manager --add-repo=http://negativo17.org/repos/fedora-spotify.repodnf -y install spotify-client

CoreBird

I am so glad that someone built a very competent and elegant Twitter client for Linux. Install CoreBird:

1	dnf -y install corebird

It's probably even better than the official Mac version.

Tweaking the title bar

I found this hack to try to make the Gnome title bars a bit less fat, which is about the only complaint I have for the look-and-feel so far:

1234567891011

tee ~/.config/gtk-3.0/gtk.css <<-EOF.header-bar.default-decoration { padding-top: 3px; padding-bottom: 3px; font-size: 0.8em;}.header-bar.default-decoration .button.titlebutton { padding: 0px;}EOF

Conclusion

Most of everything you need is web based, so Gmail, Slack, all work just fine. Fire up Chromium, Firefox or install Franz or WMail if you have to. Unfortunatelly everything that is web based consumes a lot of RAM, and this is really bad. I do miss good old, slim, native apps. Web-based apps are a huge hassle.

They "work", but I'd rather have a good native app. On the other hand, Dropbox and Skype have really terrible client apps. They are very poorly maintained, full of bugs, and terrible support. I'd rather not have them.

I was trying to get used to Thunderbird while on Ubuntu. Geary is still not good enough. But I was surprised when I tried Evolution again. It has the only thing I really want from any email client: a damn shortcut to move emails to folders: Ctrl-Shift-V (!!) How hard can that be??

Gnome 3 has a global Online Accounts repository in the Settings where you can register social networks such as Facebook and Google, but the Google support is buggy. It expires everytime, so don't use Evolution with it. Add the Imap/Smtp information manually instead. Email and Calendar data is properly synced that way.

You should have all your password in a LastPass account by now. Authy is a Chrome extension, so your multi-factor authentication should also just work.

My personal bank and investment companies, with their ugly Java applets, work just fine with Chromium and IcedTea, so I'm ok there too.

I just have to figure out the easiest backup strategy to have everything really secure. On the installation process, do not forget to choose the encrypted partition option - and if you do, definitelly backup your data regularly as I've heard of bugs during upgrades that made the encrypted partitions inaccessible. Be secure and also be careful.

As usual, from my macOS the only 2 things I will really miss is Apple Keynote (it's really amazing as no one was able to make a slick and fast presentation tool as good as Keynote) and iMovie for quick video editing (although Kdenlive is a very good alternative).

You even have built-in shortcuts to screen capture a window or an area and record a screencast!

Compared to my Ubuntu configuration, this Fedora 25 is really a pleasure to use. A competent macOS replacement. I highly recommend it!

↧