Rails Conf 2018 Day 2

2018-04-18

I visited Rails Conf and Pittsburgh this week and I wanted to post my notes from the talks I went to. Here are the notes and takeaways from day two.

Keynote: The Future of Rails 6: Scalable by Default

Speaker: Eileen M. Uchitelle

Notes

Rails is scalable out of the box up to a point. Once Rails gets to Github size though there are some workarounds and optimizations most teams do. These optimizations are going to be built in to Rails 6.

Some Rails 6 scalable by default improvements:

Multiple database support with a 3 teir database.yml
Faster test suite
Parallel testing out of the box.

When companies try to solve problems they often only look inward. Eileen encouraged developers to think about building general solutions and open sourcing them to help the community. Scalablility problems face a lot of people. This encouraged me to reinitiate open sourcing a project that addresses one issue of scaling.

Takeaways

Some cool new features are coming in Rails 6 which should help tests run faster and databases to be easier to work with.
When extending Rails to support your scale, take a look at what you are building and try to make it generic and open sourced.

So You’ve Got Yourself a Kafka: Event-Powered Rails Services

Speaker: Stella Cotton

Notes

Kafka is used to stream data for data pipelines and event driven applications and pipelines. Kafka guarantees at least once delivery within partitions. Think of a Kafka partition as a long log file with indexes and guarantees of ordering, a Kafka cluster is made up of many partitions. Applications will have multiple partitions but you should put related events on the same partition for ordering. Kafka improves speed and indepence over using RPC for distributed events processing but it does remove the explicit dependencies.

Martin Fowler defines four types of event systems in his blog post: What do you mean by “Event-Driven”?

Event created: Only send an event an id. Downstream services will call sender if they need more information.
Event + Information: Event with id, event and state changed information. No reliance on calling the sender, a downstream server has all it needs.
Event Sourcing: All state changes are admitted as events and you can rebuild the state of the world from replaying all events.
Command Query Responsibility Segregation(CQRS): Split events into read and write. Good reference here: CQRS

Suggested using Avro for Kafka schemas. Scaling can be as easy as adding more consumers but there needs to be metrics on latency. Consumers that are slower than the requests coming in or paused for a while may have a hard time catching up.

Takeaways

Kafka can help scale applications but be careful about hidden dependencies or coupling.
Metricisize and make sure consumers keep up with producers and failures can be recovered from.

Postgres 10, Performance, and You

Speaker: Gabe Enslein

Gabe is on the Heroku Postgres team and summarized some of the cool things coming to Postgres 10.

Notes

Native partitioning in default installation.
Tables can be created from partitions
Hash indexes are now first class citizens and don't have corruption issues. This is big! When asked who uses hash indexes no one raised their hands, we don't because of the WAL log warning and limited support.
Much better parallel querying support. Gather is now ordered, this means parallel scans and merge joins can have natural ordering.

Takeaways

Partition support and better parallelization of queries is coming.
Rails 4 doesn't support any of the new features and no native Ruby gems support the partitoning or new Postgres specific indexes yet.

Five Sharding Data Models and Which is Right

Speaker: Craig Kerstiens

Notes

What is sharding? Sharding is seperatinga large database into smaller faster databases. Tables on different nodes allow performance gains. Tips for sharding:

Shard on hash or range but still hash ranges so that they are distributed across nodes. This helps prevent clustering.
Define the number of shards up front and go larger than you expect to ever go. This way you can add new nodes without needing to reshard.
Can also bucket time series into tables to make easy bucketing and drop old data.

Their are five general ways to colate data:

Geography: For when there is clear geographic boundaries. Example Uber, Instacart
Multitenant: Each customer has their own shard. Will not work well if one customer takes up disproportianate amount of database, >10% may mean sharding doesn't help much.
Entity ID: sharding on an id if there aren't joins that are needed. Best for aggregations.
Graph Database: shard on a few relation types and replicate duplicate data. Check out paper TAO distributed graph datastore.
Time Series: event data and metrics can be sharded by a time period. Works best when dropping older data.

Takeaways

Not sharding is easier, try to wait but when you commit plan for the big scale.
When looking to pick a sharding strategy. Use process of elimination instead of guessing which way would work best.

Ales on Rails: Making a Smarter Brewery with Ruby

Speaker: Ben Shippee

This was more of a fun talk without many takeaways but fun to see someone building a cool home built system for managing a brewery.

Notes

Rails application was custom built for Brew Gentleman. The goals were to make managing the brewery easier. Features that they built to automate the brewery:

What's on tap, kicked with easy management for workers.
PDF menu and label generators generators.
QR Code inventory management system.
Special releases reservations.
Lot's more!

Takeaways

Sounds like a fun project and I definitely would have fun doing something similar.

Containerizing Rails: Techniques, Pitfalls, & Best Practices

Speaker: Daniel Azuma

Blog post from speaker of the talk here: Containerizing Rails: Techniques, Pitfalls, and Best Practices (RailsConf 2018)

Notes

Tips for containerizing your application:

Read and understand the base image.
Combine update, install and clean commands in one run line to prevent bloat of image.
Use multi stage Docker files to have an image for building dependencies and then copy over the built app without development dependencies.
Set locale in the Dockerfile to potentially avoid some weird Ruby string errors.
Run your app under an unprivileged user still!
Prefer the exec form: CMD ["bundle", "exec", "rails", "s"]. This ensures that the stop signals are sent to the program and not the shell.
Can get around 6 with by prefacing cmd with exec.
Avoid using onbuild because it makes assumptions about how image is used.
Always specify resource constraints to help Kubernete's plan workload.
Avoid preforking in a container, instead have one process per container.
Scale by adding containers.
Send logs outside the container, either an agent or standard out.

Takeaways

Keep your containers small, defined workloads and constrained within limits.
Reduce image size by using build images and running multiple commands in line.