Rails Conf 2018 Day 2
2018-04-18
I visited Rails Conf and Pittsburgh this week and I wanted to post my notes
from the talks I went to. Here are the notes and takeaways from day two.
Keynote: The Future of Rails 6: Scalable by Default
Speaker: Eileen M. Uchitelle
Notes
Rails is scalable out of the box up to a point. Once Rails gets to Github size
though there are some workarounds and optimizations most teams do. These
optimizations are going to be built in to Rails 6.
Some Rails 6 scalable by default improvements:
- Multiple database support with a 3 teir database.yml
- Faster test suite
- Parallel testing out of the box.
When companies try to solve problems they often only look inward. Eileen
encouraged developers to think about building general solutions and open
sourcing them to help the community. Scalablility problems face a lot of
people. This encouraged me to reinitiate open sourcing a project that addresses
one issue of scaling.
Takeaways
- Some cool new features are coming in Rails 6 which should help tests run faster
and databases to be easier to work with.
- When extending Rails to support your scale, take a look at what you are
building and try to make it generic and open sourced.
So You’ve Got Yourself a Kafka: Event-Powered Rails Services
Speaker: Stella Cotton
Notes
Kafka is used to stream data for data pipelines and event driven applications
and pipelines. Kafka guarantees at least once delivery within partitions. Think
of a Kafka partition as a long log file with indexes and guarantees of
ordering, a Kafka cluster is made up of many partitions. Applications will
have multiple partitions but you should put related events on the same
partition for ordering. Kafka improves speed and indepence over using RPC for
distributed events processing but it does remove the explicit dependencies.
Martin Fowler defines four types of event systems in his blog post:
What do you mean by “Event-Driven”?
- Event created: Only send an event an id. Downstream services will call
sender if they need more information.
- Event + Information: Event with id, event and state changed information.
No reliance on calling the sender, a downstream server has all it needs.
- Event Sourcing: All state changes are admitted as events and you can
rebuild the state of the world from replaying all events.
- Command Query Responsibility Segregation(CQRS): Split events into read
and write. Good reference here:
CQRS
Suggested using Avro for Kafka schemas. Scaling can be as easy as adding more
consumers but there needs to be metrics on latency. Consumers that are slower
than the requests coming in or paused for a while may have a hard time catching
up.
Takeaways
- Kafka can help scale applications but be careful about hidden dependencies or
coupling.
- Metricisize and make sure consumers keep up with producers and failures can
be recovered from.
Postgres 10, Performance, and You
Speaker: Gabe Enslein
Gabe is on the Heroku Postgres team and summarized some of the cool things
coming to Postgres 10.
Notes
- Native partitioning in default installation.
- Tables can be created from partitions
- Hash indexes are now first class citizens and don't have corruption issues.
This is big! When asked who uses hash indexes no one raised their hands, we
don't because of the WAL log warning and limited support.
- Much better parallel querying support. Gather is now ordered, this means
parallel scans and merge joins can have natural ordering.
Takeaways
- Partition support and better parallelization of queries is coming.
- Rails 4 doesn't support any of the new features and no native Ruby gems
support the partitoning or new Postgres specific indexes yet.
Five Sharding Data Models and Which is Right
Speaker: Craig Kerstiens
Notes
What is sharding? Sharding is seperatinga large database into smaller faster
databases. Tables on different nodes allow performance gains. Tips for
sharding:
- Shard on hash or range but still hash ranges so that they are distributed
across nodes. This helps prevent clustering.
- Define the number of shards up front and go larger than you expect to ever
go. This way you can add new nodes without needing to reshard.
- Can also bucket time series into tables to make easy bucketing and drop old
data.
Their are five general ways to colate data:
- Geography: For when there is clear geographic boundaries. Example Uber,
Instacart
- Multitenant: Each customer has their own shard. Will not work well if
one customer takes up disproportianate amount of database, >10% may mean
sharding doesn't help much.
- Entity ID: sharding on an id if there aren't joins that are needed. Best
for aggregations.
- Graph Database: shard on a few relation types and replicate duplicate
data. Check out paper TAO distributed graph datastore.
- Time Series: event data and metrics can be sharded by a time period.
Works best when dropping older data.
Takeaways
- Not sharding is easier, try to wait but when you commit plan for the big
scale.
- When looking to pick a sharding strategy. Use process of elimination instead
of guessing which way would work best.
Ales on Rails: Making a Smarter Brewery with Ruby
Speaker: Ben Shippee
This was more of a fun talk without many takeaways but fun to see someone
building a cool home built system for managing a brewery.
Notes
Rails application was custom built for Brew Gentleman. The goals were to make
managing the brewery easier. Features that they built to automate the brewery:
- What's on tap, kicked with easy management for workers.
- PDF menu and label generators generators.
- QR Code inventory management system.
- Special releases reservations.
- Lot's more!
Takeaways
- Sounds like a fun project and I definitely would have fun doing something
similar.
Containerizing Rails: Techniques, Pitfalls, & Best Practices
Speaker: Daniel Azuma
Blog post from speaker of the talk here: Containerizing Rails: Techniques, Pitfalls, and Best Practices (RailsConf 2018)
Notes
Tips for containerizing your application:
- Read and understand the base image.
- Combine update, install and clean commands in one run line to prevent bloat
of image.
- Use multi stage Docker files to have an image for building dependencies and
then copy over the built app without development dependencies.
- Set locale in the Dockerfile to potentially avoid some weird Ruby string
errors.
- Run your app under an unprivileged user still!
- Prefer the exec form:
CMD ["bundle", "exec", "rails", "s"]. This ensures
that the stop signals are sent to the program and not the shell.
- Can get around 6 with by prefacing cmd with
exec.
- Avoid using onbuild because it makes assumptions about how image is used.
- Always specify resource constraints to help Kubernete's plan workload.
- Avoid preforking in a container, instead have one process per container.
- Scale by adding containers.
- Send logs outside the container, either an agent or standard out.
Takeaways
- Keep your containers small, defined workloads and constrained within limits.
- Reduce image size by using build images and running multiple commands in line.