Rails Components and DB Basics

Jan 25, 2017

Deep Down Basic - What is Rails and its Components

I've recently asked "What's Rails components and how they works?" and I was like... no idea! (This was asked during a job interview and I think this question actually cost me the job!)

According to rails guide, it is a "web appliation development framework written in the Ruby language. It is designed to make programming web applications easier by making assumptions about what every developer needs to get started.... Rail is opinionated software. It makes the assumption that there is a best way to do things, and it's desinged to encourage that way. The Rails philosophy includes tow major guiding principles:

DRY: By not writting the same information over and over again, our code is more maintainable, more extensible, and less buggy.
Convention Over Configuration: Rails has opinions about the best way to do many things in a web application, and defauls to this set of conventions, rather than require that you specify every minutiae through endless configuration files.

Knowing cool stuff like how service object works is nice but it is very important to know the basic stuff like what they are made of. By the way, here is the answer:

ActionCable (Rails5)
ActionMailer - module for designing email service layers, especially email based on templates. It works a lot like you'd hope Rails email would, with controllers, actions and 'views' - which for email are test-based templates, not regular web-page templates.
ActionPack - handling and responding to web requests. It also includes routing - the mapping of an incoming URL to a controller and action in Rails. It also sets up your controllers and views, and shepherds a request through its controller action and then through rendering the view. For some of it, ActionPack uses Rack. The template rendering itself is done through an external gem like Erubis for .erb templates, or Haml for .haml templates. ActionPack also handles action-or view-centered functionality like view caching.

Abstract Controller
Action Controller
Action Dispatch
Action Pack

ActionView - handling view template lookup and rendering
ActiveJob - declaring jobs and making them run on a variety of queueing backends
ActiveModel - non-database functionality extracted from Rails 2 Active Record. It has validation functions and other database related methods e.g. validates :name, presence :true. ActiveModel hooks into features of your models that aren't really about the dabatase. Most commonly, ActiveModel implementations are ORMs, but they can also use non-relational storage like MongoDB, Redis or even local machine memory.
ActiveRecord - connects classes to relational database tables(migrations, associations)
ActiveSupport - contains all Ruby extensnions e.g. [].blank? ActiveSupport is a compatibility library including methods that aren't necessarily specific to Rails. ActiveSupport includes methods like how Rails changes words from single to plural, or CamelCase to snake_case. It also includes significantly better time and date support than the Ruby standard library.

Rails components are modules which are included by default in application.rb with require rails/all.

Why DB Indexing is necessary?

This is the second question I blew during the interview. If you have 10 milion users and if you want to render @users = User.order(:first_name), it will be terribly slow. How would you solve this problem?

When data is stored on disk based storate devices, it is stored as blocks of data. These blocks are accessed in their entirety, making them the atomic disk access operation. Disk blocks are structured in much the same way as linked lists; both contain a section for data, a pointer to the location of the next node(or block), and both need not be stored contiguously. Due to the fact that a number of records can only be sorted on one field, we can state that searching on a field that isn't stored isn't sorted requires a Linear Search O(N), which requires N/2 block accessed(on average), where N is the number of blocks that the table spans. If that field is a non-key field (i.e. doesn't contain unique entries) then the entire table space must be searched at N block accesses.

Whereas with a sorted field, a Binary Search may be used, this has log2N block accesses. Also since the data is sorted given a non-key field, the rest of the table doesn't need to be searched for duplicate values, once a higher value is found. Thus the performace increase is substantial.

So..... what is indexing?

Indexing is a way of sorting a number of records on multiple fields. Creating an index on a field in a table creates another data structure which holds the field value, and pointer to the record it relates to. This index structure is then sorted, allowing Binary Searches to be performed on it.

The downside to indexing is that these indexes require additional space on the disk, since the indexes are stored together in a table using the MyISAM engine, this file can quickly reach the size limits of the underlying file system if many fields within the same table are indexed.

So going back to my initial problem: order(:first_name), the first_name field is neither sorted nor a key field, so a binary search is impossible, nor are the values unique, and thus the table will require searching to the end for an extract N block accesses. It is this situation that indexing aims to correct.

Sorted vs Unsorted Fields

Given that an index record contains only the indexed field and a pointer to the original record, it stands to reason that it will be smaller than multi-field record that is points to. So the index itself requires fewer disk blocks than the original table, which therefore requires fewer block accesses to iterate through. THe schema for an index on the first_name field is outlined below;

Downside of Indexing

Keep in mind that creating an index requires additional disk space, and that too many indexes can cause issues arising from the file systems size liimts, careful thought must be used to select the correct fields to index. Since indexes are only used to speed up the searching for a matching field within the records, it stands to reason that indexing fields used only for output would be simply a waste of disk space and processing time when doing an insert and delete operation, and thus should be avoided. Also given the nature of binary search, the cardinality or uniqueness of the data is important. Indexing on a field with a cardinality of 2 would split the data in half, whereas a cardinality of 1,000 would return approximately 1,000 records. With such a low cardinality the effectiveness is reduced to a linear sort, and the query optimizer will avid using the index if the cardinality is less than 30% of the record number, effectively making the index a waste of space.