Best Practices

Please consider the following before you start building your MariaDB database.

What are your goals for the database?

  • How will your database will be used?
  • Are there any security restrictions on the data?
  • Once built, will the database be static or do you plan to update it?
  • Who should have access and can they make changes to the database?

Planning your database

  • Knowing your data well is a crucial first step
    • datatypes, value ranges, missing data, row counts
    • primary keys
    • determine whether cleanup is required before importing
  • Feel free to talk with us about database concepts and architecture, engine, datatypes, importing, handling missing and null values and other questions
  • Table architecture should reflect the right mix of leveraging the efficiency of  relational data and usability for the analysis
  • Getting the data ready for importation
    • as you clean the data keep interim files in case you need to modify any changes
    • determine whether the delimiter is appropriate for loading the data (see Delimiters)

Fully document the preparation and building of the database

  • Design
  • Entity-relationship diagram
  • Steps in cleaning and preparing raw data for importation
  • Column naming is an important part of documentation
    • names should descriptive and usable
    • names should be consistent
    • avoid using spaces and quoted identifiers in names
    • consider how the names will be used in queries

Data definition language

  • Choosing appropriate data types
    • type, length and sign of numeric columns
    • features of fixed or variable length character, text and binary columns
    • row enumeration, auto-increment requirements
    • dates and time columns
    • consider the usability of text fields
      • consider how non-categorical text fields will be used
      • consider a separate (linked) table for large text fields; text may be analyzed differently
      • avoid string literals
    • the CREATE TABLE statement

Loading data

  • Determine whether you need to turn off logging during your data load
  • Analyze any warnings during loading
  • Test loads of tables are highly recommended
  • Import tests are useful to assess appropriateness of data type choices
  • Testing is a timesaver, especially if the full tables will be large
  • Testing may uncover unforeseen data problems

Analysis of your data using SQL

  • Index key columns
  • We recommend querying your data and building analysis tables in your ‘personal’ database rather than in your research (production) database.
  • Consider whether the use of temporary tables when doing complex joins will save time
  • Use SHOW EXPLAIN when your query is running to fine-tune the SQL 
  • Use SELECT * only if needed
  • data analysis may be run via SQL scripts in the client, on the command line or passed through code written in R, Python, SAS and other applications
  • Use ORDER BY only if needed; it is more efficient to sort in a front-end program

 

Resources:

MariaDB Knowledge Base

Database Building

Conceptual Design and the Entity-Relationship Diagram

          Table Statements

          Data Definition

SQL

Basic SQL Statements

Getting Data from MariaDB

Useful MariaDB Queries

Joining Tables

Operators and Built-in Functions

Data Manipulation: SELECT, INSERT, UPDATE, DELETE