Please consider the following before you start building your MariaDB database.
What are your goals for the database?
- How will your database will be used?
- Are there any security restrictions on the data?
- Once built, will the database be static or do you plan to update it?
- Who should have access and can they make changes to the database?
Planning your database
Knowing your data well is a crucial first step
- datatypes, value ranges, missing data, row counts
- primary keys
- determine whether cleanup is required before importing
- Feel free to talk with us about database concepts and architecture, engine, datatypes, importing, handling missing and null values and other questions
- Table architecture should reflect the right mix of leveraging the efficiency of relational data and usability for the analysis
Getting the data ready for importation
- as you clean the data keep interim files in case you need to modify any changes
- determine whether the delimiter is appropriate for loading the data (see Delimiters)
Fully document the preparation and building of the database
- Entity-relationship diagram
- Steps in cleaning and preparing raw data for importation
Column naming is an important part of documentation
- names should descriptive and usable
- names should be consistent
- avoid using spaces and quoted identifiers in names
- consider how the names will be used in queries
Data definition language
Choosing appropriate data types
- type, length and sign of numeric columns
- features of fixed or variable length character, text and binary columns
- row enumeration, auto-increment requirements
- dates and time columns
consider the usability of text fields
- consider how non-categorical text fields will be used
- consider a separate (linked) table for large text fields; text may be analyzed differently
- avoid string literals
- the CREATE TABLE statement
- Determine whether you need to turn off logging during your data load
- Analyze any warnings during loading
- Test loads of tables are highly recommended
- Import tests are useful to assess appropriateness of data type choices
- Testing is a timesaver, especially if the full tables will be large
- Testing may uncover unforeseen data problems
Analysis of your data using SQL
- Index key columns
- We recommend querying your data and building analysis tables in your ‘personal’ database rather than in your research (production) database.
- Consider whether the use of temporary tables when doing complex joins will save time
- Use SHOW EXPLAIN when your query is running to fine-tune the SQL
- Use SELECT * only if needed
- data analysis may be run via SQL scripts in the client, on the command line or passed through code written in R, Python, SAS and other applications
- Use ORDER BY only if needed; it is more efficient to sort in a front-end program
Data Manipulation: SELECT, INSERT, UPDATE, DELETE