The LAMP Stack
As per Wikipedia definition: LAMP is a solution stack of free, open source software. The acronym LAMP refers to the first letters of Linux (Operating System), Apache HTTP Server, MySQL (Database software), and PHP, Perl or Python, principal components to build a viable general purpose web server.
It is evident from the definition that Database is a fundamental component of most web applications. Even for developing static websites CMS software like WordPress, Joomla, Drupal are used all of which have a Database in the back end to store static content. If you are using PHP, Perl or Python then MySQL is an integral part of the development.
PHP is easy, is it a boon or bane?
PHP’s greatest strength has been its ease of use, any entry level programmer who has little bit knowledge of C Language and understands HTML / CSS / JavaScript to some extent can get a head start with PHP programming. But getting a head start does not ensure efficient completion of an expedition; it requires stability, fail safeness, sustainability & reliability. In order to write stable & reliable code along with building a solid, dependable, robust Database takes expertise and expertise comes from practice, experience & learning from mistakes. Here are 10 nasty mistakes I have seen PHP developers making quite often.
1. Choosing the wrong Storage Engine: MyISAM Vs. InnoDB
MySQL offers a number of storage engines but the most popular ones are MyISM and InnoDB.
MyISAM is the default storage engine. It is based on the older ISAM storage engine which is no longer available. Each MySIAM table is stored on disk in three files; <tablename>.frm (stores the table data format or structure), <tablename>.MYD (holds table data) and <tablename>.MYI (keeps table index information).
InnoDB is a high-reliability and high-performance storage engine for MySQL starting with MySQL 5.5, it is the default MySQL storage engine. It is ACID (Atomicity, Consistency, Isolation, Durability) compliant.
MyISAM uses table level locking compared to InnoDB which uses row-level-locking, table-level-locking can become a critical performance bottleneck when there a lot of concurrent writes to the table.
Here is the Wikipedia source for comparison between MyISAM and InnoDB.
It’s a common tendency among PHP developers to choose MyISAM over InnoDB because they consider MyISAM easier to use compared to InnoDB!
2. Not validating user inputs
There is no way you can trust user inputs. You need to validate every single strings using server side code, don’t ever rely on JavaScript. Here is the simplest SQL injection attack example (also my favorite for grilling programmers in interviews! It still works you know!) :
[php]
$username = $_POST[“name”];
$password = $_POST[“password”];
$sql = “SELECT userid FROM usertable WHERE username=’$username’ AND password=’$password’;”;
[/php]
This can be hacked by entering “admin’; –” in the username field and query will look like:
SELECT userid FROM usertable WHERE username=’admin’;
The hacker will easily log in as “admin”; s/he need not to know the password because it’s commented out of the SQL.
3. Not encrypting password field
It’s again a common tendency of PHP programmers to store password a string and not using any kind of encryption algorithm / function like base_64 or MD5. This again makes things easier for a hacker to use SQL injections. A password field must always be stored as MD5(‘password’) because MD5 is irreversible and even the DBA can’t see the password, obviously s/he can reset it using a simple UPDATE QUERY but the data in the password field is neither exposed to the programmer or to the DBA.
4. Using * in SELECT Queries
Never use * to return all columns in a query. When possible, explicitly specify the columns you want. When all the referenced columns (selected, filtered) in your query exists as an index (called a covering index), the database need not touch the table at all. It can be fully resolved from scanning the index only. By selecting all columns you remove this option from the optimizer. Also selecting fewer columns yields faster result.
5. Using the wrong data types
MySQL offers a range of numeric, string, and time data types. Put a little thought while choosing the data type for a field.
If you want to store Boolean value like 0 or 1 in an INTEGER type of field, use TINYINT (3) instead of INTEGER (10).
If you want to store single character values like ‘A’, ‘B’ or ‘T’, ‘F’ then use CHAR(1) instead of VARCHAR.
If you’re storing a date, use a DATE or DATETIME or TIMESTAMP field. Using an INTEGER or STRING can make SQL queries more complicated, if not impossible. TIMESTAMP also lets you set a default value as CURRENT_TIMESTAMP but remember only one TIMESTAMP field can have a default value of CURRENT_TIMESTAMP.
6. Opening & Closing Database Connections Properly
This is probably the most common laziness I have found in PHP developers. There are 3 most popular ways to connect to MySQL from PHP; 1) mysql_connect, 2) mysql_pconnect, and 3) PDO. Both mysql_connect and mysql_pconnect are deprecated as of PHP 5.5.0 and will be removed in future! So PDO remains the only option for developers.
If you are using mysql_connect then you must close every connection properly to avoid getting ‘too many connections’ error, mysql_pconnect is safer that way but since both of them are deprecated as of PHP 5.5.0 its time you should move to PDO for all your applications whether it’s a legacy system or a new system.
7. Not optimizing your queries
99% of PHP performance issues are caused by the database (read not optimized SQL Queries!), and a single bad SQL Query can play havoc with your web application. MySQL’s EXPLAIN statement can you help you a bit, you also need to turn on the slow query log, error log in your database server to understand which queries are performing poorly, which queries are throwing error and resolve accordingly.
8. Favoring PHP over MySQL
LAMP newbies often prefer solving a problem using PHP code rather than righting an efficient MySQL query.
E.g., the problem at hand is to compute average value of a given attribute values, newbie PHP developers would prefer to compute average using PHP loop but better way of doing the same is to use AVG() function of MySQL.
Never run SQL Queries inside loops, rather fetch the required records using one single Query and then process them using WHILE LOOP.
9. Under- or over-indexing
As a general rule of thumb, indexes should be applied to any column named in the WHERE clause of a SELECT query.
For example, assume we have a user table with a numeric ID (the primary key) and an email address. During log on, MySQL must locate the correct ID by searching for an email. With an index, MySQL can use a fast search algorithm to locate the email almost instantly. Without an index, MySQL must check every record in sequence until the address is found.
It’s tempting to add indexes to every column; however, they are regenerated during every table INSERT or UPDATE. That can hit performance; only add indexes when necessary.
10. Not using UTF-8
UTF-8 is important for internationalization. Although it won’t be properly supported in PHP until version 6.0, there’s little to prevent you setting MySQL character sets to UTF-8. Internationalization is critical because except Asian sub continent, UK, US and Australia other countries prefer their mother tongue over English.
Also don’t forget to backup your Databases regularly to make your applications fail safe as far as data is concerned.
MySQL may be the most widely used database for LAMP developers and most other open source technology developers (Ruby, Groovy, Python, etc) but it is not the only option. PostgreSQL & Firebird are the closest competitors. Microsoft provides SQL Server Express and Oracle supply 10g Express; both are free versions of the bigger enterprise editions. Even SQLite may be a viable alternative for smaller or embedded applications. With the new era of NOSQL there are other options like MongoDB, CouchDB for large volumes of data.
© Kamalika Guha Roy
Leave a Reply