• Español
  • Blog

The road to using Mysql

Getting the most from the open source database Mysql
  • Home
  • Contact
  • Log in
  • The road to using Mysql

  • A resource blog for those seeking to utilise Mysql, by Mark Nessfield. Useful tips and links for all who need to utilise their Mysql database to the max.

  • Categories

    • All
    • Daily Commute - Standard hints/tips
    • Formula 1 - More advanced stuff
    • Highway Maintenance - DBA stuff
    • Off-Road - Unrelated (?) stuff
    • Showroom - Examples of Mysql usage
    • Uncategorized
    • Recently
    • Archives
    • Categories
    • Latest comments
  • Search




  • XML Feeds

    • RSS 2.0: Posts, Comments
    • Atom: Posts, Comments
    What is RSS?

Checking for non-text entries.

By admin on Nov 23, 2008 | In Uncategorized, Daily Commute - Standard hints/tips | Send feedback »

One common task in data cleansing is to check that a text field contains valid data. Control characters, initials or even numbers can be unwelcome values in some fields.
If we're merely checking that the cell contains text, as opposed to numbers, I find it easier to do this in Excel and export if necessary.
For those interested, the formula to do this would be something like:
=IF(ISTEXT(A3),"Good","False")
However, we have several options of addressing this problem in Mysql

One interesting way of doing this could be to check that the characters contain a vowel.
However if we just use the substring function as follows, we could be around a long time. The following code shows how we would start this.

Code:

SELECT firstname FROM `customer`
WHERE substring(firstname,1,1) not in ('A','E', 'I', 'O', 'U')
AND substring(firstname,2,1) not in ('A','E', 'I', 'O', 'U')
AND substring(firstname,3,1) not in ('A','E', 'I', 'O', 'U')

A better way would be to use the Locate function to see if a word contains a vowel. If the string is not found then locate returns zero.

Code:

SELECT firstname FROM `customer`
WHERE locate('A', firstname) = 0
and locate('E',firstname) = 0
and locate('I',firstname) = 0
and locate('O',firstname) = 0
and locate('U',firstname) = 0

Although I used locate I could easily have used the function instr instead.
Remember though that the syntax order is different

Code:

ie  instr(<search string>, <pattern>)


Yet another way would be to use NOT LIKE.

Code:

SELECT firstname FROM `customer`
WHERE firstname not like '%A%'
AND firstname not like '%E%'
AND firstname not like '%I%'
AND firstname not like '%O%'
AND firstname not like '%U%'

Some of you may have realised there are flaws in using the above. What about case? Maybe some entries just have initials? There could be acronyms or even the word rhythm (no a,e,i,o or u)!!
Perhaps a better way would be to make sure the value of the character has a proper ascii value.
For instance, if I used the following bit of code to test the first character of a field, I can bring back those records that start with a number (or have a non-standard character value).

Code:

select firstname, ascii(substring(firstname,1,1)), gender
from customer
where ascii(substring(firstname,1,1)) not between 65 and 122
and ascii(substring(firstname,1,1)) > 0

If numbers are valid here, we can test for values between 48 and 122. Here's a link to ascii table values:
ascii table

Of course the above code only works on testing the first character of the string.
In order to test every character of the string we need to iterate through a string's characters. I will show how to do this in my next entry.

Lastly, for advanced programmers, there is a great feature in Mysql - the ability to use regular expressions ie regexp
Again, this needs a greater explanation than I have time for on this entry. However there is a section devoted to this on the Mysql manual pages.
mysql manual regexp
Also, there are many sites that give tutorials in using regular expressions.
Here's a good one regular-expressions

If you do decide to use regular expressions, remember that although they can be very powerful, they can also hit performance issues on large tables.

In conclusion, there are many ways of approaching this sort of task in Mysql. I hope the above notes have given you a good idea of how we can do this.

Tags: data analysis, excel, instr, locate, regexp

More useful GROUP BY stuff.

By admin on Oct 18, 2008 | In Daily Commute - Standard hints/tips | Send feedback »

Hi, I've not gone into hibernation (yet)! I've been busy working on SqlServer queries in my job. One useful clause I use there is "top" to limit how many rows your query brings back. Now of course we have an equivalent in Mysql, the "LIMIT" clause. This is very useful to impose order on the rows your query brings back; you don't just have the facility to restrict the query to the first x rows. You can use OFFSET to skip rows. For example if I want to bring back 5 rows from the table EMP, but not the 5 lowest values, I could use the following statement:

Code:

select sal from emp order by sal limit 5 offset 5;
+------+
| sal  |
+------+
| 1250 |
| 1300 |
| 1500 |
| 1600 |
| 2450 |
+------+
5 rows in set (0.01 sec)

Now another useful point about the LIMIT clause is you can use it on GROUP BY queries. It makes sense really, as the LIMIT clause is working on the record rows that are returned for display. Imagine that you need to group by a certain field and sum on a key value. Using the GROUP BY and ORDER BY clauses together will bring back results in ascending or (if specified by DESC) order. However, it may be that large data sets will produce more rows than you're interested in. Using the LIMIT clause will restrict the number of rows you bring back:

Code:

mysql> select deptno, sum(sal) from emp
    -> group by deptno
    -> order by sum(sal) desc
    -> limit 2;
+--------+----------+
| deptno | sum(sal) |
+--------+----------+
|     20 |    10875 |
|     30 |    10650 |
+--------+----------+

I think the combination of GROUP BY with other functions and clauses, is what really makes sql powerful in analyzing data-sets. The following code demonstrates this, as it relies on the use of the UNION operator (remember to use the same number of columns and column data types in each select statement) to display different summaries in the same report:

Code:

mysql> select deptno, job,
    -> 'TOTAL BY DEPT AND JOB' as category,
    -> sum(sal) as sal
    -> from emp
    -> group by deptno, job
    -> UNION ALL
    -> select deptno, null,
    -> 'TOTAL BY DEPT', sum(sal)
    -> from emp
    -> group by deptno
    -> UNION ALL
    -> select null, null,
    -> 'GRAND TOTAL FOR TABLE', sum(sal)
    -> from emp;
+--------+-----------+-----------------------+-------+
| deptno | job       | category              | sal   |
+--------+-----------+-----------------------+-------+
|     10 | CLERK     | TOTAL BY DEPT AND JOB |  1300 |
|     10 | MANAGER   | TOTAL BY DEPT AND JOB |  2450 |
|     10 | PRESIDENT | TOTAL BY DEPT AND JOB |  5000 |
|     20 | ANALYST   | TOTAL BY DEPT AND JOB |  6000 |
|     20 | CLERK     | TOTAL BY DEPT AND JOB |  1900 |
|     20 | MANAGER   | TOTAL BY DEPT AND JOB |  2975 |
|     30 | CLERK     | TOTAL BY DEPT AND JOB |   950 |
|     30 | MANAGER   | TOTAL BY DEPT AND JOB |  2850 |
|     30 | SALESMAN  | TOTAL BY DEPT AND JOB |  6850 |
|     10 | NULL      | TOTAL BY DEPT         |  8750 |
|     20 | NULL      | TOTAL BY DEPT         | 10875 |
|     30 | NULL      | TOTAL BY DEPT         | 10650 |
|   NULL | NULL      | GRAND TOTAL FOR TABLE | 30275 |
+--------+-----------+-----------------------+-------+
13 rows in set (0.54 sec)

Pretty useful, hey? Before I leave, I have to admit to getting most of my code examples this time from the book SQL CookBook, published by O'Reilly. This really is a book worth purchasing, even if you're just starting in SQL or Mysql programming. You can find the book at this site booklink
Plug over - I wish you a profitable time working with your database of choice, and I hope you visit my site again soon.

Beautiful code

By admin on Aug 9, 2008 | In Daily Commute - Standard hints/tips, Formula 1 - More advanced stuff | Send feedback »

A bit of a delay since my last entry. I've started a new job which uses SQLServer (boo!) so my use of Mysql has been very small lately.
Anyway, let's take a little look at the group by clause. One of the most common ways I use it is to bring back records that are duplicated more than once. For example:

Code:

mysql> select ename, job, deptno,count(*) from emp group by ename, job, deptno having count(*) > 1

;
+-------+----------+--------+----------+
| ename | job | deptno | count(*) |
+-------+----------+--------+----------+
| WARD | SALESMAN | 30 | 2 |
+-------+----------+--------+----------+
1 row in set (0.00 sec)

Now how would you bring back (to delete or update) the latest insert of this duplicated record? In this case there are 2 records, one with empno of 7521, the latest record has the empno 8000. The following code (in my humble opinion) is very elegant, and is perhaps my favourite bit of sql I've seen.. thanks to this book sqlcookbook

Code:

mysql> select * from emp
    -> where empno not in (select min(empno)
    ->                     from emp
    ->                     group by ename, job, deptno);

+-------+-------+----------+------+------------+------+------+--------+
| empno | ename | job | mgr | hiredate | sal | comm | deptno |
+-------+-------+----------+------+------------+------+------+--------+
| 8000 | WARD | SALESMAN | 7698 | 1981-02-22 | 1250 | 500 | 30 |
+-------+-------+----------+------+------------+------+------+--------+
1 row in set (0.01 sec)

Here's another way of using the group by clause (again using subqueries) to bring back the employees with the highest and lowest salary in the dataset.

Code:

mysql> select ename, job, deptno
    -> from emp
    -> where sal in ((select min(sal) from emp),
    ->               (select max(sal) from emp));

+-------+-----------+--------+
| ename | job | deptno |
+-------+-----------+--------+
| SMITH | CLERK | 20 |
| KING | PRESIDENT | 10 |
+-------+-----------+--------+
2 rows in set (0.10 sec)

Now that, to me is also beautiful code.
Enjoy your coding - Until next time!

Tags: duplicates, group by
1 2 3 4 >>
  • Some of the best links you'll use

    Gentle Intro to sql
    Mysql Cheat Sheet
    Database tutorials
    MySql Magazine
    10 tips for optimizing mysql queries
    Mysql 5.0 Reference Manual
    MySQL dba Feed Resource
    dba stuff on users/privileges
  • Top MySql blogs

    Planet Mysql
    Jay Pipes
    Performance blog

    My other sites
    My home site
    My Wordpress blog
powered by b2evolution

©2009 by admin | Contact | evoCamp skin | Credits: Blog Design | blogging tool | UK hosting | Francois