Clustering is the process of partitioning a group of data points into a small number of clusters. For instance, the items in a supermarket are clustered in categories (butter, cheese and milk are grouped in dairy products). Of course this is a qualitative kind of partitioning. A quantitative approach would be to measure certain features of the products, say percentage of milk and others, and products with high percentage of milk would be grouped together. In general, we have $n$ data points $\b{x}_i,i=1...n$ that have to be partitioned in $k$ clusters. The goal is to assign a cluster to each data point. Kmeans is a clustering method that aims to find the positions $\b{\mu}_i,i=1...k$ of the clusters that minimize the distance from the data points to the cluster. Kmeans clustering solves $$\arg\min_\b{c} \sum_{i=1}^k\sum_{\b{x}\in c_i} d(\b{x},\mu_i) = \arg\min_\b{c} \sum_{i=1}^k\sum_{\b{x}\in c_i} \left\Vert \b{x}\mu_i \right\Vert_2^2$$ where $\b{c}_i$ is the set of points that belong to cluster $i$. The Kmeans clustering uses the square of the Euclidean distance $d(\b{x},\mu_i) = \left\Vert \b{x}\mu_i \right\Vert_2^2$. This problem is not trivial (in fact it is NPhard), so the Kmeans algorithm only hopes to find the global minimum, possibly getting stuck in a different solution.
The Lloyd's algorithm, mostly known as kmeans algorithm, is used to solve the kmeans clustering problem and works as follows. First, decide the number of clusters $k$. Then:
1. Initialize the center of the clusters  $\b{\mu}_i =$ some value $, i=1,...,k$ 
2. Attribute the closest cluster to each data point  $$\b{c}_i = \{j: d(\b{x}_j, \mu_i) \le d(\b{x}_j, \mu_l), l \ne i, j=1,...,n\} $$ 
3. Set the position of each cluster to the mean of all data points belonging to that cluster  $\mu_i = \frac{1}{c_i}\sum_{j\in c_i} \b{x}_j,\forall i$ 
4. Repeat steps 23 until convergence  
Notation  $\b{c} =$ number of elements in $\b{c}$ 
The algorithm eventually converges to a point, although it is not necessarily the minimum of the sum of squares. That is because the problem is nonconvex and the algorithm is just a heuristic, converging to a local minimum. The algorithm stops when the assignments do not change from one iteration to the next.
The number of clusters should match the data. An incorrect choice of the number of clusters will invalidate the whole process. An empirical way to find the best number of clusters is to try Kmeans clustering with different number of clusters and measure the resulting sum of squares.
It is really up to you! Here are some common methods:
Initialization method: number of clusters = 1. Initialize clusters
2. Assign data points to closer cluster
3. Calculate center of each cluster

In this scatter plot you have several twodimensional data points, clustered at 4 distinct positions. You can choose the initialization method and the number of clusters used in the kmeans algorithm. The button 'Reset' resets the algorithm and generates a new dataset. The button 'Iterate' runs one step of the algorithm, which becomes bolded in the text below the button. More often than not, you see that the algorithm converges to the best solution. However, if you try enough times, there are some initializations of the clusters that lead to a "bad" local minimum. If you choose the wrong number of clusters, you can see the drastic effects on the result of the algorithm.
The kmeans algorithm works reasonably well when the data fits the cluster model:
Initialization method: number of clusters = 1. Initialize clusters
2. Assign data points to closer cluster
3. Calculate center of each cluster

Although the clusters have the same scatter (in fact, the same shape), they are not spherical. As before, you can choose the initialization method and the number of clusters used in the kmeans algorithm. The button 'Reset' resets the algorithm and generates a new dataset. The button 'Iterate' runs one step of the algorithm, which becomes bolded in the text below the button. The kmeans algorithm never assigns correctly the tips of the shapes because the spherical assumption fails.
Initialization method: number of clusters = 1. Initialize clusters
2. Assign data points to closer cluster
3. Calculate center of each cluster

Although the clusters have spherical shapes, they have different scatters. As before, you can choose the initialization method and the number of clusters used in the kmeans algorithm. The button 'Reset' resets the algorithm and generates a new dataset. The button 'Iterate' runs one step of the algorithm, which becomes bolded in the text below the button. The data points that clearly belong to the large cluster but are closer to the small clusters are misclassified.