Sunday, July 21, 2013

Aggregation: database vs GAMS

The following aggregation code in GAMS aggregates PRODUCTION and AREA HARVESTED from a relatively large data set from FAO (2.2 million records). Afterwards we recalculate YIELDS, as aggregating yields can not be done in the same way as PRODUCTION and AREA (this also famously occurs when aggregating prices: first do volume and value and then recalculate prices).


*-------------------------------------------
* Aggregation
*-------------------------------------------

set
   y(year)
/1990*2000/
   aggr_type(type)
'these quantities are aggregated directly' /
         
'Area Harvested'             Ha
         
'Production Quantity'
        Tonnes
   
/

;
parameter AggrCropData(cropgroup,region,y,type);

AggrCropData(cropgroup,region,y,aggr_type) =
    
sum
((cropmap(crop,cropgroup),regionmap(country, region),unit,flag),
             cropdata(country,crop,aggr_type,y,unit,flag));


*-------------------------------------------

* Recalculate Yield (Hg/Ha)
*-------------------------------------------

AggrCropData(cropgroup,region,y,
'Yield')$AggrCropData(cropgroup,region,y,'Area Harvested') =
    1e5*AggrCropData(cropgroup,region,y,
'Production Quantity'
) /
      AggrCropData(cropgroup,region,y,
'Area Harvested'
)

See also: http://yetanothermathprogrammingconsultant.blogspot.com/2013/05/large-scale-aggregation-example.html.

Of course we can also do this in SQL:

---

--- Aggregation

---

use aggregation;

 

---

--- if target table exists, drop it

---

IF EXISTS (SELECT * FROM sys.objects  WHERE object_id = OBJECT_ID('[dbo].[AggregatedData]') AND type='U')

DROP TABLE AggregatedData;

 

---

--- Step 1:

--- Aggregate Area and Production

---

 

SELECT C.CropGroup, R.Region, A.Element, A.[Year], Sum(A.Value) AS [Value]

INTO AggregatedData

FROM Production_Crops_E_All_Data AS A,

     cropmap AS C,

     regionmap AS R

WHERE

     A.Element In ('Area Harvested','Production Quantity') And

     A.[Year]>='1990' And A.[Year]<='2000' And

     A.Country=R.Country And

     A.Item=C.Crop

GROUP BY C.CropGroup, R.Region, A.Element, A.[Year];

 

 

---

--- Step 2:

--- Recalculate Yield

---

INSERT INTO AggregatedData ( CropGroup, Region, Element, [Year], [Value] )

SELECT A.CropGroup, A.Region, 'Yield', A.[Year], 1.0e5*A.[Value]/B.[Value]

FROM AggregatedData AS A,

     AggregatedData AS B

WHERE

     A.CropGroup = B.CropGroup AND

     A.Region = B.Region AND

     A.[Year] = B.[Year] AND

     A.Element = 'Production Quantity' AND

     B.Element = 'Area Harvested' AND

     B.Value > 0;

 

Interesting to see how well SQL server is doing compared to GAMS. Here are timings in seconds:

  Step1 Step2
GAMS 1.4 0.08
MS Access 5.4 0.04
SQL Server 1.4 0.06