The following aggregation code in GAMS aggregates PRODUCTION and AREA HARVESTED from a relatively large data set from FAO (2.2 million records). Afterwards we recalculate YIELDS, as aggregating yields can not be done in the same way as PRODUCTION and AREA (this also famously occurs when aggregating prices: first do volume and value and then recalculate prices).
|
See also: http://yetanothermathprogrammingconsultant.blogspot.com/2013/05/large-scale-aggregation-example.html.
Of course we can also do this in SQL:
--- --- Aggregation --- use aggregation;
--- --- if target table exists, drop it --- IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID('[dbo].[AggregatedData]') AND type='U') DROP TABLE AggregatedData;
--- --- Step 1: --- Aggregate Area and Production ---
SELECT C.CropGroup, R.Region, A.Element, A.[Year], Sum(A.Value) AS [Value] INTO AggregatedData FROM Production_Crops_E_All_Data AS A, cropmap AS C, regionmap AS R WHERE A.Element In ('Area Harvested','Production Quantity') And A.[Year]>='1990' And A.[Year]<='2000' And A.Country=R.Country And A.Item=C.Crop GROUP BY C.CropGroup, R.Region, A.Element, A.[Year];
--- --- Step 2: --- Recalculate Yield --- INSERT INTO AggregatedData ( CropGroup, Region, Element, [Year], [Value] ) SELECT A.CropGroup, A.Region, 'Yield', A.[Year], 1.0e5*A.[Value]/B.[Value] FROM AggregatedData AS A, AggregatedData AS B WHERE A.CropGroup = B.CropGroup AND A.Region = B.Region AND A.[Year] = B.[Year] AND A.Element = 'Production Quantity' AND B.Element = 'Area Harvested' AND B.Value > 0;
|
Interesting to see how well SQL server is doing compared to GAMS. Here are timings in seconds:
Step1 | Step2 | |
GAMS | 1.4 | 0.08 |
MS Access | 5.4 | 0.04 |
SQL Server | 1.4 | 0.06 |
No comments:
Post a Comment