I am doing the Conwaylife question on HDLBits (link: https://hdlbits.01xz.net/wiki/Conwaylife). The question is kind of like a finite state machine, in which I need to update the output with the values calculated from the current output.
From the previous examples on Rule90 and Rule110 (also on the HDLBits), I intuitively wrote the update statement and the read statement of the output register q together in a clocked always block. Code is attached below:
module top_module(
input clk,
input load,
input [255:0] data,
output [255:0] q );
reg [3:0] neighbor_cnt;
reg [3:0] a, b, c, d;
reg [7:0] N;
always @ (posedge clk) begin
if (load) q <= data;
else begin
for (int i=0; i<16; i++) begin:row
for (int j=0; j<16; j++) begin:column
a <= i-1;
b <= i+1;
c <= j-1;
d <= j+1; //overflow handling, wrap around naturally
N <= {q[a*16+c], q[a*16+d], q[b*16+c], q[b*16+d], q[i*16+c], q[i*16+d], q[a*16+j], q[b*16+j]};
neighbor_cnt <= N[0]+N[1]+N[2]+N[3]+N[4]+N[5]+N[6]+N[7];
case (neighbor_cnt)
2: q[i*16 + j] <= q[i*16 + j];
3: q[i*16 + j] <= 1;
default: q[i*16 + j] <= 0;
endcase
end
end
end
end
endmodule
The output is wrong - the simulation result of q becomes all 0 at cycle 2 (cycle 1 is reading input).
After some debugging (e.g. just run one cycle and write the middle variables like N and neighbor_cnt into q), I think my q is somehow not synchronous. It is updated too early or too late, as of which I cannot tell. Then I tried to convert all statements into blocking ones (I know it is not appropriate using blocking statements in clocked always, but just trying). The result shows that the q is delayed (the output that is supposed to be cycle 2 shows up at cycle 3, and messed up the following calculation).
Only after some research into others attempts on this question, I wrote a code that runs successfully:
module top_module(
input clk,
input load,
input [255:0] data,
output [255:0] q );
reg [3:0] neighbor_cnt;
reg [3:0] a, b, c, d;
reg [7:0] N;
wire [255:0] q_next;
always @ (*) begin
for (int i=0; i<16; i++) begin:row
for (int j=0; j<16; j++) begin:column
a = i-1;
b = i+1;
c = j-1;
d = j+1; //overflow handling, wrap around naturally
N = {q[a*16+c], q[a*16+d], q[b*16+c], q[b*16+d], q[i*16+c], q[i*16+d], q[a*16+j], q[b*16+j]};
neighbor_cnt = N[0]+N[1]+N[2]+N[3]+N[4]+N[5]+N[6]+N[7];
case (neighbor_cnt)
2: q_next[i*16 + j] = q[i*16 + j];
3: q_next[i*16 + j] = 1;
default: q_next[i*16 + j] = 0;
endcase
end
end
end
always @(posedge clk) begin
if (load) q <= data;
else q <= q_next;
end
endmodule
The difference here is I used an intermediate wire q_next (not register), and place all calculations in a combinational always with blocking statements. But, I am still confused about why this difference could result in a successful code. If anyone can help out, I deeply appreciate that.
